Skip to content

Process

Scaling Effectiveness with Docs - Finding Stale Docs

In a previous post, I argued that to help your team be effective, you need to have up-to-date docs, and to have this happen, you need some way of flagging stale documentation.

In this series, I show you how you can automate this process by creating a seed script, a check script, and then automating the check script. In today's post, let's develop the check script.

Breaking Down the Check Script

At a high level, our script will need to perform the following steps:

  1. Specify the location to search.
  2. Find all the markdown files in directory.
  3. Get the "Last Reviewed" line of text.
  4. Check if the date is more than 90 days in the past.
  5. If so, print the file to the screen.

Specifying Location

Our script is going to search over our repository, however, I don't want our script to be responsible for cloning and cleaning up those files. Since the long term plan is for our script to run through GitHub Actions, we can have the pipeline be responsible for cloning the repo.

This means that our script will have to be told where to search and since it can't take in manual input, we're going to use an environment variable to tell the script where to search.

First, let's create a .env file that will store the path of the repository:

.env
REPO_DIRECTORY="ABSOLUTE PATH GOES HERE"

From there, we can start working on our script to have it use this environment variable.

index.ts
import { load } from "https://deno.land/std@0.195.0/dotenv/mod.ts";

await load({ export: true }); // this loads the env file into our environment

const directory = Deno.env.get("REPO_DIRECTORY");

if (!directory) {
  console.log("Couldn't retrieve the REPO_DIRECTORY value from environment.");
  Deno.exit();
}
console.log(directory);

If we were to run our Deno script with the following command deno run --allow-read --allow-env ./index.ts, we should see the environment variable getting logged.

Finding all the Markdown Files

Now that we have a directory, we need a way to get all the markdown files from that location.

Doing some digging, I didn't find a built-in library for doing this, but building our own isn't too terrible.

By using Deno.readDir/Sync, we can get all the entries in the specified directory.

From here, we can then recurse into the other folders and get their markdown files as well.

Let's create a new file, utility.ts and add a new function, getMarkdownFilesFromDirectory

utility.ts
export function getMarkdownFilesFromDirectory(directory: string): string[] {
  // let's get all the files from the directory
  const allEntries: Deno.DirEntry[] = Array.from(Deno.readDirSync(directory));

  // Get all the markdown files in the current directory
  const markdownFiles = allEntries.filter(
    (x) => x.isFile && x.name.endsWith(".md")
  );
  // Find all the folders in the directory
  const folders = allEntries.filter(
    (x) => x.isDirectory && !x.name.startsWith(".")
  );
  // Recurse into each folder and get their markdown files
  const subFiles = folders.flatMap((x) =>
    getMarkdownFilesFromDirectory(`${directory}/${x.name}`)
  );
  // Return the markdown files in the current directory and the markdown files in the children directories
  return markdownFiles.map((x) => `${directory}/${x.name}`).concat(subFiles);
}

With this function in place, we can update our index.ts script to be the following:

index.ts
import { load } from "https://deno.land/std@0.195.0/dotenv/mod.ts";
import { getMarkdownFilesFromDirectory } from "./utility.ts";

const directory = Deno.env.get("REPO_DIRECTORY");

if (!directory) {
  console.log("Couldn't retrieve the REPO_DIRECTORY value from environment.");
  Deno.exit();
}

const files = getMarkdownFilesFromDirectory(directory);
console.log(files);

Running the script with deno run --allow-read --allow-env ./index.ts, should get a list of all the markdown files being printed to the screen.

Getting the Last Reviewed Text

Now that we have each file, we need a way to get their last line of text.

Using Deno.readTextFile/Sync, we can get the file contents. From there, we can convert them to lines and then find the latest occurrence of Last Reviewed

Let's add a new function, getLastReviewedLine to the utility.ts file.

utility.ts
export function getLastReviewedLine(fullPath: string): string {
  // Get the contents of the file, removing extra whitespace and blank lines
  const fileContent = Deno.readTextFileSync(fullPath).trim();

  // Convert block of text to a array of strings
  const lines = fileContent.split("\n");

  // Find the last line that starts with Last Reviewed On
  const lastReviewed = lines.findLast((x) => x.startsWith("Last Reviewed On"));

  // If we found it, return the line, otherwise, return an empty string
  return lastReviewed ?? "";
}

Let's try this function out by modifying our index.ts file to display files that don't have a Last Reviewed On line.

index.ts
import { load } from "https://deno.land/std@0.195.0/dotenv/mod.ts";
import {
  getMarkdownFilesFromDirectory,
  getLastReviewedLine,
} from "./utility.ts";

const directory = Deno.env.get("REPO_DIRECTORY");

if (!directory) {
  console.log("Couldn't retrieve the REPO_DIRECTORY value from environment.");
  Deno.exit();
}

const files = getMarkdownFilesFromDirectory(directory);
files
  .filter((x) => getLastReviewedLine(x) !== "")
  .forEach((s) => console.log(s)); // print them to the screen

Determining If A Page Is Stale

At this point, we can get the "Last Reviewed On" line from a file, but we've got some more business rules to implement.

  • If there's a Last Reviewed On line, but there's no date, then the files needs to be reviewed
  • If there's a Last Reviewed On line, but the date is invalid, then the file needs to be reviewed
  • If there's a Last Reviewed On line, and the date is more than 90 days old, then the file needs to be reviewed.
  • Otherwise, the file doesn't need review.

We know from our filter logic that we're only going to be looking at lines that start with "Last Reviewed On", so now we need to extract the date.

Since we assume our format is Last Reviewed On, we can use substring to get the rest of the line. We're also going to assume that the date will be in YYYY/MM/DD format.

utility.ts
export function doesFileNeedReview(line: string): boolean {
  if (!line.startsWith("Last Reviewed On: ")) {
    return true;
  }
  const date = line.replace("Last Reviewed On: ", "").trim();
  const parsedDate = new Date(Date.parse(date));
  if (!parsedDate) {
    return true;
  }

  // We could something like DayJS, but trying to keep libraries to a minimum, we can do the following
  const cutOffDate = new Date(new Date().setDate(new Date().getDate() - 90));

  return parsedDate < cutOffDate;
}

Let's update our index.ts file to use the new function.

index.ts
import { load } from "https://deno.land/std@0.195.0/dotenv/mod.ts";
import {
  getMarkdownFilesFromDirectory,
  getLastReviewedLine,
} from "./utility.ts";

const directory = Deno.env.get("REPO_DIRECTORY");

if (!directory) {
  console.log("Couldn't retrieve the REPO_DIRECTORY value from environment.");
  Deno.exit();
}

getMarkdownFilesFromDirectory(directory)
  .filter((x) => getLastReviewedLine(x) !== "")
  .filter((x) => doesFileNeedReview(x))
  .forEach((s) => console.log(s)); // print them to the screen

And just like that, we're able to print stale docs to the screen. At this point, you could create a scheduled batch job and start using this script.

However, if you wanted to share this with others (and have this run not on your box), then stay tuned for the final post in this series where we put this into a GitHub Action and post a message to Slack!

Scaling Effectiveness with Docs - Seeding Dates

In a previous article, I argued that to help your team be effective, you need to have up-to-date docs, and to have this happen, you need some way of flagging stale documentation.

This process lends itself to being easily automated, so in this series of posts, we'll build out the necessary scripts to check for docs that haven't been reviewed in the last 90 days.

All code used in this post can be found on my GitHub.

Approach

To make this happen, we'll need to create the following:

  1. A seed script that will add a Last Reviewed Date to all of our pages.
  2. A check script that will check files for the Last Reviewed Date, returning which ones are either missing a date or are older than 90 days.
  3. Create a scheduled job using GitHub Actions to run our check script and post a message to our Slack channel.

For this post, we'll be creating the seed script.

Breaking Down the Seed Script

For this script to work, we need to be able to do two things:

  1. Determine the last commit date for a file.
  2. Add text to the end of the file.
  3. Getting a list of files in a directory.

To determine the last commit date for a file, we can leverage git and its log command (more on this in a moment). Since we're mainly doing file manipulation, we could use Deno here, but it makes much more sense to me to use something like bash or PowerShell.

Determining the Last Commit Date For a File

To make this automation work, we need to have a date for the Last Reviewed On footer. You don't want to set all the files to the same date because all the files will come up for review in one big batch.

So, you're going to want to stagger the dates. You can do this by generating random dates, but honestly, getting the last commit date should be "good" enough.

To do this, we can take advantage of git's log command with the --pretty flag.

We can test this out by using the following script.

1
2
3
4
5
file=YourFileHere.md
commitDate=$(git log -n 1 --pretty=format:%aI -- $file)
## formatting date to YYYY/MM/DD
formattedDate=$(date -d "$commitDate" "+%Y/%m/%d")
echo $formattedDate

Assuming the file has been checked into Git, we should get the date back in a YYYY/MM/DD format. Success!

Adding Text to End of File

Now that we have a way to get the date, we need to add some text to the end of the file. Since we're working in markdown, we can use --- to denote a footer and then place our text.

Since we're going to be appending multiple lines of text, we can use the cat command with here-docs.

1
2
3
4
5
6
7
8
9
file=YourFileHere.md
## Note the blank lines, this is to make sure that the footer is separated from the text in the file
## Note: The closing EOF has to be on its own line with no whitespace or other characters in front of it.
cat << EOF >> $file


---
Last Reviewed On: 2023/08/12
EOF

After running this script, we'll see that the file has appended blank lines and our new footer.

Combining Into a New Script

Now that we have both of these steps figured out, we can combine them into a single script like the following:

file=YourFileHere.md
commitDate=$(git log -n 1 --pretty=format:%aI -- $file)
## formatting date to YYYY/MM/DD
formattedDate=$(date -d "$commitDate" "+%Y/%m/%d")
## Note the blank lines, this is to make sure that the footer is separated from the text in the file
## Note: The closing EOF has to be on its own line with no whitespace or other characters in front of it.
cat << EOF >> $file


---
Last Reviewed On: $formattedDate
EOF

Nice! Given a file, we can figure out its last commit date and append it to the file. Let's make this more powerful by not having to hardcode a file name.

Finding Files In a Directory

At this point, we can update a file, but the file is hardcoded. But we're going to have a lot of docs to review, and we don't want to do this manually, so let's figure out how we can get all the markdown files in a directory.

For this exercise, we can use the find command. In our case, we need to find all the files with a .md extension, no matter what directory they're in.

directory=DirectoryPathGoesHere
find $directory -name "*.md" -type f

We're going to need to process each of these files, so some type of iteration would be helpful. Doing some digging, Bash supports a for loop, so let's use that.

1
2
3
4
directory=DirectoryPathGoesHere
for file in `find $directory -name "*.md" -type f`; do
  echo "printing $file"
done

If everything works, we should see each markdown file name being printed.

When a Plan Comes Together

We've got all the pieces, so let's bring this together:

directory=DirectoryPathGoesHere
for file in `find $directory -name "*.md" -type f`; do
  commitDate=$(git log -n 1 --pretty=format:%aI -- $file)
  # formatting date to YYYY/MM/DD
  formattedDate=$(date -d "$commitDate" "+%Y/%m/%d")
  # Note the blank lines, this is to make sure that the footer is separated from the text in the file
  # Note: The closing EOF has to be on its own line with no whitespace or other characters in front of it.
  cat << EOF >> $file


---
Last Reviewed On: $formattedDate
EOF
done

Bells and Whistles

This script works and we could ship this, however, it's a bit rough.

For example, the script assumes that it's in the same directory as your git repository. It also assumes that your repository is up-to-date and that it's safe to make changes on the current branch.

Let's make our script a bit more durable by making the following tweaks:

  1. Clone the repo to a new temp directory.
  2. Create a new branch for making changes.
  3. Commit changes and publish the branch.

Getting the latest version of the repo

For this step, let's add logic for creating a new temp directory and adding a call to git clone.

1
2
3
4
5
6
7
8
## see https://unix.stackexchange.com/questions/30091/fix-or-alternative-for-mktemp-in-os-x#answer-84980
## for why tmpDir is being created this way
docRepo="RepoUrlGoesHere"
tmpDir=$(mktemp -d 2>/dev/null || mktemp -d -t 'docSeed')
cd $tmpDir
echo "Cloning from $docRepo"
## Note the . here, this allows us to clone to the temp folder and not to a new folder of repo name
git clone "$docRepo" . &> /dev/null

Making a new branch and pushing changes

Now that we've got the repo, we can add the steps for switching branches, committing changes, and publishing the branch.

1
2
3
4
5
6
## ... code to clone repository
git switch -c 'adding-seed-dates'
## ... code to make file changes
git add --all
git commit -m "Adding seed dates"
git push -u origin adding-seed-dates

Final Script

Let's take a look at our final script:

##!/bin/bash
docRepo="RepoUrlGoesHere"
tmpDir=$(mktemp -d 2>/dev/null || mktemp -d -t 'docSeed')
cd $tmpDir

echo "Cloning from $docRepo"
git clone "$docRepo" . &> /dev/null

git switch -c 'adding-dates-to-files'

for file in `(find . -name "*.md" -type f)`; do
  echo "updating $file"
  commitDate="$(git log -n 1 --pretty=format:%aI -- $file)"
  formattedDate=$(date -d $commitDate "+%Y/%m/%d")
  cat << EOT >> $file


---
Last Reviewed On: $formattedDate
EOT
done
git add --all
git commit -m "Adding initial dates"
git push -u origin adding-dates-to-files

Wrapping Up

In this post, we wrote a bash script to clone our docs and add a new footer to every page with the file's last commit date. In the next post, we'll build the script that checks for stale files.

Scaling Effectiveness with Docs

As a leader, I'm always looking for ways to help my team to be more efficient. To me, an efficient team is self-sufficient, able to find the information needed to solve their problems.

I've found that having up-to-date documentation is critical for a team because it scales out knowledge in asynchronously, removing the need for manual knowledge transfers.

For example, my team has a wiki that contains information for onboarding into our space, how to complete certain processes (requesting time off, resetting a password), how to run our Agile activities, and our support guidebook. At any point, if someone on the team doesn't know how to do something, they can consult the wiki and find the necessary information.

Docs. Why Did It Have to Be Docs?

I enjoy up-to-date documentation, but the main problem with them is that they captured the state of the world when they were written, but they don't react to changes. If the process for resetting your password changes, the documentation doesn't auto-update. So unless you're spending time reviewing the docs, they'll grow stale and be worthless, or even worse, mislead others to do the wrong things.

A good mental model for documentation is to think of them as a garden. When planted, it looks great, and everyone enjoys the environment. Over time, weeds will grow, and plants will become overgrown, causing the garden to be less attractive. Eventually, people will stop visiting, and the garden will go into disrepair. To prevent this, we must take care of the garden, removing the weeds and trimming the plants.

Outdoor green space

Photo by Robin Wersich via Unsplash.com

Alright, I get it, documentation is important, but my team has commitments, so how do we carve out time to review?

Cameron Learns About Document Control

I started my career in healthcare, and one of my first jobs was writing software for a medical diagnostic device. We were ISO 9001 certified, and the device was considered a Class II from the FDA. Long story short, this meant that we have to provide documentation for our device and software and also show that we were keeping things up to date.

To comply, we would find docs that hadn't been updated in a specific time period (like 90 days) and review them. If everything checked out, we'd bump up the review date. Otherwise, we'd make the necessary changes and revalidate the document.

At the time, all of our files were in Word, so it wasn't the easiest to search them (I recall that we had Outlook reminders, but this was many moons ago).

By baking this into our process, this helped make our work more visible, which in turn, gave us a better idea of the team's capacity for that sprint.

Thankfully, we have better technology than Word for sharing information, so how can we take this approach and bring it up to the modern day?

Modern Take on an Old Classic

First, I think that having your docs in source control is a great idea. If you're using tools like Git, you already have a way of leaving comments and keeping track of approvals through pull requests.

To make the most of Git, you should keep your changes in plaintext as it's easy to see the differences. and I enjoy using Markdown and tools like Mkdocs make this workflow possible.

With this figured out, our next step is to know when the file was last reviewed. We can do that by adding a new line to the bottom of each file, Last Reviewed On: YYYY/MM/DD. To come up with the initial date, we could use the last time the file was modified (thanks git log!).

At this point, we have a way to see the last time the file was reviewed, next step is to write a script that can find files that haven't been reviewed in the last 90 days. At a high level, we'd do the following:

  1. Get the latest for the doc repository.
  2. Get all the markdown files for the repository.
  3. Get the last line of the file.
  4. If the line doesn't start with Last Reviewed On:, we flag it for review as it's never been reviewed.
  5. If the line has a date, but it's older than 90 days, we flag it for review as it might be stale.
  6. Print all flagged files to the screen.

With the script created, we could manually run this on Mondays. But we're technical, right? Why not create a scheduled task to execute this script instead? This removes a manual task to be ran and it gives us visibility on what docs need reviewed.

Wrapping Up

When scaling your knowledge out, having great documentation is necessary as it allows your team to self-serve and work in a more asynchronous manner. The main problem with documentation is that it captures the state of the world when the docs were written, but they don't automatically update when the world changes.

Therefore, we need to have some process to flag and review stale docs. To ensure it gets done, we provide visibility by creating work items and committing to them during the sprint.

Five Minutes at Five Guys - When Metrics Conflict with UX

In a recent post, I spoke about the flaw of using a single metric to tell the story and how Goodhart's Law tells us that once we start measuring a metric, it stops being a useful metric.

Let's look at a real-world example with the popular fast food chain, Five Guys.

All I Wanted Was a Burger

Five Guys is known for making good burgers and delivering a mountain of piping hot fries as part of your order. Seriously, an order of small fries is a mountain of spuds. Five Guys make their fries to order, so they're not sitting around under a heat lamp.

Yo Dawg, I heard you wanted fries, so I put fries in your fries

This approach works great when ordering in person, but what happens if you order online? The process is essentially the same, the crew works on the burgers, but they won't start the fries until you're at the restaurant, so they're always guaranteeing that you get fresh made fries.

At this point, it's clear that receiving a mountain of hot, cooked-to-order fries is part of the experience and what customers expect, right?

Keeping Track - My Task Tracking Approach

When it comes to keeping track of things to do, I recall an ill-fated attempt at using a planner. My middle school introduced these planners for the students that you had to use to keep track of dates (and, weirdly enough, as a hall pass to go to the bathroom).

Looking back, the intent was to have the students be more organized, but that wasn't what I learned. I found it cumbersome and a pain to keep track of. Also, you had to pay to replace it if it was lost or stolen.

What I learned to do instead was to keep track of everything I needed to do in my memory, and if I forgot, well, I had to pay the penalty.

I recall seeing my peers in high school and college be much more organized, and they made it so simple. Just color code these things, add these other things to a book and highlight these things.

I didn't realize that my peers had developed a system for studying and keeping track of what they needed to do. Since I didn't know what it was called and felt awkward admitting I didn't know what it was, I would continue relying on my memory to get things done. However, this approach doesn't scale and is prone to having tasks drop from the list.

When I started working at my second professional job, I found my boss to be organized and meticulous, and he never let anything slip. I learned a ton from him about process improvement and was introduced to a Kanban board for the first time.

As an engineer, I would use a version of his approach for years, but when I got into leadership, I felt that I needed a better system. As an individual contributor, I could rely on the task board for what I needed to do, but that approach doesn't work for a leader because not all of your tasks are timely or fit in a neat Jira ticket.

Why a System?

Why do we need a system at all? Isn't memory good enough? The problem is that the human mind is fantastic at problem-solving but isn't great when it comes to recollection. In fact, multiple studies (like this one or this one) have shown that the more stressed you are, the worse your memory can become.

With this context, you need to have some system to get the tasks out of your head and stored elsewhere. Whether that's physical sticky notes in your office, a notebook that you use, or some other tooling, I don't particularly care, but you do need something.

My Approach

I'm loosely inspired by the Getting Things Done approach to task completion, which I've implemented as a Trello board. Having an online tool works for me because I can access it anywhere on my phone (no need to carry a notebook or other materials).

Another side effect of having an online tool is that at any point I have an idea or a task that I need to do, I can add it to my Trello board in two clicks. No more worries about remembering to add the task when I'm back home or in the office, which allows me to not stress about it.

Work Intake Process

On my Trello board (which you can copy a template from here), all tasks end up in the first column, called Inbox. The inbox is the landing spot for anything and everything. Throughout the day, I will process the list and move it to the appropriate column.

  • Is it a task that I can knock out in 5 minutes or less? Just do it!
  • Is it a task that will take more than 5 minutes? Then I move it into the To Do column
  • Is it a task that I might be interested in? Is it a bigger task that I need to think more about? Then that goes into the Some Day column
  • If the task is no longer needed, then it gets deleted.

Deciding What To Do Next

Once the inbox is emptied, I look at the items in the To Do column and pick the most important one. However, determining the most important one is not always the easy.

For this, I leverage the Eisenhower Matrix approach.

Named after Dwight D. Eisenhower, the idea is that we have two axes, one labeled Important and the other labeled Urgent. With these labels, tasks fall into four buckets:

  • Urgent and Important - (e.g., production broken, everything is on fire)
  • Urgent and Not Important - (e.g., last minute request, something that needs to be done, but not necessarily by you)
  • Not Urgent and Important - (e.g., strategic work, things that need to get done, but not necessarily this moment)
  • Not Urgent and Not Important - (e.g., time wasters, delete these tasks)
Eisenhower Matrix with four quadrants: Urgent & Important, Urgent & Not Important, Not Urgent & Important, and Not Urgent & Not Important
(2023, March 7). In Wikipedia. https://en.wikipedia.org/wiki/Time_management

Dealing with Roadblocks

In an ideal world, you could take an item and run it to completion, but things aren't always that easy. You might need help from another person or are waiting for someone to do their part.

When this happens, I'll move the item to the Waiting column and pick up a new task as I don't like to be stalled.

However, I keep an eye on the number of items in flight as I've found that if I have more than three items in flight, I struggle with making progress and spend my time context-switching between the items instead of completing work. It can be challenging if the tasks are wholly unrelated (development tasks, writing, and reviewing pull requests) as the cost of regaining the context feels higher than if the tasks are related (e.g., reviewing multiple pull requests for the same repository).

Getting Things Done

As items get completed, I add them to the Done column for the week. To help keep track of what I got done for the week, I typically call my Done column the week it spans (e.g., Apr 17-23, 2023). Once the week ends, I can refer back to the column, see where I spent my time, and reflect if I made the right choices for the week.

Finally, I'll archive the list, create a new column for next week and repeat.