Searching for and navigating Git commits

I sometimes encounter code that puzzles me. When that happens, I try to find the commit that added it. Perhaps the code is that way because of a bug fix that's not obvious at a glance, or maybe there's some constraint I'm not aware of. Either way, more context is needed.

The obvious solution is to use Git blame to view the commit (and associated pull request) that added the code.

The pull request often has a description (or a link to an issue with one) that clarifies the change, or discussions added during code review those are super valuable! When that fails, the commit itself frequently contains related changes that provide context to the code.

But it's not always that straightforward. Sometimes the commit that Git blame points to is not the change that introduced the behavior like a refactor or formatting change. I used to solve this by repeatedly git blameing and reading diff after diff, but that could become terribly laborious and time-consuming.

I recently encountered a particularly tough case where I got fed up and decided to find a better way. In this post, I'll share the tools I found for effectively searching for and navigating through Git commits.

Running Git blame on a piece of code

Let's say we have a mysterious piece of code whose intent is not clear:

const { MPP_ACTIVE } = process.env;
if (MPP_ACTIVE === "true") {
doSomeFunkyStuff();
}

What does "MPP" stand for? And what does it mean for it to be active? Let's use git blame to see the commit that added this code:

git blame example.js -s -L 7,9
1edd8004 7) if (MPP_ACTIVE === "true") {
c457405a 8) doSomeFunkyStuff();
c457405a 9) }

Note: The -s option strips author and date information. -L selects specific lines.

Let's look at the diff of 1edd8004, the commit for line 7, that last touched the MPP_ACTIVE check:

Hmm, no, that's not it that's just making the comparison more specific. We need to go further back to find the change that introduced the if statement itself.

To do that, we might repeat the process and run git blame again on the prior version of the file. Let's do that and see what we get.

What we find is a tweak...

...a refactoring change...

...and a change just moving things around.

We don't care about these changes. What we care about is the commit where the MPP_ACTIVE condition was introduced. Ideally, we'd be able to search for commits that mention MPP_ACTIVE and just look at the earliest one.

That is exactly what git log -S lets us do.

Searching for commits by code

By default, git log lists every commit in your branch. The -S option lets us pass a string used to filter out commits whose diff doesn't include that specific string.

# Show commits that include "getUser" in the diff
git log -S "getUser"

Note: The string passed to -S is case-sensitive. getUser will not match GetUser.

More specifically, the -S option is used to match code that was added or deleted in that commit. If no match is found, the commit is not included in the output.

As a mental model, you can imagine the -S option being implemented like so:

if (typeof args.S === "string") {
commits = commits.filter(commit => (
commit.additions.includes(args.S) ||
commit.deletions.includes(args.S)
));
}

It's worth emphasising that the string we're searching for needs to have been added or deleted not just moved for the commit to be included. Moving lines of code that include our search string around does not constitute a match. This filters out "just moving things around" commits that would've just added noise.

Let's try running git log -S with "MPP_ACTIVE" and see what we get:

git log -S "MPP_ACTIVE"
commit 33a8b6ea050963e452b1d16165f64a77df3ff054
Author: johndoe42 <[email protected]>
Date: Tue Sept 14 14:05:52 2022 +0000
refactor
commit 8fed03eadf2afc5efe91ddf0cf7a7837c8b680fe
Author: aliceb76 <[email protected]>
Date: Mon Jan 19 10:44:19 2021 +0000
do funky stuff if MPP_ACTIVE is set

I find the default output format far too verbose so I almost always use the --oneline flag to compact the output:

git log -S "MPP_ACTIVE" --oneline
33a8b6e refactor
8fed03e do funky stuff if MPP_ACTIVE is set

The commits returned from git log are ordered from newest to oldest, which means that 8fed03e is the first commit in the codebase that mentioned MPP_ACTIVE. It turns out that 8fed03e is exactly the commit we're looking for!

Let's move past this toy example and try git log -S on a larger codebase. I'll use the Next.js codebase as an example and try finding the commit that implemented a specific feature.

Using git log -S in larger codebases

The vercel/next.js codebase has over 25,000 commits added over 8 years, so it's a fairly large and mature codebase!

Next.js has a distDir option that allows the user to specify a custom build directory. Let's try finding the commit that added this option.

As a first step, let's run git log -S with "distDir" and see what we get:

git log -S "distDir" --oneline
5c1828bdd6 Handle source map requests for Turbopack client chunks (#71168)
d8c0539b08 fix: allow custom app routes for metadata conventions (#71153)
490704430b Add source map support for [...] in the browser (#71042)
13f8fcbb6b [...] Implement support for webpack’s `stats.json` (#70996)
3b9889e1d8 [Turbopack] add new backend (#69667)
...over 400 more commits

Hmm... these are all very recent commits. As we touched on earlier, git log orders commits from newest to oldest by default. Since we want to find the earliest mentions of distDir we can use the handy --reverse flag to get the oldest commits first:

git log -S "distDir" --oneline --reverse
acc1983f80 Don't delete `.next` folder before a replacement is built (#1139)
141ab99888 build on tmp dir (#1150)
9347c8bdd0 Specify a different build directory for #1513 (#1599)
8d2bbf940d Refactor the build server to remove tie to fs (#1656)
dec85fe6c4 Add CDN support with assetPrefix (#1700)
...over 400 more commits

Nice! This gives us the first commits mentioning distDir, though it's not necessarily obvious which one we care about.

We could look through the diffs to figure that out, but that would be a lot of work. Let's instead explore some tools that we can use to analyze these commits at a high level so that we can quickly figure out which commits we care about.

Commits at a glance

A quick way to get a feel for a commit is to view the files that it touched, which we can do via git show <commit> --stat. Let's try that on the first commit in the list:

git show acc1983f80 --stat --oneline
acc1983f80 Don't delete `.next` folder before a replacement is built (#1139)
.gitignore | 3 ++-
server/build/clean.js | 4 ++--
server/build/gzip.js | 4 ++--
server/build/index.js | 19 +++++++++++--------
server/build/replace.js | 18 ++++++++++++++++++
server/build/webpack.js | 4 ++--
server/hot-reloader.js | 2 +-
7 files changed, 38 insertions(+), 16 deletions(-)

Note: The --oneline option works the same as in git log, compacting the commit log.

show --stat gives us a great overview of the files that the commit touches, and to what extent.

Still, we can narrow this down even further with the -S option. Using -S "distDir" in conjunction with show --stat shows us only the touched files whose diff includes distDir:

git show acc1983f80 --stat -S "distDir" --oneline
acc1983f80 Don't delete `.next` folder before a replacement is built (#1139)
server/build/replace.js | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

That certainly narrows it down! Let's view the diff for that specific file via show <commit> -- <file>:

git show acc1983f80 -- server/build/replace.js
+++ b/server/build/replace.js
@@ -0,0 +1,18 @@
+
+ const distDir = path.resolve(dir, distFolder);
+ const buildDir = path.resolve(dir, buildFolder);
+

I've shortened the output for clarity.

Hmm, distDir is just a local variable name in this commit. Let's keep looking.

The next commit of interest seems to be 9347c8bdd0, which talks about specifying a build directory:

git log -S "distDir" --oneline --reverse
acc1983f80 Don't delete `.next` folder before a replacement is built (#1139)
141ab99888 build on tmp dir (#1150)
9347c8bdd0 Specify a different build directory for #1513 (#1599)
...

As a first step, let's look at a summary of the changes in 9347c8bdd0 via show --stat:

git show 9347c8bdd0 --stat --oneline
9347c8bdd0 Specify a different build directory for #1513 (#1599)
.gitignore | 2 ++
bin/next-build | 5 +++--
bin/next-dev | 5 +++--
bin/next-start | 9 ++++++---
readme.md | 13 ++++++++++++-
server/build/clean.js | 4 +++-
...9 more files
15 files changed, 128 insertions(+), 34 deletions(-)

We can shorten this by only showing files whose diff includes distDir via the -S option:

git show 9347c8bdd0 --stat -S "distDir" --oneline
9347c8bdd0 Specify a different build directory for #1513 (#1599)
bin/next-start | 9 ++++++---
readme.md | 13 ++++++++++++-
server/build/clean.js | 4 +++-
server/build/index.js | 19 ++++++++++++-------
server/build/replace.js | 9 ++++++---
server/build/webpack.js | 2 +-
...4 more files
10 files changed, 68 insertions(+), 30 deletions(-)

This looks promising! Let's start looking at some diffs to see if this is the commit we're looking for. We can do that in two ways:

  1. Look at the diff for a specific file via show <commit> -- <file>, or
  2. look at diffs for all files that mention distDir via show <commit> -S <code>.

Since we don't know which file to look at, let's use the latter option and browse through files mentioning distDir. After a bit of scrolling, this addition to readme.md crops up:

git show 9347c8bdd0 -S "distDir" --oneline
+++ b/readme.md
@@ -644,6 +644,17 @@
+ #### Setting a custom build directory
+
+ You can specify a name to use for a custom build directory. For
+ example, the following config will create a `build` folder instead
+ of a `.next` folder. If no configuration is specified then next
+ will create a `.next` folder.
+
+ ```javascript
+ // next.config.js
+ module.exports = {
+ distDir: 'build'
+ }
+ ```

Looks like 9347c8bdd0 was the commit that added the distDir option! Opening the commit on GitHub shows us the associated Pull Request which also links to the issue requesting the feature.

The issue provides us with the original motive for adding distDir as an option:

I am trying to deploy next to Firebase functions, and it looks like the .next build directory is ignored by Firebase CLI.

Firebase CLI seems to ignore all hidden files, so I want to use a differently named directory.

The PR also contains a design decision where the option was renamed from options.dist to distDir.

It didn't us long to track down the commit that added the distDir option!

Effective use of -S

Our usage of the -S option was quite simplistic we were just looking for a single term. But that term was distinct enough that we got useful results.

The usefulness of the results produced by -S is proportional to how distinct the search term is. So when searching for common terms it can be helpful to make your query more distinct. For example:

  • Given a function called createContext, you could use -S "createContext(" to find invocations of that function.
  • To find code referencing a property called numInstances, you could search for -S ".numInstances".
  • If you have a React component called SmallNote, you could look for usage of that component via -S "<SmallNote".

I've found it quite useful to try multiple surrounding syntaxes. For example, if looking for a property called foo, I might try the following:

  • -S ".foo" to find property accesses,
  • -S "foo =" to find variable assignments,
  • -S "foo: " to find object literal assignments,
  • -S "foo," for passing as an argument, or assignments using the shorthand syntax, or destructuring assignments,
  • -S " foo " to find standalone destructuring assignments.

For example, when we were looking for distDir in the Next.js codebase, searching for ".distDir" or "distDir:" would have returned the commit we were looking for as the first commit.

You can also search for entire lines of code:

git log -S "[key, str] = part.split("=").map(s => s.trim());"

If you try searching for multiple lines of code using -S, keep in mind that the -S option is sensitive to whitespace.

There are tons of ways to make effective use of the -S option. Try experimenting and see what works for you!

One option that I've yet to try is -G, which works like -S except that it accepts a regex for matching instead of a literal string. See docs.

Performance

git log shows commits as soon as it finds them. So if you're looking for a relatively recent commit, git log will surface it pretty much instantly. But if the commit you're looking for is really far from the starting point of your search (which the --reverse flag affects), you might need to wait a while.

Searching through the entire commit history of the Next.js codebase takes 60 seconds on my M1 MacBook Air:

time git --no-pager log -S "distDir" --reverse --oneline
60.63s user 1.93s system 95% cpu 1:05.43 total

If your codebase is significantly larger, you may run into performance bottlenecks. If you do, I'd love to hear how you work around them!

Final words

Until recently, I wasn't aware of the -S option, and neither was a colleague I showed this to who has been writing software since before Git was created. There are probably a ton of developers who would benefit from being aware that Git has this capability!

I've used the -S option a few times since discovering it, and it's made searching for commits much more enjoyable. Go ahead and try the -S option the next time you need to search a commit history. I hope it proves useful!

Alex Harri

Mailing list

To be notified of new posts, subscribe to my mailing list.