Beginner-friendly post. If commits are scary and you treat them like glass this post will help you break free. Navigate to an empty directory in your terminal and initialize a repository with git init to experiment with.

Today we will gain intuition for what "Commits" are at a lower level, which will allow us to confidently rewrite history locally (with git rebase) and roll back (with git reflog). We will wrap up with how to apply this knowledge in a shared repository.

1 - Intuition

Commits are read-only. Well, maybe... please don't quote me on this, I couldn't find an explicit mention in the docs or the freely accessible Pro Git Book. However I find it easier to reason about commit history with this hypothesis and there is a strong case for it:

  • commits have a unique identifier (a SHA-1 hash) visible in git log
    • example: c8e9a6e5294c01847cec3dee65f166f3e1d1b2c6
      (shorthand c8e9a is easier to type)
  • the SHA-1 hash is calculated over data (~= files) and metadata (who, what, when..)
  • git commit --amend creates a new commit even when you only alter the commit message (the metadata is different)
  • git rebase creates a new commit for every changed commit. Commits depend on their parents thus every parent update cascades down children, replacing our tree with a bunch of new commits even though we "only change one".

Hence for all practical purposes that I can think of: commits are immutable and discarded whenever you change them or their parents. Now you understand why you should be wary of amending or rebasing commits already sent with git push: you're breaking "commit continuity" and creating a new tree, deleting the old one. Do that on a shared branch and the poor souls who base their work on deleted commits are in a world of pain.

2 - Confidently rewrite history

Have you acquired a taste for caution? Perfect, you are ready for the opposite viewpoint. I think rewriting commits often and extensively might be the safest and fastest workflow in the long run.

Rewriting history is safe

This seems contradictory: how could it be safe when we risk losing work? Let's review git's architecture to assuage our concerns:

  • commits are read-only
    • you can't accidentally modify them
    • data corruption would raise "SHA-1 mismatch" errors
  • git is averse to physically erasing "deleted" commits
    • they are stored locally for a long time, several weeks typically
    • in this state they are called unreachable, dangling or orphan (roughly meaning "separated from the main tree")
  • to roll back we only need one commit
    • this is usually the most recent commit of a deleted tree
    • we "restore" a commit by creating a reference to it
    • parents are restored recursively.

Git could be seen as an "append-only" database that we can safely edit to our heart's content, to roll back (restore a previous state) all we need is a tool to find unreachable commits. We will get to this, but first a preview of how convenient rewrites are.

Rewriting history is fast

Treating commit history as sacred is likely to push you into habits such as:

  • fear of committing: commits are few and far between
  • writer's block: halt progress to ponder on a "perfect" commit message

Suppose we had a tool affording us complete control over commit history, then instead we could:

  • commit often while fleshing out a feature
    • polish later when we have insight and inspiration
  • switch into "history edit mode" at regular intervals
    • re-arrange the order of commits to group them as bags of related changes
    • combine small commits into one
    • rewrite commit messages with our newfound insights.

Interactive rebase

If the above sounds too good to be true you are in luck: there is a one-stop shop command to do all of this in git.

git rebase -i choose-commit-anchor-for-rebase
git rebase -i main
git rebase -i commit-in-the-past
man git rebase

If you need a starter please take 5 minutes to complete this basic tutorial:


Exercise 1:


3 - Roll back

Now that we are started on a rewrite frenzy, all we need is a tool to dig up commits when pushing our luck a little too far... or cheats to avoid shovelling.

Cheats

Cheat #1: backup branch

 git switch my-precious-branch
 git branch backup-my-precious-branch
 # <insert OOPS moment in 'my-precious-branch' here>
 git switch my-precious-branch 
 git reset --hard backup-my-precious-branch

As long as the reference backup-my-precious-branch exists the commits are alive indefinitely and easily accessible.

Cheat #2: you're in the middle of a git rebase

git rebase --abort

Cheat #3: you know the SHA-1

In case you have access to a past git log output listing a commit known to be a correct picture, take note of the commit's SHA-1 and:

 git switch the-branch-I-want-to-rollback
 git reset --hard INSERT_SHA-1_HERE

Read the reflog (Reference log)

Roll up your sleeves, run the following commands and expect somewhat alien output:

git reflog
git reflog some-branch
git log --graph --oneline --reflog

Among what Git remembers is where references (~= branches) used to point to. This information is stored in the "reference log" or "reflog" for short. Locally there is one reflog for every reference.

Previously referenced commits

main@{2} is an alias for "commit referenced by branch main 2 steps ago"
some-branch@{0} is an alias for "commit referenced by some-branch".

It is worth noting that commands such as git reset create an entry at the top of the branch's reflog and thus main@{2} becomes main@{3}.

Explore and reset

After a mishap find the "latest known-good-picture" commit using the commit messages or check commits one by one with git checkout my-commit-id. Try not to disturb the branch's reflog during exploration, when caught early the error is probably among the first entries. Working from a new separate "restore" branch could help.

When the precious commit is found, celebrate and see Cheat #3 above to hard-reset.


Exercise 2:

  • create a test repository, create multiple commits
  • rebase
  • use the reflog to rollback the rebase.

A good way to get comfortable with the reflog is to observe it at every step while interacting with the repository. You could also read the fabulous manual:

man git reflog
git reflog --help

The reflog is a local piece of information: it traces movements of references in a local copy. Reflog entries have an expiration date implying that unreferenced commits can't live forever.

Please note that I avoid the subject of remote repositories, the edge cases and interactions between reflog, unreachable commits and push/pull from remote still feel fuzzy to me.

4 - Expiry date

Running git commit --amend, git rebase (and sometimes git reset) is likely to produce unreferenced commits. These unreferenced commits are still considered "referenced" as long as an entry exists in the reflog, but reflog entries are local and will expire.

To summarize, here is what Git remembers:

  • commits that are referenced (a branch, tag or reflog entry points to it)
  • parent commits of referenced commits, all the way up to the "Initial commit"
  • unreferenced/unreachable/dangling/orphaned commits for a limited time
    • a reflog entry counts as a reference until entry is flushed
    • after being dropped from the reflog, commit will be garbage collected eventually
      • in practice you rarely have to extract commits in this state
      • git fsck and assorted tools might interest you then

How long a reflog entry lives depends on repository settings, but typically you have at least a couple of weeks. Since the reflog is local you should treat your local repository with care during recovery, a fresh git clone would not contain the reflog's information.

5 - Caution with remote

As you become liberated from your shackles and start rewriting history left and right I would like to remind you of that initial feeling of caution: your freedom ends where another person's begin.

You are rewriting history and thus will be prompted in the future to boldly --force changes during your next git push. Be wary, this is where friendships are broken.

Thankfully you now have a better idea of what you are manipulating: a set of commits linked in a tree-like structure that you can easily (and should) re-arrange to your liking. Tack simple rules on top and you can keep your freedom and be a good neighbor.

Write your rules

Try to stick to a flowchart for remote branches, here is an example:

  • I am the sole "owner" of a remote branch
    • I can rewrite history and force-push it, but I should check for changes first.
  • I share branch abc with other people
    • I should never rectify the history of the remote abc
    • a commit is considered "local" before it is pushed, and "sacred" afterwards
    • history rewrite should concern local commits only, and thus never involve --force
    • to perform history rewrites, I can:
      • (A): carefully rebase my local commits inside my local copy of branch abc
      • or (B): create local branch abc-my-copy, work there, rebase on abc at the end (see below for an example of this strategy)
  • I am the team leader and want to rewrite branch abc
    • I make sure everyone checks in their current work then stops contributing
    • I rebase locally, force-push my changes
    • I ask everyone to start from a fresh git clone as a precaution
    • Work can resume. (this procedure is "atomic" and thus costly)

For a shared branch, strategy (B) seems safest as it introduces an extra step to merge:

git branch abc-my-copy   # create branch abc-my-copy
git switch abc-my-copy
# ...
# commits and rebases on branch abc-my-copy
# ...
git switch abc
git pull                 # local copy of 'abc' is up-to-date
git switch abc-my-copy
git rebase -i abc        # rebase 'abc-my-copy' commits on up-to-date 'abc'
git switch abc
git merge --ff-only abc-my-copy    # (optional: ask explicitly for a fast-forward)

Use --force-with-lease

git push --force-with-lease is a safer drop-in replacement for git push --force.

Force-with-lease conveys the following intention:

  • I rebased a branch locally
  • I want to push the rebased commits to remote (and thus delete the previous tree)
  • if other people pushed new commits to the branch while I was rebasing, abort.

This spares you from manually "checking for changes" before force-pushing, not to mention a commit could always sneak in between git fetch and git push.
Force-with-lease is practical but not bulletproof.


Conclusion
You have a new intuition about commits and can re-arrange them confidently with the power of rebase. Should a mistake happen the reflog has everything you need to land on your feet. You can take this freedom with you on your next team endeavor by sticking to simple rules.

Extra reading material:

Special thanks to:

  • Robin for his valuable feedback and for coming up with the enlightening concept of "read-only commits"
  • Pablo, the inspiration for starting this blog

Comments, feedback, error reports: (protonmail.com) devmartinx