Beginner-friendly post.
If commits are scary and you treat them like glass this post will help you
break free. Navigate to an empty directory in your terminal and initialize a repository
with git init
to experiment with.
Today we will gain intuition for what "Commits" are at a lower level,
which will allow us to confidently rewrite history locally (with git rebase
)
and roll back (with git reflog
). We will wrap up with how to apply this
knowledge in a shared repository.
1 - Intuition
Commits are read-only. Well, maybe... please don't quote me on this, I couldn't find an explicit mention in the docs or the freely accessible Pro Git Book. However I find it easier to reason about commit history with this hypothesis and there is a strong case for it:
- commits have a unique identifier (a SHA-1 hash) visible in
git log
- example:
c8e9a6e5294c01847cec3dee65f166f3e1d1b2c6
(shorthandc8e9a
is easier to type)
- example:
- the SHA-1 hash is calculated over data (~= files) and metadata (who, what, when..)
git commit --amend
creates a new commit even when you only alter the commit message (the metadata is different)git rebase
creates a new commit for every changed commit. Commits depend on their parents thus every parent update cascades down children, replacing our tree with a bunch of new commits even though we "only change one".
Hence for all practical purposes that I can think of: commits are immutable
and discarded whenever you change them or their parents.
Now you understand why you should be wary of amending or rebasing commits already sent with git push
:
you're breaking "commit continuity" and creating a new tree, deleting the old one.
Do that on a shared branch and the poor souls who base their work on deleted commits are in a world of pain.
2 - Confidently rewrite history
Have you acquired a taste for caution? Perfect, you are ready for the opposite viewpoint. I think rewriting commits often and extensively might be the safest and fastest workflow in the long run.
Rewriting history is safe
This seems contradictory: how could it be safe when we risk losing work? Let's review git's architecture to assuage our concerns:
- commits are read-only
- you can't accidentally modify them
- data corruption would raise "SHA-1 mismatch" errors
- git is averse to physically erasing "deleted" commits
- they are stored locally for a long time, several weeks typically
- in this state they are called unreachable, dangling or orphan (roughly meaning "separated from the main tree")
- to roll back we only need one commit
- this is usually the most recent commit of a deleted tree
- we "restore" a commit by creating a reference to it
- parents are restored recursively.
Git could be seen as an "append-only" database that we can safely edit to our heart's content, to roll back (restore a previous state) all we need is a tool to find unreachable commits. We will get to this, but first a preview of how convenient rewrites are.
Rewriting history is fast
Treating commit history as sacred is likely to push you into habits such as:
- fear of committing: commits are few and far between
- writer's block: halt progress to ponder on a "perfect" commit message
Suppose we had a tool affording us complete control over commit history, then instead we could:
- commit often while fleshing out a feature
- polish later when we have insight and inspiration
- switch into "history edit mode" at regular intervals
- re-arrange the order of commits to group them as bags of related changes
- combine small commits into one
- rewrite commit messages with our newfound insights.
Interactive rebase
If the above sounds too good to be true you are in luck: there is a one-stop shop command to do all of this in git.
git rebase -i choose-commit-anchor-for-rebase
git rebase -i main
git rebase -i commit-in-the-past
man git rebase
If you need a starter please take 5 minutes to complete this basic tutorial:
Exercise 1:
- (external link to Github): Changing a commit message
3 - Roll back
Now that we are started on a rewrite frenzy, all we need is a tool to dig up commits when pushing our luck a little too far... or cheats to avoid shovelling.
Cheats
Cheat #1: backup branch
git switch my-precious-branch
git branch backup-my-precious-branch
# <insert OOPS moment in 'my-precious-branch' here>
git switch my-precious-branch
git reset --hard backup-my-precious-branch
As long as the reference backup-my-precious-branch
exists the
commits are alive indefinitely and easily accessible.
Cheat #2: you're in the middle of a git rebase
git rebase --abort
Cheat #3: you know the SHA-1
In case you have access to a past git log
output listing a commit known
to be a correct picture, take note of the commit's SHA-1 and:
git switch the-branch-I-want-to-rollback
git reset --hard INSERT_SHA-1_HERE
Read the reflog (Reference log)
Roll up your sleeves, run the following commands and expect somewhat alien output:
git reflog
git reflog some-branch
git log --graph --oneline --reflog
Among what Git remembers is where references (~= branches) used to point to. This information is stored in the "reference log" or "reflog" for short. Locally there is one reflog for every reference.
Previously referenced commits
main@{2}
is an alias for "commit referenced by branch main 2 steps ago"
some-branch@{0}
is an alias for "commit referenced by some-branch".
It is worth noting that commands such as git reset
create an entry
at the top of the branch's reflog and thus main@{2}
becomes main@{3}
.
Explore and reset
After a mishap find the "latest known-good-picture" commit using the commit messages
or check commits one by one with git checkout my-commit-id
. Try not to disturb the branch's reflog
during exploration, when caught early the error is probably among the first entries.
Working from a new separate "restore" branch could help.
When the precious commit is found, celebrate and see Cheat #3 above to hard-reset.
Exercise 2:
- create a test repository, create multiple commits
- rebase
- use the reflog to rollback the rebase.
A good way to get comfortable with the reflog is to observe it at every step while interacting with the repository. You could also read the fabulous manual:
man git reflog
git reflog --help
The reflog is a local piece of information: it traces movements of references in a local copy. Reflog entries have an expiration date implying that unreferenced commits can't live forever.
Please note that I avoid the subject of remote repositories, the edge cases and interactions between reflog, unreachable commits and push/pull from remote still feel fuzzy to me.
4 - Expiry date
Running git commit --amend
, git rebase
(and sometimes git reset
)
is likely to produce unreferenced commits. These unreferenced commits are still considered
"referenced" as long as an entry exists in the reflog, but reflog entries are local
and will expire.
To summarize, here is what Git remembers:
- commits that are referenced (a branch, tag or reflog entry points to it)
- parent commits of referenced commits, all the way up to the "Initial commit"
- unreferenced/unreachable/dangling/orphaned commits for a limited time
- a reflog entry counts as a reference until entry is flushed
- after being dropped from the reflog, commit will be garbage collected eventually
- in practice you rarely have to extract commits in this state
git fsck
and assorted tools might interest you then
How long a reflog entry lives depends on repository settings, but
typically you have at least a couple of weeks.
Since the reflog is local you should
treat your local repository with care during recovery,
a fresh git clone
would not contain the reflog's information.
5 - Caution with remote
As you become liberated from your shackles and start rewriting history left and right I would like to remind you of that initial feeling of caution: your freedom ends where another person's begin.
You are rewriting history and thus will be prompted in the future
to boldly --force
changes during your next git push
. Be wary, this is where
friendships are broken.
Thankfully you now have a better idea of what you are manipulating: a set of commits linked in a tree-like structure that you can easily (and should) re-arrange to your liking. Tack simple rules on top and you can keep your freedom and be a good neighbor.
Write your rules
Try to stick to a flowchart for remote branches, here is an example:
- I am the sole "owner" of a remote branch
- I can rewrite history and force-push it, but I should check for changes first.
- I share branch
abc
with other people- I should never rectify the history of the remote
abc
- a commit is considered "local" before it is pushed, and "sacred" afterwards
- history rewrite should concern local commits only, and thus never involve
--force
- to perform history rewrites, I can:
- (A): carefully rebase my local commits inside my local copy of branch
abc
- or (B): create local branch
abc-my-copy
, work there, rebase onabc
at the end (see below for an example of this strategy)
- (A): carefully rebase my local commits inside my local copy of branch
- I should never rectify the history of the remote
- I am the team leader and want to rewrite branch
abc
- I make sure everyone checks in their current work then stops contributing
- I rebase locally, force-push my changes
- I ask everyone to start from a fresh
git clone
as a precaution - Work can resume. (this procedure is "atomic" and thus costly)
For a shared branch, strategy (B) seems safest as it introduces an extra step to merge:
git branch abc-my-copy # create branch abc-my-copy
git switch abc-my-copy
# ...
# commits and rebases on branch abc-my-copy
# ...
git switch abc
git pull # local copy of 'abc' is up-to-date
git switch abc-my-copy
git rebase -i abc # rebase 'abc-my-copy' commits on up-to-date 'abc'
git switch abc
git merge --ff-only abc-my-copy # (optional: ask explicitly for a fast-forward)
Use --force-with-lease
git push --force-with-lease
is a safer drop-in replacement for git push --force
.
Force-with-lease conveys the following intention:
- I rebased a branch locally
- I want to push the rebased commits to remote (and thus delete the previous tree)
- if other people pushed new commits to the branch while I was rebasing, abort.
This spares you from manually "checking for changes" before force-pushing,
not to mention a commit could always sneak in between
git fetch
and git push
.
Force-with-lease is practical but
not bulletproof.
Conclusion
You have a new intuition about
commits and can re-arrange them confidently with the power of rebase
.
Should a mistake happen the reflog
has everything you need to land on your feet.
You can take this freedom with you on your next team endeavor by sticking to simple rules.
Extra reading material:
- git internals, SHA-1, etc: Pro Git: Git's internals.
Special thanks to:
- Robin for his valuable feedback and for coming up with the enlightening concept of "read-only commits"
- Pablo, the inspiration for starting this blog
Comments, feedback, error reports: (protonmail.com) devmartinx