Previous Next

The day after code review

Today I want to talk about code reviews and the advantages and disadvantages of particular workflows with an eye towards changes in the future. A time will come when people will need to find out, why changes on the mainline have been done the way they have been done. There is always (for a positive thinking individuum) a reason or motivation behind a change and hope you are in the lucky position that

it has been recorded not only in the brains of your colleagues. I've never heard people saying that we should look at our bug-tracker, review-system or project-wiki to gather that piece of information. It usually stops at the history of your SCM as this process is time-consuming and sometimes a bit frustrating as well - Okay, it depends on your infrastructure, management, and discipline.

An assumption for now: Any relevant information, including 'keep in mind!, side effects, workaround, ..' etc., should be manifested as close to the source as possible. I recommend reading 5-useful-tips-for-a-better-commit-message or commit-messages to anyone who wants to improve his/her commit style.

If we keep information as close to the source as possible then we could ask our SCM to show all the commits that have been involved in a particular code change - may be a few lines of code, a function and so on. This is called 'blame' and works better or worse for this or that SCM. Apart from the negative connotation, you should see that tool more in a way of retrieving the evolution of a code block, where each step includes the information of the developer and other involved code parts. 

Let me try to build a bridge back to the introduction. If we need to find a reason for why things break, we want to narrow the scope to the lines where we expect the regression and then expand it in a way that we don't feel overwhelmed. At this time the code review process is already finished and what we then look at is a reflection of our review process and this includes, directly or indirectly, the technical possibilities, guidelines or best practices of the tool/process we are using. While a good review supports us to share knowledge across the team and to find critical parts in earlier phases of the development, it should reveal the weak points or effects of our changesets in respect of the future as well. This could include static analysis of the sole changesets or the provisioning of different workflows at SCM level. Today I would like to focus on the latter.

At the company, I'm working for we use Gerrit for some projects, but I think most of the following statements are valid for other tools as well. Let us enforce on two rules for the next minutes. We can either 1) change the current commit (amend) or 2) create a new commit and push it into the review process afterward.

If you need a legend for the diagrams you may want to continue reading a few more lines:

A bullet should be seen as a small, self-contained functional unit (commit) and a transparent bullet should be seen as a patch for a previous bullet or a patch. The color should denote that our currently developed feature/issue needs changes in multiple components (API, service, utility, ..)

Workflow

A) one commit to rule them all

  • Advantage
    • a commit on the main line is always functional and we can always jump back to an earlier commit if we encounter a regression
    • a patch commit is noticeable only as an item in the commit message
  • Neutral
    • a commit contains all the information for a reviewer at once, which some developers like and others not - a highly subjective topic
      • code review tools should offer a good diff support
  • Disadvantage
    • a commit contains all changes, even if sub-components do not cause the regression or could be used independently
      • scope is not narrow and contrary to the main goal
    • a newer commit may already depend on a sub-component and we cannot take a subset of those commits without a split of this regression commit

 the day after review 1

B) my master-gun can fire many commits per second, what about yours?

  • Advantage
    • a commit shows a pretty small part of the evolution process and allows a fine-grained description
  • Neutral
    • a commit can be reviewed/validated easily but many development cycle commits (add, remove) might hide the final solution or thoughts behind it
  • Disadvantage
    • any kind of patch commit is noticeable
    • a commit on the mainline is not guaranteed to be feature complete and may break the application
      • an explicit merge strategy can be used to circumvent this problem
    • an interleaving feature from your colleague might shadow the feature correlation
      • which commit belongs to which feature

 the day after review 2

If we try to combine those diagrams from workflow A and B we might end up with something like this. 

 C) reapply your patches/improvements in local iterations, don't let them leak

  • Advantage
    • a commit on the main line is always functional and we can always jump back to an earlier commit if we encounter a regression
    • a patch commit is noticeable only as an item in the commit message
    • a commit shows a pretty small part of the evolution process and allows a fine-grained description
    • a commit set can be associated with a feature and split in case of regression
  • Disadvantage
    • local iterations involve more communication if you work with multiple developers on the same feature
    • requires feature branches and you need to apply an additional strategy for continuous integration
      • like ci branch etc.

the day after review 3

Which can be translated into this: Every independent feature component is developed in a separate commit, but patches, review improvements etc. are always reapplied right on top of them. In the end, we reintegrate the feature using an explicit merge commit. One thing we have to do is to weaken our first enforcement into: We can either change the current commit or apply the change on a previous feature commit. Any relevant change can be documented in that particular component commit message and is therefore available in the future. Further, the scope of a commit is pretty narrow and this enables us to reexamine the change with additional knowledge about the problem or assumptions we made weeks or years ago. The good news is, that modern SCM systems can support you here very well - called a 'rebase' in Git.

As long as we are guarded by a review process, which encourages us to modify and improve our code, there is nothing wrong with using rebase techniques to apply changes where they are needed. As already mentioned we use Gerrit as review system and manage the code by Git. I personally prefer the latest workflow for my everyday work and in a subsequent article, I want to talk more detailed about local iterations and what I consider to be a patch.

Just one note at the end: Newer versions of Gerrit now score your commits concerning the amount of LOC changes. Feel free to post your comments or variations.

The photograph for this article was taken by Antony Theobald.


Become a backer or share this article with your colleagues and friends. Any kind of interaction is appreciated.

Quick Link