Eric Le Merdy

Learned performing Data Munging kata

Published by Éric Le Merdy


I am always looking for new ideas to animate regular coding dojo sessions in my compagny One2Team.

This kata is not particularly new. The introduction given by Dave Thomas, its author, was quite apealing:

Martin Fowler gave me a hard time for Kata02, complaining that it was yet another single-function, academic exercise. Which, or course, it was. So this week let's mix things up a bit.

Here’s an exercise in three parts to do with real world data.

The original kata subject can be found here : Kata04: Data Munging - CodeKata.

My implementation can be found here : one-kata-per-day/data-munging at master · ericlemerdy/one-kata-per-day

 To what extent did the design decisions you made when writing the original programs make it easier or harder to factor out common code?

  • Responsibilities separated in several classes makes it harder to identify common code to the point that I inlined all the design before making the abstract version.

  • Libraries and structures are the same in the original programs and the generic one.

Was the way you wrote the second program influenced by writing the first?

  • Totally ! I litterally duplicated the design with copy paste tests and code… And so, a kitten has died. Seriously, only names, columns index and striping some chars changes.

Is factoring out as much common code as possible always a good thing? Did the readability of the programs suffer because of this requirement? How about the maintainability?

  • I don't think factoring as much as possible is always a good thing. Even if you have to emerge commonalities in the behavior, it should not obsessed you otherwise everything is frozen and hard to move. I have not found anything else than experience to emerge the right abstraction.

  • In this kata, maybe 2 is not enought to commonalize. In the real world, I maybe left the 2 programms wihtout sharing common code. In fact, it already shares good library usage !

  • I have identifed several phases :

    • read file

    • compute an operation for each line (difference)

    • extract the line with the lowest value

    In this context, it is important to split file reading (implementation) and data extraction form actual computation (business). If your data can fit in memory, everything remains simple and clear.