Programblings

Rambling about programming and life as a programmer - by Mathieu Martin

Archive for the 'garbage out' Category


The illustrated guide to recovering lost commits with Git

7th June 2008

Git is one hell of a powertool.

Like with any such tool, as soon as you get to know it enough, you start pushing the boundaries. Git gives you a lot of control over your repository:

The list goes on…

More traditional version control systems don’t give you as much power as Git by any stretch of the mind. They are like taking a walk in the woods with your parents, at age 14.

You’re probably gonna see and do neat stuff, but you sure ain’t gonna get lost or anything.

Using Git on the other hand is more akin to being handed a cool motocross to go play alone in the woods… Also at age 14.

Insane motocross shit

We all know what’s bound to happen, right?

You’ll smash into a tree.

The source control equivalent to slamming into a tree is losing commits. Getting all of Git’s power and flexibility at once can be somewhat dangerous. You’ll find it so easy and helpful to branch and merge that you’ll start doing it way more often. On the other hand — especially in the beginning — you’ll misunderstand or plainly miss some important warnings, and make errors. Or you may just end up in weird merging situations you never thought of, and don’t necessarily understand. These situations can often result in losing commits or whole branches.

My goal with this article is to make sure you understand the situation you’re really in: you have temporarily lost commits or branches.

Disclaimer

This article assumes a basic knowledge of how git works, e.g. committing, branching and merging.

My first time

The first time I lost a commit was a good while ago. I can’t remember the details, but basically I got bit by the fact that under the covers, Git uses hard links liberally. Which means that copy / pasting your code directory as a recovery solution isn’t going to save your ass A nice poney when you attempt a potentially damaging operation you don’t fully understand.

Note that compressing your code directory will, though.

So there I was, after attempting an operation I didn’t really understand. I knew I had failed what I attempted and I knew I had lost my last commit. Ironically, I still had Gitk open, displaying that very commit. As long as I didn’t refresh the Gitk view with F5 I could see the lost commit.

Here’s a fun fact: under OSX (not sure about Linux) you cannot select and copy text from Gitk’s interface, except for the SHA1 field [1]. I knew Git probably had a way to recover from that… But you know, I just wanted to get back to work and NOT search documentation and blog posts endlessly.

So I took screenshots, passed them real quick through GOCR, just to see how far it would get.

The result: GOCR doesn’t like the font Monaco :-)

How to (really) recover lost commits with Git

Recently I lost a commit again. This time however, Gitk was not up to date. I knew I’d just lost something I wouldn’t necessarily remember in its entirety. It was a commit an hour old, touching many files. And I have a crappy memory.

This time I had to do it the right way. I found out it’s really easy (once you figure it out), but I found no really clear explanation anywhere. So here goes.

Initial setup

If you wanna follow along — and I strongly recommend it — here’s the boring few steps to create a dummy repo and bring it up to speed with for the rest of this article. We’re going to beat the hell out of this repo and it’s going to be fun.

So just paste the following into a console:

mkdir recovery;cd recovery
git init
touch file
git add file
git commit -m “First commit”
echo “Hello World” > file
git add .
git commit -m “Greetings”git branch cool_branch
git checkout cool_branch
echo “What up world?” > cool_file
git add .
git commit -m “Now that was cool”
git checkout master
echo “What does that mean?” >> file

Ok, let’s look at where we’re at:

gitk ––all &

The ––all option lets you see all branches at the same time, as well as your stashes.

Click here to enlarge your picture!!1

Initial setup - Recovering git commits

We can see the cool_branch as well as some yet uncommitted changes over the master branch.

mathieu@ml recovery (master)$ ls -l
total 16
-rw-r–r–  1 mathieu  staff    15B  7 Jun 18:19 cool_file
-rw-r–r–  1 mathieu  staff    33B  7 Jun 18:19 file

Got my 2 files, I’m good to go.

Let’s make a mistake

Let’s say I decide I want to bring in these cool changes in master. I’ll do it with a rebase. I know there’s no big risk of conflicts so that’s a no-brainer.

mathieu@ml recovery (master)$ git rebase cool_branch
file: needs update

My ugly mug

Now if you look carefully you’ll notice I wasn’t paying attention when Git gave me a feeble complaint about ‘file’.

Everything’s well, so I think “Ok, I don’t need cool_branch anymore”.

mathieu@ml recovery (master)$ git branch -d cool_branch
error: The branch ‘cool_branch’ is not an ancestor of your current HEAD.
If you are sure you want to delete it, run ‘git branch -D cool_branch’.

Huh? Whatever you say, Linus. Let’s get on with it.

mathieu@ml recovery (master)$ git branch -D cool_branch
Deleted branch cool_branch.

Ahh, it feels good to be a Git ninja. Now let’s see where we’re at and refresh Gitk with F5.

Gitk - oh shit moment

Oops, my cool commit is gone! That thing can’t be right. Let’s panic:

mathieu@ml recovery (master)$ ls
file

mathieu@ml recovery (master)$ git status
# On branch master
# Changed but not updated:
#   (use “git add <file>…” to update what will be committed)
#
#    modified:   file
#
no changes added to commit (use “git add” and/or “git commit -a”)

mathieu@ml recovery (master)$ git diff
diff –git a/file b/file
index 557db03..f2a8bf3 100644
— a/file
+++ b/file
@@ -1 +1,2 @@
 Hello World
+What does that mean?

Oh shit face
Oh sh!t

So the ‘file: needs update’ message back there meant that the rebase didn’t happen, because I had pending changes.

Helpful.

Recovering a lost commit

Since I don’t think my uncommitted work is complete, I’ll just stash it instead of committing it. Then I’ll hunt down my lost work.

mathieu@ml recovery (master)$ git stash save “Questioning the universe”
Saved working directory and index state “On master: Questioning the universe” HEAD is now at 6da726f… Greetings

In the name of paranoïa, let’s make sure this got in right:

In a paranoïa moment, we make sure the stash is saved correctly

Ok, let’s get on with our rescue mission:

mathieu@ml recovery (master)$ git fsck −−lost-found
dangling commit 93b0c51cfea8c731aa385109b8e99d19b38a55be

That sounds right, exactly one commit in the lost and found.

Let’s just make sure:

mathieu@ml recovery (master)$ git show 93b0c51cfea8c731aa385109b8e99d19b38a55be | mate

We see in textmate that this is our lost commit

Bingo!

Different ways to recover the commit

There are a few different ways to recover that commit. Obviously we can just copy and paste that snippet, but in the case of a bigger commit, that approach will just amount to a lot of error-prone busywork.

I’ll reclaim my Git ninja status and try it a few different ways.

Recover it with rebase

Let’s just replay this change on top of master:

mathieu@ml recovery (master)$ git rebase 93b0c51cfea8c731aa385109b8e99d19b38a55be
First, rewinding head to replay your work on top of it…
HEAD is now at 93b0c51… Now that was cool
Fast-forwarded master to 93b0c51cfea8c731aa385109b8e99d19b38a55be.

Commit recovered with rebase

Neat! Now I feel like a ninja worthy of the title again.

So let’s rewind one commit and try it another way.

mathieu@ml recovery (master)$ git reset –hard head^
HEAD is now at 6da726f… Greetings

Rewinding to a state where we’ve lost our commit

Ok, the commit’s gone.

(Don’t tell anyone but my inner ninja is feeling queasy again.)

Recover it with merge

There are cases where rebase is not powerful enough. For example when you expect to face a lot of conflicts. In this case merge is a better solution:

mathieu@ml recovery (master)$ git merge 93b0c51cfea8c731aa385109b8e99d19b38a55be
Updating 6da726f..93b0c51
Fast forward
 cool_file |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 cool_file

Commit recovered with merge

Too easy… Rewind!

mathieu@ml recovery (master)$ git reset –hard head^
HEAD is now at 6da726f… Greetings

Recover it with cherry-pick

If instead you had a few commits one after another but you just want to pick the last one, rebase and merge won’t do. They would bring the whole branch back in master. That’s a situation for cherry-pick.

mathieu@ml recovery (master)$ git cherry-pick 93b0c51cfea8c731aa385109b8e99d19b38a55be
Finished one cherry-pick.
Created commit f443703: Now that was cool
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 cool_file

Commit recovered with cherry-pick

Insane!

This only leaves one open question: WHO’S YOUR DADDY NOW, GIT?

Now that we’ve established the answer to that question, let’s get back to work!

Let’s make a second mistake

mathieu@ml recovery (master)$ git stash clear

Or was it Git stash apply?

Oops! Accidentally lost the stash

Oh jeez, there we go again…

mathieu@ml recovery (master)$ git fsck −−lost-found
dangling commit 24e3752f7a73ae98b361ce1c260e1f285d653447
dangling commit 93b0c51cfea8c731aa385109b8e99d19b38a55be

Ok, we still see the one we lost earlier, 93b0c51… Let’s look at the other one.

mathieu@ml recovery (master)$ git show 24e3752f7a73ae98b361ce1c260e1f285d653447
commit 24e3752f7a73ae98b361ce1c260e1f285d653447
Merge: 6da726f… c90f079…
Author: Mathieu Martin <webmat@gmail.com>
Date:   Sat Jun 7 16:02:57 2008 -0400

On master: Questioning the universe

diff –cc file
index 557db03,557db03..f2a8bf3
— a/file
+++ b/file
@@@ -1,1 -1,1 +1,2 @@@
Hello World
++What does that mean?

Spot on. Let’s try something wild, while we’re here.

mathieu@ml recovery (master)$ git checkout 24e3752f7a73ae98b361ce1c260e1f285d653447
Note: moving to “24e3752f7a73ae98b361ce1c260e1f285d653447″ which isn’t a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
  git checkout -b <new_branch_name>
HEAD is now at 24e3752… On master: Questioning the universe

mathieu@ml recovery (24e3752…)$

As you may have noticed, my console always indicates which branch I’m in, so far [2]. But now I seem to be in some kind of twilight zone, which Gitk confirms.

Oops! Accidentally lost the stash

Let’s follow Git’s suggestion and make that a branch.

mathieu@ml recovery (24e3752…)$ git checkout -b recovery
Switched to a new branch “recovery”

mathieu@ml recovery (recovery)$

Stash recovered as a branch

Looks weird, like stashed items always do, but at least we have our commit.

After fiddling around with what’s been recovered from the stash, I recommend NOT keeping it as a commit.

If you try to replay the change in the recovery branch over master’s most recent commit, you lose the “Questioning the universe” commit. Probably because a stash is a weird kind of commit, or maybe because of a bug. I don’t know.

(Don’t follow this one in your console)

mathieu@ml recovery (recovery)$ git rebase master  #I said don’t do this one
First, rewinding head to replay your work on top of it…
HEAD is now at 93b0c51… Now that was cool
Nothing to do.

Rebasing the recovered stash over master doesn’t work

If instead I checkout master and then rebase its last change over the ‘recovery’ branch it seems to work.

Recovered stash back in master

However since I just saw a commit disappear when rebasing the other way around, I get the feeling that this isn’t a normal commit and it may come back to haunt me later.

Recover it by applying a diff

Let’s just apply the diff to master. I’ll do as if it actually was a substantial commit, involving lots of modifications on lots of files, and apply it automatically with ‘git apply’.

First let’s visualize where we’re at, again:

Stash recovered as a branch

A diff against master is not what we want since master includes a new (very cool) commit.

Instead we just want to see the changes introduced by the current commit. To do this we can compare it with the common ancestor between the master and recovery branches. So let’s start by finding it’s ID.

Finding the ID of the common ancestor

mathieu@ml recovery (recovery)$ git diff 6da726f37683c83947d54314cd32ca1ee9d490e0
diff –git a/file b/file
index 557db03..f2a8bf3 100644
— a/file
+++ b/file
@@ -1 +1,2 @@
Hello World
+What does that mean?

Looks good. Now we throw that diff upstairs.

git diff 6da726f37683c83947d54314cd32ca1ee9d490e0 > ../recovery.diff

Then get apply it to our master branch.

mathieu@ml recovery (recovery)$ git checkout master
Switched to branch “master”

mathieu@ml recovery (master)$ git apply ../recovery.diff

And we finally confirm that everything’s under control.

mathieu@ml recovery (master)$ git status
# On branch master
# Changed but not updated:
#   (use “git add <file>…” to update what will be committed)
#
#    modified:   file
#
no changes added to commit (use “git add” and/or “git commit -a”)

mathieu@ml recovery (master)$ git diff
diff –git a/file b/file
index 557db03..f2a8bf3 100644
— a/file
+++ b/file
@@ -1 +1,2 @@
Hello World
+What does that mean?

This change was first stashed rather than committed because I felt it was not complete. Applying it with Git apply only introduces it as an unstaged change, which works perfectly for this situation. Now I can keep banging at the code until I feel this actually deserves to be committed.

mathieu@ml recovery (master)$ echo “I don’t know” >> file

mathieu@ml recovery (master)$ git commit -a -m “Conversation of staggering depth”
Created commit 65a4794: Conversation of staggering depth
 1 files changed, 2 insertions(+), 0 deletions(-)

Cleaning up the crud

Ok, so now I still have this weird looking recovery branch.

Now we want to get rid of this weird recovery branch

Since it’s now useless we can get rid of it.

mathieu@ml recovery (master)$ git branch -d recovery
error: The branch ‘recovery’ is not an ancestor of your current HEAD.
If you are sure you want to delete it, run ‘git branch -D recovery’.

Aha! This time everything’s committed correctly, so I know I can delete it for real. Git is complaining because that commit was not included through its normal merge or rebase commands. So it warns me that I may be about to lose something. However I know I got everything through the diff I made and re-applied.

mathieu@ml recovery (master)$ git branch -D recovery
Deleted branch recovery.

Now that I’m aware that commits are reachable even if they’re not in a branch anymore, I wonder about my repo’s size.

Repository size with a few dangling commits: 224kb

mathieu@ml recovery (master)$ git gc
Counting objects: 22, done.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (22/22), done.
Total 22 (delta 7), reused 0 (delta 0)

mathieu@ml recovery (master)$ git prune

Repo size after cleaning up the crud: 152kb

Fair enough. I would expect the unused commits to now be unreachable, but strangely enough:

mathieu@ml recovery (master)$ git fsck −−lost-found
dangling commit 49ed65cdea22443af3f1fd400754fe1517421b24
dangling commit 4b1bf4792cba929e88114379d7d5e86a2dc9990f
dangling commit 6cdf88318109dede7bd3c1a75be76c7255708ded
dangling commit 715a6b2cfe797383216d0f9b04fe8f50e90e779f
dangling commit f443703e5060d9f3b4d97504bda5f97e5a0b31e8

If anyone finds out what that’s all about, please let me know!

Maybe Git’s just refusing to do any work unless it’s going to actually save a considerable amount of space? I have no idea.

Conclusion

Once you know how to recover from bad mistakes, you’ll find that Git is not only a very powerful tool, but also a very forgiving one. As opposed to a motocross.

The following commands will help you figure you way out of most bad situations:

  • git show
  • git fsck −−lost-found
  • git diff

And these ones will actually get out of these bad situations:

  • git rebase
  • git cherry-pick
  • git merge
  • git apply

As I think I demonstrated, Git gives you the ability to recover from most bad mistakes. The fact that any single commit can be cherry-picked, checked out, rebased or merged makes it really easy to recover from hairy situations.

The only case where you might actually lose information is when something has not been committed or stashed yet, which I think is perfectly reasonable.

So if you take only one thing away from this article, let it be this. Git is much safer than a motocross.

Footnotes

[1] At the time I didn’t know that just having the SHA1 id was enough to save me.

[2] See how to configure your console in the same manner and also get auto-completion for Git here.

Posted in garbage out, git | 16 Comments »

Git is a dangerous tool to use

14th May 2008

Quote from the Git documentation:

<branch>

When this parameter names a non-branch (but still a valid commit object), your HEAD becomes detached.
Junio C. Hamano — the checkout documentation

Git — the only SCM that beheads its users.

Posted in garbage out, git, programming | 2 Comments »

irbrc for the runtime tramp

11th April 2008

I’m taking a break from the Rubinius for the Layman series. I don’t know why my posts always spiral into multi-thousand words essays :-) Even worse, when I try to write about Rubinius it’s so easy to get into ratholes and start fiddling, poking at and exploring this wonderful beast. All of that instead of writing, of course.

Tonight’s fiddling led me to playing around with my .irbrc file. I guess we’ve all searched for examples of config files for IRB at one time or another. When you spend a lot of time into your interactive console, you naturally end up wanting to tweak it to your liking.

However you’ll start having problems as soon as you start fooling around with the different runtimes. Or when you share your irbrc between different machines. Maybe even on different OSes. Either some gems are unsupported or you haven’t installed them yet (and don’t necessarily care).

So tonight I tried to fix that problem. My solution is only a small method that I define in my irbrc, so I’m not sure it’s worth putting it on github yet :-) Let me know what you think.

Presenting tramp_require

Usage

When my irbrc requires something just so I don’t need to do it manually:

tramp_require 'pp' #=> true/false (same return value as a normal require)

Outcome if the gem is not installed:

** Unable to require 'wirble'
--> LoadError: Did not find file to load: wirble

And you IRB loads without a problem (just without the gem pre-loaded).

When I require something I actually use in my .irbrc:

tramp_require('wirble') do
  Wirble.init(:skip_prompt=>true)
  Wirble.colorize
end

If the gem is loaded successfully, the block is executed.

If the gem isn’t loaded successfully, the block’s not executed and the same warning message is shown.

Note that the user code passed in the block is not being rescued: if it your code fails, it’s your problem :-)

Implementation

Here I’ll paste 2 equivalent implementations. The first one is a clean and understandable version ( also on pastie):

def tramp_require(what, &block)
  loaded, require_result = false, nil 

  begin
    require_result = require what
    loaded = true 

  rescue Exception => ex
    puts "** Unable to require '#{what}'"
    puts "--> #{ex.class}: #{ex.message}"
  end 

  yield if loaded and block_given? 

  require_result
end

This second version is the one I actually use, if I set the debug variable to false the result is the same as the previous implementation. If I set it to true, I get much more information, including the full backtrace of the exception (also on pastie):

$debug_irbrc=false 

def tramp_require(what, &block)
  loaded, require_result = false, nil 

  begin
    puts('', "requiring #{what}") if $debug_irbrc
    require_result = require what
    loaded = true
    puts "successfully required #{what}" if $debug_irbrc 

  rescue Exception => ex
    puts "** Unable to require '#{what}'"
    exception_details = "#{ex.class}: #{ex.message}"
    if $debug_irbrc
      ex.backtrace.reverse.each{|l| exception_details < < "\n   #{l}"}
    else
      exception_details.insert(0, "--> ")
    end
    puts exception_details
  end 

  if loaded and block_given?
    puts "executing block for #{what}" if $debug_irbrc
    yield
  end 

  require_result
end

Feel free to use these snippets as you see fit :-)

The obligatory full paste of my irbrc is available on pastie, for the curious.

One word of warning, however. I’m guessing some of you will rename tramp_require to something more boring. Just don’t rename it to irb_require. IRB already defines a method named this way.

Posted in garbage out, programming, ruby | 2 Comments »

Do not learn Ruby

20th February 2008

Ruby will get under your skin. You will miss its features and quirks when you’re not using it. You might even find other languages insufferable, once you get comfortable with Ruby.

After you’ve started using Ruby, there’s a significant chance you’ll start loathing whatever code base you currently have to work on. Especially if it’s a statically compiled language. A code base you used to think was ok, except for its few quirks.

After a while of perusing the different Ruby-related blogs, you’ll have heard other Rubyists speak of their work with words like beauty, productivity, expressiveness, conciseness, fun and you’ll realize just how far your current language is taking you from all of these words.

You’ll see

Dictionary<string, string> someDic = new Dictionary<string, string>();

And dream of

some_dic = {}

You’ll see multiple declarations for the same method, trying to emulate optional parameters and think of Ruby’s symbols and options hashes, where such a simple method as:

def method_with_options(options = {})
  encoding = options[:encoding] || 'utf-8'
  puts('Other option detected') if options[:other_option]
  #...
end

Enables all of the following uses

method_with_options :encoding => 'utf-16', :other_option => true
method_with_options :encoding => 'utf-16'
method_with_options :other_option => true
method_with_options

You’ll hear about metaprogramming and the complex syntax or frameworks that can bend Java, C# or C++ to allow a programmer to achieve what he seeks.

In the back of your head, you’ll think of the insane flexibility allowed by

- simple Ruby syntax for method chaining or redefinition;

- dynamic class definition, that lets you add methods to any existing class, even Ruby’s core classes;

- duck typing, where objects of any type can be passed to a method, as long as it responds to the expected method calls in a reasonable fashion;

and oh so many other Ruby niceties.

You’ll encounter twisted method definitions such as

bool SomeMethod(int param1, ref SomeClass someClass, out SomeEnum resultType, out string result)

and think of Ruby’s multiple returns that allows you to clearly define what’s a return value and what’s a parameter:

success, resultType, result = some_method(param1, someClass)

You’ll delve into huge, puzzling class hierarchies that struggle just to use the right abstraction level for class names… You’ll eventually realize that the whole hierarchy was simply there to share a few methods among loosely similar classes.

Then you’ll really get irritated at all the accidental complexity that could have been avoided by simply using mixins, where you define a common method and then include it in any appropriate class. And all of this without ever puzzling over strange abstract names for classes that happen to sit between 2 clear-cut levels of abstraction.

You’ll want to explore a new part of the .Net API by playing with it. You’ll create a dummy project in some random directory, find a name for it, include the proper parts of the API in the generic main class created by default and finally start playing with the construct of interest.

All this time you’ll be thinking of IRB, the Ruby interactive console. It not only allows you to play with an existing API with absolutely no fuss, but thanks to Ruby’s flexibility, you can even define classes in the console and then play with them!

But then you’ll think “Oh yeah, IRB is in fact so flexible that I can use if from a frickin’ web page (with a tutorial)!” And you’ll go play there for a couple minutes (when no one’s looking), just to keep you sane for a couple more hours. Until the C++ / C# / Java drudgery is over for the day.

You’ve been warned. If you learn Ruby, you’ll start thinking it’s impossible for you to keep using the technology you’re currently using at work. If you’re patient you’ll try to introduce it there gently (and most likely get frustrated at the time it takes). If you’re not so patient, you’ll just end up changing job.

Next monday I’m joining the great team at Karabunga to work on Defensio. I’ll be doing Ruby and a bit of Rails. Liberation is coming :-)

Posted in garbage out, programming, ruby | 53 Comments »

An easy way to make your code more testable

13th December 2007

James Golick wrote a very good article about testing a while ago. In it he dissects (and refutes) the too often heard arguments where people say they don’t write automated tests because they don’t have the time.

In the comments, some people concluded that yes, they should try to write more tests, but didn’t know where to start. In this post I won’t suggest frameworks, or specific tutorials. I’d just like to give one very first step that will help you write code that is easier to test. You’ll benefit from it even if you don’t use a testing framework yet.

As the title suggests, what I’m suggesting is pretty simple. Write side effects free methods/functions. Simple isn’t it? The rest of this article is just about explaining my point. So if the light bulb went off already, you can stop reading now.

I’m kidding, of course! So let’s not take anything for granted, instead let’s make sure we’re on the same page and define “side effect free”. It means that a function (substitute with “method” if you like) receives parameters, spits out a result and has not touched anything outside of it. There are two key parts to this definition:

  1. The function does not depend on anything else than it’s parameters: it does not expect a variable to be set outside of it to work properly (at the object, class or global level).
  2. It does not modify anything. Its result can be entirely observed either from the return value or from the exception thrown.

Of course you often have to modify the state of the application as a result of a computation. What I’m suggesting is not a substitute for that. What I’m suggesting is simply to put the juicy bits of your computation in a side effect free function. This part will be trivial to test, but you’ll still have the code that uses this function. That other part, which modifies the state of the app will need unit tests or other higher level testing.

Trivial examples of side effect free functions can be found in any good math library supplied with a language. Of course no modern language expects you to set a global variable in order to compute a square root. Those who want to follow the Ruby examples can do so on Try Ruby (fear not, you’ll be able to follow along even if you don’t know any Ruby).

Math.sqrt(4)
  => 2.0

And

Math.sqrt(-1)
Errno::EDOM: Numerical argument out of domain - sqrt
  from (irb):6:in `sqrt'
  from (irb):6

So there we have it. The result of squirt is observed either from the return value or from the exception thrown. We don’t expect any state to have been modified anywhere else in the application. I’ll present a less trivial example in a bit, but first let me just say that the direct consequence of writing this kind of code is that you can test it trivially, whatever your technique of choice.

  • If you’re in your debugger or in your interactive console, you can call it as many times as you want, with different parameters and check out if its behavior is what you expect.
  • If you use unit tests, you can code the interesting scenarios and verify their expected outcome, only in a repeatable manner.

Now since we all have a Math library of some sort in our language of choice, let’s look at another example: analyzing the parameters that will dictate the execution of your program. This is valid for configuration files with a bit more work, but let’s keep the example simple and just analyze command-line parameters.

Most languages provide us with some kind of array of parameters when entering our main function. The common way of dealing with them is to slap a big if or switch statement somewhere at the beginning of your program, which sets the state of the application accordingly, before actually starting to work on the application’s main task.

A side effect free approach would be to split the process in two parts:

  1. parse the parameters
  2. set the state / do some work

For example we could define a function that accepts an array and returns a hash (a Dictionary for .Net folks, a Map for Java folks) of the execution parameters:

{
  'arg1' => 'val1',
  'arg2' => 'val2'
}

So let’s say we start with the following method to analyze our arguments :

def analyze_args(arg_list)
  parsed_args = {}
  #We check that each argument is in the form 'arg=value'
  arg_list.each { |arg|
    key, value = arg.split '='
    if (key.nil? || value.nil?)
      raise "Some arguments are not in the 'arg=value' format"
    end
    parsed_args[key] = value
  }
  return parsed_args
end

Now we can trivially test the parsing of the command-line params:

analyze_args(["mom"])

RuntimeError: Some arguments are not in the 'arg=value' format
  from (irb):23:in `analyze_args'
  from (irb):20:in `each'
  from (irb):20:in `analyze_args'
  from (irb):29
  from :0
analyze_args(["mom=food"])
  => {"mom"=>"food"}
analyze_args(["mom=food", "dad=car"])
  => {"mom"=>"food", "dad"=>"car"}

Now that this part is taken care of, I can test it with whatever input I want, trivially.

To reiterate, in order to make your code easier to test, just extract the juicy bits of your program in side effect free functions and keep them apart from the rest of your program, which in turn makes sense of the result. At least the side effect free parts will be trivial to test.

This is just the beginning of the testability and automated tests journey, however. Of course you still have the rest of the program to test, preferably in an automated fashion.

Posted in garbage out, programming, ruby | 20 Comments »