05.20.08

Evaluating equivalent solutions

Posted in tech at 1:43 am by stacywong

Consider the following regexes, numbered for readability:

1 f?oo
2 [f]?oo
3 [f?]oo
4 (f?)oo
5 (f)?oo

Do they behave as expected? Do they behave in the same way? Are they equivalent?

To answer if they behave as expected, we just need to set what our expectations are. Looking at the regexes, it’s pretty obvious that we are interested in matching on “oo” prefixed with an optional “f”. Now that we know what our expectations are, let’s throw these test cases at it, “oo” and “foo” seem to be appropriate.

We run our tests and discover all except for #3 have the expected result of a match success for both those test cases. Specifically, sneaky #3 will correctly match “foo” but fail to match “oo”.

Now on to the next question, “Do they behave the same way?” You might be tempted to say “yes” because they appear to return the same result given the same input. But, if you’re pedantic about regexes, you’ll remember that parentheses indicate that matched text should be stored in a capture group. I wrote a little program to print “$+” after regex matching, and got the following:

Input: foo
1 f?oo  MATCH           last capture:
2 [f]?oo        MATCH           last capture:
3 [f?]oo        MATCH           last capture:
4 (f?)oo        MATCH           last capture: f
5 (f)?oo        MATCH           last capture: f

When you’re evaluating different options on how to code something, don’t just look at inputs and outputs. Side-effects have the potential to be very dangerous, especially because they’re hard to debug. In this case we’re probably fine, but of course it depends on what your application is doing and who is calling you and how often…

So now we’ve looked at functionality and some of the side effects of our examples. Is it safe to say that options 1 and 2 are the same, and options 4 and 5 are the same? Not yet. Here is where the intangibles come in. For example, it can be argued that options 2, 4, and 5 are more readable, and hence better choices. Also consider that requirements might change — what if I wanted to change my regex such that it would match “boo” and “foo” but not “oo”? Then option 2 makes sense because the square brackets indicate character set, and I would only have to add the letter “b” in there and I’m done. Okay, how about “boo” and “foo” and “taboo”? Uh-oh. Now we can’t use the character set because we’ve introduced prefixes longer than a single character. Maybe 4 or 5 are the way to go…

This is my favorite part of the code reviewing process. It’s easy to tell if the code is working*, it’s moderately easy to tell if it’s not doing anything bad, but figuring out if it’s extensible, maintainable, scalable? That’s more art than science.

BTW, if we’re deciding between 4 and 5, my pick is for 5 since it makes it clear that the “?” is acting on the entire group of options, and not just the last one. :)

* Especially if you have tests!

05.02.08

Gold star stickers

Posted in tech at 2:52 am by stacywong

I pushed some bad software this week which had some unfortunate ramifications… but even worse than pushing broken code, the problem remained undiscovered for a long time because I neglected to test my stuff after the push. No excuses, no prevaricating — I did not test because I was being lazy.

How do we prevent this from happening in the future? You can document your standard operating procedures, but at the end of the day it’s at the discretion of the human to actually read and adhere to SOP.

This is why automation is good. You can automate your tests, and tie them to your development process — code can’t be checked in, built, or deployed unless all tests pass. However, setting up the framework for such automation across several systems often requires time, and even when you get to the point of a fully automated system, problems are still going to slip through the cracks. So until you perfect your automation, you still need to keep a human in the loop.

With that in mind, the question becomes how do you make sure your human does the right thing? First you need to make it easy to do the right thing. Then you need to create incentives to do the right thing.

Beyond personal integrity and reputation, what was my incentive to do the appropriate testing in this instance? I’m not really sure, but I think it is worth exploring how to positively reinforce correct behavior so that you do not rely on good intentions alone.