Defining Terms

Pivotal Tracker Git Hook

At work we use Pivotal Tracker to manage some of our projects and we like to use the Post-Commit Hook Integration. Basically, if you put the Pivotal Tracker story number in your commit message, it will associate the commit with the story. It’s a nice feature that makes it easy to tie your code changes to features.

The problem is that I often forget to include the story number in my commit message. So, I wrote a small git commit-msg hook to check for a story number. If the message does not include one, the commit is aborted. I’ve been using the hook for a while and it’s been very helpful, so I’m publishing it.

The code is available on Github under an MIT license:

https://github.com/aag/pt-story-git-hook

Pitfalls of Ruby Mixins

Multiple Inheritance

Mixins are Ruby’s way of dealing with the multiple inheritance problem. If inheritance expresses an is-a relationship, the problem occurs when a class is more than one thing. For example, it’s easy to express that an Employee is a Person by making the Employee class inherit from the Person class. But what if Employees are also EmailReporters, who can email their status to their manager? How do you express that?

1
2
3
4
5
6
7
8
9
10
11
12
  class EmailReporter
    def send_report
      # Send an email
    end
  end

  class Person
  end

  class Employee < Person
    # How can this also be an EmailReporter?
  end

Other languages solve this problem by allowing a single class to inherit from multiple other classes, or by using interfaces. Ruby is a single-inheritance language, but solves this problem with mixins.

Mixins

Mixins are a way of adding a set of methods and constants to a class, without using class inheritance. The include method lets you include a module’s contents in a class definition. In addition to inheriting from one other class, classes can include any number of mixins. In our example, the Employee class can inherit from the Person class, but include the EmailReporter module as a mixin. Then, any methods and constants that are defined in the EmailReporter module are added to the Employee class.

1
2
3
4
5
6
7
8
9
10
11
12
  module EmailReporter
    def send_report
      # Send an email
    end
  end

  class Person
  end

  class Employee < Person
    include EmailReporter
  end

Mixins have simplicity as their primary strength. They let us share code between classes without some of the problems of multiple inheritance, which can be complex and sometimes ambiguous. They let us easily create lightweight bundles of methods that can be included in any class where they’re needed. This functionality is simple and convenient, but not without its problems.

Pitfall #1

Mixins have at least two major pitfalls. The first pitfall stems from how mixins are implemented. What really happens when you call the include method with a module? It seems like the module’s methods are injected into the current class, but that’s not actually how it works. Instead, the module is inserted into the inheritance chain, directly above the class where it’s included. Then, when one of the methods in the mixin is called, the interpreter starts going up the inheritance chain looking for the method, and when it gets to the mixin module, the method is found and called.

You can see in the above diagram that the EmailReporter module is represented right above the Employee class in the hierarchy, by the class with a dotted line around it. But where does this new class come from? The Ruby interpreter creates an anonymous class called an include class (or proxy class) that is a wrapper for the module, and this class is inserted into the class hierarchy, directly above the class where it’s included.

This all works great, except when a module defines a method that already exists in some other module or class in the hierarchy. When that happens, whichever definition is lowest in the hierarchy silently shadows, or covers up, all the other methods. That means the behavior of a method call can be determined not just by the class hierarchy and which modules are included, but the order of the include statements.

Let’s expand our previous class hierarchy to show an example of this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  module EmailReporter
    def send_report
      # Send an email
    end
  end

  module PDFReporter
    def send_report
      # Write a PDF file
    end
  end

  class Person
  end

  class Employee < Person
    include EmailReporter
    include PDFReporter
  end

  class Vehicle
  end

  class Car < Vehicle
    include PDFReporter
    include EmailReporter
  end

In this hierarchy, we have added Vehicle and Car classes to our previous hierarchy. Also, in addition to the EmailReporter module which emails reports, we have a PDFReporter module which writes reports to PDF files.

Because the Employee and Car class include the EmailReporter and PDFReporter modules in a different order, calls to the send_report method have different effects:

1
2
3
4
5
  an_employee = Employee.new
  a_car = Car.new

  an_employee.send_report # Writes a PDF
  a_car.send_report       # Sends an email

This dependence on statement ordering can be confusing and can make debugging more difficult. And this issue isn’t restricted to just modules interacting with each other. Methods defined in a class definition will silently shadow methods in included modules, and methods defined on classes higher up in the hierarchy will be silently shadowed by any modules lower down in the hierarchy.

Pitfall #2

The second pitfall of mixins is that they break encapsulation, so they can make code more entangled and make code changes harder. Consider the case of the standard Ruby Comparable module. When this module is included in a class, it expects the class to define a <=> method, which the module uses to define the <, <=, ==, >=, and > operators, as well as the between? method.

The Comparable module is very convenient, but consider what would happen if it changed so that it expected a compare_to method instead of <=>. This change would necessitate changing every class that includes Comparable. This is unlikely to happen with a standard Ruby module like Comparable, but it is fairly likely to happen with the modules you create in your application, especially at the beginning of development when you’re still figuring out how the different classes and modules should interact.

Instead of using mixins, it’s often better to create a new class and call methods on an instance of that class. Then, if the internals of the new class change, you can usually make sure the changes are wrapped in whatever method was originally being used, so the calling code doesn’t have to change.

Conclusion

In general, mixins are a good solution to the multiple inheritance problem. They can be very useful and make code sharing between classes easier than other solutions like interfaces or true multiple inheritance. However, when using mixins, you have to be aware of the potential pitfalls. Since mixins silently shadow methods, you have to be careful with method names and the order of include calls. Also, since mixins break encapsulation and can make changes difficult, you may want to consider using an instance of a class to perform the same function, especially early on in new projects.

Go Tools With Tmpfs

My desktop computer runs Linux and has an SSD and plenty of ram, so I’ve mounted /tmp to a tmpfs RAM disk, as others have suggested. This is nice because it makes things faster and cuts down on writes to the SSD, but it causes problems with some of the go tools.

Both go run and gotour compile binaries to /tmp and try to run them from there. When /tmp is a tmpfs volume, you will get errors like this:

1
2
$ go run hello.go
fork/exec /tmp/go-build995932098/command-line-arguments/_obj/a.out: permission denied

One solution is to set the tmp directory to a location that’s not in a ramdisk, just in the shell you’re using to run go. In a bash shell, you can do that with these commands:

1
2
$ mkdir ~/tmp
$ export TMPDIR=~/tmp/

Then the go commands will work correctly:

1
2
$ go run hello.go
hello, world

You will have to export TMPDIR each time you open a new shell in which you want to use the problematic go tools.

Windows Shell Bug

At work, we build one of our projects with Makefiles. In one of these Makefiles, we have this line:

1
IF NOT EXIST "$(OUTDIR)" mkdir "$(OUTDIR)"

This should create the output directory if it doesn’t exist. The line would hang the build on my development box from time to time, so I decided to track down what the problem was. It turns out, IF NOT EXIST will sometimes hang if there are quotes around the filepath, at least on Windows XP. You can verify this with a simple perl script:

1
2
3
for (;;) {
    system("cmd /c IF NOT EXIST \"\\Windows\" echo nodir");
}

Running that script will cause the cmd to hang after a short amount of time (usually less than a minute).

But, if you run this script, it will continue forever:

1
2
3
for (;;) {
    system("cmd /c IF NOT EXIST \\Windows echo nodir");
}

In our case, the filepath will never contain any spaces, so the solution was to just remove the quotes from the IF NOT EXIST command in the Makefile.

EngineYard SHA-1 Competition

A few weeks ago, EngineYard held a programming competition. Basically, the contest was to see who could get a SHA-1 hash closest to a given hash. I thought it might be fun to see how well I could do with a minimal implementation, so I coded up something in C.

My goal was not to write the most efficient program possible, but to see what results I could get from a reasonable design and no major optimizations. So, I wrote about 160 lines of C code, using OpenSSL for the hashing and the old K&R bit counting method. Depending on how long the message plaintext was, this got me about 1 to 1.5 million attempts per second per core on my 3.0 Ghz Core 2 Duo. The winning CPU-based entry, which was written by D. J. Bernstein, contained an optimized SHA-1 implementation and got around 10 million hashes per second per core. So, I was about an order of magnitude off, which seems reasonable to me.

It turns out the really fast implementations were all on GPUs. Both the winner and the runner up used Nvidia’s CUDA for fast GPGPU processing, which was cool to see.

I ran my program for most of the 30 hours of the competition. How did I do? My best result was 37 bits off of the goal, which put me 7 away from the winner.

C#, the Ternary Operator, and Mono

The Quiz

One of my coworkers recently sent out this C# programming quiz:

1
2
3
4
5
6
7
static void Main(string[] args)
{
    object x = null;
    object y = (short)4;
    x = (y is System.Int32) ? (System.Int32)y : (System.Int16)y;
    Console.WriteLine(x.GetType());
}

What is printed out?
Try it.
Explain why you were wrong.

If you run the code, you get this output:

1
System.Int32

The code snippet as it stands doesn’t make it clear exactly where the unexpected behavior is. This is a little more clear:

1
2
3
4
5
6
7
static void Main(string[] args)
{
    object x = null;
    object y = (short)4;
    x = false ? (System.Int32)y : (System.Int16)y;
    Console.WriteLine(x.GetType());
}

This code outputs the same thing as the above snippet. So why does this snippet output System.Int32, when x clearly gets set to (System.Int16)4 ? The answer is in the C# implementation of the ternary operator.

The Answer

In line 5 of the second example above, x is set to the result of the expression on the right side of the equals sign. Since the thing on the right is an expression, it must have a single type.

The C# Language Specification, in section 14.13, spells out how the type of the expression is determined:

The second and third operands of the ?: operator control the type of the conditional expression. Let X and Y be the types of the second and third operands. Then,

  • If X and Y are the same type, then this is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from X to Y, but not from Y to X, then Y is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from Y to X, but not from X to Y, then X is the type of the conditional expression.
  • Otherwise, no expression type can be determined, and a compile-time error occurs.

In our example, since there’s an implicit cast from an Int16 to an Int32, but not an implicit cast from an Int32 to an Int16, the compiler says the type of the expression must be Int32. Then, when our Int16 is returned, it’s typecast to an Int32.

More Types

The spec makes it pretty clear what is supposed to happen if there’s an implicit conversion from one of the operands to the other, but not the other way around. But what happens if there’s an implicit conversion in both directions? According to the spec, none of the first three conditions are met, so the compiler must output an error. There is an implicit conversion from a byte to int, and also one from const int to byte, as long as the int is small enough to fit into the byte. So, let’s try compiling this:

1
2
3
4
5
6
7
8
9
10
static void Main(string[] args)
{
    const int i = 1;
    const byte b = 2;

    object x = null;
    x = true ? i : b;

    Console.WriteLine(x.GetType());
}

The Bug

If you compile this with the .NET 3.5 compiler, it compiles without errors. There’s a warning about hardcoding true into the ternary operator, but nothing about the types. So, the compiler does not conform to the C# language spec. That’s a bug, but it’s not that shocking. There are other places where the .NET compiler doesn’t conform to the spec. It seems that Microsoft has a policy to leave such bugs in, so as to not break compatibility with existing code, so it will probably stay that way for the foreseeable future.

Mono

This got me wondering if the Mono compiler properly supports the spec. So, I tried compiling with the Mono 2.0 C# compiler. Here’s what you get:

1
2
Program.cs(13,17): error CS0172: Can not compute type of conditional expression as `int' and `byte' convert implicitly to each other
Compilation failed: 1 error(s), 0 warnings

So it looks like Mono conforms to the spec in this case. It’s a bit amusing that an open source project supports the spec better than Microsoft itself, but there are probably also cases where it goes the other way. However, this means that the Mono implementation is incompatible with the .NET implementation of C#. Now, this particular incompatibility is unlikely to come up that often, since there aren’t many types that have two-way implicit conversions with each other, but it’s something to consider.

The Law

The legal implications of this bug are perhaps the most interesting part. A few weeks ago, there was a lot of talk about the Mono patent situation. This has now been largely resolved with Microsoft putting C# and the CLR under its Community Promise. However, the Community Promise only applies to a given implementation “to the extent it conforms to one of the Covered Specifications.” If an implementation does not conform to the specification, it is not covered by the promise.

You can probably see where I’m going with this. If Mono decided to support compatibility with the .NET compiler by breaking from the spec and implementing the ternary type bug the same way Microsoft has, it might be giving up its protection against patent lawsuits. In order to be legally safer, it’s probably wiser for Mono to stick to the spec and break compatibility with the .NET compiler. This is significant, because the more situations like this crop up, the harder it will be for programmers to port their .NET code to Mono. There’s not much that the mono project can do about this, but it’s unfortunate that the legal situation forces their hand on compatibility.

Moving From Apache to Nginx

I’ve been having problems with too much memory usage on the 256 slice I use to serve my web pages. I was using Apache and mod_php, and the Apache processes were growing large enough that I could only have 4 of them running at once, which killed any hope of decent concurrency. I decided to switch to using PHP with FastCGI and after some research decided to go with nginx. The switch has now been made and page generation is faster, the server can handle greater concurrency, and memory usage is under control.

I actually ended up using nginx, PHP, PHP-FPM, and xcache. I’m running 8 instances of PHP and 2 nginx worker processes. First, I got started by using the step-by-step instructions I found on someone’s blog. After the initial setup, there was still a lot of configuration to do, some of which was not trivial. Below are some notes from my experience:

  1. When setting up FastCGI processes for PHP, make sure the firewall isn’t blocking access to the port the FastCGI servers are listening on. If your firewall is filtering packets to the port, you’ll get a “504 Gateway Time-out” page from nginx whenever you try to access a PHP page. To open up the port, if you’re using iptables, add something to your iptables startup script like this:

    1
    
    iptables -A INPUT -p tcp -s localhost --dport 9000 -j ACCEPT
  2. Somewhere I saw a redirect example to eliminate the www from URLs that caused me to add this to the config file:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    server {
        listen          80;
        server_name     www.definingterms.com;
        rewrite         ^/(.*) http://definingterms.com/%1 permanent;
    }
    
    server {
        listen          80 default;
        server_name     definingterms.com;
        [...]
    }

    I’m not sure why I thought I needed this, but it was messing up the loading of static images on this blog. The right thing to do is just list multiple server_names for the host, and let WordPress handle the redirect:

    1
    2
    3
    4
    5
    
    server {
        listen          80 default;
        server_name     definingterms.com www.definingterms.com;
        [...]
    }
  3. Moving most of my sites from Apache to nginx was easy, but there was the problem of rewrite rules. A lot of the software I use (e.g. Wordpress, Mediawiki, Gallery) needs rewrite rules. If you’re using Apache, there’s always an example .htaccess file either in the documentation, or in some user forum. However, the nginx rewrite rules work differently, and you can’t always find an example config, so it took some time and brain power to get them all right. Actually, this is true for all parts of the nginx configuration. The English documentation is sparse and it’s sometimes hard to find examples of what you want to do.

  4. nginx doesn’t have CGI support. If you’ve got those one or two sites that use non-PHP CGI, you’ll have to set up a FastCGI instance for each language you want to support. For example, I had a copy of ViewVC running, which is written in Python. With Apache, it was just using CGI. With nginx, I’d have to constantly run a whole Python FastCGI process for the rare event when someone wants to browse ViewVC.

In the end, I think the move to nginx was worth it purely for the performance gains, since it keeps me from having to pay for a heftier VM every month. But, it did take a fair amount of work and if I had slightly more complex needs, it might not have been a good fit for me.

Secure Passwords Not Allowed

For quite a while now, I’ve been using a tiered password system for all of the websites where I have accounts. I knew this was bad practice, but it was easy. Recently there have been a number of stories about websites’ databases being leaked, which made me seriously consider doing something better. Password managers have never impressed me much, both because of the security issues of storing all of your passwords in a central location and the danger of losing the database and not being able to reconstruct it. But, when I came across PasswordMaker, I liked what I saw.

Instead of storing the passwords, they’re generated on the fly using a cryptographically secure method. The software is all open source, so you know what’s going on and can reconstruct it if you need to. Finally, there’s also a very convenient browser plugin for Firefox that I can install everywhere. So, I finally bit the bullet and got away from my tiered password system by moving all of my online accounts to using passwords generated by PasswordMaker.

If I’m going to go through all of that trouble, I might as well use a secure password, right? So I decided to use passwords with letters, numbers, and punctuation. That shouldn’t be anything special, it’s just the standard recommendation for password security. Some systems even have that as a minimum requirement.

What surprised me was that a huge number of websites I use don’t allow passwords with punctuation, 20% in fact. Out of the 97 sites where I tried to update my account, 19 of them would not allow it. These ranged from hip web 2.0 sites like digg.com to big, corporate sites like geico.com. Here’s the full list:

So now I have to use a different set of characters for my passwords at these sites. Fortunately, PasswordMaker lets me configure different profiles for different sites, so I can set it up once and forget about it. But, why should I have to do that, especially when it makes my account less secure?

It seems like it just requires more work from the website makers to restrict certain characters, and I can’t think of any good reason to do so. It might make sense to restrict passwords to ASCII characters if their system doesn’t fully support Unicode. But disallowing all punctuation just doesn’t make any sense. The programmers might be worried about allowing escape characters in passwords, but it seems like it would be just as much work to protect the system against them internally as making an additional demand on the user.

If we’re going to expect users to have secure passwords, we need to allow them to do so. I’d like to see the above sites change their password policies, and I want any new sites to allow long, complex passwords.

Dinosaur Remix

I like Dinosaur Comics. Since the drawings and panels are the same in every comic, I thought of the idea of trying to mix and match panels from different comics to see what comes out. It turns out, Ryan North had already done something similar, but I decided to implement my idea anyway. Here’s the result:

Dinosaur Remix

Dinosaur Remix lets you randomly mix together panels, but also lock in certain panels to make a comic you like. Then you can add clever alt text and save it or send it to a dinosaur-loving friend!

It’s written in Python, PHP, and JavaScript. It was my first chance to really use jQuery, which is just about as awesome as everyone says it is. All of the code is available on GitHub.

Translation Hacks

Not so long ago, the only help available when translating text from one language to another was a dictionary and a grammar book. That’s how it was when I started learning German. Now, there are a number of tools online to help you in your bilingual quest, but you have to be clever to get the most use out of them. My examples will be in German, but these techniques can be applied to most languages that have a strong online presence.

  • A Good Dictionary - This is still your first line of defense against the hostile hoards of foreign words. For German, the best is clearly LEO. The reason it’s better than your trusty Langenscheidt’s or Oxford-Duden is that it’s user created and maintained, so you get idioms, slang, and current events.

  • Google for Grammar - If you don’t know which of two or three possible variants is the correct one, Google all of them and see if one has many more hits than the rest. Important here: put the phrase in quotes. This is especially good for things like which preposition to use with more common words. Let’s say you want to translate “I’m going to Chicago.” Which preposition do you use? Try searching for a similar phrase (in this case, altered for geographic relevance) and see if one stands out. Go to google.de and click the “Seiten auf Deutsch” (Pages in German) button. Then try three reasonable guesses:

    1. “gehe zu Berlin”: 7 hits
    2. “gehe nach Berlin”: 2,890 hits
    3. “gehe bei Berlin”: 1 hit

    It looks like we have a clear winner. You can be pretty confident translating your sentence as “Ich gehe nach Chicago”. However, this isn’t always foolproof. If you had searched for “gehe in Berlin” you would have gotten 576 hits and the results wouldn’t have been quite as clear. But, after reading the first few hits, you’d have realized that it’s not what you’re looking for.

  • Wikipedia - This is especially good for technical terms. Suppose you want to talk about the famous Quicksort algorithm in German. LEO won’t help you. So, go to the English Wikipedia page for Quicksort. Then, look down on the left side of the page in the “languages” box. Click “Deutsch” (if it exists) and you’ll be taken to the equivalent German page. In this case, you find out that Quicksort is the same in English and German, so now you can (not) translate it with confidence.

  • Names - You read a foreign name and you’re not sure what gender it is. Sometimes that doesn’t matter, but if you want to talk about this person, it sure is useful to be able to use pronouns. So, to figure it out, do a Google image search, preferably at the Google site for the country, and look at the results. Example: Johannes vs. Johanna (I read this somewhere online, but I’m not sure where. It’s especially helpful for Asian names.)

  • Machine Translation - I list this one last because it’s generally the least helpful. The two main free sites are Google Translate and Babelfish. These sites are slowly getting better, but right now they’re still of limited value. They’re good if you want to read something in a language you don’t know at all or it would take you a long time to do the translation yourself and you just want to get the gist of the text.