Defining Terms

Pivotal Tracker Git Hook

At work we use Pivotal Tracker to manage some of our projects and we like to use the Post-Commit Hook Integration. Basically, if you put the Pivotal Tracker story number in your commit message, it will associate the commit with the story. It’s a nice feature that makes it easy to tie your code changes to features.

The problem is that I often forget to include the story number in my commit message. So, I wrote a small git commit-msg hook to check for a story number. If the message does not include one, the commit is aborted. I’ve been using the hook for a while and it’s been very helpful, so I’m publishing it.

The code is available on Github under an MIT license:

https://github.com/aag/pt-story-git-hook

Pitfalls of Ruby Mixins

Multiple Inheritance

Mixins are Ruby’s way of dealing with the multiple inheritance problem. If inheritance expresses an is-a relationship, the problem occurs when a class is more than one thing. For example, it’s easy to express that an Employee is a Person by making the Employee class inherit from the Person class. But what if Employees are also EmailReporters, who can email their status to their manager? How do you express that?

  class EmailReporter
    def send_report
      # Send an email
    end
  end

  class Person
  end

  class Employee < Person
    # How can this also be an EmailReporter?
  end

Other languages solve this problem by allowing a single class to inherit from multiple other classes, or by using interfaces. Ruby is a single-inheritance language, but solves this problem with mixins.

Mixins

Mixins are a way of adding a set of methods and constants to a class, without using class inheritance. The include method lets you include a module’s contents in a class definition. In addition to inheriting from one other class, classes can include any number of mixins. In our example, the Employee class can inherit from the Person class, but include the EmailReporter module as a mixin. Then, any methods and constants that are defined in the EmailReporter module are added to the Employee class.

  module EmailReporter
    def send_report
      # Send an email
    end
  end

  class Person
  end

  class Employee < Person
    include EmailReporter
  end

Mixins have simplicity as their primary strength. They let us share code between classes without some of the problems of multiple inheritance, which can be complex and sometimes ambiguous. They let us easily create lightweight bundles of methods that can be included in any class where they’re needed. This functionality is simple and convenient, but not without its problems.

Pitfall #1

Mixins have at least two major pitfalls. The first pitfall stems from how mixins are implemented. What really happens when you call the include method with a module? It seems like the module’s methods are injected into the current class, but that’s not actually how it works. Instead, the module is inserted into the inheritance chain, directly above the class where it’s included. Then, when one of the methods in the mixin is called, the interpreter starts going up the inheritance chain looking for the method, and when it gets to the mixin module, the method is found and called.

You can see in the above diagram that the EmailReporter module is represented right above the Employee class in the hierarchy, by the class with a dotted line around it. But where does this new class come from? The Ruby interpreter creates an anonymous class called an include class (or proxy class) that is a wrapper for the module, and this class is inserted into the class hierarchy, directly above the class where it’s included.

This all works great, except when a module defines a method that already exists in some other module or class in the hierarchy. When that happens, whichever definition is lowest in the hierarchy silently shadows, or covers up, all the other methods. That means the behavior of a method call can be determined not just by the class hierarchy and which modules are included, but the order of the include statements.

Let’s expand our previous class hierarchy to show an example of this:

  module EmailReporter
    def send_report
      # Send an email
    end
  end

  module PDFReporter
    def send_report
      # Write a PDF file
    end
  end

  class Person
  end

  class Employee < Person
    include EmailReporter
    include PDFReporter
  end

  class Vehicle
  end

  class Car < Vehicle
    include PDFReporter
    include EmailReporter
  end

In this hierarchy, we have added Vehicle and Car classes to our previous hierarchy. Also, in addition to the EmailReporter module which emails reports, we have a PDFReporter module which writes reports to PDF files.

Because the Employee and Car class include the EmailReporter and PDFReporter modules in a different order, calls to the send_report method have different effects:

  an_employee = Employee.new
  a_car = Car.new

  an_employee.send_report # Writes a PDF
  a_car.send_report       # Sends an email

This dependence on statement ordering can be confusing and can make debugging more difficult. And this issue isn’t restricted to just modules interacting with each other. Methods defined in a class definition will silently shadow methods in included modules, and methods defined on classes higher up in the hierarchy will be silently shadowed by any modules lower down in the hierarchy.

Pitfall #2

The second pitfall of mixins is that they break encapsulation, so they can make code more entangled and make code changes harder. Consider the case of the standard Ruby Comparable module. When this module is included in a class, it expects the class to define a <=> method, which the module uses to define the <, <=, ==, >=, and > operators, as well as the between? method.

The Comparable module is very convenient, but consider what would happen if it changed so that it expected a compare_to method instead of <=>. This change would necessitate changing every class that includes Comparable. This is unlikely to happen with a standard Ruby module like Comparable, but it is fairly likely to happen with the modules you create in your application, especially at the beginning of development when you’re still figuring out how the different classes and modules should interact.

Instead of using mixins, it’s often better to create a new class and call methods on an instance of that class. Then, if the internals of the new class change, you can usually make sure the changes are wrapped in whatever method was originally being used, so the calling code doesn’t have to change.

Conclusion

In general, mixins are a good solution to the multiple inheritance problem. They can be very useful and make code sharing between classes easier than other solutions like interfaces or true multiple inheritance. However, when using mixins, you have to be aware of the potential pitfalls. Since mixins silently shadow methods, you have to be careful with method names and the order of include calls. Also, since mixins break encapsulation and can make changes difficult, you may want to consider using an instance of a class to perform the same function, especially early on in new projects.

Windows Shell Bug

At work, we build one of our projects with Makefiles. In one of these Makefiles, we have this line:

IF NOT EXIST "$(OUTDIR)" mkdir "$(OUTDIR)"

This should create the output directory if it doesn’t exist. The line would hang the build on my development box from time to time, so I decided to track down what the problem was. It turns out, IF NOT EXIST will sometimes hang if there are quotes around the filepath, at least on Windows XP. You can verify this with a simple perl script:

for (;;) {
    system("cmd /c IF NOT EXIST \"\\Windows\" echo nodir");
}

Running that script will cause the cmd to hang after a short amount of time (usually less than a minute).

But, if you run this script, it will continue forever:

for (;;) {
    system("cmd /c IF NOT EXIST \\Windows echo nodir");
}

In our case, the filepath will never contain any spaces, so the solution was to just remove the quotes from the IF NOT EXIST command in the Makefile.

EngineYard SHA-1 Competition

A few weeks ago, EngineYard held a programming competition. Basically, the contest was to see who could get a SHA-1 hash closest to a given hash. I thought it might be fun to see how well I could do with a minimal implementation, so I coded up something in C.

My goal was not to write the most efficient program possible, but to see what results I could get from a reasonable design and no major optimizations. So, I wrote about 160 lines of C code, using OpenSSL for the hashing and the old K&R bit counting method. Depending on how long the message plaintext was, this got me about 1 to 1.5 million attempts per second per core on my 3.0 Ghz Core 2 Duo. The winning CPU-based entry, which was written by D. J. Bernstein, contained an optimized SHA-1 implementation and got around 10 million hashes per second per core. So, I was about an order of magnitude off, which seems reasonable to me.

It turns out the really fast implementations were all on GPUs. Both the winner and the runner up used Nvidia’s CUDA for fast GPGPU processing, which was cool to see.

I ran my program for most of the 30 hours of the competition. How did I do? My best result was 37 bits off of the goal, which put me 7 away from the winner.

C#, The Ternary Operator, and Mono

The Quiz

One of my coworkers recently sent out this C# programming quiz:

static void Main(string[] args)
{
    object x = null;
    object y = (short)4;
    x = (y is System.Int32) ? (System.Int32)y : (System.Int16)y;
    Console.WriteLine(x.GetType());
}

What is printed out?
Try it.
Explain why you were wrong.

If you run the code, you get this output:

System.Int32

The code snippet as it stands doesn’t make it clear exactly where the unexpected behavior is. This is a little more clear:

static void Main(string[] args)
{
    object x = null;
    object y = (short)4;
    x = false ? (System.Int32)y : (System.Int16)y;
    Console.WriteLine(x.GetType());
}

This code outputs the same thing as the above snippet. So why does this snippet output System.Int32, when x clearly gets set to (System.Int16)4 ? The answer is in the C# implementation of the ternary operator.

The Answer

In line 5 of the second example above, x is set to the result of the expression on the right side of the equals sign. Since the thing on the right is an expression, it must have a single type.

The C# Language Specification, in section 14.13, spells out how the type of the expression is determined:

The second and third operands of the ?: operator control the type of the conditional expression. Let X and Y be the types of the second and third operands. Then,

If X and Y are the same type, then this is the type of the conditional expression.

Otherwise, if an implicit conversion exists from X to Y, but not from Y to X, then Y is the type of the conditional expression.

Otherwise, if an implicit conversion exists from Y to X, but not from X to Y, then X is the type of the conditional expression.

Otherwise, no expression type can be determined, and a compile-time error occurs.

In our example, since there’s an implicit cast from an Int16 to an Int32, but not an implicit cast from an Int32 to an Int16, the compiler says the type of the expression must be Int32. Then, when our Int16 is returned, it’s typecast to an Int32.

More Types

The spec makes it pretty clear what is supposed to happen if there’s an implicit conversion from one of the operands to the other, but not the other way around. But what happens if there’s an implicit conversion in both directions? According to the spec, none of the first three conditions are met, so the compiler must output an error. There is an implicit conversion from a byte to int, and also one from const int to byte, as long as the int is small enough to fit into the byte. So, let’s try compiling this:

static void Main(string[] args)
{
    const int i = 1;
    const byte b = 2;
    
    object x = null;
    x = true ? i : b;

    Console.WriteLine(x.GetType());
}

The Bug

If you compile this with the .NET 3.5 compiler, it compiles without errors. There’s a warning about hardcoding true into the ternary operator, but nothing about the types. So, the compiler does not conform to the C# language spec. That’s a bug, but it’s not that shocking. There are other places where the .NET compiler doesn’t conform to the spec. It seems that Microsoft has a policy to leave such bugs in, so as to not break compatibility with existing code, so it will probably stay that way for the foreseeable future.

Mono

This got me wondering if the Mono compiler properly supports the spec. So, I tried compiling with the Mono 2.0 C# compiler. Here’s what you get:

Program.cs(13,17): error CS0172: Can not compute type of conditional expression as `int' and `byte' convert implicitly to each other
Compilation failed: 1 error(s), 0 warnings

So it looks like Mono conforms to the spec in this case. It’s a bit amusing that an open source project supports the spec better than Microsoft itself, but there are probably also cases where it goes the other way. However, this means that the Mono implementation is incompatible with the .NET implementation of C#. Now, this particular incompatibility is unlikely to come up that often, since there aren’t many types that have two-way implicit conversions with each other, but it’s something to consider.

The Law

The legal implications of this bug are perhaps the most interesting part. A few weeks ago, there was a lot of talk about the Mono patent situation. This has now been largely resolved with Microsoft putting C# and the CLR under its Community Promise. However, the Community Promise only applies to a given implementation “to the extent it conforms to one of the Covered Specifications.” If an implementation does not conform to the specification, it is not covered by the promise.

You can probably see where I’m going with this. If Mono decided to support compatibility with the .NET compiler by breaking from the spec and implementing the ternary type bug the same way Microsoft has, it might be giving up its protection against patent lawsuits. In order to be legally safer, it’s probably wiser for Mono to stick to the spec and break compatibility with the .NET compiler. This is significant, because the more situations like this crop up, the harder it will be for programmers to port their .NET code to Mono. There’s not much that the mono project can do about this, but it’s unfortunate that the legal situation forces their hand on compatibility.