Messages and Transactions, Part 1: Erlang and its Discontents
With the rise of multi-core chips, everyone’s talking about how best to do concurrency. I think the two most interesting emerging paradigms are software memory transactions and message-based programming. This two-post series is about these two paradigms.
(Cedric Beust’s recent post about Erlang inspired me to accelerate this post a bit. Soon: the final Conscientious Software post.)
Message-oriented programming environments are structured around the dictum that all state is decomposed into modular elements that have no direct interaction. All state is local in a message-oriented system. Each stateful entity — processes in Erlang, or vats in E, for example — interacts with other stateful entities solely through asynchronous messaging. Entities can send messages to other entities, and receive messages from other entities. Message receipt happens one at a time; each entity is essentially a single-threaded message loop. Therefore, concurrency conflicts as such cannot happen, because no two messages can ever be operating on the same state concurrently.
There is much beauty to this model. Erlang heavily leverages the fault isolation properties of such a system; Joe Armstrong’s Erlang thesis (which seems to be offline at the moment?) describes basic patterns of constructing structures of cooperating processes with very high reliability. The basic idea seems to be recursive restartability. A process which receives an invalid message or whose state becomes corrupted is expected to immediately throw a fatal exception and cease to exist. Other supervisor processes monitor their children and detect these failures, restarting the children as necessary.
In E, the same ideas are used to push forward the frontier of interaction between mutually suspicious code. The core concept here is that on the open internet, particularly as software becomes more sophisticated in its interactions with other software, there is a need to allow systems to interact via messages while remaining able to protect themselves against misbehavior on the part of other systems. Message-oriented programming again serves to isolate, but in this case it is not isolation from unintentional bugs or from external failures, but isolation also from malicious behavior. Really this deserves a whole separate post, so I’ll leave it at that for now.
The great thing about systems like this, in theory, is that they scale very gracefully. Add more entities (processes / vats), and you get scale. Their inherent reliance on distributed messaging means that they work better with more processing power, no matter whether that’s multi-core or multi-machine. You can also upgrade a running system with new code, while maintaining full operation. Erlang is getting a lot of interest right now for this reason.
The thing that frustrates me about the current Erlang wave, however, is that most of the core Erlang papers — such as Joe Armstrong’s thesis — do relatively little to describe how one actually takes a real-world problem and expresses it as a message-passing system. For example, the thesis describes a very high-reliability ATM switch. It gives maybe a half-page sketch of some of the core modules of the switch, and it briefly describes which (not how, only which) some of the generic Erlang patterns are used in the switch’s code. But when it comes to the details — how requests are handled and handed off; how overload is dealt with; how failures are recovered at different levels of the system; how new features are implemented on top of existing running code — there’s basically nothing.
I really want to see some much more detailed case studies of actual Erlang systems that implement high reliability solutions, walking in detail through the whole code and process architecture, and talking about how upgrades and extensions are implemented on top of the base product. Without that level of explanation, it’s not at all obvious how an Erlang system is actually built. One small example: consider a bank account transfer, implemented as a single transaction between two account objects. How would that be built in an Erlang system? Would each bank account be a separate process? If so, how would you achieve transactional semantics across the withdrawal, which involves both accounts? Would you need some kind of transaction manager pattern? Are there higher-level patterns in Erlang that address these issues? Most of the Armstrong Erlang papers focus on failure and fault recovery and supervision, but not on how distributed data structure updates are performed.
The most detailed discussion about an implemented Erlang system that I’ve been able to find is the overview of the Mnesia database. But even here, the focus is primarily on the ways in which the database resembles or leverages other database technologies, and not so much on the ways it leverages Erlang for fault tolerance or online upgrade. The very things that seem to make Erlang unique are the hardest things to get detailed descriptions of! Frustrating!
Cedric’s post also makes a good point that Erlang seems lacking in the areas of error reporting and code structuring. Message-oriented languages in general have an uneasy relationship with traditional object-oriented reuse structures — message-oriented languages in some sense are inherently decoupled, tending towards structural typing (in which messages are described primarily by their contents); whereas traditional class-based languages are oriented towards nominal typing (in which the inheritance lineage of a class determines its compatibility with other classes). There are some hybrid approaches to bridging the two, for example the (somewhat defunct) HydroJ messaging framework for message-oriented programming in Java, but the two paradigms are definitely tricky to couple.
Software transactions and message-oriented systems seem to fall on two ends of a language spectrum. In the transactional case, it’s easy to see how to compose previously-sequential code and achieve some amount of concurrent scaling. In the messaging case, it’s not so easy to know how to take a distributed problem and decompose it into messages. I look forward to much more explanation of the design process for message-oriented systems, since only thus will they be competitive with the software transactional model, which is more immediately intuitive to sequential programmers (especially those who already have database experience).
Coming soon: the software transaction end of the spectrum. (And then back to finish off conscientious software!)
Edit after original post: Looks like Joe already answered my transaction question. Solution: bundle up your updates into a single message that you send to the database server. Dang, that’s a foreign concept to me, being a Java weenie used to the Hibernate wrap-your-serial-code-in-a-thread-local-transaction pattern. Erlang definitely demands you structure your code to its patterns from the ground up. A bad thing? Not necessarily… but what about composable concurrency? Better get my next post done soon!