Conscientious Software, Part 2: Provable Robustness
Second post in a series. Part 1.
Onwards! Now, what’s really interesting to me is the further implications of this autopoietic / allopoietic dichotomy throughout the software stack. The Conscientious Software paper talks about potential means of implementing autopoietic software in terms of more vaguely defined membranes and other sorts of perception-oriented interfaces between software components, along the lines of Jaron Lanier’s work.
To me, that is a lot harder to think about than the Google example, in which a lot of allopoietic engineering (very clear definition of subsystems, logical contracts, operational proofs, and other extremely “rigid” software engineering techniques) was used to create a system that, to a large extent, is reliably self-sustaining.
In other words, it seems to me that we want our autopoietic systems to be as reliable as we can make them. And the best techniques we have for designing something to be reliable are essentially allopoietic techniques.
Let’s look at this another way. A lot of programming language research right now is devoted towards integrating proof systems into various languages. In some sense, every strongly typed language has a proof system built into it. A type system is a proof system (see Wadler’s paper on Proofs As Programs). The power of these proof systems and type systems is steadily increasing — a lot of work lately is going into logics for proving file system safety, data structure heap correctness, distributed messaging consistency, and on and on.
We are continually developing conceptual techniques to increase the accuracy and correctness with which we can describe the desired properties of a system, and prove that the system in fact has those properties as implemented.
It is a fascinating question, to me, whether all those tools can be turned to the purpose of developing autopoietic systems. Let’s take one example: the above-linked work on logically proving that a file system is reliable.
The intention of this work is to model the state of the computer’s memory relative to the known state of the computer’s disk, in terms of what the memory believes to be true versus what the disk’s contents declare as true. Logical statements relate changes in the memory beliefs to changes in the disk’s beliefs. Overall, there is an error if at any point one can prove that the system’s disk beliefs are inconsistent with memory.
In a sense, this proof work is intended to make the file system robust in the face of any and all environmental failures. The robustness comes from construction, not necessarily from recovery. This is building reliability into the system at the very bottom.
It is not too difficult to postulate that such logics could be further developed, for example, to characterize the consistency properties of the Google File System, or of other large-scale distributed software infrastructures. (Boy, what a handwaving sentence! Yes, it’s not too difficult to postulate that. Actually doing it, on the other hand, is likely to be extremely difficult indeed. But over time even hard problems can get solved….)
Another direction in which to deliver increased reliability is internal self-protection. Almost all major organisms on Earth are composed of cellular structures. A cell is defined by its boundary — without a boundary, a cell does not exist. Some programming languages have similarly oriented themselves around the concept of bounded, encapsulated units which are composed in large numbers to create a greater whole. Possibly the most industrially tested and publicly available example of such a language is Erlang, which composes large and immensely reliable systems out of many small software components, which communicate only by asynchronous message passing. There is no global state in an Erlang program, and no shared state among independent objects. This makes Erlang innately fault-tolerant, insofar as the failure of any one component cannot immediately cause other components to fail. This also allows the system to be upgraded while continuing to run. A really successful autopoietic system eventually reaches a point where it needs to provide continuous availability, with no externally visible downtime at all, ever. Systems decomposed into pure message-passing subcomponents inherently support this, since individual components can be replaced cleanly while the rest of the system continues normal operation.
Of course, autopoietic layers need to be built as other Erlang objects (a common Erlang idiom is to have a number of worker objects overseen by a smaller pool of supervisor objects — this is exactly an autopoietic software layer), but the innate properties of the language contribute to the enforced internal modularity which leads not only to improved stability but also to ease of distribution. (Autopoietic systems will necessarily also be distributed and parallel systems, since any individual hardware unit may fail, and if the overall system is to survive it must have redundant parallel hardware.)
Large-scale autopoietic systems currently, and in the near future, will combine these properties. They will be systems that provide a general-purpose computing environment, with well-defined and consistent contracts, which are engineered with as much built-in and provable correctness as we can give them, which are modularized into independent and well-protected components, and which provide sufficiently reliable guarantees to the allopoietic programs running in those environments.
What is especially interesting is that this view flies in the face of the original paper’s claim that what’s needed is more vagueness, more inconsistency if you will, more tolerance between components for partial failures or partial communication. I’m not convinced by this part of their argument, at all. I think that we currently lack the ability to engineer in that style. Perhaps ultimately we’ll make greater use of artificial evolution for engineering design, and I would expect such design tools to be far better than we are at leveraging interfaces that are ambiguous or otherwise not rigidly defined.
Next: what happens if software can get sick?