Programming is Compilation, Compilation is Optimization

There seems to be a view even (especially?) among expert programmers that human programmers are special, how silly. Of course, there are some tasks better suited to human brains than existing computers, but there seems to be a false boundary drawn between them. There are many such false boundaries, consistent with the fact that the machines will eventually take our lunch and eat us too, but I want to focus on efficiency and optimization. There is a view that only humans can do the low-level optimizations necessary to get really fast code. Indeed, the common benchmark for any shiny new tool is the “handrolled” C/Fortran/Assembly.

Why can’t we do better?

The common thing between C/Fortran/Assembly is that we’re basically working at the bare machine level, with some convenient syntax (ok maybe Fortran is a bit smarter for, but we lose generality because of it, limited to numeric code). So a programmer using them is really using the bare components of the computer. This should tell us something about compilation and why high-level languages get the same performance. In a high-level language we’re no longer working with the machine primitives, but more sensible logical primitives. Then it’s the compiler’s job to convert from these higher-level constructs to the real machine primitives, ideally the most efficient form possible. But look: the programmer who writes their program in Java and compile are conceptually doing the same thing as the programmer who write their prototype in Java and then hand-write a fast C implementation, but the C programmer typically does a better job at efficiency.

In the most abstract, compilation is a transformation from a source language to a target language. It just so happens that some languages have native implementations ~in the universe~ , while others do not. So we start with human thought, that’s a language but not very well suited to implementation on a computer (it is however, perfect for implementation on human hardware!), a programmer compiles that (using their brain and fingers) into some programming language, the compiler then compiles it down to machine code, and the computer interprets it into ~physics~. You might say that people think in specifications but programs need to be written as implementations; but if the programmer always had to choose an implementation then we’re doing no better than assembly. Indeed, something as simple and prevalent as garbage collection is a step away from programming-as-implementation towardsprogramming-as-specification. One example where this is most obvious is in the language prolog, where programs are logical propositions and the language fills in the values in some possibly non-deterministic way (but note, for efficiency reasons, programmers still often have to reason about how the search is implemented, and modify it using things like cuts).

Let’s take a moment to discuss the distinction between compiling and interpreting. Programming is fundamentally about descriptions and language conversions. A program is a description of an action to be performed, an action is also a description of itself. But notice there’s a distinction between these types of descriptions: a program is timeless, it exists all at once in a single place and time, but an action is strung out cross time. This is not such a weird property of a language though, consider that spoken language is also strung out across time. From a pseudotheoretical perspective, things that are strung out across time are “additive”/”co-data“, constantly unfolding a bit at a time and potentially infinite, while things like written languages are “multiplicative”/”(constructive) data”: they exist in their entirety, built from finite components. So roughly, a compiler converts to data, while an interpreter converts to co-data. In practice, the data representations are much more useful than co-data, and the only time we ever want to translate to co-data is then we’re “running” our program. co-data also has the useful property of “productivity”, where for every sub-thing that it processes, it produces some output, unlike constructive things which aren’t well defined until they’re fully built.

Notice: programming is time-like, we must take the type to type into the computer; running a program is also time-like, we typically want our program to perform some task over time. So it seems that “interpreting” is the most natural way to do things: for every bit of program typed in, produce as much relevant output as possible, this is exactly the value of hot swapping. Then why do we ever compile things? In practice, people compile in order to do some types of verification and optimization, but is there some universal reason for this? This is a deep topic and I’ll talk about it more next time, but the short version is that the connective structure of different languages is different, so we must hold the whole program in our head at once to make the most effective “whole program optimizations”

The bottom line is that programming is just compilation from thought to machine, so there’s no reason we should have to compile directly to machine in our heads! Much of the problem comes from people viewing programming as writing instructions for dumb computers (“tell it how to do things”), leading them to write over-specified programs, with too many constraints on how its implemented for the compiler to optimize much. What we should be doing is figuring out how to encode what we want as completely and accurately as possible. People are afraid to program-as-specification because existing systems aren’t very good at getting efficient results; true, but this is not a fundamental limitation and that view is holding us back. There’s also some that think of programming-as-specification as a kind of WYSIWYG fake-programming for novices; I counter: it takes a certain kind of skill and clarity to frame a problem in a way that is both entirely correct and minimally constrained, exposing all of your domain knowledge to the compiler.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s