Natural Language is Conflated

For a long time I’ve been unsatisfied with natural language. At first, in my naïvety, I thought it was just English that was unsuitable, but it seems there’s a deeper problem that even engineered languages like esperanto don’t help with. People talk past each other, perpetually missunderstanding, driven apart by nitpicking correct phrasing.

When people asked me if I thought in words or pictures I blinked at them, “I think in thoughts, doesn’t everyone?”. I’m still not sure, but it seems that many do not notice the inadequacy of words. Maybe I’m exceptionally bad at phrasing, or maybe most people subconsciously restrict their thoughts to mirror their words so that translating isn’t a great big mess.

It remains to clarify: what makes up a thought, how does it differ from speech? First, what is similar? They’re both ways to pick out points in thingspace, (which you should understand as an injective map from thoughtspace/wordspace resp. to thingspace. Almost technically, there’s actually maps wordspace -> thoughtspace -> thingspace, and the map wordspace -> thingspace is the unique composition, so this suggests that thoughts are at least as powerful as words. Thoughts are anonymous, but words are named. Effectively this means that you can pull a word out of the aether by its name, but you can only pull up a thought by association; imagine words as organized like a dictionary, and thoughts organized by a graph, with edges as metaphors/links in associative memory etc. (this gives us some indication towards their structure as categories). The most apparent difference is that thoughts can refer directly to internal perception, whereas words are completely incapable of serializing sensory perception unless the receiver shares certain experience with you. Words can refer to the thought of sensation (which you assume they share), but not the sensation itself (which they almost certainly do share, being human).

While Wordspace and Thingspace are straightforward at least in concept, Thoughtspace deserves further explanation, as the mediation between the two. The objects of interest in Thoughtspace can be thought of as spatiotemporal firing patterns in the brain. Choosing the correct formalization for these firing patterns  is a hard scientific problem (where should we delimit the boundary of one pattern to the next?) so we’ll sidestep the problem for now by invoking the Mind/Space hypothesis, instead considering the subjective experience of firing patterns as ‘formations’ relative to a fixed observer. Oddly enough, the boundary problem disappears in this perspective: each thought obviously feels distinct, but they are connected by a similarity metric. The hypothesis is that these formations have some scientific characterization in the brain, which seems like a reasonable assumption. I suspect that thoughts only have a clean representation as formations relative to their originator, and that comparing the “objective” representation of thoughts as firing patterns is intractable in general, because of the encryption problem. So “formations” really are the more natural setting.

Thoughtspace is the world we really “live” in – it’s the only world we can actually experience, but through it we can know both Wordspace and Thingspace. The structure of Wordspace is logical. The objects of interest are strings of symbols (interpreted as logical propositions), and their connectives are algebraic manipulation. We can think of it as the union of all symbolic logics. This includes not just traditional logic, but also all constructive objects, arrangements of particles, etc. Since thoughts are “just” arrangements of discrete particles in the brain (but maybe not, if continuous quantum interactions turn out to be important for cognition), we could represent thoughts directly in Wordspace. However it makes more sense to think of it as a sort of completion of Wordspace, where the thoughts are distinct objects, that can be represented as potentially infinite arrangements of particles. The dynamics of thoughts-in-the-world, then, are inherited via projection (Thoughtspace -> Wordspace) into a finite brain. Going in the other direction, Wordspace can be embedded in Thoughtspace as those thoughts that can be written down (or otherwise serialized). This embedding is not natural though: the association between words and thoughts is learned (in a non-unique way) through experience – it includes many extensional elements such as feelings, objects (tigers, etc.), and temporal patterns. (A lot of communication errors can be cleared up by remembering that words don’t have independent meaning, trying to determine what thought the other party has in mind, rather than assuming they use the same mapping).

Thoughtspace inherits the logic of Wordspace but it also has it’s own fuzzy logic. Internally, these fuzzy arrows are the “feels like” connectives. For example: a bench is like a rocking chair; a bench is more like a rocking chair than it is like a snake. We can justify the statement that a rocking chair is like a bench, but it is not a logical statement. Rather, the “sameness” is a summery of how many contexts they are equivalent in. Ex: For “most purposes” a rocking chair is indistinguishable from a bench. This acts like a sort of probability space, with 1 being equivalent in all contexts and 0 being never equivalent, so we can start with the category Stoch as a rough approximation. We can also talk about the properties of contexts where they are similar, so the arrows have computational content in addition to weight. For example, rather than considering all contexts, a rocking chair is even more similar to a bench when restricted to contexts of sitting – they’re still a bit different in that sitting in a rocking chair feels a bit different – on the other hand, benches are often made of stone while rocking chairs almost never are, detracting from general similarity but not affecting sitting much at all. It’s all a bit handwavy but I hope the intention is clear – enumerating contexts and quantifying “sameness” is the key.

Careful though! This similarity space is in general NOT a consistent logic in the way you’d expect. My favorite illustration of this principle is the “numerologist’s folly”, exemplified by crackpot sites like this. What’s going on here? At the risk of sounding silly, I’ll point out the general flaw: While each step has high similarity, the contexts they consider are different, so care has to be taken when composing them. “New York City”, “Afghanistan”, and “The Pentagon” are all similar in their number of letters, and it’s true that New York was the 11th state of the union, but these two contexts are different! If we want to compose them as connectives, we have to conflate the contexts, so “New York City”, “Afghanistan”, “The Pentagon”, and “The state of New York” are all similar in the context “number of letters OR state order of joining the USA” which doesn’t seem like a very useful connection now does it? It’s a silly and obvious example, but this kind of conflated reasoning happens quite frequently in more subtle cases. Understanding the true structure of Thoughtspace will let you wield it’s power while avoiding such pitfalls.

The relation between wordspace and thoughtspace can be made sucinct by considering the quote:

When you draw a boundary around a group of extensional points empirically clustered in thingspace, you may find at least one exception to every simple intensional rule you can invent.

Then the embedding of wordspace into thoughtspace/thingspace are exactly the ones that can be described by intensional rules.

The major task now is to translate these subjective dynamics into something that can be quantified mathematically, measured externally, and communicated clearly between people.

Advertisements

Language Proposal: Meme dereferencing operator

I lied, no programming today. Instead, metaphysics! But actually, natural language is very similar to computer language, except everyone’s afraid to write a standard for it.

Before we start, I’d like to clarify my meaning of meme, given the cultural baggage it has acquired. By meme I refer to “formal cultural entities”, such as religions, companies, political parties, etc. But also to such phenomena as “the fashion of Victorian nobles“.

Additionally, I use “thought” not necessarily to mean a thing which one thinks, but as a thing which one may think, a conceptual ideal. This definition is tricky. In some sense, thoughts are everything. For if you cannot think it, you cannot perceive it, and so it exists in no meaningful sense.

EDIT: Since becoming acquainted with the lesswrong community, I’ve realized the concept I’m describing here is more commonly known as a “thing” in “thingspace“, to be deconvoluted from a “thought” in “thoughtspace”. 

Now, let us proceed by analogy and trust for now that there is a point hidden here somewhere.

(1) You can write point-free style in Perl with a little (a lot) of finagling, but would you WANT to? Of course not! By not allowing functions (and higher order combinators) as first class, you discourage the use of certain styles. In the same way, EVERY language feature subtly affects the kinds of programs that are commonly expressed in the language.
Now, a program (source code) is merely a description of a procedure  In an ideal world, we’d be able to traverse the space of procedures and pluck the right one directly. But the space of procedures is too large to be first class in our universe, so we need programs to give us a handle into this space.

(2) Now consider a similar problem:
One of the great abstractions of “high level” languages is being able to pretend that we’re really dealing with values directly: the addresses are hidden. In reality the machines have to identify addresses to push data around to, a fact that becomes painfully apparent when working with (ex.) Assembly or C pointers. But actually, C pointers are an elegant approach to address specification. With pointers, we do not specify memory addresses directly, the space of address is too big to be practical. (Though, we could specify them directly if we chose, the space is not so big as to be impossible.) Instead, we obtain the address of the relevant data through the reference operator.

Enough computers, what does this have to do with natural language?

Recall problem (1). Natural language serves the same purpose as a program, where procedures are thoughts. We have no way to directly specify objects in this (mindbogglingly infinite) “thoughtspace” so we use words as a handle into the space. But here’s a problem: thoughtspace is big. Really big. Armed with only our limited language, some ideas will take infinite time to express (consider the completion of the rationals to the reals). Now, you may wonder if only infinite ideas require infinite time. That would certainly be a nice property and is a valid question. However, given the incredible vastness of thoughtspace, I suspect that there exists an infinite family of “holes”: Seemingly innocuous ideas which nevertheless cannot be expressed in finite time (imagine spiraling towards a central point, refining your statement with an infinite series of words.) Even if this is not the case, weaken “infinite time” to “prohibitively long time”, and I think there is little question that this is a true problem.
Any given hole can be plugged by anchoring it to a reference point in our universe, either a physical pattern (“that’s a tree”), or via reference (“the way that a clock moves is clockwise”). Thus, the holes are dependent on the language; the language shapes the ideas we can express.

Necessarily, the things which exist, the “reified thoughts”, are only a small subset of possible thoughts. This shapes our own language, as things which readily exist are much easier to encode in speech than those which must be conceived by analogy. As beings of abstraction, we can perceive certain high level abstractions directly, as “first class”. Ex. A forest is a collection of trees, but a tree is a tree. We naturally perceive it as a unit, though in reality a tree is a complex collection of processes. We can easily do this for things which exist at a “level of abstraction” ≤ our own (The case of equality is particularly interesting, but I will not get into it at the moment).

Finally, we may consider memes. Memes are in some sense, the simplest step up from our current level of abstraction. We cannot perceive them directly as a single unit because we are a part of them (or they a part of us, depending on your perspective), in the same way that a (very intelligent) cell could not easily conceive of the body to which it belongs. Because of this, we find it hard to describe memes. A common way of referring to complicated memes without regurgitating their entire description is by implicitly tying them to more easily expressible notions, kind of like a generalized synecdoche. That is, by listing other easily named memes which are “commonly” associated with it under certain circumstances.

This method causes a host of problems, which unnecessarily limit the expressible statements. Of primary concern is ambiguity. It is often not clear when one is refering to the literal idea or the meme associated with the idea. This problem is often resolved by cultural context. That is, people of a certain similar mindset will understand the difference in their own communication, but this is not stable across mindsets, it is almost impossible to communicate in this way across large cultural gaps.
There’s a related problem. By nature of our language, an unqualified statement (usually) contains an implicit ∀. This directly conflicts with implicit meme dereferencing, and the interpretation of which is intended is subject to ambiguous context. Mixing up ∀ and “associated meme” is a dangerous thing to do and can lead to sweeping generalizations. Remember, we are referring to people here, and sweeping generalizations lead to various forms of prejudice.
This is the heart of the problem: the meme is referred to by association with a concrete idea, but exact relation between the concrete idea and the meme is unspecified and ambiguous. It can be avoided by making the link explicit, but the amount of qualification required to avoid this ambiguity is prohibitively expensive, so these types of statements tend to simply be avoided, limiting our expressive power greatly.

While truly fixing this problem essentially requires a new, carefully designed language, we can make an immediate improvement by at least specifying when we are making a connection. To this end, I propose a new primitive connective: ☀ to mean roughly “The meme primarily inhabiting  and to be used as a sort of “dereference” operator. This will at least allow an unambiguous instantiation of “association”. While it cannot represent more complicated associations than “primarily inhabiting , it covers most common use cases. There are issues with ambiguity when multiple memes may inhabit the same collection of people, which becomes more severe when the memes are mutually exclusive. Correct and clever usage of ☀ can remedy this. It is helpful to imagine trying to indicate an animal to someone where you are only allowed to speak of its environment. Ex: ☀Temperate city skies. Can you guess Pigeon?

I’ve played a little trick here. It is not immediately clear that “people inhabited  is a consistent description of memes. Why should memes associate like that? Genes associate because they can only be combined in discrete pairings with small mutation (mating), so you’re going to get something close to the parents. Memes combine in much more complicated ways, and it’s not clear that they would preserve these associations  In fact, there’s a deeper reason for why these associations hold. In biological evolution, organisms reflect their environment. In some sense, a successful organism is a “statement about how to survive in that environment”. What’s interesting about memes is that they act as both the environment and the replicator. More on this later.

Minimalist Programming

I’d like to take some time to speak on Minimalism. I don’t mean anything like http://www.wikipaintings.org/en/kazimir-malevich/black-square-1915.

Nor, at the moment, do I mean this:

#include <stdio.h>
void main() {
  printf("It's so simple!");
}

But perhaps something more like this:

You may at first think of C as the de-facto ‘minimalist’ language. There is certainly something quite charming about the simplicity of undecorated C. It has regular syntax, simple semantics, and a direct cost model. Certainly, the very definition of minimalist, with all its benefits! So what’s the problem? There are no doubt die-hard C fans wondering the same thing.

I will skip the mystery. The problem with C is reusability, for a variety of reasons. Steadfast C programmers routinely roll their own hash maps, linked lists and trees, yikes. This is the price we’ve come to accept for C’s minimalism.

C++ (and later D) attempted to solve this problem by bolting on an object and accidentally conceived template system. Actually, D can be quite nice, but you can hardly call either of them minimal.

In Guy Steele’s classic talk, he addresses (among many other things) the problem of minimalism from a language designer’s perspective. At the time, the solution of “minimal with user extensibility” chosen for The Java Programming Language© proved greatly effective for its success. However, even Java enthusiasts would likely agree that modern Java is brimming with user-built complexity.

It is often said, “never write the same code twice”, but I should say “never solve the same problem twice”. The subtle difference being that you may unknowingly write different code to solve the same problem! What use are 20 physics libraries when none of them do what I want? Java’s extensibility is not always the right kind of extensibility.

To take advantage of problem similarities, we need a more generic view of programming. One way this can be achieved is through higher order functions, and a whole host of other generic goodies provided by the so called “functional languages”. In this approach, the “scaffolding” is disentangled from the solution, allowing the same solution to be fit into different shaped problems. I will return to this notion of composability in the future, as there is quite a bit to say.

Of course now we’ve just gone and thrown minimalism out the window again.

Should we give up this dream of practical minimalism? I think not! Is mininamlism the right fit for all problems? Maybe not; it may be that verbosity is a price to pay for flexibility, but we can certainly do much better than at present. To achieve minimalism we must fearlessly distinguish the necessary primitives and forsake the unnecessary components without remorse. In the next few posts I will detail some of what I think may be key to its substantiation.