Convoluted reasoning

There’s a general trend to convoluted reasoning, which I think captures a wide range of common logical flaws. Convoluted reasoning captures a frighteningly common pattern even among brilliant thinkers: It’s all too easy with informal reasoning to follow a chain of intuition, where each step seems reasonable, only to end up at a totally incorrect conclusion. This is often used maliciously to convince people of nearly anything, as a defense, giving rise to “motte-and-bailey doctrine”/”strategic equivocation”, internally as rationalisation, and in the most sinister case, as a route for smart people to do something truly stupid. It is also the root of the map/territory conflation.

The mark of convoluted reasoning is implicit conversion between distinct objects which are related but follow very different logics. Let’s be more rigorous:

Consider a collection of categories and a chosen isomorphism between each. Take, for example, the Mind / Space categories, or many copies of the category of (your favorite) verbal logic, connected by the isomorphisms swapping between alternative definitions of words, or the categories induced by different metrics on the same type of object. The connections in an individual category compose, just as logical deductions should, but the isomorphisms on the underlying sets don’t induce functors – they don’t preserve the arrows! Of course, to our naive human brain, they’re all just the same sort of connection – I’d guess that most people don’t explicitly represent the categorical structure, preferring instead to represent everything more like simple set functions. In fact, I’d wager that at the lowest level, our brain saves space by collapsing the multiple categories into one space, so that (set) isomorphic objects (like multiple definitions of “privilege”) are literally identical, until there is a need to disambiguate them. Similarly, in the Mind / Space problem, we don’t often think of mental things being different from physical things, but rather that everything is either mental or physical (depending on which side of the wall you stand on).

So your’e probably already familiar with one example of how things can go wrong with convoluted reasoning, and that’s equivocation. We dance around in one category (collection of definitions for word), drawing all sorts of conclusions, like that “privileged” people should all shut their trap and maybe die in an excruciating fire for good measure, and then secretly jump into the other category (change definitions), so that you can include everyone you don’t like under the umbrella of “privilege”. Now substitute “privilege” for any demographic term you can think.

Don’t be fooled though, convolution isn’t just about switching definitions, it’s more about switching contexts. To see this, recognize that the same problem can arise where your isomorphism is literal identity, like so:

One very frequent case where we have many different categories on the same object are the categories induced by different metrics. In this case, the arrows represent “similarity” or “closeness”. You can often think of each possible metric as the “natural metric” on its own space, but they all happened to get mixed together in the space you’re working with. For example, the euclidian metric is the “natural metric” for \mathbb{R}^n, while the taxicab metric is the extension of the “natural metric” on a rational grid, embedded in $\mathbb{R}^n$. It is not always so obvious what the “natural metric” you’re looking for is (Hopefully I will address this in a future post).

A natural scenario for this to arise in is machine learning / pattern recognition. I’ll talk about a specific case that has interesting implications:

Consider the biological tree of life. Clearly, some species are “more related” than others. What’s not so clear is what we actually mean by “related”. Do we mean “has the most similar function”? This is clearly wrong, and has mislead biology for a long period before the study of genomics was invented. Then do we mean “has the fewest number of ancestors separating them” (so brothers would be closer than cousins, etc.)? This seems reasonable until you realize that not all mutations cause equal deviation. For example, some organisms, such as Ctenophore, have remained relatively basal throughout evolution, they haven’t deviated too far in “phenotype space” from their ancestors. So what’s the “right” metric? I won’t try to solve this here because I think it’s a hard problem (and it may be that there is no unqualified “right answer”, and the feeling that there should comes from built-in patter seeking heuristics). Rather, I’ll point out that computational biology has developed (read: apprehended from mathematicians some 30 years after the fact) a large collection of different metrics for this purpose, many of which resolve to metrics on genomic or protein sequences, such as edit-distance and its more biologically relevant varieties. Because of the complexity of data, we often need to use slightly different metrics for different types of data, for example, adding different sorts of normalizations to make everything match up. We often want to chain these comparisons up, so that if gene expression profile A is similar to gene expression profile B, and gene expression profile B is similar to drug response profile C, then expression profile A should be similar to drug response profile C, and we’ve found drugs to cure cancer, yay.

This tends to turn out badly for a few reasons. Barring the fact that some of our similarity measures don’t even obey the triangle inequality, the main reason this doesn’t work out is that they’re mutually incompatible measures. Sure, they kinda-sorta compose, we benchmark them zealously until something sticks, and you can still discover useful things with them (or at least be convincing enough to get published in a nice journal, and really friends, isn’t that the definition of useful discoveries?) but they lack the theoretical niceness to even put error bounds on how badly they can fail to compose. This is fundamentally because they are arrows in a different category.

 

Deconditioning

Quote

“We dance to the pull of strings that were woven years ago, and in a lightning flash of insight, we may see the lie; the program. It is first necessary to see that there is a program. To say perhaps, this creature is mine, but not wholly me. What follows then is that the prey becomes the hunter, pulling apart the obsession, naming its parts, searching for fragments of understanding in its entrails. Shrinking it, devouring it, peeling the layers of onion-skin”

The Mind / Space Duallity

WARNING: This is almost certainly wrong, and the “math” is embarrassingly sloppy – it’s meant mainly as a note to myself and to inspire thought. I assume some familiarity with at least the premise of “modern”* meditation/insight theory as from MCTB / The Overground, and some basic category theory, mainly adjunctions/monads.

Here I’ll expose some ideas regarding the seeming conflict between an objective universe and a personal universe of qualia, building on the so-called “science of meditation

First, let me motivate the question. You may scoff, “Of course, external reality exists, this kind of philosophical play is pointless, we’ve gotten over this centuries ago”. I agree, it’s a confused question, and pointless to question external reality, so rather than seeking some non-existent answer, seek to understand how the question arrises. It is true, at the most basic level, that we do not, cannot, observe external reality, but only our raw senses. So it should seem at least a bit mysterious that external reality feels so real, even if the question poses no practical barrier to interacting with external reality. I seek to dissolve that mysterious feeling. Really, this should be no huge marvel, we frequently treat math as if it exists platonically, independent of our knowledge of it, but it is also clearly true that you can’t “find” math anywhere, it is a human invention. The two frames of reference fit snuggly inside each other.

I’ll attempt to roughly frame the problem through category theory – category theory is useful because it lets us distinguish different types of interaction (arrows in different categories) that might otherwise be convoluted.

Let’s start with what we know: one of the greater epistemic fruits of meditation is the realization that sensory experience is composed of individual frames containing a moment of perception “in motion”, commonly called “formations“. While hard to describe, and even harder to measure, one may think of formations as “differential objects”: not just points but infinitesimal segments of perception, in the context of immediate past and future, where our normal perception is the weaving together (integral) of these building blocks.

Assume for a moment that these formations, as the meditative scholars purport,  are the basic building blocks of perceptual reality, that we can never gain any more information, even in theory, than that contained in the continuous sequence of formations we perceive. Then where is there room for objective reality? “Things exist” certainly seems like a reasonable and useful assumption, and moreover, it feels more natural than a universe composed of fleeting thoughtstuff. To answer, let me quickly detour into the distinction between knowledge and belief.

This is where category theory starts to come in handy: both “everything is a belief” and “things can be known for certain” can be true if we’re careful about our typing. Consider the “belief monad” B, where, for a proposition a, B a is the belief in a. Now, ever proposition can be lifted into a belief (that the proposition is true), and if we believe that we believe something, we really ought to believe it too! (Though humans are quite bad at this kind of reflective consistency). So we have mappings

return :: a -> B a
join :: B (B a) -> B a

And that's a monad! If we restrict a to have the type of formations, everything still holds, since "I observed formation a" is just a specific kind of proposition.

Now, since formations contain thoughts, our beliefs are encoded in each moment, ie. there is an embedding B (FORM )\to FORM. But here again higher-categories are useful - "isomorphisms are not equalities", formations may encode beliefs, but they are not beliefs, the missing component is time. Computation takes time, thoughts are a particular kind of computation, but formations are timeless. This is where the mystery of belief vs knowledge comes from: You "know" your experience for free, from the monadic return, but there is logical uncertainty regarding what your experience represents. The statement "I think therefor I am" is tantamount to \exists f. f which resolves trivially - this is why it feels more like "knowledge" than "belief", there is no logical uncertainty, it is proven immediately by the witness of any formation.

If formations are timeless, what links one moment to the next? How can we predict the future from the past? Now we need to deal instead with indexical uncertainty. To proceed, we need to introduce some notion of "possibility". The most obvious way is to codify a "belief" as a kind of bayesian network, which subsumes both classical and stochastic logic, though the real deal is almost certainly more complicated. Now, for any group of observations pulled from some space, there is a "free" probability distribution on that space generated by maximizing entropy subject to the constraints imposed by data. This free distribution is the "external reality" - the universal property of maximum entropy making it unique and "objective" in the sense that anything that could be known about the universe can be computed from it. Of course, computing maximum entropy distributions is hard, even harder if you have to recompute every instant! So we don't, we compute our lossy "subjective" views of reality. While we can't ever know the universe entirely, it is interesting to notice that it's fundamentally made of the same sort of thing as thoughts leading to a nice interpretation that "learning about the universe is to become increasingly isomorphic to it".

Of course, if reality were "just" a probability distribution over perception, that wouldn't be very satisfactory. The universe is cold and uncaring, it doesn't feel like it's about you, it feels like there are "things" independent from yourself. How can we reconcile? A probability distribution is an opaque function - but opaque functions are always hiding more structure inside. This is the utility of bayesian networks: to factor our probability distribution, exposing its internal structure. The universe we observe, or rather seek to discover, is then the complete factoring of this "free" distribution, generated by our perception.

Now we come full circle, back to what science has suspected for a long time, that, while the laws of physics may be freely generated from perception, (assuming we know the entire state of the multiverse) physics can entirely explain human perception, by forgetting the rest of the universe and only focussing on the bit responsible for perception. So while reality may emerge from our perception, containing no more information than our experience of it, it can also be viewed as a distinct entity, with a more natural representation, and our perception as simply a view into it.

Programming is Compilation, Compilation is Optimization

There seems to be a view even (especially?) among expert programmers that human programmers are special, how silly. Of course, there are some tasks better suited to human brains than existing computers, but there seems to be a false boundary drawn between them. There are many such false boundaries, consistent with the fact that the machines will eventually take our lunch and eat us too, but I want to focus on efficiency and optimization. There is a view that only humans can do the low-level optimizations necessary to get really fast code. Indeed, the common benchmark for any shiny new tool is the “handrolled” C/Fortran/Assembly.

Why can’t we do better?

The common thing between C/Fortran/Assembly is that we’re basically working at the bare machine level, with some convenient syntax (ok maybe Fortran is a bit smarter for, but we lose generality because of it, limited to numeric code). So a programmer using them is really using the bare components of the computer. This should tell us something about compilation and why high-level languages get the same performance. In a high-level language we’re no longer working with the machine primitives, but more sensible logical primitives. Then it’s the compiler’s job to convert from these higher-level constructs to the real machine primitives, ideally the most efficient form possible. But look: the programmer who writes their program in Java and compile are conceptually doing the same thing as the programmer who write their prototype in Java and then hand-write a fast C implementation, but the C programmer typically does a better job at efficiency.

In the most abstract, compilation is a transformation from a source language to a target language. It just so happens that some languages have native implementations ~in the universe~ , while others do not. So we start with human thought, that’s a language but not very well suited to implementation on a computer (it is however, perfect for implementation on human hardware!), a programmer compiles that (using their brain and fingers) into some programming language, the compiler then compiles it down to machine code, and the computer interprets it into ~physics~. You might say that people think in specifications but programs need to be written as implementations; but if the programmer always had to choose an implementation then we’re doing no better than assembly. Indeed, something as simple and prevalent as garbage collection is a step away from programming-as-implementation towardsprogramming-as-specification. One example where this is most obvious is in the language prolog, where programs are logical propositions and the language fills in the values in some possibly non-deterministic way (but note, for efficiency reasons, programmers still often have to reason about how the search is implemented, and modify it using things like cuts).

Let’s take a moment to discuss the distinction between compiling and interpreting. Programming is fundamentally about descriptions and language conversions. A program is a description of an action to be performed, an action is also a description of itself. But notice there’s a distinction between these types of descriptions: a program is timeless, it exists all at once in a single place and time, but an action is strung out cross time. This is not such a weird property of a language though, consider that spoken language is also strung out across time. From a pseudotheoretical perspective, things that are strung out across time are “additive”/”co-data“, constantly unfolding a bit at a time and potentially infinite, while things like written languages are “multiplicative”/”(constructive) data”: they exist in their entirety, built from finite components. So roughly, a compiler converts to data, while an interpreter converts to co-data. In practice, the data representations are much more useful than co-data, and the only time we ever want to translate to co-data is then we’re “running” our program. co-data also has the useful property of “productivity”, where for every sub-thing that it processes, it produces some output, unlike constructive things which aren’t well defined until they’re fully built.

Notice: programming is time-like, we must take the type to type into the computer; running a program is also time-like, we typically want our program to perform some task over time. So it seems that “interpreting” is the most natural way to do things: for every bit of program typed in, produce as much relevant output as possible, this is exactly the value of hot swapping. Then why do we ever compile things? In practice, people compile in order to do some types of verification and optimization, but is there some universal reason for this? This is a deep topic and I’ll talk about it more next time, but the short version is that the connective structure of different languages is different, so we must hold the whole program in our head at once to make the most effective “whole program optimizations”

The bottom line is that programming is just compilation from thought to machine, so there’s no reason we should have to compile directly to machine in our heads! Much of the problem comes from people viewing programming as writing instructions for dumb computers (“tell it how to do things”), leading them to write over-specified programs, with too many constraints on how its implemented for the compiler to optimize much. What we should be doing is figuring out how to encode what we want as completely and accurately as possible. People are afraid to program-as-specification because existing systems aren’t very good at getting efficient results; true, but this is not a fundamental limitation and that view is holding us back. There’s also some that think of programming-as-specification as a kind of WYSIWYG fake-programming for novices; I counter: it takes a certain kind of skill and clarity to frame a problem in a way that is both entirely correct and minimally constrained, exposing all of your domain knowledge to the compiler.

 

 

Convolution

In common usage, ‘convoluted’ means “complex”,”hard to follow”,”unclear”. So the main reson to use ‘convoluted’ is for its emotional nuances. But I’ll argue (in the near future) that the full emotional power of language is overkill and makes expressing precise logical meaning quite difficult in practice. So as usual we turn to math for a better meaning.

In math, convolution is a peculiar operation that can be thought of as smearing two functions over their input. Conversely, deconvolution reverses the process, finding two functions from a smeared one. It’s often used in image/signal processing, where convolution is some kind of blur and deconvolution tries to sharpen. It’s used a little more generally for things like network deconvolution, where you attempt to direct effects when you’re only able to measure pairwise effects, which may be a direct effect, a transitive chain of 2, of 3, etc.

So let’s use our wonderful abstracting brains to distil the commonality:

DEFINITION: Convolution is when the distinction between multiple independent factors becomes obscured through some interaction. Deconvolution is the intellectual/computational process of dissecting a convoluted result into its factors, and determining how they convolute. Pseudomathematically, a convoluted thing is a thing like C \overset{f}{\cong} C_1 + C_2 + C_3 + .... Deconvolution is finding both the right hand side (the factors) AND the function f, the “how”.

(Note: We could use “conflation” here and it would be closer to the original english meaning. I choose “convolution” instead so that the technical distinction is clear. The image here should be of an object “sitting below” its confounding factors. If this ‘misuse’ of language bothers you, feel free to interpret it as a frequent typo of mine – it doesn’t matter much anyway, as the definition will always be linked when using a “technical” term)

EXAMPLE: When statistically evaluating cause and effect, we often use correlation as a surrogate. But if two events A,B, have a 0.5 correlation, We could have

A \overset{0.5}{\to} B, or A \overset{0.5}{\leftarrow} B, or A \overset{\sqrt{0.5}}{\leftarrow} C \overset{\sqrt{0.5}}{\to} B or A \to \ldots \to B or.. you get the idea

 

 

Convolution is a big problem that seems to go unnoticed in human thinking (perhaps because there was no word for it in common usage :D), so I’ll be using it as a platform for many more posts in the near future. I started writing them and realized they all had the same braindump preamble, so I factored it out!

HREF EVERYTHING

This begins an experiment where I attempt to carve thoughtspace at its boundaries, to pluck choice thoughtstuff from the æther and distill it to crystaline perfection. There’s great danger in making up new words or repurposing old ones, but programmers seem to do it all the time in designing subroutine names. They avoid disaster by making sure the definition is always within sight of the name usage; to them it is clear, it is not the name that has meaning, but the reference to its definition, reified in the (arbitrary) name. Programmers have an easier time, because programs and words are different stuff, but when the definition of a word is a bunch of other words, it gets convoluted (see what I did there?). Sometimes people get confused and think that words have meaning, that there should be a right definition, or that some words are real and others made up. How silly, to hold all words in a global namespace. For practical reasons its a hard problem to fix, but fortunately we live in the ~future~, where references can be reified directly into hyperlinks, so meaning need never be ambiguous or unfamiliar! I encourage everyone everywhere to hyperlink the definition to every term that may be even a bit unclear. This forces us to make the way we’re using a word explicit, and gives more freedom of expression.

The “Science” of meditation

You may wonder “what is meditation really?”. Or perhaps you have a precached notion of meditation as something only hippies and monks do, a religious ritual at best. But really there is much more.

In the broadest sense, it is the “science” of the mind.

First I’m going to shadow your definition of “science” with mine, so that there is no confusion. Hopefully it is sufficiently close to yours so that you can glean some benefit from the discussion. By “science”, I mean the acquisition of knowledge about the “outside” world, through logic (truth preserving transformations) and observation, the prediction of future observations given past observations. I make this distinction because the application of “science” directly to the mind gets a bit tricky.

The universe simulates quantum mechanics at some scale, chemistry at a larger scale, Newtonian physics at a still larger scale, general relativity, etc. None of these scientific frameworks are true, but they are convenient abstractions for predicting future observations from past observations to some degree of accuracy. We know that they are not true because there are “leaks” in the abstraction, some things that don’t fit. QM cannot currently reconcile with GR, so we know we must jump to a “larger” theory to unify them. What if at some point we find no leaks? Does this mean we now know what is “true”? Well not necessarily. The universe could be a perfect simulation in a larger universe, a dream in the mind of a god, a quantum fluctuation that exists only for an instant with our memories an illusion.

truth

How can we distinguish these possibilities? There is no way, so long as there is no observable difference. For this reason I suggest that my definition of “science” is a useful one.

This shift in perspective begs a question: If we can’t really say anything about what’s true “out there”, but can only make statements about observations, what exactly is an observation?! It seems obvious that we need to look deeper inside the mind, in the realm of sensory perception to answer this question. Normally in science we make some basic assumptions, like “objective reality exists”. This is generally a nice assumption to make. But in doing so we make some implicit assumptions that our senses relay accurate information and are not subject to introspection. This leads to problems like the above, where you can make nonsense statements about indistinguishable realities.

So observations are kind of like senses? Well, “sense” is rather arbitrary (in addition to the 5 there’s balance, proprioception, hunger, bladder fullness, etc. depending on how liberal you want to be), but more importantly it relies on assumptions of the outside world. For you, the observer, what does a sense mean? Consider all the preprocessing the retina does before it forwards the image to the rest of the brain. You would still call sight a “sense” yes? Then I suggest that, to the observer, a sense is just a nerve module feeding data that the brain views as “primitive”. So anything that “feels different” is a different sense. Pain, tickle, pressure, etc. are all different senses under this definition!

If we take this notion to it’s logical conclusion, we see also that emotions are senses too. Though it acts on a much larger scale, an emotion is just a preprocessing of your general state to give a heuristic on how to think. It seems weird at first to think this way, but we’ve known it all along. This is why we “feel” sad, or “feel” happy, the same way we “feel” the wind through our hair.

So now it seems clear “Yes, of course we should study our senses if we want to know anything about our relation to this world!”. But there’s a big problem which has prevented “meditative sciences” from becoming “real sciences”, and that’s reproducibility. The scientific method is in many ways the optimal tool for conducting “science” on the natural world. One of the most fundamental and useful assumptions of science are that the laws of nature are the same everywhere, so if we set up two comparable experiments in two comparable places, we should get comparable results. But when we try to apply this to meditation, it all comes crashing down. To the pure observer, there is no “everywhere”, there is only “here” and “now”. We cannot take for granted “elsewhere” or even our past; consider that memories too are a sense, replaying input through the same channels it came in the first time.

It may seem a wonder that with these limitations we could say anything useful at all! Is this a fundamental limitation? Well, yes and no. We’ve already accepted a far greater weakness, that no logic system can prove it’s own consistency. Essentially (with much abuse of rigor), we can never “prove” anything at all, at least by most people’s definition of “prove”. But incompleteness doesn’t stop us from pushing forward to find “truth”,  and neither should this. It is convenient to assume that “math works”, if we can just get math right, and so too it is convenient to assume that our senses betray some greater external reality, if we can just interpret them correctly.There’s a reason that most philosophers disregard solipsism, because it’s mostly useless! But considering it seriously prevents one common epistemic trap. If you can’t experientially distinguish between one possible reality and another, it is meaningless to consider which is “real”. But we generalize “experience” from the typical all or nothing definition, so we really mean It is not meaningful to distinguish between two models that give identical predictions of future experience. Essentially we’ve weakened “reality” to “convenient model”, and the practical results becomes obvious.

But this is all philosophical exploration, which has been reinvented many times over. The  real, practical reason that meditation has never been seriously considered is still reproducability. The mind is in constant flux. You are continuously transforming, never the same person at two points in time. Clearly this wreaks havoc on reproducability! How can you ever repeat an experiment if you already expect a result? “All the theorems assume that how you think doesn’t affect reality apart from your actual actions”. But in meditation, that is exactly what you’re doing! Meditation is essentially conscious control of the placebo effect! Normally we account for placebo through replication and controls. But what does a meditation control look like? You must always be thinking something! There is no “null thought” by which to compare, or rather, the default is specific to each person. If we had a battalion of identical personalities, this may be possible. As crazy as it sounds, I expect this to be within the reach of science.

Another problem is that we cannot directly share our experiences in a reliable way. One person says “If you do A, you will feel X”, and another asks “Great! What is X like”, and the original replies “I cannot tell you, you must experience it yourself”. This is actually not so much a problem, what we’re doing here is describing an experiment that will lead to observation X; this is the best we can do when language fails to encode all of our experience. The real difficulty is in determining “feeling equality”. So you’ve done A, but how do you know the X’ you feel is the same as the master’s X? If we have good reason to believe that consciousness is embedded in the physical world (we do), then we can eventually tie a neuronal activity profile to each feeling (or an equivalence class of such profiles, as the case more likely will be).

Meditation makes no assumptions of an external world, it is the pure observation of our existence. Because our senses are the most primitive thing we can observe, meditation requires no tools, no prior knowledge of the world, and no assumptions. It is the study of the “self”, at the purest level. Bridging between this knowledge of the “self” and our understanding of scientific reality requires technical advances that are not available yet, but this does not mean it should be relegated to philosophy alone. Science presses forwards, slowly consuming the domains of philosophy and religion into the sphere of applicable knowledge. We need first only recognize that it is applicable!

Algebraic Data Part 1 – Golden Trees

This post is literate haskell (ignoring the latex images) so before we set off

> {-#LANGUAGE TypeOperators#-}
> import Control.Category
> import Prelude hiding ((.),id)

Haskell datatypes form an algebra, complete with sums (a + b \equiv \text{Either a b}), products (a*b \equiv (a,b)) and even exponentials (b^a \equiv a \to b)

So what can we do with this correspondence? Well lots of things, but today I'm going to focus on type isomorphisms.

Notice first that all (unquantified) data declarations can be converted into a standard form. For example:

data List a = Nil | Cons a (List a)

is converted to:

type GenericList a = Either () (a,GenericList a)

This is the basis of Haskell Generics (ex. in GHC.Generics). By converting data into a universal form we can manipulate different structures in a uniform way. Essentially what we are doing here is just "stripping off the tags" that make isomorphic algebraic types distinct.

Each generic representation gives rise to a type equation.For list: \text{List}(x) \cong 1 + x\text{List}(x)

We can do some pretty mindbending stuff with these equations due to a result from category theory. By expanding the equation we get the generating function

[x] \cong 1 + x + x^2 + x^3 + \ldots

we can think of a list as either the empty list, a sequence of one element, a sequence of two elements, a sequence of 3 elements etc. etc.

But also, by manipulating the recursive equation into closed form, we see that [x] \cong \frac{1}{1-x}

Even though this doesn't have any sensible interpretation as a datatype, we can take a jaunt through the complex numbers so long as we return to a form without division or subtraction and so long as no side is purely constant.

We need this last constraint to prevent things like shoving a tree into the unit type as follows:

> data Tree a = O a | T (Tree a) (Tree a) deriving (Show,Eq)

T(a) \cong a + T^2(a)\\    \Rightarrow T(a) \cong \frac{1\pm\sqrt{1-4a}}{2}

By setting a=1 the unit type,  we get T(1) \cong \frac{1\pm i \sqrt{3}}{2} which is a sixth root of unity. So since T^6(1) = 1 we would expect an unlabeled tree to be equivalent to the unit type. But of course this makes no sense! As it turns out, One tree IS equivalent to seven trees. There are an infinite number of trees, but the unit type has only a finite number of inhabitants (exactly 1!), which is why we make the stipulation that no final result be constant.

Notice that so far our data declarations only give us equations of the form X = F(X)
Is it possible to get other types of equations? Specifically can we construct a "golden" datatype that satisfies the equation G^2 \cong G+1 giving the closed form G \cong \phi \equiv \frac{1+\sqrt{5}}{2}, the golden ratio

We already have our tree equation T(x) \cong \frac{1\pm\sqrt{1-4a}}{2} which resolves to \phi when a=(-1). But what kind of type looks like -1? Well, looking back at our list equation L(a) \cong \frac{1}{1-a} resolves to -1 when a=2. So what's 2? Why, it's just the Boolean type 1+1! So composing all these together we find our "golden type" to be a tree of lists of booleans, or equivalently, a tree with leaves labeled by (possibly empty) bitstrings.

Now this is a nice result, but it may leave you scratching your head about exactly how the the identity G^2 \cong G +1 holds.

So lets construct an isomorphism explicitly.

Lets first definite some convenience types

> data a :+ b = L a | R b deriving (Show,Eq) -- Sum type
> data a :* b = a :* b deriving (Show,Eq) -- Product type
> infixl 6 :+
> infixl 7 :*
> type X = [Bool]
> type G = G = Tree X
> f # g = g . f -- this will come in handy later

Since we're talking about isomorphisms, we might as well make it explicit

> data a :~ b = ISO {to :: a->b, from :: b -> a}
> instance Category (:-) where
>     id = ISO id id
>     ISO f g . ISO f' g' = ISO (f . f') (g' . g)
> infixr 0 :-

And include some simple unrapping isomorphisms for trees and lists

> tWrap :: Tree a :- Tree a :* Tree a :+ a
> tWrap = ISO f g where
>     f (O a) = R a
>     f (T t1 t2) = L $ t1:*t2
>     g (R a) = O a
>     g (L (t1 :* t2)) = T t1 t2
> lWrap :: [a] :- a:*[a] :+ ()
> lWrap = ISO f g where
>     f [] = R ()
>     f (a:as) = L $ a:*as
>     g (R ()) = []
>     g (L (a:*as)) = a:as

The following derivation is based on the one presented in This Week's Find

First notice that if we can decompose a type Z \cong Z' + X where X=[Bool] satisfies the equation of lWrap, then Z + x + 1 \cong Z' + 2X + 1\\ \cong Z' + X \\ \cong Z

> lemma1 :: (z :~ z' :+ X) -> (z :+ X :+ () :~ z)
> lemma1 f = ISO ff gg where
>  ff zz = from f $ case zz of
>  R () -> R []
>  L (R xs) -> R (False:xs)
>  L (L z) -> case to f z of
>    R xs -> R (True:xs)
>    L z' -> L z'

Given an isomorphism f : Z \leftrightarrow Z' + X, we get an isomorphism f' : Z \leftrightarrow Z + X + 1

Convince yourself that the above works and play with it a bit. It's only going to get messier from here.

Before we continue we should build some convenience functions to handle basic manipulations of generic terms. For everyday arithmetic we use the commutative, associative and distributive laws without giving it a second thought, but here we need to choose them explicitly. Note that our constructors are left associative and :+ binds less tightly than :* , so for example (a:+b:+c:+d:*e) is the same as (((a:+b):+c):+(d:*e))

> cP :: a :+ b :- b :+ a -- Commutativity of addition
> cP = ISO f f where; f x = case x of; L a -> R a; R b -> L b
> cT :: a :* b := b :* a -- Commutativity of multiplication
> cT = ISO f f where f (a :* b) = b :* a
> aP :: (a :+ b) :+ c :- a :+ (b :+ c) -- Associativity of addition
> aP = ISO f g where
>   f x = case x of
>     L (L a) -> L a
>     L (R b) -> R $ L b
>     R c -> R $ R c
>   g x = case x of
>     L a -> L $ L a
>     R (L b) -> L $ R b
>     R (R c) -> R c
> aT :: (a :* b) :* c :~ a :* (b :* c) -- Associativity of multiplication
> aT = ISO (\((a:*b):*c) -> a:*(b:*c)) (\(a:*(b:*c)) -> (a:*b):*c)
> unit :: a :- ():*a -- Multiplicative identity
> dist :: a:*(b:+c) :- a:*b :+ (b:*c) -- Distributive law
> dist = ISO f g where
>   f (a:*x) = case x of; L b -> L $ a:*b; R c -> R $ a:*c
>   g (L (a:*b)) = a:*(L b)
>   g (R (a:*c)) = a:*(R c)
{- Isomorphism combinators. These are analogous to the arrow opperators of the same name -}
> (+++) :: (a :- b) -> (c :- d) -> (a:+c :- b:+d)
> f +++ g = ISO (to f `pp` to g) (from f `pp` from g) where
>   pp f' g' x = case x of; L a -> L (f' a); R b -> R (g' b)
> (***) :: (a :- b) -> (c :- d) -> (a:*c :- b:*d)
> f *** g = ISO (to f `pp` to g) (from f `pp` from g) where
>   pp f' g' (a:*b) = f' a :* g' b

Now we unwrap the tree into a list and a pair of trees, and unwrap the list into () or a bool and a list, giving G \cong X + G \cong 2X + 1 + G^2

> lemma2 :: G :~ B:*X :+ () :+ G:*G
> lemma2 = cP . (id +++ listWrap) . treeWrap

Now we apply this identity to G^2 \cong (X+G^2)^2 \cong (2X + 1 + G^2)^2 and expand just enough to get a free X on the rightmost side. SInce the expansion contains an X term, it is of the form Z' + x for some Z'. But really, we don't care what it is since it gets absorbed by lemma1 , we just need to separate an X. Nevertheless, the tangle of commutation and association gets a bit hairy.

> lemma3 :: B:*a :~ a:+a
> lemma3 = ISO f g where
>  f (b:*a) = case b of;Z -> L a; U -> R a
>  g x = case x of; L a -> Z:*a; R a -> U:*a
> lemma4 = -- Show that G^2 = Z' + X for some (ugly) Z' 
>  (lemma2 *** lemma2) # d # (d+++id) # aP # (id +++ cP) # inv aP #
>  (id +++ id +++ (cP.aP.inv u . cT)) #
>  inv aP #
>  (id +++ lemma3) #
>  inv aP

Fire up a GHCI session and try to follow each step of lemma4 to get a feel for how to manipulate complex expressions. This may seem messy but imagine trying to write a correct function like this with only case statements as we did for lemma1! Now that the heavy lifting is out of the way the rest falls into our laps

> lemma5 :: G:*G :+ X :+ () :- G:*G
> lemma5 = lemma1 lemma4

And finally we get our isomorphism by unwrapping the left hand side and applying lemma5

> golden :: G :+ () :- G:*G
> golden = (treeWrap +++ id) # lemma5

So what kind of mapping do we have? I'm not sure that derivation made it any clearer, but at least we have an honest isomorphism now.  With a bit of playing around we notice the following (much simpler) correspondence.

L (O x)       <=> O (True:False:xs) :* O []
R ()          <=> O [True] :* O []
(L (T t1 t2)) <=> t1 :* t2

Of course, this isomorphism is not canonical, the specifics depend on exactly how you wire lemma4, but it is an isomorphism nevertheless! We can prove this by simply following the types and ensuring that all of our combinators are indeed isomorphisms and that composition of isomorphisms type is an isomorphism; an exercise that I will leave to the motivated reader.

Language Proposal: Meme dereferencing operator

 

I lied, no programming today. Instead, metaphysics! But actually, natural language is very similar to computer language, except everyone’s afraid to write a standard for it.

Before we start, I’d like to clarify my meaning of meme, given the cultural baggage it has aquired (http://www.dispatch.com/content/graphics/2012/12/06/1a-grumpy-cat-art-gu8kjvrb-1good-morning-cat.jpg). By meme I refer to “formal cultural entities”, such as religions, companies, political parties, etc. But also to such phenomena as “the fashion of Victorian nobles (http://en.wikipedia.org/wiki/File:Victorian_Woman.jpeg).

Additionally, I use “thought” not necessarily to mean a thing which one thinks, but as a thing which one may think, a conceptual ideal. This definition is tricky. In some sense, thoughts are everything. For if you cannot think it, you cannot perceive it, and so it exists in no meaningful sense.

EDIT: Since becoming acquainted with the lesswrong community, I’ve realized the concept I’m describing here is more commonly known as a “thing” in “thingspace“, to be deconvoluted from a “thought” in “thoughtspace”. 

Now, let us proceed by analogy and trust for now that there is a point hidden here somewhere.

(1) You can write point-free style in Perl with a little (a lot) of finagling, but would you WANT to? Of course not! By not allowing functions (and higher order combinators) as first class, you discourage the use of certain styles. In the same way, EVERY language feature subtly affects the kinds of programs that are commonly expressed in the language.
Now, a program (source code) is merely a description of a procedure  In an ideal world, we’d be able to traverse the space of procedures and pluck the right one directly. But the space of procedures is too large to be first class in our universe, so we need programs to give us a handle into this space.

(2) Now consider a similar problem:
One of the great abstractions of “high level” languages is being able to pretend that we’re really dealing with values directly: the addresses are hidden. In reality the machines have to identify addresses to push data around to, a fact that becomes painfully apparent when working with (ex.) Assembly or C pointers. But actually, C pointers are an elegant approach to address specification. With pointers, we do not specify memory addresses directly, the space of address is too big to be practical. (Though, we could specify them directly if we chose, the space is not so big as to be impossible.) Instead, we obtain the address of the relevant data through the reference operator.

Enough computers, what does this have to do with natural language?

Recall problem (1). Natural language serves the same purpose as a program, where procedures are thoughts. We have no way to directly specify objects in this (mindbogglingly infinite) “thoughtspace” so we use words as a handle into the space. But here’s a problem: thoughtspace is big. Really big. Armed with only our limited language, some ideas will take infinite time to express (consider the completion of the rationals to the reals). Now, you may wonder if only infinite ideas require infinite time. That would certainly be a nice property and is a valid question. However, given the incredible vastness of thoughtspace, I suspect that there exists an infinite family of “holes”: Seemingly innocuous ideas which nevertheless cannot be expressed in finite time (imagine spiraling towards a central point, refining your statement with an infinite series of words.) Even if this is not the case, weaken “infinite time” to “prohibitively long time”, and I think there is little question that this is a true problem.
Any given hole can be plugged by anchoring it to a reference point in our universe, either a physical pattern (“that’s a tree”), or via reference (“the way that a clock moves is clockwise”). Thus, the holes are dependent on the language; the language shapes the ideas we can express.

Necessarily, the things which exist, the “reified thoughts”, are only a small subset of possible thoughts. This shapes our own language, as things which readily exist are much easier to encode in speech than those which must be conceived by analogy. As beings of abstraction, we can perceive certain high level abstractions directly, as “first class”. Ex. A forest is a collection of trees, but a tree is a tree. We naturally perceive it as a unit, though in reality a tree is a complex collection of processes. We can easily do this for things which exist at a “level of abstraction” ≤ our own (The case of equality is particularly interesting, but I will not get into it at the moment).

Finally, we may consider memes. Memes are in some sense, the simplest step up from our current level of abstraction. We cannot perceive them directly as a single unit because we are a part of them (or they a part of us, depending on your perspective), in the same way that a (very intelligent) cell could not easily conceive of the body to which it belongs. Because of this, we find it hard to describe memes. A common way of referring to complicated memes without regurgitating their entire description is by implicitly tying them to more easily expressible notions, kind of like a generalized synecdoche. That is, by listing other easily named memes which are “commonly” associated with it under certain circumstances.

This method causes a host of problems, which unecessarily limit the expressible statements. Of primary concern is ambiguity. It is often not clear when one is refering to the literal idea or the meme associated with the idea. This problem is often resolved by cultural context. That is, people of a certain similar mindset will understand the difference in their own communication, but this is not stable across mindsets, it is almost impossible to communicate in this way across large cultural gaps.
There’s a related problem. By nature of our language, an unqualified statement (usually) contains an implicit ∀. This directly conflicts with implicit meme dereferencing, and the interpretation of which is intended is subject to ambiguous context. Mixing up ∀ and “associated meme” is a dangerous thing to do and can lead to sweeping generalizations. Remember, we are referring to people here, and sweeping generalizations lead to various forms of prejudice.
This is the heart of the problem: the meme is referred to by association with a concrete idea, but exact relation between the concrete idea and the meme is unspecified and ambiguous. It can be avoided by making the link explicit, but the amount of qualification required to avoid this ambiguity is prohibitively expensive, so these types of statements tend to simply be avoided, limiting our expressive power greatly.

While truly fixing this problem essentially requires a new, carefully designed language, we can make an immediate improvement by at least specifying when we are making a connection. To this end, I propose a new primitive connective: ☀ to mean roughly “The meme primarily inhabiting  and to be used as a sort of “dereference” operator. This will at least allow an unambiguous instantiation of “association”. While it cannot represent more complicated associations than “primarily inhabiting , it covers most common use cases. There are issues with ambiguity when multiple memes may inhabit the same collection of people, which becomes more severe when the memes are mutually exclusive. Correct and clever usage of ☀ can remedy this. It is helpful to imagine trying to indicate an animal to someone where you are only allowed to speak of its environment. Ex: ☀Temperate city skies. Can you guess Pigeon?

I’ve played a little trick here. It is not immediately clear that “people inhabited  is a consistent description of memes. Why should memes associate like that? Genes associate because they can only be combined in discrete pairings with small mutation (mating), so you’re going to get something close to the parents. Memes combine in much more complicated ways, and it’s not clear that they would preserve these associations  In fact, there’s a deeper reason for why these associations hold. In biological evolution, organisms reflect their environment. In some sense, a successful organism is a “statement about how to survive in that environment”. What’s interesting about memes is that they act as both the environment and the replicator. More on this later.