## I love it when crazy people say things I agree with

### Quote

“Physicists suffer from a disorder of the mind that causes them to believe that sensible, temporal objects have more reality than eternal, immutable Platonic mathematical objects, and to place more trust in their senses than in their reason, more trust in the scientific method of ‘evidence’ than the mathematical method of eternal proof. “
– Mike Hockney, Why Math Must Replace Science (The God Series Book 18)

# Social Membranes, Genre Encryption, and Super-Secret Tech

There’s a common problem of good ideas being fragmented across genre. Recently, I’ve begun to consider it THE (non-obvious) problem in knowledge advancement.

Let’s take a computer science approach to make it more clear why it’s THE problem. The search for the truth is indeed a kind of search, so it makes sense that you’d want to use a search tree. The nice thing about search trees is that they can be traversed in parellel. If we view humanity as the program, humans are the  threads. How do we keep threads separate?

In a Von Neumann computer, keeping threads separate is trivial, but for humans, you’d have to prevent all communication. Now, it gets a bit complicated, because each human (thread) is also concurrent – a human can work on many different things at once. Now we can’t simply cut off all communication, because you might need to communicate with different groups for different tasks, so we have to be selective about which content can be communicated with whom. How does that work?

I’ll propose that one mechanism is via hijacking genre. Genre are a convenient heuristic for grouping information (sciencey sounding things tend to contain information about the natural world, religiousy sounding stuff tends to contain metaphysical poison etc.), but it’s nowhere near perfect. Viewed in this way, genre can be used as a sort of encryption by phrasing it in genre-specific lingo. The only way to decrypt it is to both understand the lingo, and buy into the genre. This last bit is important, it’s not enough to just understand what they’re saying, because even the absolute truth spoken in a monologue about Lord Xenu is likely to be dismissed anyway. I should point out that the encryption is not explicit, there is no original plaintext understandable by everyone. It is more so that the ideas exist in a different “basis”, and it works because computing idea equality is hard[fragmentation], so most people filter it out via the genre heuristic.

The “team identity” effect causes these fuzzy differences in genre to self organize into sharper “social membranes”, which roughly approximate different search threads.

I approach all of this as a scientist, and so like to think of the science genre as the “main branch” because it contains sufficient epistemology to ~learn everything~. The end result from this perspective is that some useful gems for science get hidden in other genres. This post was motivated by two particular examples, which I’ll get into next time. But there’s so many that I’m beginning to collect some of the more exotic ones.

# Consciousness and Intelligence are Convoluted

There’s been great discussion on LW as to the value of consciousness as a concept. The general conclusion many have come away with is that we should probably just taboo ‘consciousness’ and get to the meat. I tend to agree.

I’d like to present a slightly different reasoning though. The feeling of consciousness in ourselves and others is a hardcoded trait. This immediately should lead us to be very suspicious of it as a consistent concept. It’s clearly useful for development, if only to be used as a proxy for “human-ness”, but I’ll argue that it is just that, a heuristic. There are many interesting phenomena hiding in ‘consciousness’, but they should be considered distinct phenomena. They are bound together by the their shared name, and we often switch between them in conversation without noticing, assuming they’re the same concept. That’s right, ‘Consciousness’ is convoluted.

Here are a few things people tend to mean with conciousness, each interesting on their own. It is not immediately obvious that they represent the same phenomena, though I suspect that they are deeply related. It’s important to lay them out as separate ideas so that their connection can be made explicit, rather than equivalent “by definition”.

1. The feeling of free will – The generation of counterfactual scenarios and an evaluation of those scenarios
2. The feeling of self awareness - To contain a model of ones self and mental processes
3. Perception of qualia – The recognition of sense as “internal” experiences, such as awareness of the color red.

Similarly, the term “Intelligence” is convoluted and we should taboo it. Some possible meanings are

1. Consciousness – yes, sometimes consciousness and intelligence are used synonymously
2. Containing sufficient processing power, being sufficiently complex so as to be unpredictable – in common usage, we sometimes say someone is “intelligent” if they can learn and think quickly. Many times things feel intelligent if they are complex.
3. Acting sufficiently agent-like – Intelligent things feel as if they act according to goals and rational decisions based on those goals.
1. Often a good heuristic for “agent-like” is “self-like”, if you consider the class of all things you encounter, you’re probably one of the more agent-like things you deal with. So in many cases, this is the feeling we’re actually referring to when we say “intelligent”. This one is just flat Wrong, beware of it.

Consider these meanings, which you tend to use most often, and how they might be related. Be mindful of how you use them during conversation and when you feel the urge to switch meanings; it should greatly improve the clarity of your arguments.

# NULL

Sorry, there’s nothing here!

This is a stub to keep track of all the articles that are intended to be written as background or motivation. If there’s a link here, it’s because I already have an idea to fill in, but just need to write it.

I won’t ask you to “just trust me”, but at least trust that if there’s a link here, the point wasn’t simply overlooked. Use this as an invitation to think about why the point in question might or might not make sense.

# Convoluted reasoning

There’s a general trend to convoluted reasoning, which I think captures a wide range of common logical flaws. Convoluted reasoning captures a frighteningly common pattern even among brilliant thinkers: It’s all too easy with informal reasoning to follow a chain of intuition, where each step seems reasonable, only to end up at a totally incorrect conclusion. This is often used maliciously to convince people of nearly anything, as a defense, giving rise to “motte-and-bailey doctrine”/”strategic equivocation”, internally as rationalisation, and in the most sinister case, as a route for smart people to do something truly stupid. It is also the root of the map/territory conflation.

The mark of convoluted reasoning is implicit conversion between distinct objects which are related but follow very different logics. Let’s be more rigorous:

Consider a collection of categories and a chosen isomorphism between each. Take, for example, the Mind / Space categories, or many copies of the category of (your favorite) verbal logic, connected by the isomorphisms swapping between alternative definitions of words, or the categories induced by different metrics on the same type of object. The connections in an individual category compose, just as logical deductions should, but the isomorphisms on the underlying sets don’t induce functors – they don’t preserve the arrows! Of course, to our naive human brain, they’re all just the same sort of connection – I’d guess that most people don’t explicitly represent the categorical structure, preferring instead to represent everything more like simple set functions. In fact, I’d wager that at the lowest level, our brain saves space by collapsing the multiple categories into one space, so that (set) isomorphic objects (like multiple definitions of “privilege”) are literally identical, until there is a need to disambiguate them. Similarly, in the Mind / Space problem, we don’t often think of mental things being different from physical things, but rather that everything is either mental or physical (depending on which side of the wall you stand on).

So your’e probably already familiar with one example of how things can go wrong with convoluted reasoning, and that’s equivocation. We dance around in one category (collection of definitions for word), drawing all sorts of conclusions, like that “privileged” people should all shut their trap and maybe die in an excruciating fire for good measure, and then secretly jump into the other category (change definitions), so that you can include everyone you don’t like under the umbrella of “privilege”. Now substitute “privilege” for any demographic term you can think.

Don’t be fooled though, convolution isn’t just about switching definitions, it’s more about switching contexts. To see this, recognize that the same problem can arise where your isomorphism is literal identity, like so:

One very frequent case where we have many different categories on the same object are the categories induced by different metrics. In this case, the arrows represent “similarity” or “closeness”. You can often think of each possible metric as the “natural metric” on its own space, but they all happened to get mixed together in the space you’re working with. For example, the euclidian metric is the “natural metric” for $\mathbb{R}^n$, while the taxicab metric is the extension of the “natural metric” on a rational grid, embedded in $\mathbb{R}^n$. It is not always so obvious what the “natural metric” you’re looking for is (Hopefully I will address this in a future post).

A natural scenario for this to arise in is machine learning / pattern recognition. I’ll talk about a specific case that has interesting implications:

Consider the biological tree of life. Clearly, some species are “more related” than others. What’s not so clear is what we actually mean by “related”. Do we mean “has the most similar function”? This is clearly wrong, and has mislead biology for a long period before the study of genomics was invented. Then do we mean “has the fewest number of ancestors separating them” (so brothers would be closer than cousins, etc.)? This seems reasonable until you realize that not all mutations cause equal deviation. For example, some organisms, such as Ctenophore, have remained relatively basal throughout evolution, they haven’t deviated too far in “phenotype space” from their ancestors. So what’s the “right” metric? I won’t try to solve this here because I think it’s a hard problem (and it may be that there is no unqualified “right answer”, and the feeling that there should comes from built-in patter seeking heuristics). Rather, I’ll point out that computational biology has developed (read: apprehended from mathematicians some 30 years after the fact) a large collection of different metrics for this purpose, many of which resolve to metrics on genomic or protein sequences, such as edit-distance and its more biologically relevant varieties. Because of the complexity of data, we often need to use slightly different metrics for different types of data, for example, adding different sorts of normalizations to make everything match up. We often want to chain these comparisons up, so that if gene expression profile A is similar to gene expression profile B, and gene expression profile B is similar to drug response profile C, then expression profile A should be similar to drug response profile C, and we’ve found drugs to cure cancer, yay.

This tends to turn out badly for a few reasons. Barring the fact that some of our similarity measures don’t even obey the triangle inequality, the main reason this doesn’t work out is that they’re mutually incompatible measures. Sure, they kinda-sorta compose, we benchmark them zealously until something sticks, and you can still discover useful things with them (or at least be convincing enough to get published in a nice journal, and really friends, isn’t that the definition of useful discoveries?) but they lack the theoretical niceness to even put error bounds on how badly they can fail to compose. This is fundamentally because they are arrows in a different category.

## Deconditioning

### Quote

“We dance to the pull of strings that were woven years ago, and in a lightning flash of insight, we may see the lie; the program. It is first necessary to see that there is a program. To say perhaps, this creature is mine, but not wholly me. What follows then is that the prey becomes the hunter, pulling apart the obsession, naming its parts, searching for fragments of understanding in its entrails. Shrinking it, devouring it, peeling the layers of onion-skin”

# The Mind / Space Duallity

WARNING: This is almost certainly wrong, and the “math” is embarrassingly sloppy – it’s meant mainly as a note to myself and to inspire thought. I assume some familiarity with at least the premise of “modern”* meditation/insight theory as from MCTB / The Overground, and some basic category theory, mainly adjunctions/monads.

Here I’ll expose some ideas regarding the seeming conflict between an objective universe and a personal universe of qualia, building on the so-called “science of meditation

First, let me motivate the question. You may scoff, “Of course, external reality exists, this kind of philosophical play is pointless, we’ve gotten over this centuries ago”. I agree, it’s a confused question, and pointless to question external reality, so rather than seeking some non-existent answer, seek to understand how the question arrises. It is true, at the most basic level, that we do not, cannot, observe external reality, but only our raw senses. So it should seem at least a bit mysterious that external reality feels so real, even if the question poses no practical barrier to interacting with external reality. I seek to dissolve that mysterious feeling. Really, this should be no huge marvel, we frequently treat math as if it exists platonically, independent of our knowledge of it, but it is also clearly true that you can’t “find” math anywhere, it is a human invention. The two frames of reference fit snuggly inside each other.

I’ll attempt to roughly frame the problem through category theory – category theory is useful because it lets us distinguish different types of interaction (arrows in different categories) that might otherwise be convoluted.

Let’s start with what we know: one of the greater epistemic fruits of meditation is the realization that sensory experience is composed of individual frames containing a moment of perception “in motion”, commonly called “formations“. While hard to describe, and even harder to measure, one may think of formations as “differential objects”: not just points but infinitesimal segments of perception, in the context of immediate past and future, where our normal perception is the weaving together (integral) of these building blocks.

Assume for a moment that these formations, as the meditative scholars purport,  are the basic building blocks of perceptual reality, that we can never gain any more information, even in theory, than that contained in the continuous sequence of formations we perceive. Then where is there room for objective reality? “Things exist” certainly seems like a reasonable and useful assumption, and moreover, it feels more natural than a universe composed of fleeting thoughtstuff. To answer, let me quickly detour into the distinction between knowledge and belief.

This is where category theory starts to come in handy: both “everything is a belief” and “things can be known for certain” can be true if we’re careful about our typing. Consider the “belief monad” $B$, where, for a proposition $a$, $B a$ is the belief in $a$. Now, ever proposition can be lifted into a belief (that the proposition is true), and if we believe that we believe something, we really ought to believe it too! (Though humans are quite bad at this kind of reflective consistency). So we have mappings

return :: a -> B a
join :: B (B a) -> B a

And that’s a monad! If we restrict $a$ to have the type of formations, everything still holds, since “I observed formation a” is just a specific kind of proposition.

Now, since formations contain thoughts, our beliefs are encoded in each moment, ie. there is an embedding $B (FORM )\to FORM$. But here again higher-categories are useful – “isomorphisms are not equalities”, formations may encode beliefs, but they are not beliefs, the missing component is time. Computation takes time, thoughts are a particular kind of computation, but formations are timeless. This is where the mystery of belief vs knowledge comes from: You “know” your experience for free, from the monadic return, but there is logical uncertainty regarding what your experience represents. The statement “I think therefor I am” is tantamount to $\exists f. f$ which resolves trivially – this is why it feels more like “knowledge” than “belief”, there is no logical uncertainty, it is proven immediately by the witness of any formation.

If formations are timeless, what links one moment to the next? How can we predict the future from the past? Now we need to deal instead with indexical uncertainty. To proceed, we need to introduce some notion of “possibility”. The most obvious way is to codify a “belief” as a kind of bayesian network, which subsumes both classical and stochastic logic, though the real deal is almost certainly more complicated. Now, for any group of observations pulled from some space, there is a “free” probability distribution on that space generated by maximizing entropy subject to the constraints imposed by data. This free distribution is the “external reality” – the universal property of maximum entropy making it unique and “objective” in the sense that anything that could be known about the universe can be computed from it. Of course, computing maximum entropy distributions is hard, even harder if you have to recompute every instant! So we don’t, we compute our lossy “subjective” views of reality. While we can’t ever know the universe entirely, it is interesting to notice that it’s fundamentally made of the same sort of thing as thoughts leading to a nice interpretation that “learning about the universe is to become increasingly isomorphic to it”.

Of course, if reality were “just” a probability distribution over perception, that wouldn’t be very satisfactory. The universe is cold and uncaring, it doesn’t feel like it’s about you, it feels like there are “things” independent from yourself. How can we reconcile? A probability distribution is an opaque function – but opaque functions are always hiding more structure inside. This is the utility of bayesian networks: to factor our probability distribution, exposing its internal structure. The universe we observe, or rather seek to discover, is then the complete factoring of this “free” distribution, generated by our perception.

Now we come full circle, back to what science has suspected for a long time, that, while the laws of physics may be freely generated from perception, (assuming we know the entire state of the multiverse) physics can entirely explain human perception, by forgetting the rest of the universe and only focussing on the bit responsible for perception. So while reality may emerge from our perception, containing no more information than our experience of it, it can also be viewed as a distinct entity, with a more natural representation, and our perception as simply a view into it.

# Programming is Compilation, Compilation is Optimization

There seems to be a view even (especially?) among expert programmers that human programmers are special, how silly. Of course, there are some tasks better suited to human brains than existing computers, but there seems to be a false boundary drawn between them. There are many such false boundaries, consistent with the fact that the machines will eventually take our lunch and eat us too, but I want to focus on efficiency and optimization. There is a view that only humans can do the low-level optimizations necessary to get really fast code. Indeed, the common benchmark for any shiny new tool is the “handrolled” C/Fortran/Assembly.

Why can’t we do better?

The common thing between C/Fortran/Assembly is that we’re basically working at the bare machine level, with some convenient syntax (ok maybe Fortran is a bit smarter for, but we lose generality because of it, limited to numeric code). So a programmer using them is really using the bare components of the computer. This should tell us something about compilation and why high-level languages get the same performance. In a high-level language we’re no longer working with the machine primitives, but more sensible logical primitives. Then it’s the compiler’s job to convert from these higher-level constructs to the real machine primitives, ideally the most efficient form possible. But look: the programmer who writes their program in Java and compile are conceptually doing the same thing as the programmer who write their prototype in Java and then hand-write a fast C implementation, but the C programmer typically does a better job at efficiency.

In the most abstract, compilation is a transformation from a source language to a target language. It just so happens that some languages have native implementations ~in the universe~ , while others do not. So we start with human thought, that’s a language but not very well suited to implementation on a computer (it is however, perfect for implementation on human hardware!), a programmer compiles that (using their brain and fingers) into some programming language, the compiler then compiles it down to machine code, and the computer interprets it into ~physics~. You might say that people think in specifications but programs need to be written as implementations; but if the programmer always had to choose an implementation then we’re doing no better than assembly. Indeed, something as simple and prevalent as garbage collection is a step away from programming-as-implementation towardsprogramming-as-specification. One example where this is most obvious is in the language prolog, where programs are logical propositions and the language fills in the values in some possibly non-deterministic way (but note, for efficiency reasons, programmers still often have to reason about how the search is implemented, and modify it using things like cuts).

Let’s take a moment to discuss the distinction between compiling and interpreting. Programming is fundamentally about descriptions and language conversions. A program is a description of an action to be performed, an action is also a description of itself. But notice there’s a distinction between these types of descriptions: a program is timeless, it exists all at once in a single place and time, but an action is strung out cross time. This is not such a weird property of a language though, consider that spoken language is also strung out across time. From a pseudotheoretical perspective, things that are strung out across time are “additive”/”co-data“, constantly unfolding a bit at a time and potentially infinite, while things like written languages are “multiplicative”/”(constructive) data”: they exist in their entirety, built from finite components. So roughly, a compiler converts to data, while an interpreter converts to co-data. In practice, the data representations are much more useful than co-data, and the only time we ever want to translate to co-data is then we’re “running” our program. co-data also has the useful property of “productivity”, where for every sub-thing that it processes, it produces some output, unlike constructive things which aren’t well defined until they’re fully built.

Notice: programming is time-like, we must take the type to type into the computer; running a program is also time-like, we typically want our program to perform some task over time. So it seems that “interpreting” is the most natural way to do things: for every bit of program typed in, produce as much relevant output as possible, this is exactly the value of hot swapping. Then why do we ever compile things? In practice, people compile in order to do some types of verification and optimization, but is there some universal reason for this? This is a deep topic and I’ll talk about it more next time, but the short version is that the connective structure of different languages is different, so we must hold the whole program in our head at once to make the most effective “whole program optimizations”

The bottom line is that programming is just compilation from thought to machine, so there’s no reason we should have to compile directly to machine in our heads! Much of the problem comes from people viewing programming as writing instructions for dumb computers (“tell it how to do things”), leading them to write over-specified programs, with too many constraints on how its implemented for the compiler to optimize much. What we should be doing is figuring out how to encode what we want as completely and accurately as possible. People are afraid to program-as-specification because existing systems aren’t very good at getting efficient results; true, but this is not a fundamental limitation and that view is holding us back. There’s also some that think of programming-as-specification as a kind of WYSIWYG fake-programming for novices; I counter: it takes a certain kind of skill and clarity to frame a problem in a way that is both entirely correct and minimally constrained, exposing all of your domain knowledge to the compiler.

# Convolution

In common usage, ‘convoluted’ means “complex”,”hard to follow”,”unclear”. So the main reson to use ‘convoluted’ is for its emotional nuances. But I’ll argue (in the near future) that the full emotional power of language is overkill and makes expressing precise logical meaning quite difficult in practice. So as usual we turn to math for a better meaning.

In math, convolution is a peculiar operation that can be thought of as smearing two functions over their input. Conversely, deconvolution reverses the process, finding two functions from a smeared one. It’s often used in image/signal processing, where convolution is some kind of blur and deconvolution tries to sharpen. It’s used a little more generally for things like network deconvolution, where you attempt to direct effects when you’re only able to measure pairwise effects, which may be a direct effect, a transitive chain of 2, of 3, etc.

So let’s use our wonderful abstracting brains to distil the commonality:

DEFINITION: Convolution is when the distinction between multiple independent factors becomes obscured through some interaction. Deconvolution is the intellectual/computational process of dissecting a convoluted result into its factors, and determining how they convolute. Pseudomathematically, a convoluted thing is a thing like $C \overset{f}{\cong} C_1 + C_2 + C_3 + ...$. Deconvolution is finding both the right hand side (the factors) AND the function f, the “how”.

(Note: We could use “conflation” here and it would be closer to the original english meaning. I choose “convolution” instead so that the technical distinction is clear. The image here should be of an object “sitting below” its confounding factors. If this ‘misuse’ of language bothers you, feel free to interpret it as a frequent typo of mine – it doesn’t matter much anyway, as the definition will always be linked when using a “technical” term)

EXAMPLE: When statistically evaluating cause and effect, we often use correlation as a surrogate. But if two events A,B, have a 0.5 correlation, We could have

$A \overset{0.5}{\to} B$, or $A \overset{0.5}{\leftarrow} B$, or $A \overset{\sqrt{0.5}}{\leftarrow} C \overset{\sqrt{0.5}}{\to} B$ or $A \to \ldots \to B$ or.. you get the idea

Convolution is a big problem that seems to go unnoticed in human thinking (perhaps because there was no word for it in common usage :D), so I’ll be using it as a platform for many more posts in the near future. I started writing them and realized they all had the same braindump preamble, so I factored it out!

# HREF EVERYTHING

This begins an experiment where I attempt to carve thoughtspace at its boundaries, to pluck choice thoughtstuff from the æther and distill it to crystaline perfection. There’s great danger in making up new words or repurposing old ones, but programmers seem to do it all the time in designing subroutine names. They avoid disaster by making sure the definition is always within sight of the name usage; to them it is clear, it is not the name that has meaning, but the reference to its definition, reified in the (arbitrary) name. Programmers have an easier time, because programs and words are different stuff, but when the definition of a word is a bunch of other words, it gets convoluted (see what I did there?). Sometimes people get confused and think that words have meaning, that there should be a right definition, or that some words are real and others made up. How silly, to hold all words in a global namespace. For practical reasons its a hard problem to fix, but fortunately we live in the ~future~, where references can be reified directly into hyperlinks, so meaning need never be ambiguous or unfamiliar! I encourage everyone everywhere to hyperlink the definition to every term that may be even a bit unclear. This forces us to make the way we’re using a word explicit, and gives more freedom of expression.