Computation From Within

I’m going to talk about the qualitative computational strength afforded by evolution. More generally, I’m going to talk about what you gain when you weave a bunch of things that can do computation into an evolutionary game, a system of those things. In pseudocode:

`Computes a => Meta a`

Very simple, you might wonder if there’s anything useful we can say about this construction. First let me give some examples, in increasing order of complexity.

1. Many perceptrons combine into an Artificial Neural Network (ANN)
2. Many Actors combine into an erlang program
3. Many cells combine into an organ
4. Many organs combine into an organism
5. Many organisms combine into a colony
6. Many species combine into an ecosystem

Notice: Individual perceptrons are weak. In an ANN, each perceptron might only do simple addition, but the full network can be turing complete. Clearly, there is something gained here, but where does this extra power come from? We’re not just talking about extra memory or efficiency from added processors, we’re talking about a qualitative expansion in the types of things that can be computed; this is a big deal. The only place for this extra power to hide is in the connections between neurons. We can thus view ANN quite clearly as directed graphs with nodes labeled by perceptron computations.

If we generalize perceptrons to be arbitrary functions rather than simple arithmetic, we get something more like an Actor Model. Since ANN are already turing complete, this would at first not seem to gain us much other than convenience. Consider though, an actor can do something a simple function cannot, it can fail, and in fact erlang is famous for gracefully handling process failure. You can see now the relation to evolutionary games. If we interpret life forms as a hypothesis about how to survive the environment, then it’s a nice property that one hypothesis can fail without bringing the whole system down. But we’re still missing the secret ingredient to life: if we start with a bunch of hypotheses, that’s just one big meta-hypothesis – eventually they could all fail and then we’re out of luck. What we need is a way to introduce new hypotheses. What we need is a Monad.

Unlike an ANN or an erlang program, lifeforms can replicate themselves (approximately). For simplicity let’s confine ourselves to asexual reproduction. If we take ‘Meta a’ to be the type of a group of ‘a’s, then reproduction is an arrow (a -> Meta a). Naturally, reproduction happens within a larger group, so we can always stitch a reproduction into the larger whole, so we really have (Meta a -> (a -> Meta a’) -> Meta a’). This gives us something quite powerful: a chance at immortality. There’s an old math puzzle that sets up like

Suppose we have a bacterium. At each time step, the bacterium either dies, or splits in two, with probability p, 1-p respectively

It turns out that the exponentially branching growth cancels the exponentially decaying chained probabilities and we get a finite probability that the bacterial lineage never dies (Try to prove it!). Now, this comes just from a constant death probability for each bacterium. In real evolution we can do better, the organisms with a better p are more likely to live longer, so we expect the average p to increase steadily. Barring large extra-systemic constant fluctuations (like the planet exploding), “life” (which is to say, descendents of the proto-slime) is pretty darn near immortal.

Note: I’ve described asexual reproduction here because it’s simpler, sexual reproduction also requires (local) interaction of the group, rather than an individual, but it’s otherwise similar.

What does this mean for us lowly humans? The good news is that ‘humanity’ is a sort of meta-thing, and so has all the strengths I’ve spoken of. The human meta-entity absorbs knowledge from its constituents, immortalized in a chain of human communication. Even while individuals and groups may wax, wane, and die, humanity marches endlessly forwards. The bad news is that I’ve hidden a problem from you. I’ve hidden it because I’m not sure how to fix it. Unlike the bacterium example which produces exact copies of itself, real replicators don’t produce exact copies. Certainly our chain of decedents will be near immortal, but in what sense will they be “the same” as us?

We’d like to say that descendants are “the same” in the sense that they are clustered nearby in thingspace. This gives us a clue about what sort of things should be allowed to go meta. Particularly, it makes sense to talk about a collection of `a` as an independent `Meta a` in some context if it’s behavior in that context does not depend strongly on the behavior of any individual or small group. That is, `Meta a` is differentially private! This criteria makes it clear that one of our previous examples doesn’t work as well as the others. While an organ is stable even if a small group of its cells die, an organism has much less tolerance for organ failure – a small heart defect can take the whole system down! The teleological view is that the body uses heterogeneous organs to save resources in making a “minimum viable human” and makes a stability tradeoff in doing so – homogeneous systems are more stable because they have more symmetries.

Shift perspective downwards: The brain is very stable to seemingly dramatic rewirings, not just individual cells, so maybe it’s build on something larger? Take the internal view, that the mind is composed of many competing subprocesses vying for control, each one thought of as a hypothesis about which action to take. This creates a sort of evolutionary game for thoughts (spatiotemporal firing patterns), where a thought lives when it’s firing and is otherwise in stasis/dead. The individual thought dies but the mind salvages the remains and is better for it. The power of the mind is the ability to keep playing.

Both individual human minds and the human meta-minds (Kami) are turing complete, so they should be able to process the same sort of things. Digital immortality suggests that thoughts should be substrate independent. Humans are fragile, the meta-mind is immortal; yet we can live only through our own eyes. Is it possible or even meaningful to “blow up” a human mind, embedding it instead as a distributed entity, rather than porting it one-for-one to a computer? I suspect not totally, since the network topology of human civilization is very different from that of the brain (for one thing, there’s a lot more latency). However, the tales of “charismatic leaders” becoming Kami is tantilizing, and suggest that at the least, human minds can act as a seed for distributed entities.

Reflective Agents

TL;DR: A framework for reasoning about the relationship between present/future beliefs in the presence of self-modification, and about a theory of mind allowing AI to model human experience/utility. Currently just a braindump to be filled in with actual research.

To rephrase a bit from earlier:

An agent must process its sensory information/measurements (the “map”), to infer it’s actual status in the world (the “territory”). We often consider this as finding hidden variables, but sensory data is fundamentally very different from the universe it betrays. It’s not “about” anything until interpreted so, and could stand independently as a mathematical object.
For a fixed agent `i`, consider two categories.

The first, `MAP_i`, whose objects are sensory data, including internal states (“thoughts”), everything the agent `i` uses to reason about.

The objects of the second, UNIV, can be interpreted as predictive models over MAP_i, but should more accurately be interpreted as “things” (logical propositions, the possible physical states of the universe, etc). The interpretation as predictive models is recovered via fixing a forgetful functor (R_i : UNIV -> MAP_i) mapping physical states to sensor readings. The morphisms of UNIV are more complicated, but should (at least) take the form of a framed bicategory with vertical arrows as “logical” maps, vertical arrows as “stochastic” maps. A starting point for a concrete representation is the category Stoch (see for ex. arXiv:1205.1488). The causal structure of UNIV lets objects “roughly factor” as a product of smaller (i.e. lower entropy. See: arXiv:1106.1791) objects.

The real challenge is inferring reality from observation. Interpreting Occam’s razor as a maximum entropy principle, we can freely generate the “best approximation to reality” from sensory data by taking the maximum entropy model generating that data. Of course the functor (L_i : MAP_i -> UNIV) is uncomputable, but since it is characterized by a universal property, it is still well-founded as a function, so can be manipulated formally, and computable approximations can be found.

The details above remain to be worked out, but the motivation for this approach comes when we find the suitable adjoint pairs R_i and L_i:

Then they yield a monad (R_i . L_i : MAP_i -> MAP_i) given by inferring the world, and then restricting to sensory data. Intuitively, this is the “self reflection monad”. If we generalize to let the indices vary freely we instead get (R_j . L_i : MAP_i -> MAP_j), the “theory of mind” indexed monad.

Now consider the internal logic of this setting, sandwiched between two applications of the monad, i.e. the logic carried out by some agent. Then there are two operations, “reflection” and “reification” that let us shift down or up a level in the monad stack. (see: Andrzej Filinski’s work, ex. “Representing Monads”). Reflection lets us believe we experience what we believe we believe we experience (with some side effects in the form of disciplined effort), and reification lets us believe we experience what we actually experience (with some side effects in the form of introspective effort). Reflection is analogous to an AI self-modifying to believe/act the way it thinks it should believe/act, and reification is analogous to an AI making its beliefs explicit.

An interesting point here is that the monad is over MAP – you can’t talk about what is “true” abstractly, but only make concrete predictions of experience, implicitly utilizing UNIV as part of the context. This may be a flaw needing correction, though it is more compelling to consider it as a solution to inconsistency. By forcing each recursive step to occur between the monad, monadic effects witness each step, and maintain coinductive productivity. However, the classic notion of ‘denotation’ is lost in the general case (we cannot simply avoid Godel’s theorem). This might not be so bad though, as many propose (ex. J-V Girard) that interaction is primary, while denotation is incomplete. i.e. we may still be able to reason about and put bounds on the long term behavior of self-modifying A.I., even if the process never converges.

Fragmentation

There is an equally concerning dual problem to convolution, and that is fragmentation. While convolution is about multiple ideas hiding in the same terminology, fragmentation is about a single idea hiding between multiple terminologies. This causes problems when we understand different aspects of an idea

For example, I know that eggplants are delicious, and know many recipes for them, but then I’m placed in a bizarre British cooking show, and told that today’s ingredient is aubergine. What is an aubergine??? I’ve heard it’s a nightshade, that sounds poisonous…

Notice there are actually two places for fragmentation to hide, in the hierarchy, Thingspace – Thoughtspace – Wordspace, it can hide at either junction. We could have multiple words referring to the same idea. This is analogous to having multiple textual aliases for some function in code. It’s annoying, but not that hard to solve: textual substitution is easy, we can just look up the definition and unify them. In practice, humans are very good at on-the-fly substitution/translation, so these sorts of fragmentation survive but don’t cause us too much trouble. The more difficult sort of fragmentation is at the Thingspace -Thougthspace link, analogous to two different algorithms that do the same thing. Unlike in Wordspace, where most translations are just direct substitution, and equivalent ideas are strictly equivalent, computing equality of generalized ‘things’ is Hard (think function equality, but harder). Part of this is because ‘things’ want to be weak; the way in which things are “equivalent” matters.

There’s a more general problem. Maybe two ideas don’t represent exactly the same thing, but they sorta seem to have related subcomponents. The “sameness” of thoughts/things is related to the degree to which their subcomponents match.

It seems to me that the optimal way to think (and the optimal way to design an AI.. hint hint), is to break all problems down into a complete set of primitive ideas, and then memoize all the ones that get used frequently. An immediate question comes up: How do we know when we’ve found a minimal concept? This business of seeing internal structure, and factoring concepts into their primary components is exactly the sort of thing category theory is good at, and so this problem has a solution. The notion of “universal property” exactly encodes what we mean by “minimal concept relative to some property”. Roughly, a concept has a “universal property” if every other concept with that property can be described in terms of it. For example, a “product thing” for things A and B has exactly the information to give you an A or a B via projecting. Any other thing that could give you an A or a B can be described as a special case. I encourage you to actually read about universal properties, because I can’t give satisfactory coverage here.

The whole point of this is that Fragmentation is a big problem whenever you’re trying to coordinate research between diverse fields of study, because it leads to the encryption problem. Fragmented ideas “want to” be united, in that they have higher entropy than their unification. When two ideas merge into a lower entropy state, the information difference (“counterfactual pressure”) is released as a sort of free energy and can be captured via “counterfactual arbitrage

Decisions, Pointwise

Lemma: Decisions can be modeled by some algorithm
Proof: Consider writing down the list of all your actions. There is some algorithm which generates this string.

This is one of the main premises of timeless decision theory. The idea being that to “choose” makes no sense, you always make the decision that subjectively maximizes your utility function, you just aren’t sure what that decision is until you make it. The feeling of “being in control” comes from the process of generating counterfactual scenarios and evaluating their utility until we find a maximum. Because of the way our brain processes memory and imagination, these imagined scenarios feel like they “could have been”, if only we had made a different decision.

One of the practical implications of this approach is some advice on a LW willpower thread – that, rather than decide our action in a particular scenario, we should instead choose as if we’re choosing the output of our decision algorithm. This supposedly makes it easier to maintain (for example) a diet. But why should that be the case?

Let’s take for granted that our decisions are determined by some biological algorithm, but which algorithm? When we make decisions, it feels like each one is a fresh scenario – we could choose to do anything we wanted, it just so happens that we choose predictably. This corresponds to a pointwise encoding of our decision algorithm – the algorithm which simply stores the literal sequence and prints it.

Take for granted that we have some goal (say, to not eat icecream). This goal imposes a pattern on our sequence of actions, and whenever there’s a pattern to a sequence, we can compress it into a lower entropy representation.

Let’s assume for a moment that willpower is a semi-finite resource, and that decision fatigue and ego depletion are real effects. More generally, we can just assume that there is some information (semi)conservation principle in the universe, which seems plausible but is not well understood. In this setting, an agent would want to make high impact but low-complexity decisions – it must make tradeoffs between being correct and conserving energy, so it makes sense to choose a simple decision rule whenever possible. However, the real world is not so simple.

Consider an iterated game between two bounded agents. If the amount of processing power they have is greatly unequal, the stronger agent will nearly always win, because a more complex strategy requires a more complex response. Shifting this perspective, we can consider any environment as an agent. Clearly, the rest of the world has more entropy than you, so in general, any simple strategy you come up with will be incomplete. The best you can hope to do is make decisions point-wise, considering all prior information every time you make one. One way to think of it, which may or may not really be how the brain works (but which is still useful because it is a bound on all computational systems), is that every time you commit to a simple rule, you spin up a subprocess dedicated to that task. In practice, you can’t just “decide” to commit to a rule (as many LW zealots would suggest), it’s more like forming a habit. So committing to a decision rule (spinning up a subprocess) costs energy, but takes much less energy each time it is invoked, because it efficiently compartmentalizes information as an in/out process. A cute example I use is to always pick what the other orders at a restaurant (or their second choice, if they have one, to improve variety). It took a bit of time to think this up and commit to it (not much!) but it saves a lot of thinking in the future, with pretty good results. The general principle is that the best you can do in an infinite game is to pick simple rules that capture “most” of the value. However, it also costs energy to modify a strategy, killing an old habit is often harder than starting one – it gains momentum.

Shift perspective again to consider not individuals but systems: corporations, religious or political groups, etc. These too can be considered as agents (in a much more salient way than “the environment” in general), with their own goals and  strategies. Such systems have a benefit we don’t: they can add computing power (members) relatively easily. This sort of cosmological expansion greatly magnifies the momentum of any existing strategies, as each new member is likely to inherit it. This pattern is commonly seen in the lifecycles of corporations: a lightweight company captures some market inefficiency with a new approach; it balloons up in success and becomes too rigid to adapt, either coasting their way into irrelevance or getting killed by a more agile competitor. If only they could stay at that sweetspot: powerful enough to afford risks, without being stagnated in bureaucracy.

The only groups that seem to resist it are those with “visionary” leaders that can synchronize the organization while still making quick decisions. But these groups are fragile, dying along with their leader. As soon as distributed decision-making is allowed, the system gains momentum, allowing persistence but preventing change.

When enough momentum is gained, weird things can happen, like practices that everyone (well, a sizable majority) hates but that never seem to change. What’s going on here? From the inside, as a single member of an organization, it can be maddening. You can see at a smaller scale than the kami you are a part of: what is “locally obvious” to you, may be too subtle, too expensive, for the organization to adopt. The microscopic reasoning is that simple, ambiguous ideas, can be fit into more people’s worldviews. Conflation (logically the “with” connective) plays a large roll here, allowing multiple ideas to be bundled together, choosing the best one for each convert. Conflation is dangerous though: it is not denotational, but operational, meaning that it does not preserve teleology so it’s end result is unpredictable, and will almost always take a mind of its own.

So you damn better seed your organization right the first time.

Natural Language is Conflated

For a long time I’ve been unsatisfied with natural language. At first, in my naïvety, I thought it was just English that was unsuitable, but it seems there’s a deeper problem that even engineered languages like esperanto don’t help with. People talk past each other, perpetually missunderstanding, driven apart by nitpicking correct phrasing.

When people asked me if I thought in words or pictures I blinked at them, “I think in thoughts, doesn’t everyone?”. I’m still not sure, but it seems that many do not notice the inadequacy of words. Maybe I’m exceptionally bad at phrasing, or maybe most people subconsciously restrict their thoughts to mirror their words so that translating isn’t a great big mess.

It remains to clarify: what makes up a thought, how does it differ from speech? First, what is similar? They’re both ways to pick out points in thingspace, (which you should understand as an injective map from thoughtspace/wordspace resp. to thingspace. Almost technically, there’s actually maps wordspace -> thoughtspace -> thingspace, and the map wordspace -> thingspace is the unique composition, so this suggests that thoughts are at least as powerful as words. Thoughts are anonymous, but words are named. Effectively this means that you can pull a word out of the aether by its name, but you can only pull up a thought by association; imagine words as organized like a dictionary, and thoughts organized by a graph, with edges as metaphors/links in associative memory etc. (this gives us some indication towards their structure as categories). The most apparent difference is that thoughts can refer directly to internal perception, whereas words are completely incapable of serializing sensory perception unless the receiver shares certain experience with you. Words can refer to the thought of sensation (which you assume they share), but not the sensation itself (which they almost certainly do share, being human).

While Wordspace and Thingspace are straightforward at least in concept, Thoughtspace deserves further explanation, as the mediation between the two. The objects of interest in Thoughtspace can be thought of as spatiotemporal firing patterns in the brain. Choosing the correct formalization for these firing patterns  is a hard scientific problem (where should we delimit the boundary of one pattern to the next?) so we’ll sidestep the problem for now by invoking the Mind/Space hypothesis, instead considering the subjective experience of firing patterns as ‘formations’ relative to a fixed observer. Oddly enough, the boundary problem disappears in this perspective: each thought obviously feels distinct, but they are connected by a similarity metric. The hypothesis is that these formations have some scientific characterization in the brain, which seems like a reasonable assumption. I suspect that thoughts only have a clean representation as formations relative to their originator, and that comparing the “objective” representation of thoughts as firing patterns is intractable in general, because of the encryption problem. So “formations” really are the more natural setting.

Thoughtspace is the world we really “live” in – it’s the only world we can actually experience, but through it we can know both Wordspace and Thingspace. The structure of Wordspace is logical. The objects of interest are strings of symbols (interpreted as logical propositions), and their connectives are algebraic manipulation. We can think of it as the union of all symbolic logics. This includes not just traditional logic, but also all constructive objects, arrangements of particles, etc. Since thoughts are “just” arrangements of discrete particles in the brain (but maybe not, if continuous quantum interactions turn out to be important for cognition), we could represent thoughts directly in Wordspace. However it makes more sense to think of it as a sort of completion of Wordspace, where the thoughts are distinct objects, that can be represented as potentially infinite arrangements of particles. The dynamics of thoughts-in-the-world, then, are inherited via projection (Thoughtspace -> Wordspace) into a finite brain. Going in the other direction, Wordspace can be embedded in Thoughtspace as those thoughts that can be written down (or otherwise serialized). This embedding is not natural though: the association between words and thoughts is learned (in a non-unique way) through experience – it includes many extensional elements such as feelings, objects (tigers, etc.), and temporal patterns. (A lot of communication errors can be cleared up by remembering that words don’t have independent meaning, trying to determine what thought the other party has in mind, rather than assuming they use the same mapping).

Thoughtspace inherits the logic of Wordspace but it also has it’s own fuzzy logic. Internally, these fuzzy arrows are the “feels like” connectives. For example: a bench is like a rocking chair; a bench is more like a rocking chair than it is like a snake. We can justify the statement that a rocking chair is like a bench, but it is not a logical statement. Rather, the “sameness” is a summery of how many contexts they are equivalent in. Ex: For “most purposes” a rocking chair is indistinguishable from a bench. This acts like a sort of probability space, with 1 being equivalent in all contexts and 0 being never equivalent, so we can start with the category Stoch as a rough approximation. We can also talk about the properties of contexts where they are similar, so the arrows have computational content in addition to weight. For example, rather than considering all contexts, a rocking chair is even more similar to a bench when restricted to contexts of sitting – they’re still a bit different in that sitting in a rocking chair feels a bit different – on the other hand, benches are often made of stone while rocking chairs almost never are, detracting from general similarity but not affecting sitting much at all. It’s all a bit handwavy but I hope the intention is clear – enumerating contexts and quantifying “sameness” is the key.

Careful though! This similarity space is in general NOT a consistent logic in the way you’d expect. My favorite illustration of this principle is the “numerologist’s folly”, exemplified by crackpot sites like this. What’s going on here? At the risk of sounding silly, I’ll point out the general flaw: While each step has high similarity, the contexts they consider are different, so care has to be taken when composing them. “New York City”, “Afghanistan”, and “The Pentagon” are all similar in their number of letters, and it’s true that New York was the 11th state of the union, but these two contexts are different! If we want to compose them as connectives, we have to conflate the contexts, so “New York City”, “Afghanistan”, “The Pentagon”, and “The state of New York” are all similar in the context “number of letters OR state order of joining the USA” which doesn’t seem like a very useful connection now does it? It’s a silly and obvious example, but this kind of conflated reasoning happens quite frequently in more subtle cases. Understanding the true structure of Thoughtspace will let you wield it’s power while avoiding such pitfalls.

The relation between wordspace and thoughtspace can be made sucinct by considering the quote:

When you draw a boundary around a group of extensional points empirically clustered in thingspace, you may find at least one exception to every simple intensional rule you can invent.

Then the embedding of wordspace into thoughtspace/thingspace are exactly the ones that can be described by intensional rules.

The major task now is to translate these subjective dynamics into something that can be quantified mathematically, measured externally, and communicated clearly between people.

I love it when crazy people say things I agree with

Quote

“Physicists suffer from a disorder of the mind that causes them to believe that sensible, temporal objects have more reality than eternal, immutable Platonic mathematical objects, and to place more trust in their senses than in their reason, more trust in the scientific method of ‘evidence’ than the mathematical method of eternal proof. ”
– Mike Hockney, Why Math Must Replace Science (The God Series Book 18)

Social Membranes, Genre Encryption, and Super-Secret Tech

There’s a common problem of good ideas being fragmented across genre. Recently, I’ve begun to consider it THE (non-obvious) problem in knowledge advancement.

Let’s take a computer science approach to make it more clear why it’s THE problem. The search for the truth is indeed a kind of search, so it makes sense that you’d want to use a search tree. The nice thing about search trees is that they can be traversed in parellel. If we view humanity as the program, humans are the  threads. How do we keep threads separate?

In a Von Neumann computer, keeping threads separate is trivial, but for humans, you’d have to prevent all communication. Now, it gets a bit complicated, because each human (thread) is also concurrent – a human can work on many different things at once. Now we can’t simply cut off all communication, because you might need to communicate with different groups for different tasks, so we have to be selective about which content can be communicated with whom. How does that work?

I’ll propose that one mechanism is via hijacking genre. Genre are a convenient heuristic for grouping information (sciencey sounding things tend to contain information about the natural world, religiousy sounding stuff tends to contain metaphysical poison etc.), but it’s nowhere near perfect. Viewed in this way, genre can be used as a sort of encryption by phrasing it in genre-specific lingo. The only way to decrypt it is to both understand the lingo, and buy into the genre. This last bit is important, it’s not enough to just understand what they’re saying, because even the absolute truth spoken in a monologue about Lord Xenu is likely to be dismissed anyway. I should point out that the encryption is not explicit, there is no original plaintext understandable by everyone. It is more so that the ideas exist in a different “basis”, and it works because computing idea equality is hard[fragmentation], so most people filter it out via the genre heuristic.

The “team identity” effect causes these fuzzy differences in genre to self organize into sharper “social membranes”, which roughly approximate different search threads.

I approach all of this as a scientist, and so like to think of the science genre as the “main branch” because it contains sufficient epistemology to ~learn everything~. The end result from this perspective is that some useful gems for science get hidden in other genres. This post was motivated by two particular examples, which I’ll get into next time. But there’s so many that I’m beginning to collect some of the more exotic ones.

Consciousness and Intelligence are Convoluted

There’s been great discussion on LW as to the value of consciousness as a concept. The general conclusion many have come away with is that we should probably just taboo ‘consciousness’ and get to the meat. I tend to agree.

I’d like to present a slightly different reasoning though. The feeling of consciousness in ourselves and others is a hardcoded trait. This immediately should lead us to be very suspicious of it as a consistent concept. It’s clearly useful for development, if only to be used as a proxy for “human-ness”, but I’ll argue that it is just that, a heuristic. There are many interesting phenomena hiding in ‘consciousness’, but they should be considered distinct phenomena. They are bound together by the their shared name, and we often switch between them in conversation without noticing, assuming they’re the same concept. That’s right, ‘Consciousness’ is convoluted.

Here are a few things people tend to mean with conciousness, each interesting on their own. It is not immediately obvious that they represent the same phenomena, though I suspect that they are deeply related. It’s important to lay them out as separate ideas so that their connection can be made explicit, rather than equivalent “by definition”.

1. The feeling of free will – The generation of counterfactual scenarios and an evaluation of those scenarios
2. The feeling of self awareness – To contain a model of ones self and mental processes
3. Perception of qualia – The recognition of sense as “internal” experiences, such as awareness of the color red.

Similarly, the term “Intelligence” is convoluted and we should taboo it. Some possible meanings are

1. Consciousness – yes, sometimes consciousness and intelligence are used synonymously
2. Containing sufficient processing power, being sufficiently complex so as to be unpredictable – in common usage, we sometimes say someone is “intelligent” if they can learn and think quickly. Many times things feel intelligent if they are complex.
3. Acting sufficiently agent-like – Intelligent things feel as if they act according to goals and rational decisions based on those goals.
1. Often a good heuristic for “agent-like” is “self-like”, if you consider the class of all things you encounter, you’re probably one of the more agent-like things you deal with. So in many cases, this is the feeling we’re actually referring to when we say “intelligent”. This one is just flat Wrong, beware of it.

Consider these meanings, which you tend to use most often, and how they might be related. Be mindful of how you use them during conversation and when you feel the urge to switch meanings; it should greatly improve the clarity of your arguments.

NULL

Sorry, there’s nothing here!

This is a stub to keep track of all the articles that are intended to be written as background or motivation. If there’s a link here, it’s because I already have an idea to fill in, but just need to write it.

I won’t ask you to “just trust me”, but at least trust that if there’s a link here, the point wasn’t simply overlooked. Use this as an invitation to think about why the point in question might or might not make sense.

Convoluted reasoning

There’s a general trend to convoluted reasoning, which I think captures a wide range of common logical flaws. Convoluted reasoning captures a frighteningly common pattern even among brilliant thinkers: It’s all too easy with informal reasoning to follow a chain of intuition, where each step seems reasonable, only to end up at a totally incorrect conclusion. This is often used maliciously to convince people of nearly anything, as a defense, giving rise to “motte-and-bailey doctrine”/”strategic equivocation”, internally as rationalisation, and in the most sinister case, as a route for smart people to do something truly stupid. It is also the root of the map/territory conflation.

The mark of convoluted reasoning is implicit conversion between distinct objects which are related but follow very different logics. Let’s be more rigorous:

Consider a collection of categories and a chosen isomorphism between each. Take, for example, the Mind / Space categories, or many copies of the category of (your favorite) verbal logic, connected by the isomorphisms swapping between alternative definitions of words, or the categories induced by different metrics on the same type of object. The connections in an individual category compose, just as logical deductions should, but the isomorphisms on the underlying sets don’t induce functors – they don’t preserve the arrows! Of course, to our naive human brain, they’re all just the same sort of connection – I’d guess that most people don’t explicitly represent the categorical structure, preferring instead to represent everything more like simple set functions. In fact, I’d wager that at the lowest level, our brain saves space by collapsing the multiple categories into one space, so that (set) isomorphic objects (like multiple definitions of “privilege”) are literally identical, until there is a need to disambiguate them. Similarly, in the Mind / Space problem, we don’t often think of mental things being different from physical things, but rather that everything is either mental or physical (depending on which side of the wall you stand on).

So your’e probably already familiar with one example of how things can go wrong with convoluted reasoning, and that’s equivocation. We dance around in one category (collection of definitions for word), drawing all sorts of conclusions, like that “privileged” people should all shut their trap and maybe die in an excruciating fire for good measure, and then secretly jump into the other category (change definitions), so that you can include everyone you don’t like under the umbrella of “privilege”. Now substitute “privilege” for any demographic term you can think.

Don’t be fooled though, convolution isn’t just about switching definitions, it’s more about switching contexts. To see this, recognize that the same problem can arise where your isomorphism is literal identity, like so:

One very frequent case where we have many different categories on the same object are the categories induced by different metrics. In this case, the arrows represent “similarity” or “closeness”. You can often think of each possible metric as the “natural metric” on its own space, but they all happened to get mixed together in the space you’re working with. For example, the euclidian metric is the “natural metric” for $\mathbb{R}^n$, while the taxicab metric is the extension of the “natural metric” on a rational grid, embedded in \$\mathbb{R}^n\$. It is not always so obvious what the “natural metric” you’re looking for is (Hopefully I will address this in a future post).

A natural scenario for this to arise in is machine learning / pattern recognition. I’ll talk about a specific case that has interesting implications:

Consider the biological tree of life. Clearly, some species are “more related” than others. What’s not so clear is what we actually mean by “related”. Do we mean “has the most similar function”? This is clearly wrong, and has mislead biology for a long period before the study of genomics was invented. Then do we mean “has the fewest number of ancestors separating them” (so brothers would be closer than cousins, etc.)? This seems reasonable until you realize that not all mutations cause equal deviation. For example, some organisms, such as Ctenophore, have remained relatively basal throughout evolution, they haven’t deviated too far in “phenotype space” from their ancestors. So what’s the “right” metric? I won’t try to solve this here because I think it’s a hard problem (and it may be that there is no unqualified “right answer”, and the feeling that there should comes from built-in patter seeking heuristics). Rather, I’ll point out that computational biology has developed (read: apprehended from mathematicians some 30 years after the fact) a large collection of different metrics for this purpose, many of which resolve to metrics on genomic or protein sequences, such as edit-distance and its more biologically relevant varieties. Because of the complexity of data, we often need to use slightly different metrics for different types of data, for example, adding different sorts of normalizations to make everything match up. We often want to chain these comparisons up, so that if gene expression profile A is similar to gene expression profile B, and gene expression profile B is similar to drug response profile C, then expression profile A should be similar to drug response profile C, and we’ve found drugs to cure cancer, yay.

This tends to turn out badly for a few reasons. Barring the fact that some of our similarity measures don’t even obey the triangle inequality, the main reason this doesn’t work out is that they’re mutually incompatible measures. Sure, they kinda-sorta compose, we benchmark them zealously until something sticks, and you can still discover useful things with them (or at least be convincing enough to get published in a nice journal, and really friends, isn’t that the definition of useful discoveries?) but they lack the theoretical niceness to even put error bounds on how badly they can fail to compose. This is fundamentally because they are arrows in a different category.