On GroupThink

A recent SSC raised the question: How accurate is this interpretation?

Students who characterized their relationships with other students as “competitive, uninvolved…alienated” were more likely to show gains in critical thinking than were students who portrayed their peer relations as “friendly, supportive, or a sense of belonging” The data in this study do not permit confident explanation of this relation, but one might speculate that a sense of participation in a friendly, supportive peer environment may require a partial suspension of one’s critical thinking skills.

I see two compelling extremal options:

  1. The analysis in the quote is correct
  2. Getting along in a diverse world means you have to compartmentalize your object-level beliefs, in favor of niceness. This may look like “Suspending critical judgment” on paper.

There’s less charitable interpretations, like “Dumb people are friendlier because they need the support” (kind of the reverse side of the proposed etiology in this paper), but I’ll ignore them because considering perturbative environmental effects is more informative for personal decision making than selection effects.

While I agree that socialization doesn’t require suspension, as an implementation detail I think this is exactly what humans do for efficiency. Consider that every time you think “no. wrong.” in conversation, it generates irritation and uses mental resources deciding whether to let the transgression slide, for the sake of civility. At least, this is how it feels to me, and I can’t think of any way to shortcut this computation without downtuning the module elevating incongruences to conscious attention (A necessary component of critical thinking).

It’s possible to compartmentalize this effect, so that you can swap out critical thinking modules depending on context, but humans are very sensitive to conditioning1 so there will always be some bleed-through without (expensive) conscious intervention.

Generally, questions of the form “Is this annoying social thing worth ignoring for the benefit of collaboration” are hard to evaluate. So the default is to increase or decrease your conditioned sensitivity with each experience, not evaluate on a case-by-case basis.

Even more generally, when communicating

There’s an enormous cost to

  1. internalizing their ontology
  2. translating it to your own
  3. reflecting to system 1
  4. fast thinking
  5. reifying back to system 2
  6. re-translating back to their ontology

Compare this to

  1. fast thinking

which is possible only if you share ontologies.

I like to think of this as being “homoiconic” with respect to the other person: you can interpret their words directly as thoughts and vice versa.

So there is a huge incentive to be homoiconic with your society, to the extent that we make sacrifices elsewhere. Critical thinking is a process of continually refining your ontology, but this threatens to desync ours from the collective, so the incremental gains are outweighed by the large constant factor of being synchronized.

The “Level Up” behaviour seems to be fragmenting off a “core” ontology through aggressive personal unapologeticness, while selectively switching into special purpose social modes. Mastery is gained through the two Siddhis of Reflecting and Reifying Ontologies (internally, Amalgams), progressively improving the compartmentalization of thinking modes described at the start.


  1. If you find the evolutionary psychology argument is unsavory, consider that you can’t possibly store a separate thinking mode for every person, at some point you have to collapse the probability mass for nearby categories. 

Technical Language for Pragmatic Neuromancy

It’s very difficult to talk about the internal subjective dynamics of the mind, precisely because such dynamics are non-denotational. As such, it’s important to use precise language, to distinguish technical terms. Mathematicians tend to overload common words like “category”, and “group” as technical terms. This works for them since it’s usually clear from context and tradition when they’re being technical. However, when discussing such nebulous concepts as the dynamics of mind, it’s very easy to get them mixed up. For example, when I say “idea”,  I could mean a mental voice proposing a course of action, a mental image of the solution, or the platonic thing, independent from any particular thinker (in the sense of “The idea of communism”), etc.

As such, it’s important to have a technical language for such technical topics. I mean technical in the sense of “precise in its domain”, but I do not imply any of the scientific strength normally associated with technical language. An unfortunate quirk of human reasoning causes named things to be more “real” than unnamed things – naming a concept gives it Authority. One way to avoid this is to give fantastic names to poorly understood phenomena.

Ancient Buddhism has put a lot of time into “research” on the subjective dynamics of the mind, so I will co-opt some of their technical terms. In fact, there is an entire language, Pali, already devoted to precisely the purpose of immortalizing the technical mental concepts of Buddhism. I won’t be using exactly the original meanings, so it should be made clear that whenever I use a pali word, my definition will shadow the original meaning (and will be appropriately hyperlinked to avoid confusion).

The most central idea of Neuromancy is that mental objects have real subjective existence, and can be manipulated at “their own level”, in a way totally unlike our normal interaction with the world. Siddhi (सिद्धि) is often associated with fantastic supernatural powers, but its direct translation is “Accomplishment, Attainment, Complete Knowledge”. I will co-opt it to refer to a particular experiential understanding of mental objects. There is a hierarchy of progressively deeper “truths” to be attained. These aren’t truths like true statements, but truths like seeing in higher fidelity – seeing through an optical illusion. Mastery in neuromancy is about refining perception. The skills of manipulation follow naturally “For Free”. A deeply gratifying feature of this line of practice is that. Unlike normal factual knowledge, truly incorporating it changes who you are, at a deep level. It is impossible to fully understand Siddhi intellectually, and it is impossible to understand them without changing the way you interact with the world. Where the light comes, the darkness must be cast out. The progress of enlightenment is the attainment of increasingly central Siddhi, but it is not the straight path many would have you believe. Like fields of science, there is a menagerie of interrelated experiential skills to achieve – some more central and profound than others.

It seems too good to be true, that mastery over the mind could come as a side effect of simple observation. How could this be?

Suspend disbelief for a moment and imagine yourself as a Cartesian duality. \Gamma \vdash \Delta. The watcher \Gamma watching the world \Delta. Everything in \Delta is what you see, and \Gamma is who you are. You can’t see inside your mind, but its thoughts and processes can be reflected into the visible world via introspection. Similarly, visible patterns and statements can be absorbed into the context. You can no longer see what’s in the context; it is just how you are.

We normally think of knowledge as the facts we can recite, or the skills we can describe. This sort of knowledge lives on the right – we can see it. The other sort of innate knowledge, like riding a bike, or painting a beautiful picture, live on the left. Similarly, what you believe lives on the left, and what you believe you believe lives on the right. There is a clear duality between left and right, between perception and object. To change your view of the world, you can either change the world, or change your perception of it. To fix a naturally changing world, you must change perception exactly opposite to it, contravariantly. This dance is in complete symmetry; to be perfectly wrong requires perfect knowledge of the reality.

There is good evidence that our “volition” is illusory: actions can be predicted before we are conscious of the decision. The body runs on autopilot, with the consciousness trailing behind. So what, then, is the point of consciousness at all? There are two purposes I see: One, our sense of volition gives us the ability to audit our actions, training the body-suit for a next encounter. Second and most salient to neuromancy, by controlling attention, consciousness can filter the perception that’s sent to autopilot. In a very real sense, the only control we have is of what we observe, so we best train attention to the limit.

It is in this sense that mastery over the mind follows directly from introspection. All you have to do is see a mental phenomenon, and this is sufficient to unravel the illusion, collapsing into a more stable mental state with more degrees of freedom. But take note: by “see” I don’t mean to describe or label. Simply noticing “I’m having maladaptive thoughts” is not sufficient to dispel them – that knowledge lives on the right. They must be truly seen, at the root level on the right – the arising of each thought strand, the wave of discursive thoughts they trigger.

Siddhi can be big and small, from the mundane of “how to see negative space”, through the technical of “groking recursion”, all the way through the boarderline supernatural of metabolic control. Generally, they provide a way of measuring Levels of Excellence, and share the property of being hard to evaluate in others, without feeling it yourself. I’d like to devote a significant portion of future blogtime to pointing out particular Siddhi of all shapes and proposing exercises to solidify them.

A Narrative Against Narratives

EDIT: Teleology is not the right word here, and sometimes I incorrectly use it interchangeably with Etiology. I actually mean the common thing that acts like both: that primitive mental object that captures the feeling of “becauseness”, so bear with the error until I can unpack and update.


It’s often suggested that the best way to present information is to form a narrative around it. It dresses up boring facts into an engaging package, directs the audience’s attention to the key conclusions, and makes an argument more persuasive – what’s not to love? Well… the last bit. Narratives are a Dark Art: they are more convincing than their argument supports. This poses a serious problem, not only of the unscrupulous brainwashing the masses, but also for over-convincing yourself.

We use narratives to collapse the multitudes of reality into a single legible teleology, and teleology are certainly very useful. But teleology is not narrative. Narratives are a particular encoding of teleology into language. Language is useful because it encodes exactly those thoughts that can be serialized and transmitted to other people. In a Lispy sort of way, it’s convenient to use the same representation for speech as for thoughts. i.e. Our internal monologue is homoiconic. This way, we can compose an argument in our heads and transmit it directly without translation/parsing for either the speaker or recipient. That is, it allows direct intensional communication.

The trouble with narratives though, is that collapsing into a narrative creates “sunk cost” inertia, making it harder to change your mind later. Even worse, all new incoming information is subconsciously filtered and absorbed into the inertia of the interpretation. It takes effort to even see the unnarrated world. If the narrative is correct, this is no trouble. In fact, it is beneficial to compress the world into symbolic representations strung together by narrative, providing an economy of thought… once you are certain you’re doing it right. But the issue with using narratives for exploratory reasoning should now be apparent. They’re too convincing for their own good

There are many possible narratives, to the point that it is possible form one around any collection of data, and support any position. You might think that by normalizing against competing narratives (the premise behind debates, courts, etc.), the more convincing one should be correct. Locally, that’s probably true, but the inability to find a better narrative is only weak evidence of its correctness, and this small gain is not worth the burden of cognitive inertia.


What is the alternative? The gold standard of reasoning is constructive logic, but most systems are too complex for it – that’s exactly the role of teleology. Rather than collapse to a narrative, we can try to hold the entire nonverbal, nonsymbolic Amalgam at once. The notion of Amalgam is a technical Neuromancy term, and so requires some unpacking. An Amalgam is like a collection of competing teleology (the nonverbal precursor to a narrative), but it also contains information about how those teleology are related, and their relative weights. At first pass, one may consider an amalgam as a dagger category whose objects are teleology, and arrows are oriented relationships (hence the dagger involution swaps orientation) between them. The nature of these relationships is nebulous, and can really only be understood (with extant technology) by experience. However, they can be understood as a sort of “sharing” of underlying evidence, like a groupoid but oriented, such that an Amalgam is smaller than the sum of its teleology. The degree to which they share is proportional to their subjective conditional probabilities, with the total weights estimated by “running” it as an MCMC i.e. considering each argument in relation to others, switching positions, etc. If that makes no sense, it will do little harm now to just think of them as collections of competing hypothesis narratives – but it’s important to remember that the internally they have a more efficient, nonverbal representation, allowing continuous shades of narratives that shares some of the neural structure.

These Amalgams are the way that knowledge with uncertainty is represented at the deeper conscious level, when not overwhelmed by seductive narratives. By avoiding narrative, we avoid collapsing into one incomplete meaning, instead preferring a network of interrelated meanings, which together constrain an accurate representation of our belief.

Unfortunately, this method has two major restrictions. By its nonsymbolic nature, the entirety of its structure cannot be delineated and serialized for communication, it can only be conveyed extensionally, by showing directed examples until the idea converges. It can even be difficult to introspect for the thinker inexperienced in nonsymbolic manipulation. Far worse though, the Amalgam is non-denotational – it’s interaction is rather operational, bouncing between related but logically distinct meanings. This lack of logical closure poses the same problem as narratives: the possibility to arbitrarily deviate from reality by accidentally or intentionally employing motte-bailey style conflations.

The benefit gained then, in addition to more efficiently holding uncertainty, is that nonverbal Amalgams do not carry so much weight: they are easier to discard in the face of new information, and do have the infectious power of narratives. Since they can’t be serialized, they can only be communicated by example, which essentially requires “recompiling” the idea, allowing the recipient to form their own interpretation, with bare reality as the ultimate steward of information.

Narratives are certainly useful. There are better ways to structure information clearly to give the right conclusions, but by unfortunate quirk of the mind, they mostly look boring and encyclopeadic or downright robotic. So the main use of narratives should be for engaging emotional content where appropriate. Particularly, they should be avoided in favor of Amalgams when problem solving in an uncertain environment.





Computation From Within

I’m going to talk about the qualitative computational strength afforded by evolution. More generally, I’m going to talk about what you gain when you weave a bunch of things that can do computation into an evolutionary game, a system of those things. In pseudocode:

Computes a => Meta a

Very simple, you might wonder if there’s anything useful we can say about this construction. First let me give some examples, in increasing order of complexity.

  1. Many perceptrons combine into an Artificial Neural Network (ANN)
  2. Many Actors combine into an erlang program
  3. Many cells combine into an organ
  4. Many organs combine into an organism
  5. Many organisms combine into a colony
  6. Many species combine into an ecosystem

Notice: Individual perceptrons are weak. In an ANN, each perceptron might only do simple addition, but the full network can be turing complete. Clearly, there is something gained here, but where does this extra power come from? We’re not just talking about extra memory or efficiency from added processors, we’re talking about a qualitative expansion in the types of things that can be computed; this is a big deal. The only place for this extra power to hide is in the connections between neurons. We can thus view ANN quite clearly as directed graphs with nodes labeled by perceptron computations.

If we generalize perceptrons to be arbitrary functions rather than simple arithmetic, we get something more like an Actor Model. Since ANN are already turing complete, this would at first not seem to gain us much other than convenience. Consider though, an actor can do something a simple function cannot, it can fail, and in fact erlang is famous for gracefully handling process failure. You can see now the relation to evolutionary games. If we interpret life forms as a hypothesis about how to survive the environment, then it’s a nice property that one hypothesis can fail without bringing the whole system down. But we’re still missing the secret ingredient to life: if we start with a bunch of hypotheses, that’s just one big meta-hypothesis – eventually they could all fail and then we’re out of luck. What we need is a way to introduce new hypotheses. What we need is a Monad.

Unlike an ANN or an erlang program, lifeforms can replicate themselves (approximately). For simplicity let’s confine ourselves to asexual reproduction. If we take ‘Meta a’ to be the type of a group of ‘a’s, then reproduction is an arrow (a -> Meta a). Naturally, reproduction happens within a larger group, so we can always stitch a reproduction into the larger whole, so we really have (Meta a -> (a -> Meta a’) -> Meta a’). This gives us something quite powerful: a chance at immortality. There’s an old math puzzle that sets up like

Suppose we have a bacterium. At each time step, the bacterium either dies, or splits in two, with probability p, 1-p respectively

It turns out that the exponentially branching growth cancels the exponentially decaying chained probabilities and we get a finite probability that the bacterial lineage never dies (Try to prove it!). Now, this comes just from a constant death probability for each bacterium. In real evolution we can do better, the organisms with a better p are more likely to live longer, so we expect the average p to increase steadily. Barring large extra-systemic constant fluctuations (like the planet exploding), “life” (which is to say, descendents of the proto-slime) is pretty darn near immortal.

Note: I’ve described asexual reproduction here because it’s simpler, sexual reproduction also requires (local) interaction of the group, rather than an individual, but it’s otherwise similar.

What does this mean for us lowly humans? The good news is that ‘humanity’ is a sort of meta-thing, and so has all the strengths I’ve spoken of. The human meta-entity absorbs knowledge from its constituents, immortalized in a chain of human communication. Even while individuals and groups may wax, wane, and die, humanity marches endlessly forwards. The bad news is that I’ve hidden a problem from you. I’ve hidden it because I’m not sure how to fix it. Unlike the bacterium example which produces exact copies of itself, real replicators don’t produce exact copies. Certainly our chain of decedents will be near immortal, but in what sense will they be “the same” as us?

We’d like to say that descendants are “the same” in the sense that they are clustered nearby in thingspace. This gives us a clue about what sort of things should be allowed to go meta. Particularly, it makes sense to talk about a collection of a as an independent Meta a in some context if it’s behavior in that context does not depend strongly on the behavior of any individual or small group. That is, Meta a is differentially private! This criteria makes it clear that one of our previous examples doesn’t work as well as the others. While an organ is stable even if a small group of its cells die, an organism has much less tolerance for organ failure – a small heart defect can take the whole system down! The teleological view is that the body uses heterogeneous organs to save resources in making a “minimum viable human” and makes a stability tradeoff in doing so – homogeneous systems are more stable because they have more symmetries.

Shift perspective downwards: The brain is very stable to seemingly dramatic rewirings, not just individual cells, so maybe it’s build on something larger? Take the internal view, that the mind is composed of many competing subprocesses vying for control, each one thought of as a hypothesis about which action to take. This creates a sort of evolutionary game for thoughts (spatiotemporal firing patterns), where a thought lives when it’s firing and is otherwise in stasis/dead. The individual thought dies but the mind salvages the remains and is better for it. The power of the mind is the ability to keep playing.

Both individual human minds and the human meta-minds (Kami) are turing complete, so they should be able to process the same sort of things. Digital immortality suggests that thoughts should be substrate independent. Humans are fragile, the meta-mind is immortal; yet we can live only through our own eyes. Is it possible or even meaningful to “blow up” a human mind, embedding it instead as a distributed entity, rather than porting it one-for-one to a computer? I suspect not totally, since the network topology of human civilization is very different from that of the brain (for one thing, there’s a lot more latency). However, the tales of “charismatic leaders” becoming Kami is tantilizing, and suggest that at the least, human minds can act as a seed for distributed entities.

Reflective Agents

TL;DR: A framework for reasoning about the relationship between present/future beliefs in the presence of self-modification, and about a theory of mind allowing AI to model human experience/utility. Currently just a braindump to be filled in with actual research.

To rephrase a bit from earlier:

An agent must process its sensory information/measurements (the “map”), to infer it’s actual status in the world (the “territory”). We often consider this as finding hidden variables, but sensory data is fundamentally very different from the universe it betrays. It’s not “about” anything until interpreted so, and could stand independently as a mathematical object.
For a fixed agent i, consider two categories.

The first, MAP_i, whose objects are sensory data, including internal states (“thoughts”), everything the agent i uses to reason about.

The objects of the second, UNIV, can be interpreted as predictive models over MAP_i, but should more accurately be interpreted as “things” (logical propositions, the possible physical states of the universe, etc). The interpretation as predictive models is recovered via fixing a forgetful functor (R_i : UNIV -> MAP_i) mapping physical states to sensor readings. The morphisms of UNIV are more complicated, but should (at least) take the form of a framed bicategory with vertical arrows as “logical” maps, vertical arrows as “stochastic” maps. A starting point for a concrete representation is the category Stoch (see for ex. arXiv:1205.1488). The causal structure of UNIV lets objects “roughly factor” as a product of smaller (i.e. lower entropy. See: arXiv:1106.1791) objects.

The real challenge is inferring reality from observation. Interpreting Occam’s razor as a maximum entropy principle, we can freely generate the “best approximation to reality” from sensory data by taking the maximum entropy model generating that data. Of course the functor (L_i : MAP_i -> UNIV) is uncomputable, but since it is characterized by a universal property, it is still well-founded as a function, so can be manipulated formally, and computable approximations can be found.

The details above remain to be worked out, but the motivation for this approach comes when we find the suitable adjoint pairs R_i and L_i:

Then they yield a monad (R_i . L_i : MAP_i -> MAP_i) given by inferring the world, and then restricting to sensory data. Intuitively, this is the “self reflection monad”. If we generalize to let the indices vary freely we instead get (R_j . L_i : MAP_i -> MAP_j), the “theory of mind” indexed monad.

Now consider the internal logic of this setting, sandwiched between two applications of the monad, i.e. the logic carried out by some agent. Then there are two operations, “reflection” and “reification” that let us shift down or up a level in the monad stack. (see: Andrzej Filinski’s work, ex. “Representing Monads”). Reflection lets us believe we experience what we believe we believe we experience (with some side effects in the form of disciplined effort), and reification lets us believe we experience what we actually experience (with some side effects in the form of introspective effort). Reflection is analogous to an AI self-modifying to believe/act the way it thinks it should believe/act, and reification is analogous to an AI making its beliefs explicit.

An interesting point here is that the monad is over MAP – you can’t talk about what is “true” abstractly, but only make concrete predictions of experience, implicitly utilizing UNIV as part of the context. This may be a flaw needing correction, though it is more compelling to consider it as a solution to inconsistency. By forcing each recursive step to occur between the monad, monadic effects witness each step, and maintain coinductive productivity. However, the classic notion of ‘denotation’ is lost in the general case (we cannot simply avoid Godel’s theorem). This might not be so bad though, as many propose (ex. J-V Girard) that interaction is primary, while denotation is incomplete. i.e. we may still be able to reason about and put bounds on the long term behavior of self-modifying A.I., even if the process never converges.


There is an equally concerning dual problem to convolution, and that is fragmentation. While convolution is about multiple ideas hiding in the same terminology, fragmentation is about a single idea hiding between multiple terminologies. This causes problems when we understand different aspects of an idea

For example, I know that eggplants are delicious, and know many recipes for them, but then I’m placed in a bizarre British cooking show, and told that today’s ingredient is aubergine. What is an aubergine??? I’ve heard it’s a nightshade, that sounds poisonous…

Notice there are actually two places for fragmentation to hide, in the hierarchy, Thingspace – Thoughtspace – Wordspace, it can hide at either junction. We could have multiple words referring to the same idea. This is analogous to having multiple textual aliases for some function in code. It’s annoying, but not that hard to solve: textual substitution is easy, we can just look up the definition and unify them. In practice, humans are very good at on-the-fly substitution/translation, so these sorts of fragmentation survive but don’t cause us too much trouble. The more difficult sort of fragmentation is at the Thingspace -Thougthspace link, analogous to two different algorithms that do the same thing. Unlike in Wordspace, where most translations are just direct substitution, and equivalent ideas are strictly equivalent, computing equality of generalized ‘things’ is Hard (think function equality, but harder). Part of this is because ‘things’ want to be weak; the way in which things are “equivalent” matters.

There’s a more general problem. Maybe two ideas don’t represent exactly the same thing, but they sorta seem to have related subcomponents. The “sameness” of thoughts/things is related to the degree to which their subcomponents match.

It seems to me that the optimal way to think (and the optimal way to design an AI.. hint hint), is to break all problems down into a complete set of primitive ideas, and then memoize all the ones that get used frequently. An immediate question comes up: How do we know when we’ve found a minimal concept? This business of seeing internal structure, and factoring concepts into their primary components is exactly the sort of thing category theory is good at, and so this problem has a solution. The notion of “universal property” exactly encodes what we mean by “minimal concept relative to some property”. Roughly, a concept has a “universal property” if every other concept with that property can be described in terms of it. For example, a “product thing” for things A and B has exactly the information to give you an A or a B via projecting. Any other thing that could give you an A or a B can be described as a special case. I encourage you to actually read about universal properties, because I can’t give satisfactory coverage here.

The whole point of this is that Fragmentation is a big problem whenever you’re trying to coordinate research between diverse fields of study, because it leads to the encryption problem. Fragmented ideas “want to” be united, in that they have higher entropy than their unification. When two ideas merge into a lower entropy state, the information difference (“counterfactual pressure”) is released as a sort of free energy and can be captured via “counterfactual arbitrage

Decisions, Pointwise

Lemma: Decisions can be modeled by some algorithm
Proof: Consider writing down the list of all your actions. There is some algorithm which generates this string.

This is one of the main premises of timeless decision theory. The idea being that to “choose” makes no sense, you always make the decision that subjectively maximizes your utility function, you just aren’t sure what that decision is until you make it. The feeling of “being in control” comes from the process of generating counterfactual scenarios and evaluating their utility until we find a maximum. Because of the way our brain processes memory and imagination, these imagined scenarios feel like they “could have been”, if only we had made a different decision.

One of the practical implications of this approach is some advice on a LW willpower thread – that, rather than decide our action in a particular scenario, we should instead choose as if we’re choosing the output of our decision algorithm. This supposedly makes it easier to maintain (for example) a diet. But why should that be the case?

Let’s take for granted that our decisions are determined by some biological algorithm, but which algorithm? When we make decisions, it feels like each one is a fresh scenario – we could choose to do anything we wanted, it just so happens that we choose predictably. This corresponds to a pointwise encoding of our decision algorithm – the algorithm which simply stores the literal sequence and prints it.

Take for granted that we have some goal (say, to not eat icecream). This goal imposes a pattern on our sequence of actions, and whenever there’s a pattern to a sequence, we can compress it into a lower entropy representation.

Let’s assume for a moment that willpower is a semi-finite resource, and that decision fatigue and ego depletion are real effects. More generally, we can just assume that there is some information (semi)conservation principle in the universe, which seems plausible but is not well understood. In this setting, an agent would want to make high impact but low-complexity decisions – it must make tradeoffs between being correct and conserving energy, so it makes sense to choose a simple decision rule whenever possible. However, the real world is not so simple.

Consider an iterated game between two bounded agents. If the amount of processing power they have is greatly unequal, the stronger agent will nearly always win, because a more complex strategy requires a more complex response. Shifting this perspective, we can consider any environment as an agent. Clearly, the rest of the world has more entropy than you, so in general, any simple strategy you come up with will be incomplete. The best you can hope to do is make decisions point-wise, considering all prior information every time you make one. One way to think of it, which may or may not really be how the brain works (but which is still useful because it is a bound on all computational systems), is that every time you commit to a simple rule, you spin up a subprocess dedicated to that task. In practice, you can’t just “decide” to commit to a rule (as many LW zealots would suggest), it’s more like forming a habit. So committing to a decision rule (spinning up a subprocess) costs energy, but takes much less energy each time it is invoked, because it efficiently compartmentalizes information as an in/out process. A cute example I use is to always pick what the other orders at a restaurant (or their second choice, if they have one, to improve variety). It took a bit of time to think this up and commit to it (not much!) but it saves a lot of thinking in the future, with pretty good results. The general principle is that the best you can do in an infinite game is to pick simple rules that capture “most” of the value. However, it also costs energy to modify a strategy, killing an old habit is often harder than starting one – it gains momentum.

Shift perspective again to consider not individuals but systems: corporations, religious or political groups, etc. These too can be considered as agents (in a much more salient way than “the environment” in general), with their own goals and  strategies. Such systems have a benefit we don’t: they can add computing power (members) relatively easily. This sort of cosmological expansion greatly magnifies the momentum of any existing strategies, as each new member is likely to inherit it. This pattern is commonly seen in the lifecycles of corporations: a lightweight company captures some market inefficiency with a new approach; it balloons up in success and becomes too rigid to adapt, either coasting their way into irrelevance or getting killed by a more agile competitor. If only they could stay at that sweetspot: powerful enough to afford risks, without being stagnated in bureaucracy.

The only groups that seem to resist it are those with “visionary” leaders that can synchronize the organization while still making quick decisions. But these groups are fragile, dying along with their leader. As soon as distributed decision-making is allowed, the system gains momentum, allowing persistence but preventing change.

When enough momentum is gained, weird things can happen, like practices that everyone (well, a sizable majority) hates but that never seem to change. What’s going on here? From the inside, as a single member of an organization, it can be maddening. You can see at a smaller scale than the kami you are a part of: what is “locally obvious” to you, may be too subtle, too expensive, for the organization to adopt. The microscopic reasoning is that simple, ambiguous ideas, can be fit into more people’s worldviews. Conflation (logically the “with” connective) plays a large roll here, allowing multiple ideas to be bundled together, choosing the best one for each convert. Conflation is dangerous though: it is not denotational, but operational, meaning that it does not preserve teleology so it’s end result is unpredictable, and will almost always take a mind of its own.

So you damn better seed your organization right the first time.

Natural Language is Conflated

For a long time I’ve been unsatisfied with natural language. At first, in my naïvety, I thought it was just English that was unsuitable, but it seems there’s a deeper problem that even engineered languages like esperanto don’t help with. People talk past each other, perpetually missunderstanding, driven apart by nitpicking correct phrasing.

When people asked me if I thought in words or pictures I blinked at them, “I think in thoughts, doesn’t everyone?”. I’m still not sure, but it seems that many do not notice the inadequacy of words. Maybe I’m exceptionally bad at phrasing, or maybe most people subconsciously restrict their thoughts to mirror their words so that translating isn’t a great big mess.

It remains to clarify: what makes up a thought, how does it differ from speech? First, what is similar? They’re both ways to pick out points in thingspace, (which you should understand as an injective map from thoughtspace/wordspace resp. to thingspace. Almost technically, there’s actually maps wordspace -> thoughtspace -> thingspace, and the map wordspace -> thingspace is the unique composition, so this suggests that thoughts are at least as powerful as words. Thoughts are anonymous, but words are named. Effectively this means that you can pull a word out of the aether by its name, but you can only pull up a thought by association; imagine words as organized like a dictionary, and thoughts organized by a graph, with edges as metaphors/links in associative memory etc. (this gives us some indication towards their structure as categories). The most apparent difference is that thoughts can refer directly to internal perception, whereas words are completely incapable of serializing sensory perception unless the receiver shares certain experience with you. Words can refer to the thought of sensation (which you assume they share), but not the sensation itself (which they almost certainly do share, being human).

While Wordspace and Thingspace are straightforward at least in concept, Thoughtspace deserves further explanation, as the mediation between the two. The objects of interest in Thoughtspace can be thought of as spatiotemporal firing patterns in the brain. Choosing the correct formalization for these firing patterns  is a hard scientific problem (where should we delimit the boundary of one pattern to the next?) so we’ll sidestep the problem for now by invoking the Mind/Space hypothesis, instead considering the subjective experience of firing patterns as ‘formations’ relative to a fixed observer. Oddly enough, the boundary problem disappears in this perspective: each thought obviously feels distinct, but they are connected by a similarity metric. The hypothesis is that these formations have some scientific characterization in the brain, which seems like a reasonable assumption. I suspect that thoughts only have a clean representation as formations relative to their originator, and that comparing the “objective” representation of thoughts as firing patterns is intractable in general, because of the encryption problem. So “formations” really are the more natural setting.

Thoughtspace is the world we really “live” in – it’s the only world we can actually experience, but through it we can know both Wordspace and Thingspace. The structure of Wordspace is logical. The objects of interest are strings of symbols (interpreted as logical propositions), and their connectives are algebraic manipulation. We can think of it as the union of all symbolic logics. This includes not just traditional logic, but also all constructive objects, arrangements of particles, etc. Since thoughts are “just” arrangements of discrete particles in the brain (but maybe not, if continuous quantum interactions turn out to be important for cognition), we could represent thoughts directly in Wordspace. However it makes more sense to think of it as a sort of completion of Wordspace, where the thoughts are distinct objects, that can be represented as potentially infinite arrangements of particles. The dynamics of thoughts-in-the-world, then, are inherited via projection (Thoughtspace -> Wordspace) into a finite brain. Going in the other direction, Wordspace can be embedded in Thoughtspace as those thoughts that can be written down (or otherwise serialized). This embedding is not natural though: the association between words and thoughts is learned (in a non-unique way) through experience – it includes many extensional elements such as feelings, objects (tigers, etc.), and temporal patterns. (A lot of communication errors can be cleared up by remembering that words don’t have independent meaning, trying to determine what thought the other party has in mind, rather than assuming they use the same mapping).

Thoughtspace inherits the logic of Wordspace but it also has it’s own fuzzy logic. Internally, these fuzzy arrows are the “feels like” connectives. For example: a bench is like a rocking chair; a bench is more like a rocking chair than it is like a snake. We can justify the statement that a rocking chair is like a bench, but it is not a logical statement. Rather, the “sameness” is a summery of how many contexts they are equivalent in. Ex: For “most purposes” a rocking chair is indistinguishable from a bench. This acts like a sort of probability space, with 1 being equivalent in all contexts and 0 being never equivalent, so we can start with the category Stoch as a rough approximation. We can also talk about the properties of contexts where they are similar, so the arrows have computational content in addition to weight. For example, rather than considering all contexts, a rocking chair is even more similar to a bench when restricted to contexts of sitting – they’re still a bit different in that sitting in a rocking chair feels a bit different – on the other hand, benches are often made of stone while rocking chairs almost never are, detracting from general similarity but not affecting sitting much at all. It’s all a bit handwavy but I hope the intention is clear – enumerating contexts and quantifying “sameness” is the key.

Careful though! This similarity space is in general NOT a consistent logic in the way you’d expect. My favorite illustration of this principle is the “numerologist’s folly”, exemplified by crackpot sites like this. What’s going on here? At the risk of sounding silly, I’ll point out the general flaw: While each step has high similarity, the contexts they consider are different, so care has to be taken when composing them. “New York City”, “Afghanistan”, and “The Pentagon” are all similar in their number of letters, and it’s true that New York was the 11th state of the union, but these two contexts are different! If we want to compose them as connectives, we have to conflate the contexts, so “New York City”, “Afghanistan”, “The Pentagon”, and “The state of New York” are all similar in the context “number of letters OR state order of joining the USA” which doesn’t seem like a very useful connection now does it? It’s a silly and obvious example, but this kind of conflated reasoning happens quite frequently in more subtle cases. Understanding the true structure of Thoughtspace will let you wield it’s power while avoiding such pitfalls.

The relation between wordspace and thoughtspace can be made sucinct by considering the quote:

When you draw a boundary around a group of extensional points empirically clustered in thingspace, you may find at least one exception to every simple intensional rule you can invent.

Then the embedding of wordspace into thoughtspace/thingspace are exactly the ones that can be described by intensional rules.

The major task now is to translate these subjective dynamics into something that can be quantified mathematically, measured externally, and communicated clearly between people.

I love it when crazy people say things I agree with


“Physicists suffer from a disorder of the mind that causes them to believe that sensible, temporal objects have more reality than eternal, immutable Platonic mathematical objects, and to place more trust in their senses than in their reason, more trust in the scientific method of ‘evidence’ than the mathematical method of eternal proof. ”
– Mike Hockney, Why Math Must Replace Science (The God Series Book 18)


Social Membranes, Genre Encryption, and Super-Secret Tech

There’s a common problem of good ideas being fragmented across genre. Recently, I’ve begun to consider it THE (non-obvious) problem in knowledge advancement.

Let’s take a computer science approach to make it more clear why it’s THE problem. The search for the truth is indeed a kind of search, so it makes sense that you’d want to use a search tree. The nice thing about search trees is that they can be traversed in parellel. If we view humanity as the program, humans are the  threads. How do we keep threads separate?

In a Von Neumann computer, keeping threads separate is trivial, but for humans, you’d have to prevent all communication. Now, it gets a bit complicated, because each human (thread) is also concurrent – a human can work on many different things at once. Now we can’t simply cut off all communication, because you might need to communicate with different groups for different tasks, so we have to be selective about which content can be communicated with whom. How does that work?

I’ll propose that one mechanism is via hijacking genre. Genre are a convenient heuristic for grouping information (sciencey sounding things tend to contain information about the natural world, religiousy sounding stuff tends to contain metaphysical poison etc.), but it’s nowhere near perfect. Viewed in this way, genre can be used as a sort of encryption by phrasing it in genre-specific lingo. The only way to decrypt it is to both understand the lingo, and buy into the genre. This last bit is important, it’s not enough to just understand what they’re saying, because even the absolute truth spoken in a monologue about Lord Xenu is likely to be dismissed anyway. I should point out that the encryption is not explicit, there is no original plaintext understandable by everyone. It is more so that the ideas exist in a different “basis”, and it works because computing idea equality is hard, so most people filter it out via the genre heuristic. Another way to think of it is as tuning into a radio frequency, or accessing an unlisted IP address; I use the encryption metaphor to highlight the fact that “tuning in” actually requires a deeper knowledge about how to interpret foreign lingo (often using the same words with different technical meanings) into your own.

The “team identity” effect causes these fuzzy differences in genre to self organize into sharper “social membranes”, which roughly approximate different search threads.

I approach all of this as a scientist, and so like to think of the science genre as the “main branch” because it contains sufficient epistemology to ~learn everything~. The end result from this perspective is that some useful gems for science get hidden in other genres. This post was motivated by two particular examples, which I’ll get into next time. But there’s so many that I’m beginning to collect some of the more exotic ones.