Dennett on Semantic Information and Shannon Information

In his recent book,  From Bacteria to Bach and Back (2017), Dennett utilizes memetics to develop his ideas on consciousness. I haven't finished it yet. I paused after 120 or so pages, because I have concerns with his preliminary discussion of information.There are areas where I agree with Dennett, but I have problems with his discussion of both Shannon information and semantic information. The problems with the latter are unfortunately predictable given Dennett's longstanding approach to intentionality. 

I. Shannon Information

In a chapter entitled "What is Information?,” Dennett follows a seventy-year tradition of making a strong distinction between semantic information and Shannon information. Here is how Claude Shannon ("A Mathematical Theory of Communication," 1948) introduces the distinction: 

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.

Shannon's concern is not with the role a signal might play in a system of communication (how signals might "refer" or be "correlated"), which he calls semantic aspects or meaning. Instead, he is talking about the quantifiable properties of the signal itself--how "each possible selection" can be factored into the design of the system's operations. 

He goes on to explain, "If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely." In other words, if every possible signal in a communication channel is just as likely as any other signal, then the amount of information contained in any given message is equal to the number of possible messages (or any monotonic function of that number). In other words, the amount of information in a message is inversely proportional to the probability that that message will be received. The more probable the message, the less information it contains.

There is a common-sense way of understanding the basic idea. Imagine you and I are now sitting in a crowded restaurant. There is a constant din of noise: people talking, plates and glasses clinking, children laughing. You might tune in to some of those sounds from time to time, but there is a background noise that you don't consciously differentiate. You need to tune out those sounds, because they are quite loud and you want to focus on what I'm saying. 

You are able to tune out that background noise by (subconsciously) identifying the probability of its sounds. It's not that you don't hear the noise--it's still getting to your brain. It's just that it is all more or less equally probable to you, so it no longer provides information to you. As Shannon would say, it is so probable that the information content is practically zero. This can change at any moment--perhaps somebody will drop a glass, and the improbably sharp noise it creates will distract you for a moment. But since the vast majority of the noise in the restaurant is equally probable to you, you are able to ignore it and focus on the improbable sounds of my voice as I talk to you about information theory.

Another entirely different sound could be just as improbable, and just as distracting, as the sound of shattering glass. That doesn't mean that we would interpret it the same way. So the meaning of the sound (its semantic content) is quite distinct from its probability.

Shannon information is a measure of such probabilities, and it is the foundation of modern day information theory. However, the relationship between these probabilities and semantic content, or meaning, is not immediately obvious. Still, there is at least a prima facie difference between them. 

As my restaurant example illustrates, I think it is very likely that we subconsciously rely on calculations of Shannon information all the time. This is necessary for tuning out background noise (not just aural, but also visual, tactile, and so on), and also for identifying all kinds of objects (again, aural, visual, tactile, and so on). It therefore makes sense to me to think that, when neuroscientists or cognitive scientists talk about regions of the brain devoted to facial recognition, they are talking about Shannon information. Yet, Dennett claims otherwise--because, he says, this information is about something. For it to be Shannon information, he says, we would have to identify an encoding system that could give us "a natural interpretation of transmission in the nervous system that yields a measure of bandwidth or storage capacity" (Dennett 2017, 111).

Dennett does have a point: If we want a calculation of the probabilities involved in our facial recognition processing, we would need a measure along the lines of what he describes. We do not know enough about cognition to come up with those numbers, but that doesn't mean we aren't talking about processes that utilize Shannon information.

Dennett's claim involves another, more questionable, point: Shannon information, he says, is not about anything. It lacks intentional properties. But why should that be? When we calculate the probability of a noise-to-signal ratio, our calculation is about something. When we tune out the background noise in a loud room, those subconscious calculations are about something. We can, and should, distinguish between Shannon information and semantic information, but intentionality is not limited to one side of that divide.

Rather than think of Shannon information as a container or vehicle for semantic content (which Dennett suggests by using the analogy of a volume of liquid), we might rather think of Shannon information as an aspect or function of communication which may or may not be semantic. In that case, the difference between semantic communication and non-semantic (or, let's say, natural) communication is not marked by the presence or absence of Shannon information, but by the presence or absence of mental states.


II. Semantic Information

Dennett introduces his approach to semantic information as "design worth getting"(Dennett 2017, 115). The basic idea is that semantic information is how our behavior is in/formed by a design process--where design does not entail the presence of a designer (or so Dennett says).

Leaving aside the issue of design (which I will pick up in a minute), Dennett's idea is not new. Haugeland ("Semantic Engines: An Introduction to Mind Design," 1981) conceptualizes semantic information as ways of solving problems--as algorithms (with strict rules determining input/output relations) and heuristics (with loose rules for such relations).  Dennett ("True Believers: The Intentional Stance and Why It Works," 1978) and Haugeland both observe that any attempt to establish an algorithm for even the simplest of human behaviors very quickly runs into the problem of combinatorial explosion. The hope that human behavior could be perfectly modelled on a universal Turin machine, however conceivable in theory, is therefore said to be a practical impossibility.

Searle ("Minds, Brains, and Programs," 1980) famously objects that such a model is impossible in theory, arguing that the causal properties of the brain are an essential component of human cognition. Exactly what physical qualities are indispensable to human cognition is never explained, which has led some of his detractors to suppose he is attributing mysterious properties to the brain. Searle's objection, however, is interesting--in part because, yes, the brain still holds many mysteries. 

Searle's idea suggests that semantic content might require phenomenal consciousness, and thus phenomenal knowledge. This knowledge, according to Torin Alter ("A limited defense of the knowledge argument," 1998), is not communicable through language. It cannot be coded, and so cannot be programmed into a universal Turing machine. As Alter notes, this is not a challenge to physicalism "unless physicalism entails that all physical facts are discursively learnable." Still, Dennett has responded with arguments that all physical facts are, in principle, discursively learnable. I will not address those arguments here. I only want to observe that the issue is not a simple one. 

The seeds for these ideas can be traced back to Wittgenstein (Philosophical Investigations, 1953) and his view that successfully following a rule is not and cannot be fully determined by rules. To think otherwise would lead to an infinite regress. (This fact was observed even before Wittgenstein, but not developed into a systematic program, in Lewis Carroll's "What the Tortoise Said to Achilles" [1895].) Successful rule following (as evidenced by our "language-games") ultimately just comes down to a "way of life." Wittgenstein's insight is to this day often misunderstood as suggesting, for example, that we simply cannot come up with a valid definition of "playing a game," for example. Quite the contrary. Wittgenstein says we can come up with such definitions for particular purposes; the point is that we don't have them already. We don't need--and simply cannot need--such definitions to follow rules.

Wittgenstein's proposal led directly to meaning skepticism via Quine's thesis of the indeterminacy of translation and to Kripke's skeptical paradox. However, Wittgenstein never considered himself a skeptic about meaning, so either he was wrong about his own views or something has gotten lost in translation. (I think the latter.) I won't dwell on that here, but it is a nugget I would like to explore in the future. In any case, Dennett's view seems to be more-or-less in line with the Quinean approach.

I am attracted to some of the ways Dennett applies his idea of semantic information. For example, he discusses how our heuristics (he does not use that term, at least not yet; he is sticking with "information" here) can be designed to be at odds with our interests: "Much of it sticks because it is designed to stick, by advertisers and propagandists and other agents whose interests are served by building outposts of recognition in other agents' minds"(Dennett, p. 118, italics in original). Already Dennett is approaching memetics here, identifying some semantic information as invasive entities designed to use people as resources for other ends. I would add (and I'm going to presume that Dennett does add later) that nobody has to intentionally design information as propaganda for it to be propaganda. Propaganda--even large-scale, systematic propaganda--can evolve innocently, without anyone aware of any deception at all. The benefitting entity can be an institution or a nation, and therefore wouldn't even be conscious of how people were being disempowered by it. (I'm not saying all institutions or nations are disempowering. I'm rather saying that they can be.)

I view propaganda as a covert mode of communication which creates or maintains an imbalance of power by strengthening a belief system at the expense of its target audience. In other words, propaganda is defined by how it works against its target's interests and for the interests of another entity (be it a group, an individual, an institution, a nation, etc.). It does this by infiltrating the target's belief system. I say it is covert because the functionality of the communication is hidden. For propaganda to work, the target cannot see how their beliefs are being debilitated. This is commonly understood as the difference between the apparent meaning and the real meaning of propaganda: Propaganda can appear to be celebrating the freedom of the populace when it is really enslaving them. In general, this sort of dynamic defines propaganda: the appearance of "design worth having" masks the reality of who actually benefits from the information. It's never the target audience.

Like Dennett, I am tempted to describe this in memetic terms: When I refer to belief systems, that could be interpreted as habitats for memetic replication. Propaganda cultivates one memetic habitat (beliefs which promote whoever or whatever gains from the propaganda) while weakening its target audience’s ability to cultivate a memetic habitat which serves their interests. You don't need the memetic terminology to understand the general point, but I am interested in the possibility of thinking about this in memetic terms.

Despite the appeal of much of what Dennett says, I am ultimately frustrated by his discussion of semantic information. Quite predictably, he refuses to distinguish between semantic information processing and the "design" that occurs through natural selection in biological evolution. The adaptation of complex behavior through blind evolutionary processes is, in Dennett's view, just as much a design process as the development of cultural practices and advanced technology. This leads Dennett to make very unusual claims, such as that the behavior of bedbugs and mice relies on semantic information that has been programmed through natural selection.

Considering that semantics is generally defined in relation to linguistic meaning--as non-natural meaning (to use Grice's phrase)--Dennett's claim here is quite audacious. Is Dennett suggesting that DNA is a language with non-natural meaning?

He addresses the implied connection between semantic information and DNA as a matter of pragmatics (Dennett 2017, 122). However, his argument here does not seem coherent. He presents a hypothetical situation in which he bursts into a house and yells, "Put on the kettle!"(ibid.) Depending on his audience's prior knowledge, they will interpret him differently. If he is communicating his decision to finally steam open somebody else's mail, then a correct interpretation of his behavior would require a good deal of prior knowledge. Dennett makes the well-known point that the non-natural meaning--the semantic content--that is communicated is not determined by "looking closely at the structure of the signal as an acoustic wave or as an English sentence"(p. 123), but then he makes a strange leap. He says, "There is no code for all these different lessons learned"(ibid., italics are his).

On the one hand, Wittgenstein would agree. There could not possibly be a code that accounted for all of our learning. Its a fundamentally analog process that cannot be fully circumscribed by rules. Searle would be happy, too. But Dennett's  reasons, I presume, are different than Wittgenstein and Searle's. For Dennett (I am guessing), they cannot all be coded because there simply aren't enough resources to get the job done. That's why we rely on heuristics. But then, why think that the heuristics can't all be coded? He doesn't say, and this suggests a problem to me.

This is a crucial question for Dennett's project. He wants us to think of the wisdom of bedbugs as uncoded, pragmatic knowledge. This is another version of his concept of "free-floating rationales." He wants to say that, in our everyday acts of communication, we are dealing with the same free-floating rationales as bedbugs. The idea is that there is no fundamental difference--no philosophically interesting difference--between the communication of semantic information and the natural evolution of plant and animal behavior. They're both semantic!

This is again entirely predictable, since Dennett's central thesis--connecting all of his work on intentionality for the last half of a century--is that there is no fundamental difference between attributing mental states to people and attributing them to anything else: dogs, bacteria, computers, or even thermostats. The difference is only a matter of degree, not of kind. In his view, it's just a matter of how much predictive value we gain by adopting "the intentional stance." 

It's a mistake, he says, to think that there is something fundamentally different going on--something of profound philosophical importance--when we talk about people's beliefs or desires. So of course he will not see a fundamental difference between the semantic information we talk about with respect to human language, on the one hand, and any other behavior that we might find ourselves describing in teleological terms.

The only difference now is that Dennett isn't merely trying to justify all varieties of teleological talk. He is here opening the door for attributing linguistic content to all varieties of natural phenomena. He tries to resist this conclusion in weak terms: "I think, however, that we should resist the temptation to impose these categories from linguistics on DNA information transmission because they apply, to the extent that they do, only fitfully and retrospectively" (Dennett 2017, 124). Why only fitfully and retrospectively?

Dennett wants to undermine the fundamental difference between when we apply such notions as implication and justification to human behavior and when we apply them to Mother Nature, and yet he also says we should not apply them to Mother Nature. Why not? The only reason is apparently whimsy, which is no reason at all.

Either we should not resist the temptation to attribute linguistic competence to Mother Nature, or we should look very suspiciously on what Dennett is trying to do here. I'm voting for the latter.

A theory of semantic information should account for the fact that we must apply the intentional stance in some cases. The fact that we can regard Mother Nature or a thermostat or a lectern in teleological terms does not mean that we believe that those entities have minds. On the other hand, the fact that we must regard each other in teleological terms (our sanity depends on it!) is based on the fact that we believe we all have actual minds.

This is the problem with Dennett's intentional stance in a nutshell. If Dennett is right, then a complete physical description of the universe will not involve a description of intentional states. There won't be any minds in that picture. There will be apples and galaxies and asteroid belts, but not minds. It's not that he thinks minds are not real. He rather thinks they exist in an intermediate area between reality and fantasy; they occupy the phantom realm of free-floating rationales. He doesn't see any value in regarding mental states one way or the other (see "Real Patterns," 1991). But this doesn't work. If mind patterns are just more complex versions of the sorts of patterns that biologists uncover in nature, then why wouldn't a description of the physical world include them? It seems that if a physical description did not include them, it would be incomplete. Therefore, either physicalism is false or Dennett's view of the intentional stance is false--or perhaps both are false.

It seems to me that Dennett's notion of semantic information has merit, but not if he relies on a notion of "design without a designer." That notion is a useful shortcut in many situations, but we need to understand why talk of design is not just a shortcut in some cases. Only then will we have a viable theory of semantics and intentionality.

Popular posts from this blog

Introduction: the why and how of this blog

Notes on Kripke's "Naming and Necessity," Lecture I, Part 2 of 2 (pp. 54-70)