Entropy

Entropy is a weird thing. It is used in multiple seemingly unrelated contexts. None of them have intuitive definitions. I hope to simply explain what entropy is in these contexts and why they all deserve to have the same word.


Prerequisites: Someone expects you to understand entropy.

Originally Written: October 2017 – June 2018.

Confidence Level: Established science since the late 1800s to mid 1900s, depending on the section.



We live macroscopically.

Most of the dynamics of the world occurs at scales much smaller than we are. Atoms, molecules, cells, bytes, and many other things are much smaller and move much faster than we usually care to observe.

We only care about macroscopic properties of objects. The microscopic properties are hard to observe, so we usually forget about them when making macroscopic predictions of the world.

Occasionally, the microscopic properties become important. When this happens, we need to know how many different ways we can rearrange the microscopic structure without changing what we observe macroscopically.

The entropy of a macroscopic state tells you how many ways you can rearrange the microscopic structure without changing any of the macroscopic properties.

Entropy is a way of counting a large number of similar things.


Statistical Mechanics

Consider a lump of material: a piece of rubber, water in a cup, or the air in a room.

You can describe this material in one of two ways. You can describe it using macroscopic properties like volume, density, temperature, and pressure. Or you can describe it by tracing what each atom in the lump is doing: where is it, what are the forces on it, and what other atoms (if any) it’s interacting with.

Microscopic Picture

Let’s focus on the microscopic picture first.

Rubber

Rubber is made of polymers. A polymer is a long chain of atoms, The chain of atoms can be either stretched out or crumpled.

Figure 1: Long chains of atoms in rubber can be (a) crumpled or (b) stretched. Source.

There are many ways that a polymer can be crumpled, but only a few ways that it can be stretched out. To see this, consider a simple polymer consisting of four atoms. Each atom can either be above the previous atom (represented here as $\uparrow$) or below the previous one (represented here as $\downarrow$).

There are only two ways that the polymer can be stretched out:

  • $\uparrow \uparrow \uparrow \uparrow$
  • $\downarrow \downarrow \downarrow \downarrow$

There are eight ways that a polymer can be partially crumpled:

  • $\uparrow \uparrow \uparrow \downarrow$
  • $\uparrow \uparrow \downarrow \uparrow$
  • $\uparrow \downarrow \uparrow \uparrow$
  • $\downarrow \uparrow \uparrow \uparrow$
  • $\downarrow \downarrow \downarrow \uparrow$
  • $\downarrow \downarrow \uparrow \downarrow$
  • $\downarrow \uparrow \downarrow \downarrow$
  • $\uparrow \downarrow \downarrow \downarrow$

There are six ways that a polymer can be completely crumpled:

  • $\uparrow \uparrow \downarrow \downarrow$
  • $\uparrow \downarrow \uparrow \downarrow$
  • $\uparrow \downarrow \downarrow \uparrow$
  • $\downarrow \uparrow \uparrow \downarrow$
  • $\downarrow \uparrow \downarrow \uparrow$
  • $\downarrow \downarrow \uparrow \uparrow$

If these examples are hard to visualize, make a chain of paperclips and orient each paper clip in the direction of the arrows.

As the length of the polymer increases, the difference between the number of ways the chain can be stretched out and the number of ways the chain can be crumpled becomes even more dramatic. Real rubber has millions atoms in each of it chains and an even larger number of chains in a macroscopic piece.

Fluids

Air consists of a large number of molecules (mostly nitrogen and oxygen) moving around randomly and running into each other. The microscopic description of the air would include a list of each of the molecules in the air, each molecule’s current position, and each molecule’s velocity. Since the molecules are moving around randomly, they could end up in any position. They could even line up all in a row down the center of the room. However, there are only a few ways that the molecules can line up in a row. There are far more ways for the molecules to be scattered uniformly throughout the room.

Figure 2: There are more ways for atoms to be spread throughout the room than for the atoms to all be clumped in one corner. Source.

Water is a lot like air. The main difference is that water molecules interact with each other a lot more than air molecules. This doesn’t do much to change the picture of the microscopic motion of the water molecules I described above. There are far more ways for the water molecules to be uniformly scattered throughout the cup than there are for all of the water molecules to be in a row or clustered together against one side of the cup.

Macroscopic Picture

Now we can start thinking about how we would describe these lumps of material macroscopically.

One obvious thing to notice about the rubber is whether or not it is stretched or relaxed. If the rubber is stretched, then most of the individual polymers have to be either partially or fully stretched as well. If the rubber is relaxed, then most of the individual polymers will be either partially or fully crumpled.

One obvious thing to notice about the air in a room or the water in a cup is whether it has spread out uniformly or whether it is all clumped together in one corner of the room or cup. Luckily, this never happens or else water would be much more difficult to drink and we would continually be in danger of suffocating if all of our air moves to another part of the room.

Assume that all of the microscopic arrangements of the atoms are equally likely. This is true as long as the microscopic arrangements all have the same energy and the atoms can rearrange themselves much more quickly than humans (the macroscopic measuring device) move. Even when all of the microscopic arrangements are equally likely, the macroscopic arrangements are not equally likely to occur. We are more likely to see the macroscopic arrangements which correspond to the largest number of microscopic arrangements.

Since there are more ways to crumple a polymer than to stretch it out, we are more likely to see a relaxed piece rubber than a stretched one.

Since there are more ways to arrange the atoms in air or water almost uniformly than in a row, we are more likely to see air and water as being extremely uniform.

There are about $6*10^{23}$ (Avogadro’s number) atoms in a macroscopic object. The chance of getting all of them to stretch out together or gather in a corner of the room together is effectively zero.

It is possible to stretch rubber or to pump air from one part of a room to another. This requires effort because you are working against the statistics of very large groups of atoms.

Entropy in Statistical Mechanics

Entropy in statistical mechanics is a way of making this idea more precise. The entropy of a macroscopic object tells you the number of ways that you can rearrange the atoms in that object without changing any of its macroscopic properties. Since this number is unreasonably large, we define the entropy to be the Boltzmann constant times the log (base $e$) of this number.

If nothing is exerting any force on your piece of rubber, then it is most likely going to be relaxed. This is the macroscopic arrangement of the rubber which corresponds to the most microscopic arrangements of atoms, i.e. with the maximum entropy. If the rubber isn’t already relaxed, it will spontaneously retract to get back to the most likely macroscopic arrangement.

Figure 3: A comparison between a few of the atoms in low entropy ice and higher entropy water. Source.

This is the second law of thermodynamics.

If nothing else is acting on an object, it will automatically approach the macroscopic arrangement with largest entropy.


Information Theory

Information and Randomness

Information theory deals with random numbers. When we think of a random number, we can either think of the result of a single random experiment or we can think of the probability distribution of what would happen if we did many experiments.

Take a coin toss as an example. We could look at the result of a single coin toss, which might be tails. Or we could look at the probability distribution for the coin. If the coin is fair, then the probability distribution would be that the coin lands on heads half of the time and tails the other half of the time.

Entropy in Information Theory

When we think about entropy in information theory, the macroscopic property is the probability distribution and the microscopic property is the result of a single coin toss or other random experiment.

In this setting, entropy measures how much new information you get from the result of a single experiment with a known probability distribution. Once again, we’ll take the log of this number to make sure that it doesn’t get unreasonably big. This time, we’ll use log base 2 because computer scientists like to measure things in bits.

If every result of the random experiment is equally likely, the entropy measures how many possibilities there were to choose from. The entropy of a fair coin is $\log_2 2 = 1$ bit. The entropy of a fair eight sided die is $\log_2 8 = 3$ bits.

If there is an exact pattern that all of the results of the experiment follow, then the entropy is zero. For example, perhaps every coin toss lands on heads. If you want to fully describe a list of numbers made this way, you just need to write down the pattern. Each additional number in the list adds no additional information.

What if you have an unfair coin which lands on heads $70\%$ of the time and on tails $30\%$ of the time. This is somehow between the two previous examples, so the entropy should be between $0$ and $1$ bit. If you do the calculation, you’ll find that the entropy in this case in $0.88$ bits.

How does this relate to describing the list of numbers you get? You could obviously describe it by writing down every random result, but that is not the most efficient way. Instead, you could record the pattern: most of the time, the coin lands heads up. You then record each time the pattern fails. Since you only need to record the $30\%$ of the results that land tails up, you can be more efficient, so the entropy is lower.

Storing and Transmitting Data

Why does this matter?

Information theory is most commonly used in computer science to help you store or transmit information.

Suppose you have a black and white picture that you want to store on your computer without reducing its resolution at all. Each pixel is either black or white. We could get a completely random picture by tossing a fair coin for each pixel.

We could store this picture as a list (or an array) of $0$s for black pixels and $1$s for white pixels. In this case, the entropy for the picture would be the number of pixels – you would need to use one bit for each pixel.

Figure 4: Three pictures of 1000×1000 pixels, which are randomly chosen to be 0 (black) or 1 (white). Although there is a lot of microscopic information, these three pictures look the same macroscopically.

Most actual pictures are not completely random. There are usually patterns. One pattern might be that the upper left corner of the picture is entirely white. Another pattern might be that each pixel is more likely to be the same as the pixels next to it. One we take these patterns into account, the entropy for the picture is reduced.

We can store the picture more efficiently by recording the patterns and when the pixels violate the patterns instead of recording each pixel independently.


Chaos Theory

Chaos is our depiction of how complicated motion, structure, or behavior arises in deterministic systems.

Sensitive Dependence on Initial Conditions

One of the key features of chaos is sensitive dependence on initial conditions. Even if two trajectories start extremely close together, they will not remain close together for very long. The distance between them grows until their motion appears to be completely independent of each other.

Having sensitive dependence on initial conditions is not enough to have chaos. A ball at the top of a hill or a pencil balanced on its tip both have sensitive dependence on initial conditions. You can change which way the pencil falls or which direction the ball rolls by only the slightest disturbance. And yet we wouldn’t describe this motion as chaotic.

The reason why we don’t describe these as chaotic is because the long-time behavior is simple. The ball rolls to the bottom of the hill and stops. The pencil ends up laying on the table. Something chaotic should continue to move in complicated, unpredictable patterns after a long time.

For a chaotic system, there should be lots of qualitatively distinct patterns after a long time.

Entropy in Chaos Theory

In order to introduce the idea of entropy here, we need to specify something macroscopic and some microscopic things that we are going to count. The macroscopic thing is the chaotic system. The microscopic things are the different qualitatively distinct patterns that the motion could follow.

Entropy counts the number of qualitatively distinct patterns in this chaotic system. A chaotic system with higher entropy moves in more distinct patterns than a system with lower entropy.

Entropy thus provides a measurement of how chaotic the system is. To distinguish the entropy as it appears in a chaotic setting from other kinds of entropy, we refer to it as “topological entropy”.

The most familiar example of chaos is weather. In this example, the macroscopic thing is the climate. The microscopic things are different weather patterns. Some climates have more different weather patterns than others, so we would say that those climates have higher entropy and are more chaotic than others.

The weather patterns that you can observe depends on how long you watch the weather. You can’t say that it has been over 30 degrees every day for the last month if you only bought a thermometer a week ago. The number of distinct weather patterns increases as you observe the weather for longer periods of time. The rate at which the number of weather patterns increase is more interesting than the number of distinct week long weather patterns that occur.

For chaotic systems, the number of distinct dynamical patterns increases exponentially with time.

The topological entropy is defined as the growth rate of the number of distinct dynamical patterns.

I will describe all of this in more detail in my What Is Chaos? series.


Conclusion

Entropy is a way of counting a large number of similar things.

The details – what to count and how to count them – depend on what situation you use entropy in.

In statistical mechanics, entropy measures the number of ways that the atoms can be arranged without changing any of the macroscopic properties like length or temperature. If you leave something alone for a while, it is most likely going to end up with the macroscopic properties that have the most number of ways to arrange the atoms, i.e. the macroscopic properties with the highest entropy. This is the second law of thermodynamics.

In information theory, entropy measures the number of bits of information you need to describe a random process or a set of data. The more random the data is, the more bits are needed to store it, and the higher its entropy is.

In chaos theory, entropy measures the exponential growth rate of the number of distinct behaviors the chaotic motion has. If there are more different ways that the object can move, the chaos has higher entropy.

In all three situations, there multiple microscopic arrangements (atomic states, bits, or distinct patterns of motion) for every macroscopically equivalent situation (temperature of an object, probability distribution, or chaotic system)

Thoughts?