What is Chaos? Part VII: Partitions and Symbols

I chose the name The Chaostician for this blog. To live up to this name, this blog should be about chaos, in particular, what a physicist means when referring to ‘chaos theory’.

This eight part series is my explanation of chaos theory to a popular audience. Chaos is a mechanism that allows deterministic objects to behave unpredictably. I will explain why this happens and what kinds of predictions we can make when something is chaotic.

Those of you who are already experts in nonlinear dynamics know that there are many ways to approach chaos. In this series, I will focus on behavior on strange attractors in dissipative chaos. In particular, the goal is to explain the ideas underlying periodic orbit theory and trace formulas you can use to calculate expectation values. Other series will address other approaches to chaos.

I know I said that the series would end this week. That would make this post too long and it would contain too many ideas. So I’ve split it into two posts.

Part VII begins by stating our final goal. Since we cannot make predictions for a single trajectory if there’s chaos, we should try to make statistical predictions instead. We cannot even approach this less ambitious goal directly. Instead, we first divide the space the chaos moves through into qualitatively different regions, each of which is labeled by some symbol. We then record the motion as a list of symbols of the regions it goes through. Doing this allows us to finally get a good definition of ‘chaos’: motion where the number of qualitatively different behaviors increases exponentially in time. The ‘topological entropy’ helps us make that definition more precise. Next week we will finally reach our goal.


Part I: Introduction.
Part II: The Simplest Chaotic System.
Part III: Lyapunov Exponents.
Part IV: Strange Attractors.
Part V: Continuous Time.
Part VI: Stretch and Fold.

Part VII: Partitions and Symbols.
Part VIII: Periodic Orbit Theory.


Prerequisites: I hope this is understandable using high school level math, although I once again use a limit. If you don’t understand anything, please ask – in the comments or by emailing thechaostician@gmail.com.

Originally Written: August – November 2017.

Confidence Level: Established science since the mid twentieth century. Occasional philosophy mixed in.



Global Stationary Distribution

One of the primary goals of physics is to be able to determine the behavior of a system for a long time if you are given the initial conditions of the system. For a few special systems, we have figured out how to solve this problem exactly. For the overwhelming majority of systems, we can at best do approximations. Many of these approximations are done numerically: approximate the actual (complicated) system as a sequence of a large number of simple problems and have a computer do the rest of the work.

It is extremely difficult to determine the long-time behavior of chaotic systems. Trying to exactly solve a chaotic system is a futile endeavor. Approximations only work for a short amount of time. Making predictions for a longer time is thwarted by positive Lyapunov exponents. Whenever you make an approximation, you introduce a small error to your solution. This isn’t a problem as long as the error is small. Because chaotic systems have sensitive dependence on initial conditions, any errors grow exponentially in time. Even if your approximation is reasonable initially, after a long time, the approximate solution stops modeling the actual solution.

We can’t make an accurate prediction of what will happen to any initial condition after it spends a long time changing chaotically. But we could still try to see if we can make some prediction about the long time behavior.

What are the best predictions we can make for the long-time behavior of a chaotic system?

The best predictions that we can make are statistical. Although we can’t say exactly where the object will be at a certain time, we might be able to predict the probability of the object being at each location.[1]This is actually a probability density, not a probability. The probability of finding the object at any point after a long time is called the global stationary distribution.

There is a single global stationary distribution as long as the system only has a single attractor. After a long time, the system will settle down into only one type of behavior, regardless of where it starts. If there are multiple basins of attraction, then each will have its own stationary distribution. To determine even the statistical long-time behavior, you have to know which basin of attraction you are starting in.

Once you have the global stationary distribution, you can calculate the object’s average position, the standard deviation of the object’s position, and any other statistical “observables”.[2]A statistical observable is any kind of an average you can take using a probability distribution.

Unfortunately, it is extremely difficult to calculate the global stationary distribution if there’s chaos. After a long time, the motion will be concentrated near the (fractal) strange attractor. The approximations you do to make the computation possible alter or destroy the small scale fractal structure of the attractor. Moreover, any location that was once the corner of the fold has a spike in probability. Since there have been many folds after a long time, there are many spikes. Two nearby locations can have dramatically different probabilities, even if they are both on the strange attractor. All of this small scale structure can be destroyed by the approximations needed for computation.

Nevertheless, I will show a numerical approximation for the global stationary distribution of the Hénon attractor.

Figure 1: The global stationary distribution for the Hénon map. The strange attractor can be seen in the $x,y$ plane. The probability of finding the object at each location after a long time is shown vertically (labeled as $\mu$). Too much approximation will altar or destroy both the fractal structure of the attractor and the narrow spikes in probability along the attractor. Source: chaosbook.org, Figure 19.5 (Chapter: Transporting Densities).

Even though the global stationary distribution is prohibitively difficult to calculate for most chaotic systems, we don’t need to abandon trying to make statistical predictions for long time chaotic motion.


Partitioning the Space

The approximations necessary to compute any long time chaotic behavior alter or destroy the small scale structure of the attractor and the global stationary distribution. If the small scale structure is going to be destroyed anyway, we might as well do it in an intelligent way.

Recording the exact location of the object at every time requires a lot of information. Instead, we can partition the space into qualitatively different regions. We then only record which region the object is in at every time.

What partition to choose is somewhat of an art. You want to divide the space that the object moves through into as few pieces as possible, while still capturing all the qualitative differences between the types of motion. Let’s look at some good partitions for our examples:

  • The three disk pinball has an obvious partition: the three disks. To record a trajectory, simply list which disks the pinball bounces off of in order.
  • The space for the logistic map is a line between 0 and 1. We partition it into the region $ \geq \tfrac{1}{2} $ and the region $ < \tfrac{1}{2} $. We do not choose this partition because it divides the space into equal pieces. We choose this partition so that the boundary between the two regions is at the top of the parabola. If the top were shifted to one side (perhaps by adding a cubic term), the partition should be shifted as well.
  • The Hénon map, much like the logistic map, stretches and folds according to something of the form: $x_t = \ \dots \ {x_{t-1}}^2$. The partition is based on which side of the fold the current location is on. For the Hénon map, partition the two-dimensional space into the region $x \geq 0$ and the region $x < 0$.
  • We also partition the Rössler attractor into two regions based on which side of the fold the current location is on. It is easiest to make this partition for a Poincaré section near $\theta = \tfrac{\pi}{2}$ because it is easiest to see the fold there.
Figure 2: A Poincaré section at $\theta = \tfrac{\pi}{2}$ for the Rössler attractor. The attractor is partitioned into two regions determined by the fold. The lower one (L) is shown in red and the upper one (U) is shown in blue.
  • The partition for the Lorenz system is the two wings of the “butterfly”.

Symbolic Dynamics

Now we have our partition.

When we calculate some chaotic trajectory, we don’t need to record the position at every time. Instead, we can just record a sequence of symbols for which part of the attractor the object is moving through. This requires much less memory and gives us something simpler to analyze.

For a discrete time system, a new symbol should be recorded at each time step. For a continuous time system, a new symbol should be recorded each time the trajectory loops around again. It could be recorded each time the trajectory crosses a Poincaré section or you could just observe the path by eye. Figure 3 shows an example of discrete time symbolic dynamics and Figure 4 shows an example of continuous time symbolic dynamics.

Time:Location:Symbol:
10.100000000000000…L
20.360000000000000…L
30.921600000000000…R
40.289013760000000…L
50.821939226122650…R
60.585420538734197…R
70.970813326249438…R
80.113339247303761…L
90.401973849297512…L
100.961563495113813…R
110.147836559913285…L
120.503923645865164…R
130.999938420012499…R
140.000246304781624…L
150.000984976462315…L
160.003936025134734…L
170.015682131363489…L
180.061744808477550…L
190.231729548414484…L
200.712123859224412…R
210.820013873390967…R
220.590364483349242…R
230.967337040596098…R
240.126384361947522…L
250.441645420010560…L
260.986378971977024…R
270.053741982474292…L
280.203415127176100…L
290.648149652848124…R
300.912206721443921…R
310.320342475185814…L
320.870892695110561…R
330.449754434854499…L
340.989901532732837…R
350.039985952904069…L
360.153548305897691…L
370.519884894614559…R
380.998418363864672…R
390.006316538249856…L
400.025106558377575…L
410.097904876416033…L
420.353278046359976…L
430.913890673280218…R
440.314778042286590…L
450.862771305523247…R
460.473587919555836…L
470.997209608026444…R
480.011130422744760…L
490.044026145737131…L
500.168351376914654…L
Figure 3: The logistic map with $r = 4$, starting at 0.1. Fifty time steps are shown. The space is partitioned into two parts: $L$ when $x < \tfrac{1}{2}$ and $R$ when $x \geq \tfrac{1}{2}$. The symbolic dynamics are:
LLRLRRRLLRLRRLLLLLLRRRRLLRLLRRLRLRLLRRLLLLRLRLRLLL

Figure 4: An orbit of the Rössler system starting on the strange attractor. As time increases from 0 to 75, the color of the orbit changes from red to blue. The attractor is partitioned into two parts: $U$, the upper part of the attractor which folds over, and $L$, the lower part of the attractor. In order to be in $L$, the entire circle must remain in the flat part of the attractor. Instead of recording the entire orbit, we can just record the symbolic dynamics:
U L U U U U U U U L U L U (ULUUUUUUULULU)

The list of L’s and R’s in the symbolic dynamics for the logistic map with $r=4$ looks completely random. There are just as many R’s as L’s and you can find sequences of either L or R of arbitrary length in the list. Indeed, the logistic map is a decent random number generator. After running the logistic map for a long time, the numerical errors amplified by the positive Lyapunov exponent, overwhelm whatever information was contained in the initial condition. Any finite sequence of R’s and L’s will eventually be found in the chaotic symbolic dynamics.

The symbolic dynamics for the Rössler attractor are less random. In particular, there are far few L’s than U’s and the L’s are all isolated from each other. If you run the Rössler system for long enough, you will find sequences of two L’s together, but no longer sequences. U’s can appear in arbitrarily long sequences. You will also find that a pair of L’s is always followed by a UL. You never see a pair of L’s followed by a pairs of U’s.

We call this sort of pattern a “pruning rule”. For most chaotic systems, not all sequences of the symbolic dynamics are possible.


Topological Entropy

The intuitive idea here is that something is chaotic if it moves in lots of different ways. Topological entropy is a way of measuring the number of different ways something can move to determine if the motion is chaotic. This will give a clear and precise definition for whether something moves chaotically.

Entropy is a way of counting a large number of similar things. The entropy of something macroscopic tells you how many ways you can rearrange the microscopic structure without changing any of the macroscopic properties.[3]See also my post on Entropy.

For chaos, the macroscopic is the chaotic system – for example, a strange attractor. The microscopic are the various distinct patterns something can follow as it travels around the attractor. This type of entropy is called “topological entropy”. It provides a measurement for the number of distinct trajectories that exist in the chaotic system.

The number of distinct trajectories grows as you watch the chaos for longer. The rate at which the number of patterns increases is more interesting than the number of distinct patterns that occur after some fixed about of time.

For chaotic systems, the number of distinct dynamical patterns increases exponentially with time. The topological entropy is defined as the growth rate of the number of distinct dynamical patterns.

We use the partitions and symbolic dynamics described above to determine whether two patterns are distinct. For some partition, two trajectories are not distinct up until time $T$ if their symbolic dynamics up until time $T$ are the same. Two trajectories are distinct in time $T$ if their symbolic dynamics up to time $T$ are different. Call the number of distinct trajectories, which depends on how long you watch the dynamics, $N_T$.

We expect that the number of distinct trajectories will grow exponentially with time. If the partition has $a$ components and all possible symbolic dynamics are allowed, then the number of distinct trajectories would be: $$ N_T = a^T = e^{T \log(a)} \, .$$ Each additional time step (or return to the Poincaré section) adds one more letter to the list and there are $a$ possibilities for that letter. Chaos often has pruning rules, so that actually number of distinct trajectories would be smaller than this.

For any chaotic system, we expect that the number of distinct orbits will grow like: $$ N_T \sim e^{h T} $$ $h$ is the topological entropy.

Now we do all of the same tricks that we used in the definition of the Lyapunov exponent. Take the $\log$ of both sides and divide by $T$ to isolate $h$. Because there are often short-time transients that aren’t significant to the long-time motion, take the limit as $T$ approach infinity. $$ h \geq \lim_{T\rightarrow\infty} \frac{\log(N_T)}{T} $$ This definition can be calculated for any partition for any thing that moves. As long as $h$ is positive, we know that the number of qualitatively distinct motions grows exponentially, so the motion is chaotic. If $h$ is zero, then the number of qualitatively distinct motions does not grow, so the motion is not chaotic.

There is one more problem that could arise. Maybe we chose a bad partition. If our partition isn’t good enough, then we might be missing some of the interesting dynamics. If this happens, then we will underestimate the topological entropy. There are other distinct patterns that we are unable to see using our partition. If we pick a partition first, then do the calculation, we only know that the topological entropy must be at least this large. Taking a finer partition might make the calculations more tedious, but it could give us a better answer. The definition of topological entropy should be calculated for the partition that gives the maximum entropy. For the mathematicians reading this, I should be more precise and say that we take the supremum of the entropies calculated for all partitions.

The definition of topological entropy is: $$ h = \sup_{\text{partitions}} \ \lim_{T\rightarrow\infty} \frac{\log(N_T)}{T} $$

Topological entropy is my favorite way of measuring how chaotic something is.

In all of non-chaotic systems that we have looked at, there are only a few things that happen after a long time. They tend to settle down onto one or a few fixed points or periodic orbits. There are also systems that have many periodic orbits, but which are not chaotic. For example, a frictionless pendulum’s motion is periodic, regardless of what amplitude you start it with. Although this never settles down onto a few orbits, the number of orbits does not grow as you wait for longer and longer times, so the topological entropy is zero.

All of the chaotic systems that we’ve considered have positive topological entropy.

After the end of the periodic doubling cascade for the logistic map, there are (unstable) periodic orbits whose period is any power of two. The number of distinct motions grows – you could either be staying on one of the short periodic orbits or you could be on one of the long ones. Most orbits for the logistic map with these parameter values are not periodic, so this underestimates the number of possible patterns in the motion, but it does show that there is positive topological entropy here. When $r=4$, all possible symbolic sequences of L and R are possible for the motion. The partition has $a=2$ components, so the number of distinct trajectories grows as: $$ N_T = 2^T = e^{T \log(2)} $$ The topological entropy for $r = 4$ is $h = \log(2)$.

Motion on the Rössler attractor also has positive topological entropy. Like the logistic map, its natural partition has two components. Because of the pruning rule, not all sequences of L and U are possible. There are fewer distinct long time behaviors, so the topological entropy is smaller than the topological entropy of the logistic map with $r=4$.

The pinball machine’s natural partition has three components corresponding to the three disks. The pruning rules constrain it significantly. You can’t return to the same disk without first hitting another disk. At each time in the motion, the next symbol has two possibilities, so the number of distinct motions is $2^T$. The topological entropy is $\log(2)$.

The topological entropy for the Hénon map and the Lorenz system (and also the logistic map) depend on the values of the parameters. The parameters are $r$ in the logistic map, $a$ & $b$ for the Hénon map, and $24.5$ & $\tfrac{8}{3}$ for the Lorenz system. The topological entropy often changes discontinuously as you change the parameters when bifurcations within the chaos produce or destroy structure.

Motion on a strange repellor can also have positive topological entropy. As long as you remain on the strange repellor, there are more and more distinct patterns that you can move in. If you restrict your calculation to motion that is not on the repellor and instead approaches a fixed or periodic point (which is almost all initial conditions), you will measure the topological entropy in this region to be zero. We saw this for the logistic map with $r = 3.835$, which has transient chaos, but then settles down onto a periodic orbit of period three. If you make sure to include the strange repellor that causes the transient chaos, then you will find that the topological entropy is positive. If you aren’t careful, you might miss the transient chaos and falsely calculate the topological entropy to be zero.

Topological entropy is a good measurement for chaos. If the topological entropy is not zero, then the number of distinct patterns of the motion grows exponentially in time. This corresponds well with our intuitive notion of chaos – something that has many unpredictable behaviors. If something has a larger topological entropy, it has more possible behaviors, and so is more chaotic.

Next post.

References

References
1 This is actually a probability density, not a probability.
2 A statistical observable is any kind of an average you can take using a probability distribution.
3 See also my post on Entropy.

Thoughts?