*Entropy measures quantify the uncertainty in the EEG, which roughly equates to the possible configurations or their predictability. However there are many method and parameter choices that can fundamentally change the result and meaning.*

### The basic idea of entropy

Entropy as a concept first originated in the field of thermodynamics and a typical, physical interpretation of entropy is the disorder of a system, described by the distribution probabilities of molecules of gaseous or fluid systems. Shannon introduced this concept into the field of information theory and defined what is commonly known as statistical entropy,

H = -Σ p(x)log(p(x))

To make the concept of statistical entropy more intuitive, consider an experiment of picking a number from a set S={1, 2, 3} and the probabilities of picking each number. Say in one case we have the equal probabilities for picking each number as follows:

P(1) = 1/3, P(2) =1/3, and, P(3) =1/3.

Plugging these values to the equation above, the entropy H turns out to be 1.09.

If instead you could only pick the number 1 which means there is only possibility. This would have the following probabilities for which it turns out that H = 0.

P(1) = 1, P(2) = 0, and P(3) = 0,

Let’s consider one more case, where you can pick either 1 or 2 but not 3, then probabilities are

P(1) = 1/2 , P(2) = 1/2, and P(3) = 0,

Now it turns out that H=0.69 which is in between. So the entropy is maximal when the number of possibilities is highest and lowest when there is only one possibility. Another way of looking at this is that maximum entropy is when all the outcomes are equally likely and the degree of uncertainty or ignorance about the outcome is therefore the highest. When we have more certainty about the outcome, entropy is lower. When entropy is zero, this is the case of maximum information, where there is no need to carry out the experiment since we know the outcome is always 1!

So, entropy simply quantifies the uncertainty or ignorance in a statistical sense, which is equivalent to the possible configurations of possibilities of the system. This is a rather coarse interpretation, but it helps is forming some intuition about the concept.

**Entropy in EEG**

Applying the concept of entropy to time series like electroencephalography (EEG) is a way to quantify, in a statistical sense, the amount of uncertainty or randomness in the pattern, which is also roughly equivalent to the amount of information contained in the signal. Entropy measures in the time domain generally break up the signal into segments that are then compared for similarity either directly (in the time domain) or after some kind of transformation of the signal (such as the power spectral density). This typically depends on a few fundamental parameters – the length of segment chosen, the transformation of the signal (if any) and the distance metric or the way the segments are compared. Other types of entropy first transform the EEG signal into the frequency domain using different methods such as the fourier transform or more complex methods such as wavelets and looking at these transformed characteristics.

In making these various choices, each entropy measure makes some implicit assumptions about what aspect of the signal is meaningful or important to quantify. Some entropy measures look at the vector distance between segments in the time domain while others look at the transformed elements of the signal such as spectral content or an oscillatory element or wavelet. This of course has implications since if you happen to pick an irrelevant aspect of the signal, it may turn out to be meaningless. Consider for example if you wanted to compare words but you didn’t know that they were words and instead looked at how different the symbols of each letter were from each other – b may be more similar in shape to d than x but this will miss the point of language. The challenge is that *a priori* we don’t fully know which aspect is relevant in the EEG signal.

So where has this been useful for EEG? So far its most significant impact has been seen in anesthesia where most measures of entropy decrease with anesthesia suggesting that the signal becomes more predictable or repetitive as you go under. It has also been used to classify disease states such as schizophrenia. Of course there may be other aspects of cognition as well where entropy is relevant.

**Assumptions and issues**

Assuming that the aspect of the signal chosen is relevant, there are different issues to keep in mind. One is noise in the signal, which will have an impact on the entropy measure, particularly in time domain measures of shorter segment length. Another common assumption made in these measures is that the signal at hand is stationary i.e., that the statistical distributions of values in every window is identical if the data were divided into multiple windows. This is often not the case, particularly for EEG signals and computation of the power spectrum. A common way to mitigate this problem is to divide the data into multiple ‘stationary’ segments, but there is no real consensus on what should be the length of this segment for EEG data. Also, typically these methods require sufficiently vast amounts of data (which may violate ‘stationarity’ requirement), which make their application to experimental EEG data challenging. Although it has been shown that SampEn is more consistent and works well with shorter data length, low signal-to-noise ratio is still remains an issue.

To summarize, entropy measures are used to quantify in a statistical sense, the irregularity or uncertainty in a biological signal like EEG. This could be done either in time or frequency domain. The choice of using a transform and all the associated parameters make implicit assumptions about what is important in the signal and can result in very different results.

We provide below a tutorial of common entropy measures.

**Entropy in the time-domain**

Two of the popular choices in time domain include approximate entropy and sample entropy. These are used to quantify the amount of repeatability or predictability in the waveform patterns of an EEG signal.

Computation of approximate or sample entropy basically depends on three parameters – 1) window length – m (the length of signal you are using for comparison), 2) threshold for similarity – r and 3) length of data N. It basically works like this:

Given an EEG time series of N points, we create a set series of smaller segments of length m. Each such segment which we call **x**(i) here is nothing a block of data of length m, staring at time point ‘i’ in the EEG. We do this for each point *i* from the first to the last one possible where you can still have an m length segment. For an EEG signal of length N, there will be N-m+1 such segments (i.e. **x**(1), **x**(2), … **x**(N-m+1)). Then, we seek answer to this question: ** How similar is segment x(i) to the rest of the segments ?** Here similarity is defined using the threshold ‘r’, which is typically a distance measure between two segments of data (which is essentially the vector distance or the the distance between two m-dimensional points). If the distance is less than the threshold ‘r’, we give it a value of 1 (i.e., the segments are similar), else we give a value of 0. For each segment

**x**(i) we then compute a quantity C(i,r,m) which is the fraction of segments similar to segment

*i*.

*C(i,r,m) = (number of segments similar to block x(i))/(total number of segments).*

Now we compute the average of the logarithm of these similarity fractions for all the segments (the logarithm makes it so that very small fractions don’t dramatically skew the distribution).

*A(m,r) = 1/(N-m+1) * Σ log (C(i,r,m)) *(remember that N-m+1 is the number of segments)

Now we repeat the above procedure with segment length m+1 and analogously define B(m+1,r). Now approximate entropy is computed as

*AppEn(m,r) = A(m,r) – B(m+1,r)*

The lower the value of approximate entropy, the more regular or repetitive the signal is. Random or irregular signals tend to have higher values for approximate entropy. This is obvious because, if the signal is repetitive then there would not be much of a difference between block sizes m and m+1 when the above statistic is computed! Also note that in order to avoid the situation where there are no segments similar to **x(**i) which will result in log(0) (undefined!), approximate entropy counts self-matches, which can make it biased.

Sample entropy differs from approximate entropy in two ways –

1) It does not involve self-matching and

2) It does not do pairwise matching of segment similarity.

Instead one seeks to compute the statistic A(m,r) as

*A(m,r) = (number of x(j) vectors within threshold ‘r’ of x(i) vector) / (N-m)*

where j=1…N-m and j ≠ i. One can similarly defines B(m+1,r) and compute sample entropy as

*SampEn = -log (A(m,r) / B(m+1,r))*

Again, smaller values of sample entropy indicate repeatability in a signal and higher values indicate irregularity.

**Entropy in the frequency-domain**

The basic idea of computing entropy in frequency domain consists of transforming time-domain signal into frequency domain using standard tools like Fourier transform or more advanced methods like Wavelets. This gives rise to two different entropy measures – 1) Spectral entropy and 2) Total wavelet entropy

*Spectral entropy*

Spectral entropy requires the power spectral density (PSD) of an EEG signal , which is obtained via discrete Fourier transform (DFT). Give two frequency points of interest, lets say f1 and f2, the power spectrum between these frequencies is normalized and spectral entropy is computed a defined by Shannon entropy

SE = -Σ P_norm log(P_norm),

where the sum is taken over all the frequencies between f1 and f2. For a mono-frequency, periodic signal, SE will be close to zero (Think about the experiment we described about where P(X=1) =1! ) and for white noise, random signal the SE values will be much higher, as a random noise contains power in all frequencies (much like the experiment described above where all the outputs were equally likely).

See related post The Blue Frog in the EEG

*Total wavelet entropy*

One could use wavelets to decompose an EEG signal into multiple resolutions levels and compute relative energy for each level ‘j’ as

*p(j) = ( Energy at level j ) / (total energy of all levels)*

Now total wavelet entropy as given by Shannon is defined as

TWE = -Σ p(j)log(p(j))

where the sum is taken over all the decomposed levels. TWE measures the amount of order/disorder in a signal. Just as in the case of spectral entropy, a sinusoidal signal would have a TWE value very low, almost close to zero and for a random signal which has its energy spread over all the bands will have a high value of TWE.