Entropy can separate randomness and certainties. In my previous post on Anomaly detection in univariate stochastic time series with spectral entropy, I show the magic of spectrum entropy. Now we will change the topic to anomaly detections in multivariate time series, and we are still trying to use entropy in the solution.
Multivariate time series have N time series variables. Each variable depends not only on its past values but also has some dependency on other variables. A typical anomaly is the interactions or dependencies of those variables are not expected for some data points. There should be no apparent anomaly for any single variable. Otherwise, the problem becomes univariate time series anomaly detection.
For instance, let’s talk about the weather. Temperature and humidity are generally negatively correlated. As the temperature increases, the humidity will decrease. So even if the temperature is pretty high, as long as the moisture stays in the lower zone, matching the temperature accordingly, you will feel hot. That’s a typical hot day, not an anomaly. I don’t know where you live, but you probably have experienced such days in summer: days with very high temperatures and humidity simultaneously, which made you feel exceptionally uncomfortable. That’s when the correlation of the time series of temperature and humidity went wrong: an anomaly in the multivariate time series.
Correlation and Structural Entropy
Pearson correlation is a simple measurement for time series interactions. For a multivariate time series with N variables, we will have C(N,2) correlations (choose two from N) or an N*N correlation matrix (upper triangle and lower triangle are the same).
We want to see how much info contains in this matrix. A common method is applying a clustering algorithm to mine the structural information in the matrix. Below is an example of using hierarchical clustering on a correlation matrix.

Next, we will apply the idea of entropy. Entropy can be defined based on the matrix’s structure, specifically, the number of significant clusters satisfying a certain threshold. Suppose all the variables are highly correlated. There will be only one cluster, and the correlation information contained in the multivariate time series is minimum. In the other extreme case, all the N time series are not correlated, there will be N clusters, and the correlation information contained in the multivariate time series is maximum. Researchers proposed this entropy and named it Structural Entropy.
Using structural entropy, we can transform a group of time series into a signal time series of entropy values, which captures the movement of the correlations of our interest. This transformation is particularly helpful if we want to monitor the correlations and catch anomalies. We will see that in the following two examples with real-world data.
Trail Dataset
We have the dataset of Seattle Burke Gilman Trail. It was generated by monitoring the pedestrian and bikers on the trail. The description says, "Wires in a diamond formation in the concrete detect bikes, and an infrared sensor mounted on a wooden post detects pedestrians. The counters also capture the direction of travel for both bikes and pedestrians". This public dataset is available for download from Kaggle.
The dataset has four columns: Ped South, Ped North, Bike North, and Bike South, meaning the number of people of pedestrians/bikers in the south/north directions. After aggregating for 2019 at the day level, we have a multivariate time series with four variables. Each time series has seasonality and trend. Plot 1 shows four density plots.

It’s expected that the number of pedestrians/bikers in both directions should be highly correlated. For instance, if more bikers show up on the southbound trail on a nice weather day, more bikers should show up on the northbound.
Plot 2 shows the data as multivariate time series.

If we scrutinize the plot, we can eyeball some interesting anomalies. For some days in August, while the number of pedestrians going south went up, the number of pedestrians going north went down. There are also similar anomaly regions for the bikers in August.
To demonstrate those anomalies, I also plot two rolling correlations with a window size of 10 days on the second y-axis on the right. These two grey lines clearly show the correlations changing in August.
Now let’s calculate structural entropy. There are two parameters involved. The first is the window size. Let’s still use 10 days. The second is the threshold for clustering the correlation matrix. Since we believe the numbers in both directions should be highly-correlated, let’s use 0.8. Plot 3 shows the final structural entropy curve.

We can see the spike in August, implying uncertainty of the correlations increases. It can tell us more information other than the anomaly in August. For most days from March to August, the value is 0. It implies the four variables are highly correlated. For the rest of the year, the value is round 1. It means there are two clusters in the matrix: Ped South and Ped North, and Bike South and Bike North. Generally, both directions are correlated, but pedestrians and bikers are not connected too much. Maybe the reason behind this is the temperature and weather.
In late August, the entropy can go up to 2. It implies there is no particular correlation among the four variables. What happened in August 2019? Should we treat it as a real anomaly?
Honestly, I don’t know the reason. I found ‘news’ after I searched "Seattle Burke Gilman Trail 2019 August": from August 16th to August 18th, the trail was closed for tree removal work. This event may cause chaos lasting longer than three days. People may detour after they see the work notification signs or even after the job is done. The locations of the work, the sensors and the signs may impact how people choose the routes.
But from the data perspective, August indeed looks abnormal. If we zoom in on the data for August, we will label August 4th, August 13th and August 24th as apparent anomalies.

From this example, we show that the structural entropy will catch the odd moments when the correlations of the multivariate time series change from predictable and deterministic into unpredictable and stochastic.
Next is an example of the other way of anomaly.
Market Dataset
The idea of structural entropy was first applied in the stock market analysis. In this example, anomalies will appear as certainties from randomness or correlations from a bunch of uncorrelated.
I downloaded the price history of around 400 leading Canadian stocks from Yahoo API. Then I calculated the daily return in percentage and obtained time series for each stock. Plot 5 shows 20 of those daily return time series.

Then we compute the structural entropy on all series and obtain a new time series on entropy shown in Plot 6. The most significant anomaly occurred around April 2020, when the market was severely hit by the covid 19. Around that period, almost all stocks moved in the same direction: lower. There was no uncertainty in the bear market, and the entropy was once closed to zero. In other periods, the entropy value varies between 4 and 8. That marks a regular market: some stocks/sections are rising, and some are falling.

Conclusion
Structural entropy provides a powerful tool to detect anomalies in multivariate time series data if the variables’ correlation is our concern. It can catch the moment when correlated time series become uncorrelated and vice versa. It’s easy to understand and compute. I recommend data scientists explore this tool for the task of anomaly detection in multivariate time series.
I will share more lessons learned for time series data in the future.
Thanks for reading. Have fun with your time series data.
References
Seattle Burke Gilman Trail Data (CC0: Public Domain)
Yahoo Finance API (BSD 3-Clause "New" or "Revised" License)






