{"id":157980,"date":"2023-03-06T19:04:48","date_gmt":"2023-03-06T19:04:48","guid":{"rendered":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/"},"modified":"2025-01-18T05:22:29","modified_gmt":"2025-01-18T05:22:29","slug":"sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2","status":"publish","type":"post","link":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/","title":{"rendered":"Sentiment Analysis and Structural Breaks in Time-Series Text Data"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"124893\" data-has-transparency=\"false\" style=\"--dominant-color: #124893;\" loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1707\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg\" alt=\"Photo by Adam \u015amigielski on Unsplash\" class=\"wp-image-157981 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg 2560w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-300x200.jpg 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-1024x683.jpg 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-768x512.jpg 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-1536x1024.jpg 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-2048x1366.jpg 2048w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@smigielski?utm_source=medium&amp;utm_medium=referral\">Adam \u015amigielski<\/a> on <a href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n<p class=\"wp-block-paragraph\">Text data contains lots of qualitative information, which can be quantified with various methods, including sentiment analysis techniques. These models are used to identify, extract and quantify emotions from text data and have wide use in business and academic research. Since the text is often recorded on a time-series basis, text datasets might display structural breaks as the quantitative information change due to many possible factors.<\/p>\n<p class=\"wp-block-paragraph\">As a business analyst, measuring the changes in customer perceptions of a particular brand might be one of the key tasks. In the research role, one can be interested in the shifts in Vladimir Putin&#8217;s public statements over time. <strong><a href=\"https:\/\/pypi.org\/project\/arabica\/\">Arabica<\/a><\/strong> is a python library specifically designed to deal with similar questions. It contains these methods for exploratory analysis of time-series text datasets:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>arabica_freq<\/strong> for descriptive n-gram-based exploratory analysis (EDA)<\/li>\n<li><strong>cappuccino<\/strong> is a <strong>** visualization module including _<\/strong>heatma<strong><em>p, <\/em><\/strong>word clou<strong><em>d, and <\/em><\/strong>line plo**_t for unigram, bigram, and trigram frequencies<\/li>\n<li><strong>coffee_break<\/strong> enables sentiment and structural break analysis.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">This article will introduce you to <strong>Coffee-break,<\/strong> the sentiment and structural breaks analysis module. Read the <a href=\"https:\/\/arabica.readthedocs.io\/en\/latest\/index.html\">documentation <\/a>and these tutorials for the first two methods: <a href=\"https:\/\/towardsdatascience.com\/text-as-time-series-arabica-1-0-brings-new-features-for-exploratory-text-data-analysis-88eaabb84deb\">arabica_freq<\/a>, <a href=\"https:\/\/medium.com\/towards-data-science\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\">cappuccino<\/a>.<\/p>\n<p class=\"wp-block-paragraph\"><em><strong>EDIT Jul 2023<\/strong>: Arabica has been updated. Check the <strong>documentation<\/strong> for the full list of parameters.<\/em><\/p>\n<h2 class=\"wp-block-heading\">2. Coffee_break: algorithm and structure<\/h2>\n<p class=\"wp-block-paragraph\">The _coffee<em>break<\/em> module has a simple backend architecture. Here is schematically how it works:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"ecf1e0\" data-has-transparency=\"true\" style=\"--dominant-color: #ecf1e0;\" loading=\"lazy\" decoding=\"async\" width=\"4964\" height=\"4524\" class=\"wp-image-452160 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1MTZabGQYv7BI4wJQJE9EpQ.png\" alt=\"Figure 1. Coffe_break architecture. Source: draw.io\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1MTZabGQYv7BI4wJQJE9EpQ.png 4964w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1MTZabGQYv7BI4wJQJE9EpQ-300x273.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1MTZabGQYv7BI4wJQJE9EpQ-1024x933.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1MTZabGQYv7BI4wJQJE9EpQ-768x700.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1MTZabGQYv7BI4wJQJE9EpQ-1536x1400.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1MTZabGQYv7BI4wJQJE9EpQ-2048x1866.png 2048w\" sizes=\"auto, (max-width: 4964px) 100vw, 4964px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Coffe_break architecture. Source: draw.io<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Raw text is cleaned with <a href=\"https:\/\/pypi.org\/project\/cleantext\/#description\">cleantext<\/a> providing punctuation and numbers cleaning. Stop words (the most common words in a language with no significant meaning) are not removed in the <strong>pre-processing step<\/strong> because they don&#8217;t negatively affect sentiment analysis. However, with <code>skip<\/code> parameter, we can remove a list of additional stop words or unwanted strings (words or word sequences) that will not impact sentiment analysis.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Sentiment analysis<\/strong> implements <strong>VADER (<\/strong><em><a href=\"https:\/\/ojs.aaai.org\/index.php\/ICWSM\/article\/view\/14550\">Valence Aware Dictionary and Sentiment Reasoner<\/a>)<\/em>, __ a universal pre-trained sentiment classifier [1]. It was trained on social media data from Twitter, but it also works very well on other types of datasets. <a href=\"https:\/\/towardsdatascience.com\/the-most-favorable-pre-trained-sentiment-classifiers-in-python-9107c06442c6\">My previous article<\/a> offers a more detailed introduction to the model and coding in Python.<\/p>\n<p class=\"wp-block-paragraph\">Coffee_break uses VADER&#8217;s compound indicator for sentiment evaluation. The aggregate sentiment is calculated as:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"ebebeb\" data-has-transparency=\"false\" style=\"--dominant-color: #ebebeb;\" loading=\"lazy\" decoding=\"async\" width=\"724\" height=\"56\" class=\"wp-image-452161 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/18eCBjBfDlBoymxy4w4i8nw.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/18eCBjBfDlBoymxy4w4i8nw.png 724w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/18eCBjBfDlBoymxy4w4i8nw-300x23.png 300w\" sizes=\"auto, (max-width: 724px) 100vw, 724px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">where <em>t<\/em> is the aggregation period. The aggregate indicator ranges [-1: 1] having a positive sentiment closer to 1 and a negative approaching -1.<\/p>\n<p class=\"wp-block-paragraph\">The aggregated sentiment creates a time series displaying some degree of variability over time. <strong>Structural breaks<\/strong> in the time series are identified with the <em>Fisher-Jenks algorithm,<\/em> or <em>Jenks Optimisation Method<\/em> originally proposed by George F. Jenks [2].<\/p>\n<p class=\"wp-block-paragraph\">It is a clustering-based method designed to find the best arrangement of values into different classes (clusters). The <code>jenks_breaks<\/code> function implemented with <code>jenkspy<\/code> library returns a list of values that correspond to the limits of the classes. These structural breaks are in the plot marked as vertical lines and visually indicate the breakpoints in the time series of text data.<\/p>\n<p class=\"wp-block-paragraph\">The <strong>implemented libraries<\/strong> are <a href=\"https:\/\/pypi.org\/project\/matplotlib\/\">Matplotlib <\/a>(visualization), <a href=\"https:\/\/pypi.org\/project\/vaderSentiment\/\">vaderSentiment <\/a>(sentiment analysis), and <a href=\"https:\/\/pypi.org\/project\/jenkspy\/\">jenkspy<\/a> (structural breaks). <a href=\"https:\/\/pypi.org\/project\/pandas\/\">Pandas <\/a>and <a href=\"https:\/\/pypi.org\/project\/numpy\/\">Numpy <\/a>make the processing operations.<\/p>\n<h2 class=\"wp-block-heading\">3. Use case: Twitter sentiment analysis<\/h2>\n<p class=\"wp-block-paragraph\">Let&#8217;s illustrate the coding on <a href=\"https:\/\/www.kaggle.com\/datasets\/gpreda\/pfizer-vaccine-tweets\">Pfizer Vaccine Tweets dataset<\/a> collected using Twitter API. The data contains 11 000 tweets about Pfizer &amp; BioNTech vaccine posted between 2006 and 2021. The dataset is released under the <a href=\"https:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/\">CC0: Public Domain<\/a> license according to <a href=\"https:\/\/developer.twitter.com\/en\/developer-terms\/commercial-terms\">Twitter developer policy<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">The data contains a lot of punctuation and numbers and needs cleaning before any further steps:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"ededed\" data-has-transparency=\"false\" style=\"--dominant-color: #ededed;\" loading=\"lazy\" decoding=\"async\" width=\"1488\" height=\"271\" class=\"wp-image-452162 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/13_FJePuErWs6IIdjDPfOtw.png\" alt=\"Figure 2. Pfizer Vaccine Tweets dataset\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/13_FJePuErWs6IIdjDPfOtw.png 1488w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/13_FJePuErWs6IIdjDPfOtw-300x55.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/13_FJePuErWs6IIdjDPfOtw-1024x186.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/13_FJePuErWs6IIdjDPfOtw-768x140.png 768w\" sizes=\"auto, (max-width: 1488px) 100vw, 1488px\" \/><figcaption class=\"wp-element-caption\">Figure 2. Pfizer Vaccine Tweets dataset<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The _coffee<em>break<\/em> method&#8217;s parameters are:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def coffee_break(text: str,                 # Text column\n                 time: str,                 # Time column\n                 date_format: str,          # Date format: &#039;eur&#039; - European, &#039;us&#039; - American\n                 time_freq: int =&#039;&#039;,        # Aggregation period: &#039;Y&#039;\/&#039;M&#039;\n                 preprocess: bool = False,  # Clean data from numbers and punctuation\n                 skip: [] ,                 # Remove additional stop words\n                 n_breaks: int =&#039;&#039;          # Number of breaks: min. 2\n)<\/code><\/pre>\n<h2 class=\"wp-block-heading\">3. 1. Sentiment analysis over time<\/h2>\n<p class=\"wp-block-paragraph\">Our data has a 15-year time span covering the Covid-19 crisis. Changes in the public mood about vaccination, fake news about vaccines, and many other factors are expected to lead to significant variations in sentiment over time.<\/p>\n<h3 class=\"wp-block-heading\">Coding<\/h3>\n<p class=\"wp-block-paragraph\">First, import _coffee<em>break<\/em>:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from arabica import coffee_break<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Arabica reads dates in <strong>US-style (<\/strong><em>MM\/DD\/YYYY)<\/em> and <strong>European-style<\/strong> (<em>DD\/MM\/YYYY)<\/em> date and datetime formats. The data is pretty raw and covers 15 years. Displaying sentiment by month is, therefore, not very helpful.<\/p>\n<p class=\"wp-block-paragraph\">Let&#8217;s clean the data and aggregate sentiment by year with this code:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">coffee_break(text = data[&#039;text&#039;],\n             time = data[&#039;date&#039;],\n             date_format = &#039;eur&#039;,  # Read dates in European format\n             time_freq = &#039;Y&#039;,      # Yearly aggregation\n             preprocess = True,    # Clean data - punctuation + numbers\n             skip = None ,         # No other stop words removed\n             n_breaks = None)      # No structural break analysis<\/code><\/pre>\n<h3 class=\"wp-block-heading\">Results<\/h3>\n<p class=\"wp-block-paragraph\">Arabica returns a picture that can be manually saved in PNG or JPEG.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f9fafb\" data-has-transparency=\"true\" style=\"--dominant-color: #f9fafb;\" loading=\"lazy\" decoding=\"async\" width=\"8132\" height=\"5670\" class=\"wp-image-452163 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1l_Pn6lwaLaNDAfMXzDrb-Q.png\" alt=\"Figure 3. Sentiment analysis - yearly\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1l_Pn6lwaLaNDAfMXzDrb-Q.png 8132w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1l_Pn6lwaLaNDAfMXzDrb-Q-300x209.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1l_Pn6lwaLaNDAfMXzDrb-Q-1024x714.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1l_Pn6lwaLaNDAfMXzDrb-Q-768x535.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1l_Pn6lwaLaNDAfMXzDrb-Q-1536x1071.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1l_Pn6lwaLaNDAfMXzDrb-Q-2048x1428.png 2048w\" sizes=\"auto, (max-width: 8132px) 100vw, 8132px\" \/><figcaption class=\"wp-element-caption\">Figure 3. Sentiment analysis &#8211; yearly<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">At the same time, Arabica returns a dataframe with the corresponding data. The table can be saved simply by assigning the function to an object :<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># generate a dataframe\ndf = coffee_break(text = data[&#039;text&#039;],\n                  time = data[&#039;date&#039;],\n                  date_format = &#039;eur&#039;,\n                  time_freq = &#039;Y&#039;,\n                  preprocess = True,\n                  skip = None ,\n                  n_breaks = None)\n\n# save is as a csv\ndf.to_csv(&#039;sentiment_data.csv&#039;)<\/code><\/pre>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><strong>Results interpretation:<\/strong> we can see that sentiment significantly dropped after Pfizer vaccines started to be used to tackle Covid in 2021 (Figure 2). The reason is likely the global pandemic the world faced and the generally negative mood in these years.<\/p><\/blockquote>\n<h2 class=\"wp-block-heading\">3.2. Structural break analysis<\/h2>\n<p class=\"wp-block-paragraph\">Next, let&#8217;s formalize the structural breaks in sentiment statistically. _Coffe<em>break<\/em> enables the identification of min. 2 breakpoints. The following code returns a figure with 3 breakpoints marked by vertical lines and the table with the corresponding time series:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">coffee_break(text = data[&#039;text&#039;],\n             time = data[&#039;date&#039;],\n             date_format = &#039;eur&#039;, # Read dates in European format\n             time_freq = &#039;Y&#039;,     # Yearly aggregation\n             preprocess = True,   # Clean data\n             skip = None,         # No other stop words removed\n             n_breaks = 3)        # 3 breaktpoints<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The figure:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"faf9fa\" data-has-transparency=\"true\" style=\"--dominant-color: #faf9fa;\" loading=\"lazy\" decoding=\"async\" width=\"8132\" height=\"5648\" class=\"wp-image-452164 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1Ca0NVl7IpGGBfUbMVVxgpw.png\" alt=\"Figure 4. Structural break analysis - yearly\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1Ca0NVl7IpGGBfUbMVVxgpw.png 8132w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1Ca0NVl7IpGGBfUbMVVxgpw-300x208.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1Ca0NVl7IpGGBfUbMVVxgpw-1024x711.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1Ca0NVl7IpGGBfUbMVVxgpw-768x533.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1Ca0NVl7IpGGBfUbMVVxgpw-1536x1067.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1Ca0NVl7IpGGBfUbMVVxgpw-2048x1422.png 2048w\" sizes=\"auto, (max-width: 8132px) 100vw, 8132px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Structural break analysis &#8211; yearly<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Subsetting the data to the two Covid years (2020\u20132021), we might observe monthly changes in public sentiment, keeping <code>n_breaks = 3<\/code> and setting <code>time_freq = &#039;M&#039;<\/code> :<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafafa\" data-has-transparency=\"true\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"8202\" height=\"5603\" class=\"wp-image-452165 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1y6x8SyUHkt7us8nM45-mcQ.png\" alt=\"Figure 5. Structural break analysis - monthly\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1y6x8SyUHkt7us8nM45-mcQ.png 8202w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1y6x8SyUHkt7us8nM45-mcQ-300x205.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1y6x8SyUHkt7us8nM45-mcQ-1024x700.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1y6x8SyUHkt7us8nM45-mcQ-768x525.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1y6x8SyUHkt7us8nM45-mcQ-1536x1049.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1y6x8SyUHkt7us8nM45-mcQ-2048x1399.png 2048w\" sizes=\"auto, (max-width: 8202px) 100vw, 8202px\" \/><figcaption class=\"wp-element-caption\">Figure 5. Structural break analysis &#8211; monthly<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The graph is not very informative. There are 1577 rows for 24 time observations in this subset, and after cleaning the raw data, the time series is very volatile. Making conclusions using a clustering-based algorithm on such a limited volume of data is not a good idea.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><strong>Results interpretation:<\/strong> the structural break analysis in yearly frequency statstically confirmed what we could see from the time series of sentiment in Figure 3. Fisher-Jenks algorithm identified three structural breaks &#8211; in 2009, 2017, and 2021. We can only guess what caused the decline in 2009 and between 2016 and 2018. The 2021&#8217;s drop is likely reasoned by the Covid-19 crisis.<\/p><\/blockquote>\n<h2 class=\"wp-block-heading\">4. Best practices for structural break analysis<\/h2>\n<p class=\"wp-block-paragraph\">Let&#8217;s summarize the recommendations for the most effective use of _coffee<em>break<\/em>:<\/p>\n<ul class=\"wp-block-list\">\n<li>don&#8217;t use structural break analysis if there are NAN values in the corresponding time series.<\/li>\n<li>identification of more than 3 break points makes sense in longer time series (at least 12 observations).<\/li>\n<li>breakpoint identification might not work well in highly volatile datasets. The reason for dramatic changes might not be the shifts in sentiment but rather the quality of data.<\/li>\n<li>the analysis is only as correct as the underlying sentiment data. Before the actual use, make a short exploration of the raw text dataset to check if (1) it is not too imbalanced in the number of rows for each period and (2) it contains enough information for sentiment evaluation (texts are not too short and don&#8217;t contain mostly digits and special characters).<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"ebebeb\" data-has-transparency=\"true\" style=\"--dominant-color: #ebebeb;\" loading=\"lazy\" decoding=\"async\" width=\"9620\" height=\"2620\" class=\"wp-image-452166 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1sneBBul67KQv17oz8oILAQ.png\" alt=\"Image by author\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1sneBBul67KQv17oz8oILAQ.png 9620w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1sneBBul67KQv17oz8oILAQ-300x82.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1sneBBul67KQv17oz8oILAQ-1024x279.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1sneBBul67KQv17oz8oILAQ-768x209.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1sneBBul67KQv17oz8oILAQ-1536x418.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/1sneBBul67KQv17oz8oILAQ-2048x558.png 2048w\" sizes=\"auto, (max-width: 9620px) 100vw, 9620px\" \/><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">A drawback of _coffee<em>break<\/em> is that currently, it only works with <strong>English texts.<\/strong> Due to the fact that Arabica is mainly a Pandas-based package (including Numpy vectorization in some parts), _coffee<em>break<\/em> is rather slow in evaluating large datasets. It is time-efficient in processing datasets of up to approx. 40 000 rows.<\/p>\n<p class=\"wp-block-paragraph\">Read these tutorials to find out more about n-gram and sentiment analysis and visualization of time-series text data:<\/p>\n<ul class=\"wp-block-list\">\n<li><em><strong><a href=\"https:\/\/medium.com\/towards-data-science\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\">Visualization Module in Arabica Speeds Up Text Data Exploration<\/a><\/strong><\/em><\/li>\n<li><em><strong><a href=\"https:\/\/towardsdatascience.com\/customer-satisfaction-measurement-with-n-gram-and-sentiment-analysis-547e291c13a6\">Customer Satisfaction Measurement with N-gram and Sentiment Analysis<\/a><\/strong><\/em><\/li>\n<li><em><strong><a href=\"https:\/\/pub.towardsai.net\/research-article-meta-data-description-made-quick-and-easy-57754e54b550\">Research Article Meta-data Description Made Quick and Easy<\/a><\/strong><\/em><\/li>\n<li><em><strong><a href=\"https:\/\/medium.com\/towards-data-science\/the-most-favorable-pre-trained-sentiment-classifiers-in-python-9107c06442c6\">The Most Favorable Pre-trained Sentiment Classifiers in Python<\/a><\/strong><\/em><\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">_Coffee<em>break<\/em> has been developed in cooperation with Prof. Jitka Pom\u011bnkov\u00e1 (Brno University of Technology). The complete code in this tutorial is on my <a href=\"https:\/\/github.com\/PetrKorab\/Arabica\/blob\/main\/docs\/examples\/coffee_break_examples.ipynb\">GitHub<\/a>.<\/p>\n<p class=\"wp-block-paragraph\"><em>Did you like the article? You can invite me <a href=\"https:\/\/www.buymeacoffee.com\/petrkorab\">for coffee<\/a> and support my writing. You can also subscribe to my <a href=\"https:\/\/medium.com\/subscribe\/@petrkorab\">email list<\/a> to get notified about my new articles. Thanks!<\/em><\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"d8d5d1\" data-has-transparency=\"false\" style=\"--dominant-color: #d8d5d1;\" loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1707\" class=\"wp-image-452167 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0vmpSaU8n3EN2aIjb-scaled.jpg\" alt=\"Photo by Content Pixie on Unsplash\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0vmpSaU8n3EN2aIjb-scaled.jpg 2560w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0vmpSaU8n3EN2aIjb-300x200.jpg 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0vmpSaU8n3EN2aIjb-1024x683.jpg 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0vmpSaU8n3EN2aIjb-768x512.jpg 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0vmpSaU8n3EN2aIjb-1536x1024.jpg 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0vmpSaU8n3EN2aIjb-2048x1365.jpg 2048w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@contentpixie?utm_source=medium&amp;utm_medium=referral\">Content Pixie<\/a> on <a href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">References<\/h2>\n<p class=\"wp-block-paragraph\">[1] Hutto, C., Gilbert, E. (2014). <a href=\"https:\/\/ojs.aaai.org\/index.php\/ICWSM\/article\/view\/14550\">VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text<\/a>. <em>Proceedings of the International AAAI Conference on Web and Social Media<\/em>, <em>8<\/em>(1), 216\u2013225.<\/p>\n<p class=\"wp-block-paragraph\">[2] Jenks, G.F. (1977). <em>Optimal data classification for choropleth maps.<\/em> Kansas. University. Dept. of Geography-Meteorology. Occasional paper no. 2.<\/p>","protected":false},"excerpt":{"rendered":"<p>Arabica now offers a structural break and sentiment analysis module to enrich time-series text mining<\/p>\n","protected":false},"author":18,"featured_media":157981,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"is_member_only":true,"sub_heading":"Arabica now offers a structural break and sentiment analysis module to enrich time-series text mining","footnotes":""},"categories":[],"tags":[467,694,1604,461],"sponsor":[],"coauthors":[30697],"class_list":["post-157980","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-python","tag-sentiment-analysis","tag-text-mining","tag-time-series-analysis"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Sentiment Analysis and Structural Breaks in Time-Series Text Data | Towards Data Science<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Sentiment Analysis and Structural Breaks in Time-Series Text Data | Towards Data Science\" \/>\n<meta property=\"og:description\" content=\"Arabica now offers a structural break and sentiment analysis module to enrich time-series text mining\" \/>\n<meta property=\"og:url\" content=\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/\" \/>\n<meta property=\"og:site_name\" content=\"Towards Data Science\" \/>\n<meta property=\"article:published_time\" content=\"2023-03-06T19:04:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-18T05:22:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1707\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Petr Kor\u00e1b\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:site\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Petr Kor\u00e1b\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/\"},\"author\":{\"name\":\"TDS Editors\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\"},\"headline\":\"Sentiment Analysis and Structural Breaks in Time-Series Text Data\",\"datePublished\":\"2023-03-06T19:04:48+00:00\",\"dateModified\":\"2025-01-18T05:22:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/\"},\"wordCount\":1287,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg\",\"keywords\":[\"Python\",\"Sentiment Analysis\",\"Text Mining\",\"Time Series Analysis\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/\",\"url\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/\",\"name\":\"Sentiment Analysis and Structural Breaks in Time-Series Text Data | Towards Data Science\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg\",\"datePublished\":\"2023-03-06T19:04:48+00:00\",\"dateModified\":\"2025-01-18T05:22:29+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg\",\"width\":2560,\"height\":1707,\"caption\":\"Photo by Adam \u00c5\u009amigielski on Unsplash\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/towardsdatascience.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Sentiment Analysis and Structural Breaks in Time-Series Text Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/towardsdatascience.com\/#website\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"name\":\"Towards Data Science\",\"description\":\"Publish AI, ML &amp; data-science insights to a global community of data professionals.\",\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"alternateName\":\"TDS\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/towardsdatascience.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/towardsdatascience.com\/#organization\",\"name\":\"Towards Data Science\",\"alternateName\":\"TDS\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"width\":696,\"height\":696,\"caption\":\"Towards Data Science\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/TDataScience\",\"https:\/\/www.youtube.com\/c\/TowardsDataScience\",\"https:\/\/www.linkedin.com\/company\/towards-data-science\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\",\"name\":\"TDS Editors\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"caption\":\"TDS Editors\"},\"description\":\"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds\",\"url\":\"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Sentiment Analysis and Structural Breaks in Time-Series Text Data | Towards Data Science","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/","og_locale":"en_US","og_type":"article","og_title":"Sentiment Analysis and Structural Breaks in Time-Series Text Data | Towards Data Science","og_description":"Arabica now offers a structural break and sentiment analysis module to enrich time-series text mining","og_url":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/","og_site_name":"Towards Data Science","article_published_time":"2023-03-06T19:04:48+00:00","article_modified_time":"2025-01-18T05:22:29+00:00","og_image":[{"width":2560,"height":1707,"url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg","type":"image\/jpeg"}],"author":"Petr Kor\u00e1b","twitter_card":"summary_large_image","twitter_creator":"@TDataScience","twitter_site":"@TDataScience","twitter_misc":{"Written by":"Petr Kor\u00e1b","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#article","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/"},"author":{"name":"TDS Editors","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee"},"headline":"Sentiment Analysis and Structural Breaks in Time-Series Text Data","datePublished":"2023-03-06T19:04:48+00:00","dateModified":"2025-01-18T05:22:29+00:00","mainEntityOfPage":{"@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/"},"wordCount":1287,"commentCount":0,"publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"image":{"@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg","keywords":["Python","Sentiment Analysis","Text Mining","Time Series Analysis"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/","url":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/","name":"Sentiment Analysis and Structural Breaks in Time-Series Text Data | Towards Data Science","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage"},"image":{"@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg","datePublished":"2023-03-06T19:04:48+00:00","dateModified":"2025-01-18T05:22:29+00:00","breadcrumb":{"@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#primaryimage","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/03\/0Yds9_8oU0auz4HKP-scaled.jpg","width":2560,"height":1707,"caption":"Photo by Adam \u00c5\u009amigielski on Unsplash"},{"@type":"BreadcrumbList","@id":"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/towardsdatascience.com\/"},{"@type":"ListItem","position":2,"name":"Sentiment Analysis and Structural Breaks in Time-Series Text Data"}]},{"@type":"WebSite","@id":"https:\/\/towardsdatascience.com\/#website","url":"https:\/\/towardsdatascience.com\/","name":"Towards Data Science","description":"Publish AI, ML &amp; data-science insights to a global community of data professionals.","publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"alternateName":"TDS","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/towardsdatascience.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/towardsdatascience.com\/#organization","name":"Towards Data Science","alternateName":"TDS","url":"https:\/\/towardsdatascience.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","width":696,"height":696,"caption":"Towards Data Science"},"image":{"@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/TDataScience","https:\/\/www.youtube.com\/c\/TowardsDataScience","https:\/\/www.linkedin.com\/company\/towards-data-science\/"]},{"@type":"Person","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee","name":"TDS Editors","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac","url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","caption":"TDS Editors"},"description":"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds","url":"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Towards Data Science","distributor_original_site_url":"https:\/\/towardsdatascience.com","push-errors":false,"_links":{"self":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/157980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/comments?post=157980"}],"version-history":[{"count":0,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/157980\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media\/157981"}],"wp:attachment":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media?parent=157980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/categories?post=157980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/tags?post=157980"},{"taxonomy":"sponsor","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/sponsor?post=157980"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/coauthors?post=157980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}