{"id":104781,"date":"2023-01-09T21:51:17","date_gmt":"2023-01-09T21:51:17","guid":{"rendered":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/"},"modified":"2025-01-26T13:03:36","modified_gmt":"2025-01-26T13:03:36","slug":"visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce","status":"publish","type":"post","link":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/","title":{"rendered":"Visualization Module in Arabica Speeds Up Text Data Exploration"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"eee2e2\" data-has-transparency=\"true\" style=\"--dominant-color: #eee2e2;\" loading=\"lazy\" decoding=\"async\" width=\"6192\" height=\"3811\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png\" alt=\"Figure 1. Bigram word cloud, image by author.\" class=\"wp-image-104782 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png 6192w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ-300x185.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ-1024x630.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ-768x473.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ-1536x945.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ-2048x1260.png 2048w\" sizes=\"auto, (max-width: 6192px) 100vw, 6192px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Bigram word cloud, image by author.<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/pypi.org\/project\/arabica\/\">Arabica <\/a>is a python library for exploratory text data analysis focusing on text from a time-series perspective. It reflects the empirical reality that many text datasets are collected as repeated observations over time. <strong>Time series text data<\/strong> include newspaper article headlines, research article abstracts and metadata, product reviews, social network communication, and many others. <strong>Arabica<\/strong> __ simplifies exploratory analysis (EDA) of these datasets by providing these methods:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>arabica_freq:<\/strong> descriptive n-gram analysis and time-series n-gram analysis, for n-gram based EDA of text dataset<\/li>\n<li><strong>cappuccino:<\/strong> for visual exploration of the data.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">This article provides an introduction to <strong>Cappuccino,<\/strong> Arabica&#8217;s visualization module **** for exploratory analysis of time-series text data. Read the <a href=\"https:\/\/arabica.readthedocs.io\/en\/latest\/index.html\">documentation<\/a> and a tutorial <a href=\"https:\/\/towardsdatascience.com\/text-as-time-series-arabica-1-0-brings-new-features-for-exploratory-text-data-analysis-88eaabb84deb\">here <\/a>for a general introduction to Arabica.<\/p>\n<p class=\"wp-block-paragraph\"><em><strong>EDIT Jan 2023<\/strong>: Arabica has been updated. Check the <strong>documentation<\/strong> for the full list of parameters.<\/em><\/p>\n<h2 class=\"wp-block-heading\">2. Cappuccino, visualization for exploratory text data analysis<\/h2>\n<p class=\"wp-block-paragraph\">The plots implemented are <strong>word cloud<\/strong> (unigram, bigram, and trigram versions), <strong>heatmap<\/strong>, and <strong>line plot<\/strong>. They help discover (1) the <strong>most frequent n-grams<\/strong> for the whole data reflecting its time-series character (word clouds) and (2) <strong>n-grams development over time<\/strong> (heatmap, line plot).<\/p>\n<p class=\"wp-block-paragraph\">The graphs are designed for use in presentations, reports, and empirical studies. They are, therefore, in <strong>high resolution<\/strong> (pixels depend on the data range displayed in the graphs).<\/p>\n<p class=\"wp-block-paragraph\">Cappuccino relies on <em><a href=\"https:\/\/matplotlib.org\/\">matplotlib<\/a><\/em>, <em><a href=\"https:\/\/pypi.org\/project\/wordcloud\/\">worcloud<\/a><\/em>, and <em><a href=\"https:\/\/plotnine.readthedocs.io\/en\/stable\/\">plotnine<\/a><\/em> to create and display graphs, and <em>cleantext<\/em> and <em><a href=\"https:\/\/www.nltk.org\/\">NTLK<\/a><\/em> corpus of stopwords for pre-processing. <em>Plotnine<\/em> implements the popular and widely used <em><a href=\"https:\/\/ggplot2.tidyverse.org\/\">ggplot2<\/a><\/em> library into Python. The requirements are here.<\/p>\n<p class=\"wp-block-paragraph\">The method&#8217;s parameters are:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">def cappuccino(text: str,                # Text\n               time: str,                # Time\n               plot: str = &#039;&#039;,           # Chart type: &#039;wordcloud&#039;\/&#039;heatmap&#039;\/&#039;line&#039;\n               ngram: int = &#039;&#039;,          # N-gram size, 1 = unigram, 2 = bigram, 3 = trigram\n               time_freq: str= &#039;&#039;,       # Aggregation period: &#039;Y&#039;\/&#039;M&#039;&#039;, if no aggregation: &#039;ungroup&#039;\n               max_words int = &#039;&#039;,       # Max number for most frequent n-grams displayed for each period\n               stopwords: [],            # Languages for stop words\n               skip: [ ],                # Remove additional strings\n               numbers: bool = False,    # Remove numbers\n               punct: bool = False,      # Remove punctuation\n               lower_case: bool = False  # Lowercase text\n)<\/code><\/pre>\n<h2 class=\"wp-block-heading\">3. Descriptive n-gram visualization<\/h2>\n<p class=\"wp-block-paragraph\">Descriptive analysis in Arabica provides n-gram frequency calculations without aggregation over a specific period. In simple terms, first, n-grams frequencies are calculated for each text record, second, the frequencies are summed for the whole dataset, and finally, the frequencies are visualized in a plot.<\/p>\n<h3 class=\"wp-block-heading\">Word cloud<\/h3>\n<p class=\"wp-block-paragraph\">Let&#8217;s illustrate the coding on the <strong><a href=\"https:\/\/www.kaggle.com\/datasets\/therohk\/million-headlines?resource=download\">Million News Headlines<\/a><\/strong> of news headlines published in daily frequency over 2003\u20132\u201319: 2016\u201309\u201318. The dataset is provided by the Australian Broadcasting Corporation under the <a href=\"https:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/\">CC0: Public Domain<\/a> license. We&#8217;ll subset the data to the first 50 000 headlines.<\/p>\n<p class=\"wp-block-paragraph\">First, install Arabica with <code>pip install arabica<\/code>, then import Cappuccino:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-javascript\">from arabica import cappuccino<\/code><\/pre>\n<p class=\"wp-block-paragraph\">After reading the data with <code>pandas<\/code>, the data looks like this:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"e9e9e8\" data-has-transparency=\"false\" style=\"--dominant-color: #e9e9e8;\" loading=\"lazy\" decoding=\"async\" width=\"614\" height=\"260\" class=\"wp-image-545416 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1aYiqwnwyIyKdGL4Fms4LNw.png\" alt=\"Figure 2. Million News Headlines data\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1aYiqwnwyIyKdGL4Fms4LNw.png 614w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1aYiqwnwyIyKdGL4Fms4LNw-300x127.png 300w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><figcaption class=\"wp-element-caption\">Figure 2. Million News Headlines data<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">We lowercase the text, clean the data from punctuation and numbers, remove English stopwords and other unwanted strings (<em>&quot;g&quot;<\/em>, <em>&quot;br&quot;<\/em>), and plot a word cloud with the 100 most frequent words:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">cappuccino(text = data[&#039;headline&#039;],\n           time = data[&#039;date&#039;],\n           plot = &#039;wordcloud&#039;,\n           ngram = 1,               # n-gram size, 1 = unigram, 2 = bigram, 3 = trigram\n           time_freq = &#039;ungroup&#039;,   # no period aggregation\n           max_words = 100,         # displays 100 most frequent words\n           stopwords = [&#039;english&#039;], # remove English stopwords\n           skip = [&#039;g&#039;,&#039;br&#039;],       # remove additional strings\n           numbers = True,          # remove numbers\n           punct = True,            # remove punctuation\n           lower_case = True        # lowercase text\n)\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">It returns the word cloud:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"ece5e6\" data-has-transparency=\"true\" style=\"--dominant-color: #ece5e6;\" loading=\"lazy\" decoding=\"async\" width=\"6192\" height=\"3811\" class=\"wp-image-545417 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1sk-fhkTdvQngKxydPuErtw.png\" alt=\"Figure 3. Word cloud, image by author.\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1sk-fhkTdvQngKxydPuErtw.png 6192w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1sk-fhkTdvQngKxydPuErtw-300x185.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1sk-fhkTdvQngKxydPuErtw-1024x630.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1sk-fhkTdvQngKxydPuErtw-768x473.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1sk-fhkTdvQngKxydPuErtw-1536x945.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1sk-fhkTdvQngKxydPuErtw-2048x1260.png 2048w\" sizes=\"auto, (max-width: 6192px) 100vw, 6192px\" \/><figcaption class=\"wp-element-caption\">Figure 3. Word cloud, image by author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">After changing <code>ngram = 2<\/code> , we receive a word cloud with the 100 most frequent bigrams (see the cover picture). Alternatively, <code>ngram = 3<\/code> displays the most frequent trigrams:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f2e9eb\" data-has-transparency=\"true\" style=\"--dominant-color: #f2e9eb;\" loading=\"lazy\" decoding=\"async\" width=\"6192\" height=\"3811\" class=\"wp-image-545419 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1rDlCI6bK6wTLQadwsocN9Q.png\" alt=\"Figure 4. Word cloud - trigram, image by author.\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1rDlCI6bK6wTLQadwsocN9Q.png 6192w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1rDlCI6bK6wTLQadwsocN9Q-300x185.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1rDlCI6bK6wTLQadwsocN9Q-1024x630.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1rDlCI6bK6wTLQadwsocN9Q-768x473.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1rDlCI6bK6wTLQadwsocN9Q-1536x945.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1rDlCI6bK6wTLQadwsocN9Q-2048x1260.png 2048w\" sizes=\"auto, (max-width: 6192px) 100vw, 6192px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Word cloud &#8211; trigram, image by author.<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">4. Time-series n-gram visualization<\/h2>\n<p class=\"wp-block-paragraph\">Time series text data typically display variability over time. Political statements before elections and newspaper headlines during the Covid-19 pandemic are nice examples. To display the n-grams over time, Arabica implements a <strong>heatmap<\/strong> and a <strong>line plot<\/strong> for monthly and yearly periods.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"c6d2e4\" data-has-transparency=\"true\" style=\"--dominant-color: #c6d2e4;\" loading=\"lazy\" decoding=\"async\" width=\"650\" height=\"186\" class=\"wp-image-545421 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1697tN6L6XfIehj0fqzvMdA.png\" alt=\"Image by author, source: Draw.io\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1697tN6L6XfIehj0fqzvMdA.png 650w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1697tN6L6XfIehj0fqzvMdA-300x86.png 300w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><figcaption class=\"wp-element-caption\">Image by author, source: Draw.io<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Heatmap<\/strong><\/h3>\n<p class=\"wp-block-paragraph\">A <strong>heatmap<\/strong> with the ten most frequent words in each month is displayed with the following code :<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">cappuccino(text = data[&#039;headline&#039;],\n           time = data[&#039;date&#039;],\n           plot = &#039;heatmap&#039;,\n           ngram = 1,               # n-gram size, 1 = unigram, 2 = bigram\n           time_freq = &#039;M&#039;,         # monthly aggregation\n           max_words = 10,          # displays 10 most frequent words for each period\n           stopwords = [&#039;english&#039;], # remove English stopwords\n           skip = [&#039;g&#039;, &#039;br&#039;],      # remove additional strings\n           numbers = True,          # remove numbers\n           punct = True,            # remove punctuation\n           lower_case = True        # lowercase text\n)\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">The unigram heatmap is the output:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"e7e9e8\" data-has-transparency=\"true\" style=\"--dominant-color: #e7e9e8;\" loading=\"lazy\" decoding=\"async\" width=\"8078\" height=\"3919\" class=\"wp-image-545422 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/15spB_YmspXFRHi0-1dkZJA.png\" alt=\"Figure 5. Heatmap - unigram, image by author.\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/15spB_YmspXFRHi0-1dkZJA.png 8078w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/15spB_YmspXFRHi0-1dkZJA-300x146.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/15spB_YmspXFRHi0-1dkZJA-1024x497.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/15spB_YmspXFRHi0-1dkZJA-768x373.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/15spB_YmspXFRHi0-1dkZJA-1536x745.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/15spB_YmspXFRHi0-1dkZJA-2048x994.png 2048w\" sizes=\"auto, (max-width: 8078px) 100vw, 8078px\" \/><figcaption class=\"wp-element-caption\">Figure 5. Heatmap &#8211; unigram, image by author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The unigram heatmap gives us the first look at the variability of data over time. We can clearly identify the important patterns in the data:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>most frequent n-grams<\/strong>: &quot;us&quot;, &quot;police&quot;, &quot;new&quot;, &quot;man&quot;.<\/p>\n<p class=\"wp-block-paragraph\"><strong>outliers<\/strong> (terms appearing only in one period): &quot;war&quot;, &quot;wa&quot;, &quot;rain&quot;, &quot;killed&quot;, &quot;iraqi&quot;, &quot;concerns&quot;, &quot;budget&quot;, &quot;bali&quot;.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">We might consider removing the outliers in the later stage of the analysis. Alternatively, we create a <strong>bigram<\/strong> <strong>heatmap<\/strong> by changing <code>ngram = 2<\/code> and <code>max_words = 5<\/code> displaying a heatmap with the five most frequent bigrams in each period.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f6f7f6\" data-has-transparency=\"true\" style=\"--dominant-color: #f6f7f6;\" loading=\"lazy\" decoding=\"async\" width=\"8513\" height=\"5767\" class=\"wp-image-545424 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1UODvzoys1mEmO7qlNX4xuw.png\" alt=\"Figure 6. Heatmap - bigram, image by author.\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1UODvzoys1mEmO7qlNX4xuw.png 8513w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1UODvzoys1mEmO7qlNX4xuw-300x203.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1UODvzoys1mEmO7qlNX4xuw-1024x694.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1UODvzoys1mEmO7qlNX4xuw-768x520.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1UODvzoys1mEmO7qlNX4xuw-1536x1041.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1UODvzoys1mEmO7qlNX4xuw-2048x1387.png 2048w\" sizes=\"auto, (max-width: 8513px) 100vw, 8513px\" \/><figcaption class=\"wp-element-caption\">Figure 6. Heatmap &#8211; bigram, image by author.<\/figcaption><\/figure>\n<h3 class=\"wp-block-heading\"><strong>Line plot<\/strong><\/h3>\n<p class=\"wp-block-paragraph\">A <strong>line plo<\/strong>t with n-grams is displayed by changing <code>plot = &#039;line&#039;<\/code>. By setting <code>ngram<\/code> parameter to 1 and <code>max_words = 5<\/code> we create a line plot for the five most frequent words in each period:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafbfb\" data-has-transparency=\"true\" style=\"--dominant-color: #fafbfb;\" loading=\"lazy\" decoding=\"async\" width=\"4778\" height=\"2379\" class=\"wp-image-545426 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1W0ytuUeFrJzqxFI-XGzqeg.png\" alt=\"Figure 7. Line plot - unigram, image by author.\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1W0ytuUeFrJzqxFI-XGzqeg.png 4778w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1W0ytuUeFrJzqxFI-XGzqeg-300x149.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1W0ytuUeFrJzqxFI-XGzqeg-1024x510.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1W0ytuUeFrJzqxFI-XGzqeg-768x382.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1W0ytuUeFrJzqxFI-XGzqeg-1536x765.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1W0ytuUeFrJzqxFI-XGzqeg-2048x1020.png 2048w\" sizes=\"auto, (max-width: 4778px) 100vw, 4778px\" \/><figcaption class=\"wp-element-caption\">Figure 7. Line plot &#8211; unigram, image by author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Similarly by changing <code>ngram = 2<\/code> and <code>max_words = 3<\/code> the bigram line plot looks like this:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f9f9f9\" data-has-transparency=\"true\" style=\"--dominant-color: #f9f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"4928\" height=\"2379\" class=\"wp-image-545427 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1zyOFM8YTvMv3p5Bz8UcQHw.png\" alt=\"Figure 8. Line plot - bigram, image by author.\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1zyOFM8YTvMv3p5Bz8UcQHw.png 4928w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1zyOFM8YTvMv3p5Bz8UcQHw-300x145.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1zyOFM8YTvMv3p5Bz8UcQHw-1024x494.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1zyOFM8YTvMv3p5Bz8UcQHw-768x371.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1zyOFM8YTvMv3p5Bz8UcQHw-1536x742.png 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1zyOFM8YTvMv3p5Bz8UcQHw-2048x989.png 2048w\" sizes=\"auto, (max-width: 4928px) 100vw, 4928px\" \/><figcaption class=\"wp-element-caption\">Figure 8. Line plot &#8211; bigram, image by author.<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Final remarks<\/h2>\n<p class=\"wp-block-paragraph\">Cappuccino greatly helps in the visual exploration of text data which has a time-series character. With a single line of code, we pre-process the data and provide the first exploratory glimpse of the dataset. Here are several tips to follow:<\/p>\n<ul class=\"wp-block-list\">\n<li>The visualization frequency also depends on the <strong>length of the time dimension<\/strong> in the data. In long time series, a monthly plot will not display the data clearly, while a graph for short time series (less than a year) in yearly frequency will not provide any variability over time.<\/li>\n<li>Select a suitable form of <strong>visualization on the basis of the dataset in your project<\/strong>. A line plot is not a good choice for datasets with high n-gram variability over time (see Fig 8). In this case, the heatmap shows a better picture even for many n-grams in each period.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Some questions we can answer with Arabica are (1) how the concepts in a specific domain (economics, biology, etc.) evolved over time, using <strong>research article metadata<\/strong>, (2) which key topics were emphasized during a presidential campaign, using <strong>Twitter tweets<\/strong>, (3) which parts of the brand and communication a company should improve, using <strong>customer product reviews<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">The complete code in this <a href=\"https:\/\/medium.com\/towards-data-science\/text-as-time-series-arabica-1-0-brings-new-features-for-exploratory-text-data-analysis-88eaabb84deb\">tutorial <\/a>is on my <a href=\"https:\/\/github.com\/PetrKorab\/Arabica\/blob\/main\/docs\/examples\/cappuccino_examples.ipynb\">GitHub<\/a>. For more examples, read the <a href=\"https:\/\/arabica.readthedocs.io\/en\/latest\/index.html\">documentation <\/a>and a tutorial on _arabica<em>freq<\/em> method.<\/p>\n<p class=\"wp-block-paragraph\"><em><strong>EDIT:<\/strong> Arabica now has a <strong>sentiment and structural breaks<\/strong> analytical module. Read more and also check practical applications in these tutorials:<\/em><\/p>\n<ul class=\"wp-block-list\">\n<li><em><strong><a href=\"https:\/\/towardsdatascience.com\/sentiment-analysis-and-structural-breaks-in-time-series-text-data-8109c712ca2\">Sentiment Analysis and Structural Breaks in Time-Series Text Data<\/a><\/strong><\/em><\/li>\n<li><em><strong><a href=\"https:\/\/towardsdatascience.com\/customer-satisfaction-measurement-with-n-gram-and-sentiment-analysis-547e291c13a6\">Customer Satisfaction Measurement with N-gram and Sentiment Analysis<\/a><\/strong><\/em><\/li>\n<li><em><strong><a href=\"https:\/\/pub.towardsai.net\/research-article-meta-data-description-made-quick-and-easy-57754e54b550\">Research Article Meta-data Description Made Quick and Easy<\/a><\/strong><\/em><\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><em>Did you like the article? You can invite me <a href=\"https:\/\/www.buymeacoffee.com\/petrkorab\">for coffee<\/a> and support my writing. You can also subscribe to my <a href=\"https:\/\/medium.com\/subscribe\/@petrkorab\">email list<\/a> to get notified about my new articles. Thanks!<\/em><\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f1e7e5\" data-has-transparency=\"false\" style=\"--dominant-color: #f1e7e5;\" loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1437\" class=\"wp-image-545431 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/0YWoIOYChn4xLiKaM-scaled.jpg\" alt=\"Photo by Kanwardeep Kaur on Unsplash\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/0YWoIOYChn4xLiKaM-scaled.jpg 2560w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/0YWoIOYChn4xLiKaM-300x168.jpg 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/0YWoIOYChn4xLiKaM-1024x575.jpg 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/0YWoIOYChn4xLiKaM-768x431.jpg 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/0YWoIOYChn4xLiKaM-1536x862.jpg 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/0YWoIOYChn4xLiKaM-2048x1150.jpg 2048w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@kavar05?utm_source=medium&amp;utm_medium=referral\">Kanwardeep Kaur<\/a> on <a href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>","protected":false},"excerpt":{"rendered":"<p>Arabica now offers unigram, bigram, and trigram word cloud, heatmap, and line chart to further accelerate time-series text data analysis<\/p>\n","protected":false},"author":18,"featured_media":104782,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"is_member_only":true,"sub_heading":"Arabica now offers unigram, bigram, and trigram word cloud, heatmap, and line chart to further accelerate time-series text data analysis","footnotes":""},"categories":[47],"tags":[508,1223,467,1604],"sponsor":[],"coauthors":[30697],"class_list":["post-104781","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-visualization","tag-data-visualization","tag-exploratory-data-analysis","tag-python","tag-text-mining"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Visualization Module in Arabica Speeds Up Text Data Exploration | Towards Data Science<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Visualization Module in Arabica Speeds Up Text Data Exploration | Towards Data Science\" \/>\n<meta property=\"og:description\" content=\"Arabica now offers unigram, bigram, and trigram word cloud, heatmap, and line chart to further accelerate time-series text data analysis\" \/>\n<meta property=\"og:url\" content=\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/\" \/>\n<meta property=\"og:site_name\" content=\"Towards Data Science\" \/>\n<meta property=\"article:published_time\" content=\"2023-01-09T21:51:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-26T13:03:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ-1024x630.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Petr Kor\u00e1b\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:site\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Petr Kor\u00e1b\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/\"},\"author\":{\"name\":\"TDS Editors\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\"},\"headline\":\"Visualization Module in Arabica Speeds Up Text Data Exploration\",\"datePublished\":\"2023-01-09T21:51:17+00:00\",\"dateModified\":\"2025-01-26T13:03:36+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/\"},\"wordCount\":1004,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png\",\"keywords\":[\"Data Visualization\",\"Exploratory Data Analysis\",\"Python\",\"Text Mining\"],\"articleSection\":[\"Data Visualization\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/\",\"url\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/\",\"name\":\"Visualization Module in Arabica Speeds Up Text Data Exploration | Towards Data Science\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png\",\"datePublished\":\"2023-01-09T21:51:17+00:00\",\"dateModified\":\"2025-01-26T13:03:36+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png\",\"width\":6192,\"height\":3811,\"caption\":\"Figure 1. Bigram word cloud, image by author.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/towardsdatascience.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Visualization Module in Arabica Speeds Up Text Data Exploration\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/towardsdatascience.com\/#website\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"name\":\"Towards Data Science\",\"description\":\"Publish AI, ML &amp; data-science insights to a global community of data professionals.\",\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"alternateName\":\"TDS\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/towardsdatascience.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/towardsdatascience.com\/#organization\",\"name\":\"Towards Data Science\",\"alternateName\":\"TDS\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"width\":696,\"height\":696,\"caption\":\"Towards Data Science\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/TDataScience\",\"https:\/\/www.youtube.com\/c\/TowardsDataScience\",\"https:\/\/www.linkedin.com\/company\/towards-data-science\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\",\"name\":\"TDS Editors\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"caption\":\"TDS Editors\"},\"description\":\"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds\",\"url\":\"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Visualization Module in Arabica Speeds Up Text Data Exploration | Towards Data Science","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/","og_locale":"en_US","og_type":"article","og_title":"Visualization Module in Arabica Speeds Up Text Data Exploration | Towards Data Science","og_description":"Arabica now offers unigram, bigram, and trigram word cloud, heatmap, and line chart to further accelerate time-series text data analysis","og_url":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/","og_site_name":"Towards Data Science","article_published_time":"2023-01-09T21:51:17+00:00","article_modified_time":"2025-01-26T13:03:36+00:00","og_image":[{"width":1024,"height":630,"url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ-1024x630.png","type":"image\/png"}],"author":"Petr Kor\u00e1b","twitter_card":"summary_large_image","twitter_creator":"@TDataScience","twitter_site":"@TDataScience","twitter_misc":{"Written by":"Petr Kor\u00e1b","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#article","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/"},"author":{"name":"TDS Editors","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee"},"headline":"Visualization Module in Arabica Speeds Up Text Data Exploration","datePublished":"2023-01-09T21:51:17+00:00","dateModified":"2025-01-26T13:03:36+00:00","mainEntityOfPage":{"@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/"},"wordCount":1004,"commentCount":0,"publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"image":{"@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png","keywords":["Data Visualization","Exploratory Data Analysis","Python","Text Mining"],"articleSection":["Data Visualization"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/","url":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/","name":"Visualization Module in Arabica Speeds Up Text Data Exploration | Towards Data Science","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage"},"image":{"@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png","datePublished":"2023-01-09T21:51:17+00:00","dateModified":"2025-01-26T13:03:36+00:00","breadcrumb":{"@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#primaryimage","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2023\/01\/1QFOlMLGdhfeBZ0EkL7eYLQ.png","width":6192,"height":3811,"caption":"Figure 1. Bigram word cloud, image by author."},{"@type":"BreadcrumbList","@id":"https:\/\/towardsdatascience.com\/visualization-module-in-arabica-speeds-up-text-data-exploration-47114ad646ce\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/towardsdatascience.com\/"},{"@type":"ListItem","position":2,"name":"Visualization Module in Arabica Speeds Up Text Data Exploration"}]},{"@type":"WebSite","@id":"https:\/\/towardsdatascience.com\/#website","url":"https:\/\/towardsdatascience.com\/","name":"Towards Data Science","description":"Publish AI, ML &amp; data-science insights to a global community of data professionals.","publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"alternateName":"TDS","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/towardsdatascience.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/towardsdatascience.com\/#organization","name":"Towards Data Science","alternateName":"TDS","url":"https:\/\/towardsdatascience.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","width":696,"height":696,"caption":"Towards Data Science"},"image":{"@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/TDataScience","https:\/\/www.youtube.com\/c\/TowardsDataScience","https:\/\/www.linkedin.com\/company\/towards-data-science\/"]},{"@type":"Person","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee","name":"TDS Editors","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac","url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","caption":"TDS Editors"},"description":"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds","url":"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Towards Data Science","distributor_original_site_url":"https:\/\/towardsdatascience.com","push-errors":false,"_links":{"self":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/104781","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/comments?post=104781"}],"version-history":[{"count":0,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/104781\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media\/104782"}],"wp:attachment":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media?parent=104781"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/categories?post=104781"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/tags?post=104781"},{"taxonomy":"sponsor","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/sponsor?post=104781"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/coauthors?post=104781"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}