{"id":606519,"date":"2025-07-08T00:14:05","date_gmt":"2025-07-08T05:14:05","guid":{"rendered":"https:\/\/towardsdatascience.com\/?p=606519"},"modified":"2025-07-08T00:14:35","modified_gmt":"2025-07-08T05:14:35","slug":"run-your-python-code-up-to-80x-faster-using-the-cython-library","status":"publish","type":"post","link":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/","title":{"rendered":"Run Your Python Code up to 80x Faster Using the Cython Library"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><mdspan datatext=\"el1751951097703\" class=\"mdspan-comment\">Python may be an<\/mdspan> excellent language for rapid prototyping and code development, but one thing I often hear people say about using it is that it\u2019s slow to execute. This is a particular pain point for data scientists and ML engineers, as they often perform computationally intensive operations, such as matrix multiplication, gradient descent calculations or image processing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Over time, Python has evolved internally to address some of these issues by introducing new features to the language, such as multi-threading or rewriting existing functionality for improved performance. However, Python\u2019s use of the Global Interpreter Lock (GIL) often hamstrung efforts like this.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Many external libraries have also been written to bridge this perceived performance gap between Python and compiled languages such as Java. Perhaps the most used and well-known of these is the <strong>NumPy<\/strong> library. Implemented in the C language, NumPy was designed from the ground up to support multiple CPU cores and super-fast numerical and array processing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There are alternatives to NumPy, and in a recent TDS article, I introduced the <strong>numexpr<\/strong> library, which, in many use cases, can even outperform NumPy. If you&#8217;re interested in learning more, I&#8217;ll include a link to that story at the end of this article.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Another external library that is very effective is <strong>Numba<\/strong>. Numba utilises a Just-in-Time (JIT) compiler for Python, which translates a subset of Python and NumPy code into fast machine code at runtime. It is designed to accelerate numerical and scientific computing tasks by leveraging LLVM (Low-Level Virtual Machine) compiler infrastructure.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this article, I would like to discuss another runtime-enhancing external library, <strong>Cython<\/strong>. It&#8217;s one of the most performant Python libraries but also one of the least understood and used. I think this is at least partially because you have to get your hands a little bit dirty and make some changes to your original code. But if you follow the simple four-step plan I&#8217;ll outline below, the performance benefits you can achieve will make it more than worthwhile.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is&nbsp;Cython?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If you haven&#8217;t heard of Cython, it&#8217;s a superset of Python designed to provide C-like performance with code written mainly in Python. It allows for converting Python code into C code, which can then be compiled into shared libraries that can be imported into Python just like regular Python modules. This process results in the performance benefits of C while maintaining the readability of Python.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I&#8217;ll showcase the exact benefits you can achieve by converting your code to use Cython, examining three use cases and providing the four steps required to convert your existing Python code, along with comparative timings for each run.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting up a development environment<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before continuing, we should set up a separate development environment for coding to keep our project dependencies separate. I&#8217;ll be using WSL2 Ubuntu for Windows and a Jupyter Notebook for code development. I use the UV package manager to set up my development environment, but feel free to use whatever tools and methods suit you.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">$ uv init cython-test\n$ cd cython-test\n$ uv venv\n$ source .venv\/bin\/activate\n(cython-test) $ uv pip install cython jupyter numpy pillow matplotlib<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now<span style=\"margin: 0px; padding: 0px;\">, type&nbsp;<strong>&#8216;jupyter notebook&#8217;<\/strong><\/span> into your command prompt. You should see a notebook open in your browser. If that doesn\u2019t happen automatically, what you\u2019ll likely see is a screenful of information after running the Jupyter Notebook command. Near the bottom of that, there will be a URL you should copy and paste into your browser to initiate the Jupyter Notebook.<br>Your URL will be different to mine, but it should look something like this:-<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">http:\/\/127.0.0.1:8888\/tree?token=3b9f7bd07b6966b41b68e2350721b2d0b6f388d248cc69d<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Example 1 &#8211; Speeding up for&nbsp;loops<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before we start using Cython, let\u2019s begin with a regular Python function and time how long it takes to run. This will be our base benchmark.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We\u2019ll code a simple double-for-loop function that takes a few seconds to run, then use Cython to speed it up and measure the differences in runtime between the two methods.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is our baseline standard Python code.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># sum_of_squares.py\nimport timeit\n\n# Define the standard Python function\ndef slow_sum_of_squares(n):\n    total = 0\n    for i in range(n):\n        for j in range(n):\n            total += i * i + j * j\n    return total\n\n# Benchmark the Python function\nprint(&quot;Python function execution time:&quot;)\nprint(&quot;timeit:&quot;, timeit.timeit(\n        lambda: slow_sum_of_squares(20000),\n        number=1))<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">On my system, the above code produces the following output.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">Python function execution time:\n13.135973724005453<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s see how much of an improvement Cython makes of it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The four-step plan for effective Cython use.<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Using Cython to boost your code run-time in a Jupyter Notebook is a simple 4-step process.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">Don&#8217;t worry if you&#8217;re not a Notebook user, as I&#8217;ll show how to convert regular Python .py files to use Cython later on.<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">1\/ In the first cell of your notebook, load the Cython extension by typing this command.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">%load_ext Cython<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">2\/ For any subsequent cells that contain Python code that you wish to run using cython<strong>, <\/strong>add the <strong>%%cython<\/strong> magic command before the code. For example,<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">%%cython\ndef myfunction():\n    etc ...\n        ...<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">3\/ Function definitions that contain parameters must be correctly typed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">4\/ Lastly, all variables must be typed appropriately by using the<strong> cdef<\/strong> directive. Also, where it makes sense, use functions from the standard C library (available in Cython using the <strong>from libc.stdlib <\/strong>directive).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Taking our original Python code as an example, this is what it needs to look like to be ready to run in a notebook using cython after applying all four steps above.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">%%cython\ndef fast_sum_of_squares(int n):\n    cdef int total = 0\n    cdef int i, j\n    for i in range(n):\n        for j in range(n):\n            total += i * i + j * j\n    return total\n\nimport timeit\nprint(&quot;Cython function execution time:&quot;)\nprint(&quot;timeit:&quot;, timeit.timeit(\n        lambda: fast_sum_of_squares(20000),\n        number=1))<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">As I hope you can see, the reality of converting your code is much easier than the four procedural steps required might suggest.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The runtime of the above code was impressive. On my system, this new cython code produces the following output.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">Cython function execution time:\n0.15829777799808653<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">That is an over 80x speed-up.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Example 2\u200a\u2014\u200aCalculate pi using Monte&nbsp;Carlo&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For our second example, we\u2019ll examine a more complex use case, the foundation of which has numerous real-world applications.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">An area where Cython can show significant performance improvement is in numerical simulations, particularly those involving heavy computation, such as Monte Carlo (MC) simulations. Monte Carlo simulations involve running many iterations of a random process to estimate the properties of a system. MC applies to a wide variety of study fields, including climate and atmospheric science, computer graphics, AI search and quantitative finance. It is almost always a very computationally intensive process.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To illustrate, we\u2019ll use Monte Carlo in a simplified manner to calculate the value of Pi. This is a well-known example where we take a square with a side length of one unit and inscribe a quarter circle inside it with a radius of one unit, as shown here.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/mc2.png\" alt=\"\" class=\"wp-image-607660\"\/><figcaption class=\"wp-element-caption\">Image by AI (GPT-4o)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The ratio of the area of the quarter circle to the area of the square is, obviously, (Pi\/4).&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, if we consider many random (x,y) points that all lie within or on the bounds of the square, as the total number of these points tends to infinity, the ratio of points that lie on or inside the quarter circle to the total number of points tends towards Pi \/4. We then multiply this value by 4 to obtain the value of Pi itself.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is some typical Python code you might use to model this.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import random\nimport time\n\ndef monte_carlo_pi(num_samples):\n    inside_circle = 0\n    for _ in range(num_samples):\n        x = random.uniform(0, 1)\n        y = random.uniform(0, 1)\n        if (x**2) + (y**2) &lt;= 1:  \n            inside_circle += 1\n    return (inside_circle \/ num_samples) * 4\n\n# Benchmark the standard Python function\nnum_samples = 100000000\n\nstart_time = time.time()\npi_estimate = monte_carlo_pi(num_samples)\nend_time = time.time()\n\nprint(f&quot;Estimated Pi (Python): {pi_estimate}&quot;)\nprint(f&quot;Execution Time (Python): {end_time - start_time} seconds&quot;)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Running this produced the following timing result.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">Estimated Pi (Python): 3.14197216\nExecution Time (Python): 20.67279839515686 seconds<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now, here is the Cython implementation we get by following our four-step process.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">%%cython\nimport cython\nimport random\nfrom libc.stdlib cimport rand, RAND_MAX\n\n@cython.boundscheck(False)\n@cython.wraparound(False)\ndef monte_carlo_pi(int num_samples):\n    cdef int inside_circle = 0\n    cdef int i\n    cdef double x, y\n    \n    for i in range(num_samples):\n        x = rand() \/ &lt;double&gt;RAND_MAX\n        y = rand() \/ &lt;double&gt;RAND_MAX\n        if (x**2) + (y**2) &lt;= 1:\n            inside_circle += 1\n            \n    return (inside_circle \/ num_samples) * 4\n\nimport time\n\nnum_samples = 100000000\n\n# Benchmark the Cython function\nstart_time = time.time()\npi_estimate = monte_carlo_pi(num_samples)\nend_time = time.time()\n\nprint(f&quot;Estimated Pi (Cython): {pi_estimate}&quot;)\nprint(f&quot;Execution Time (Cython): {end_time - start_time} seconds&quot;)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">And here is the new output.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">Estimated Pi (Cython): 3.1415012\nExecution Time (Cython): 1.9987852573394775 seconds<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Once again, that\u2019s a pretty impressive 10x speed-up for the Cython version.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One thing we did in this code example that we didn\u2019t in the other is import some external libraries from the C standard library. That was the line,<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from libc.stdlib cimport rand, RAND_MAX<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>cimport<\/strong> command is a Cython keyword used to import C functions, variables, constants, and types. We used it to import optimised C language versions of the equivalent random.uniform() Python functions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Example 3\u2014 image manipulation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For our final example, we\u2019ll do some image manipulation. Specifically, some image convolution, which is a common operation in image processing. There are many use cases for image convolution. We\u2019re going to use it to try to sharpen the slightly blurry image shown below.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/mc3.png\" alt=\"\" class=\"wp-image-607661\"\/><figcaption class=\"wp-element-caption\">Original image by Yury Taranik (licensed from Shutterstock)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">First, here is the regular Python code.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from PIL import Image\nimport numpy as np\nfrom scipy.signal import convolve2d\nimport time\nimport os\nimport matplotlib.pyplot as plt\n\ndef sharpen_image_color(image):\n\n    # Start timing\n    start_time = time.time()\n    \n    # Convert image to RGB in case it&#039;s not already\n    image = image.convert(&#039;RGB&#039;)\n    \n    # Define a sharpening kernel\n    kernel = np.array([[0, -1, 0],\n                       [-1, 5, -1],\n                       [0, -1, 0]])\n    \n    # Convert image to numpy array\n    image_array = np.array(image)\n    \n    # Debugging: Check input values\n    print(&quot;Input array values: Min =&quot;, image_array.min(), &quot;Max =&quot;, image_array.max())\n    \n    # Prepare an empty array for the sharpened image\n    sharpened_array = np.zeros_like(image_array)\n    \n    # Apply the convolution kernel to each channel (assuming RGB image)\n    for i in range(3):\n        channel = image_array[:, :, i]\n        # Perform convolution\n        convolved_channel = convolve2d(channel, kernel, mode=&#039;same&#039;, boundary=&#039;wrap&#039;)\n        \n        # Clip values to be in the range [0, 255]\n        convolved_channel = np.clip(convolved_channel, 0, 255)\n        \n        # Store back in the sharpened array\n        sharpened_array[:, :, i] = convolved_channel.astype(np.uint8)\n    \n    # Debugging: Check output values\n    print(&quot;Sharpened array values: Min =&quot;, sharpened_array.min(), &quot;Max =&quot;, sharpened_array.max())\n    \n    # Convert array back to image\n    sharpened_image = Image.fromarray(sharpened_array)\n    \n    # End timing\n    duration = time.time() - start_time\n    print(f&quot;Processing time: {duration:.4f} seconds&quot;)\n    \n    return sharpened_image\n\n# Correct path for WSL2 accessing Windows filesystem\nimage_path = &#039;\/mnt\/d\/images\/taj_mahal.png&#039;\n\nimage = Image.open(image_path)\n\n# Sharpen the image\nsharpened_image = sharpen_image_color(image)\n\nif sharpened_image:\n    # Show using PIL&#039;s built-in show method (for debugging)\n    #sharpened_image.show(title=&quot;Sharpened Image (PIL Show)&quot;)\n\n    # Display the original and sharpened images using Matplotlib\n    fig, axs = plt.subplots(1, 2, figsize=(15, 7))\n\n    # Original image\n    axs[0].imshow(image)\n    axs[0].set_title(&quot;Original Image&quot;)\n    axs[0].axis(&#039;off&#039;)\n\n    # Sharpened image\n    axs[1].imshow(sharpened_image)\n    axs[1].set_title(&quot;Sharpened Image&quot;)\n    axs[1].axis(&#039;off&#039;)\n\n    # Show both images side by side\n    plt.show()\nelse:\n    print(&quot;Failed to generate sharpened image.&quot;)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The output is this.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-markup\">Input array values: Min = 0 Max = 255\nSharpened array values: Min = 0 Max = 255\nProcessing time: 0.1034 seconds<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/mc4-1-1024x405.png\" alt=\"\" class=\"wp-image-607663\"\/><figcaption class=\"wp-element-caption\">Image  By Author<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s see if Cython can beat that run time of 0.1034 seconds.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">%%cython\n# cython: language_level=3\n# distutils: define_macros=NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION\n\nimport numpy as np\ncimport numpy as np\nimport cython\n\n@cython.boundscheck(False)\n@cython.wraparound(False)\ndef sharpen_image_cython(np.ndarray[np.uint8_t, ndim=3] image_array):\n    # Define sharpening kernel\n    cdef int kernel[3][3]\n    kernel[0][0] = 0\n    kernel[0][1] = -1\n    kernel[0][2] = 0\n    kernel[1][0] = -1\n    kernel[1][1] = 5\n    kernel[1][2] = -1\n    kernel[2][0] = 0\n    kernel[2][1] = -1\n    kernel[2][2] = 0\n    \n    # Declare variables outside of loops\n    cdef int height = image_array.shape[0]\n    cdef int width = image_array.shape[1]\n    cdef int channel, i, j, ki, kj\n    cdef int value\n    \n    # Prepare an empty array for the sharpened image\n    cdef np.ndarray[np.uint8_t, ndim=3] sharpened_array = np.zeros_like(image_array)\n\n    # Convolve each channel separately\n    for channel in range(3):  # Iterate over RGB channels\n        for i in range(1, height - 1):\n            for j in range(1, width - 1):\n                value = 0  # Reset value at each pixel\n                # Apply the kernel\n                for ki in range(-1, 2):\n                    for kj in range(-1, 2):\n                        value += kernel[ki + 1][kj + 1] * image_array[i + ki, j + kj, channel]\n                # Clip values to be between 0 and 255\n                sharpened_array[i, j, channel] = min(max(value, 0), 255)\n\n    return sharpened_array\n\n# Python part of the code\nfrom PIL import Image\nimport numpy as np\nimport time as py_time  # Renaming the Python time module to avoid conflict\nimport matplotlib.pyplot as plt\n\n# Load the input image\nimage_path = &#039;\/mnt\/d\/images\/taj_mahal.png&#039;\nimage = Image.open(image_path).convert(&#039;RGB&#039;)\n\n# Convert the image to a NumPy array\nimage_array = np.array(image)\n\n# Time the sharpening with Cython\nstart_time = py_time.time()\nsharpened_array = sharpen_image_cython(image_array)\ncython_time = py_time.time() - start_time\n\n# Convert back to an image for displaying\nsharpened_image = Image.fromarray(sharpened_array)\n\n# Display the original and sharpened image\nplt.figure(figsize=(12, 6))\nplt.subplot(1, 2, 1)\nplt.imshow(image)\nplt.title(&quot;Original Image&quot;)\n\nplt.subplot(1, 2, 2)\nplt.imshow(sharpened_image)\nplt.title(&quot;Sharpened Image&quot;)\n\nplt.show()\n\n# Print the time taken for Cython processing\nprint(f&quot;Processing time with Cython: {cython_time:.4f} seconds&quot;)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The output is,<\/p>\n\n\n\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/07\/mc5-1024x536.png\" alt=\"\" class=\"wp-image-607664\"\/><figcaption class=\"wp-element-caption\">Image BY Author<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Both programs performed well, but Cython was nearly 25 times faster.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What about running Cython outside a Notebook environment?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">So far, everything I&#8217;ve shown you assumes you&#8217;re running your code inside a Jupyter Notebook. The reason I did this is that it&#8217;s the easiest way to introduce Cython and get some code up and running quickly. While the Notebook environment is extremely popular among Python developers, a huge amount of Python code is still contained in regular .py files and run from a terminal using the Python command. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If that&#8217;s your primary mode of coding and running Python scripts, the <strong>%load_ext<\/strong> and <strong>%%cython<\/strong> IPython magic commands won&#8217;t work since those are only understood by Jupyter\/IPython.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, here&#8217;s how to adapt my four-step Cython conversion process if you&#8217;re running your code as a regular Python script. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s take my first <strong>sum_of_squares<\/strong> example to showcase this.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">1\/ Create a .pyx file instead of using %%cython<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Move your Cython-enhanced code into a file named, for example:-<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">sum_of_squares.pyx<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># sun_of_squares.pyx\ndef fast_sum_of_squares(int n):\n    cdef int total = 0\n    cdef int i, j\n    for i in range(n):\n        for j in range(n):\n            total += i * i + j * j\n    return total<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">All we did was remove the %%cython directive and the timing code (which will now be in the calling function)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2\/ Create a setup.py file to compile your .pyx file<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># setup.py\nfrom setuptools import setup\nfrom Cython.Build import cythonize\n\nsetup(\n    name=&quot;cython-test&quot;,\n    ext_modules=cythonize(&quot;sum_of_squares.pyx&quot;, language_level=3),\n    py_modules=[&quot;sum_of_squares&quot;],  # Explicitly state the module\n    zip_safe=False,\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">3\/ Run the setup.py file using this command,<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-bash\">$ python setup.py build_ext --inplace\nrunning build_ext\ncopying build\/lib.linux-x86_64-cpython-311\/sum_of_squares.cpython-311-x86_64-linux-g\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><br>4\/ Create a regular Python module to call our Cython code, as shown below, and then run it.<\/p>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># main.py\nimport time, timeit\nfrom sum_of_squares import fast_sum_of_squares\n\nstart = time.time()\nresult = fast_sum_of_squares(20000)\n\nprint(&quot;timeit:&quot;, timeit.timeit(\n        lambda: fast_sum_of_squares(20000),\n        number=1))<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">$ python main.py\n\ntimeit: 0.14675087109208107<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Hopefully, I\u2019ve convinced you of the efficacy of using the Cython library in your code. Although it might seem a bit complicated at first sight, with a little effort, you can get incredible performance enhancements to your run times over using regular Python, even when using fast numerical libraries such as NumPy.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I provided a four-step process to convert your regular Python code to use Cython for running within Jupyter Notebook environments. Additionally, I explained the steps required to run Cython code from the command line outside a Notebook environment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, I reinforced the above by showcasing examples of converting regular Python code to use Cython.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the three examples I showed, we achieved gains of 80x, 10x and 25x speed-ups, which is not too shabby at all. <\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">As promised, here is a <a href=\"https:\/\/towardsdatascience.com\/numexpr-the-faster-than-numpy-library-that-no-ones-heard-of\/\" data-type=\"link\" data-id=\"https:\/\/towardsdatascience.com\/numexpr-the-faster-than-numpy-library-that-no-ones-heard-of\/\">link<\/a> to my previous TDS article on utilising the numexpr library to accelerate Python code.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A four-step plan for C language speed where it matters most<\/p>\n","protected":false},"author":18,"featured_media":606520,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"is_member_only":false,"sub_heading":"A four-step plan for C language speed where it matters most","footnotes":""},"categories":[25],"tags":[7871,689,448,468,467],"sponsor":[],"coauthors":[31928],"class_list":["post-606519","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-programming","tag-code-optimization","tag-cython","tag-data-science","tag-deep-dives","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Run Your Python Code up to 80x Faster Using the Cython Library | Towards Data Science<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Run Your Python Code up to 80x Faster Using the Cython Library | Towards Data Science\" \/>\n<meta property=\"og:description\" content=\"A four-step plan for C language speed where it matters most\" \/>\n<meta property=\"og:url\" content=\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/\" \/>\n<meta property=\"og:site_name\" content=\"Towards Data Science\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-08T05:14:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-08T05:14:35+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541-1024x527.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"527\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Thomas Reid\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:site\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Thomas Reid\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/\"},\"author\":{\"name\":\"TDS Editors\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\"},\"headline\":\"Run Your Python Code up to 80x Faster Using the Cython Library\",\"datePublished\":\"2025-07-08T05:14:05+00:00\",\"dateModified\":\"2025-07-08T05:14:35+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/\"},\"wordCount\":1728,\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png\",\"keywords\":[\"Code Optimization\",\"Cython\",\"Data Science\",\"Deep Dives\",\"Python\"],\"articleSection\":[\"Programming\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/\",\"url\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/\",\"name\":\"Run Your Python Code up to 80x Faster Using the Cython Library | Towards Data Science\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png\",\"datePublished\":\"2025-07-08T05:14:05+00:00\",\"dateModified\":\"2025-07-08T05:14:35+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png\",\"width\":2159,\"height\":1111,\"caption\":\"Image by AI (GPT-4o)\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/towardsdatascience.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Run Your Python Code up to 80x Faster Using the Cython Library\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/towardsdatascience.com\/#website\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"name\":\"Towards Data Science\",\"description\":\"Publish AI, ML &amp; data-science insights to a global community of data professionals.\",\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"alternateName\":\"TDS\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/towardsdatascience.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/towardsdatascience.com\/#organization\",\"name\":\"Towards Data Science\",\"alternateName\":\"TDS\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"width\":696,\"height\":696,\"caption\":\"Towards Data Science\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/TDataScience\",\"https:\/\/www.youtube.com\/c\/TowardsDataScience\",\"https:\/\/www.linkedin.com\/company\/towards-data-science\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\",\"name\":\"TDS Editors\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"caption\":\"TDS Editors\"},\"description\":\"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds\",\"url\":\"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Run Your Python Code up to 80x Faster Using the Cython Library | Towards Data Science","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/","og_locale":"en_US","og_type":"article","og_title":"Run Your Python Code up to 80x Faster Using the Cython Library | Towards Data Science","og_description":"A four-step plan for C language speed where it matters most","og_url":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/","og_site_name":"Towards Data Science","article_published_time":"2025-07-08T05:14:05+00:00","article_modified_time":"2025-07-08T05:14:35+00:00","og_image":[{"width":1024,"height":527,"url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541-1024x527.png","type":"image\/png"}],"author":"Thomas Reid","twitter_card":"summary_large_image","twitter_creator":"@TDataScience","twitter_site":"@TDataScience","twitter_misc":{"Written by":"Thomas Reid","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#article","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/"},"author":{"name":"TDS Editors","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee"},"headline":"Run Your Python Code up to 80x Faster Using the Cython Library","datePublished":"2025-07-08T05:14:05+00:00","dateModified":"2025-07-08T05:14:35+00:00","mainEntityOfPage":{"@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/"},"wordCount":1728,"publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"image":{"@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png","keywords":["Code Optimization","Cython","Data Science","Deep Dives","Python"],"articleSection":["Programming"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/","url":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/","name":"Run Your Python Code up to 80x Faster Using the Cython Library | Towards Data Science","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage"},"image":{"@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png","datePublished":"2025-07-08T05:14:05+00:00","dateModified":"2025-07-08T05:14:35+00:00","breadcrumb":{"@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#primaryimage","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/07\/Untitled-design-3-fotor-20250707164541.png","width":2159,"height":1111,"caption":"Image by AI (GPT-4o)"},{"@type":"BreadcrumbList","@id":"https:\/\/towardsdatascience.com\/run-your-python-code-up-to-80x-faster-using-the-cython-library\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/towardsdatascience.com\/"},{"@type":"ListItem","position":2,"name":"Run Your Python Code up to 80x Faster Using the Cython Library"}]},{"@type":"WebSite","@id":"https:\/\/towardsdatascience.com\/#website","url":"https:\/\/towardsdatascience.com\/","name":"Towards Data Science","description":"Publish AI, ML &amp; data-science insights to a global community of data professionals.","publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"alternateName":"TDS","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/towardsdatascience.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/towardsdatascience.com\/#organization","name":"Towards Data Science","alternateName":"TDS","url":"https:\/\/towardsdatascience.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","width":696,"height":696,"caption":"Towards Data Science"},"image":{"@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/TDataScience","https:\/\/www.youtube.com\/c\/TowardsDataScience","https:\/\/www.linkedin.com\/company\/towards-data-science\/"]},{"@type":"Person","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee","name":"TDS Editors","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac","url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","caption":"TDS Editors"},"description":"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds","url":"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"TDS Contributor Portal","distributor_original_site_url":"https:\/\/contributor.insightmediagroup.io","push-errors":false,"_links":{"self":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/606519","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/comments?post=606519"}],"version-history":[{"count":0,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/606519\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media\/606520"}],"wp:attachment":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media?parent=606519"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/categories?post=606519"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/tags?post=606519"},{"taxonomy":"sponsor","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/sponsor?post=606519"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/coauthors?post=606519"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}