{"id":4190,"date":"2024-08-10T21:09:41","date_gmt":"2024-08-10T21:09:41","guid":{"rendered":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/"},"modified":"2025-01-08T16:48:36","modified_gmt":"2025-01-08T16:48:36","slug":"introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0","status":"publish","type":"post","link":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/","title":{"rendered":"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics"},"content":{"rendered":"<h1 class=\"wp-block-heading\"><strong>Introduction to Support Vector Machines &#8211; Motivation and Basics<\/strong><\/h1>\n<h3 class=\"wp-block-heading\"><em>Learn basic concepts that make Support Vector Machine a powerful linear classifier<\/em><\/h3>\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n<p class=\"wp-block-paragraph\">In this post, you will learn about the basics of Support Vector Machines (SVM), which is a well-regarded supervised machine learning algorithm.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><em>This technique needs to be in everyone&#8217;s tool-bag especially people who aspire to be a data scientist one day.<\/em><\/p><\/blockquote>\n<p class=\"wp-block-paragraph\">Since there&#8217;s a lot to learn about, I&#8217;ll introduce SVM to you across two posts so that you can have a coffee break in between \ud83d\ude42<\/p>\n<h2 class=\"wp-block-heading\">Motivation<\/h2>\n<p class=\"wp-block-paragraph\">First, let us try to understand the motivation behind SVM in the context of a binary classification problem. In a binary classification problem, our data belong to two classes and we try to find a decision boundary that splits the data into those two classes while making minimum mistakes. Consider the diagram below which represents our (hypothetical) data on a 2-D plane. As we can see, the data is divided into two classes: Pluses and Stars.<\/p>\n<p class=\"wp-block-paragraph\">Note: For the sake of simplicity, we&#8217;ll only consider linearly separable data for now and learn about not linearly separable cases later on.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fefefd\" data-has-transparency=\"true\" style=\"--dominant-color: #fefefd;\" loading=\"lazy\" decoding=\"async\" width=\"618\" height=\"497\" class=\"wp-image-315794 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/08ejp6z8wjMoOkOAH.png\" alt=\"Figure 1: Data representation where we have data points from two distinct classes\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/08ejp6z8wjMoOkOAH.png 618w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/08ejp6z8wjMoOkOAH-300x241.png 300w\" sizes=\"auto, (max-width: 618px) 100vw, 618px\" \/><figcaption class=\"wp-element-caption\">Figure 1: Data representation where we have data points from two distinct classes<\/figcaption><\/figure>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The goal of SVM, like any classification algorithm, is to find a decision boundary that splits the data into two classes. However, there could be many possible decision boundaries to achieve this purpose as shown below. Which one should we consider?<\/p><\/blockquote>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fdfdfd\" data-has-transparency=\"true\" style=\"--dominant-color: #fdfdfd;\" loading=\"lazy\" decoding=\"async\" width=\"632\" height=\"498\" class=\"wp-image-315795 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0q0eirNHjRus3U-W-.png\" alt=\"Figure 2: What is the ideal decision boundary?\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0q0eirNHjRus3U-W-.png 632w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0q0eirNHjRus3U-W--300x236.png 300w\" sizes=\"auto, (max-width: 632px) 100vw, 632px\" \/><figcaption class=\"wp-element-caption\">Figure 2: What is the ideal decision boundary?<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The yellow and the black decision boundaries do not seem to be a good choice. Why, you ask? This is simply because they might not generalize well to new data points as each of them is awfully close to one of the classes. In this sense, the blue line seems to be a good candidate as it is far away from both classes. Hence, by extending this chain of thought, we can say that:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>an ideal decision boundary would be a line that is at a maximum distance from any data point.<\/p><\/blockquote>\n<p class=\"wp-block-paragraph\">So, if we think of the decision boundary as a road, we want that road to be as wide as possible. This is exactly what SVM aims to do.<\/p>\n<h2 class=\"wp-block-heading\">How it Works (Mathematically)<\/h2>\n<p class=\"wp-block-paragraph\">Now that we understand what SVM aims to do, our next step is to understand how it finds this decision boundary. So, let&#8217;s start from scratch with the help of the following diagram.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fdfdfd\" data-has-transparency=\"true\" style=\"--dominant-color: #fdfdfd;\" loading=\"lazy\" decoding=\"async\" width=\"615\" height=\"482\" class=\"wp-image-315796 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0-CEMfn_YKJvf_7__.png\" alt=\"Figure 3: Mathematic formulation to identify ideal decision boundary\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0-CEMfn_YKJvf_7__.png 615w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0-CEMfn_YKJvf_7__-300x235.png 300w\" sizes=\"auto, (max-width: 615px) 100vw, 615px\" \/><figcaption class=\"wp-element-caption\">Figure 3: Mathematic formulation to identify ideal decision boundary<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">First, we will derive the equation of the decision boundary in terms of the data points. To that end, let us suppose we already have a decision boundary (blue line in the above diagram) and two unknown points that we have to classify. We represent these points as vectors <code>u<\/code> and <code>v<\/code> in the 2-D space. We also introduce a vector <code>w<\/code> which we assume is perpendicular to the decision boundary. Now, we project <code>u<\/code> and <code>v<\/code> in the direction of <code>w<\/code> and check whether the projected vector is on the left or right side of the decision boundary based on some threshold <code>c<\/code>.<\/p>\n<p class=\"wp-block-paragraph\">Mathematically, we say that a data point <code>x<\/code> is on the right side of the decision boundary (that is, in the Star class) if <code>w.x \u2265 c<\/code> else it is in the plus class. This means that the equation of the hyperplane (line in case of 2-D) that separates two classes, in terms of an arbitrary data point <code>x<\/code> is the following:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f3f3f3\" data-has-transparency=\"false\" style=\"--dominant-color: #f3f3f3;\" loading=\"lazy\" decoding=\"async\" width=\"1114\" height=\"64\" class=\"wp-image-315797 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1PflOKK9evxd77TSsWhS-uQ.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1PflOKK9evxd77TSsWhS-uQ.png 1114w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1PflOKK9evxd77TSsWhS-uQ-300x17.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1PflOKK9evxd77TSsWhS-uQ-1024x59.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1PflOKK9evxd77TSsWhS-uQ-768x44.png 768w\" sizes=\"auto, (max-width: 1114px) 100vw, 1114px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Now we have the equation of our decision boundary but it is not yet immediately clear how it would help us in maximizing its distance from the data points of both the classes. To that end, we would employ a trick which goes as follows. Usually, in a binary classification problem, the labels of data samples are + 1 or -1. Thus, it would be more convenient for us if our decision rule (i.e. <code>w.x + b<\/code>) outputs quantity greater than or equal to +1 for all the data points belonging to star class and quantity less than or equal to -1 for all the data points belonging to the plus class.<\/p>\n<p class=\"wp-block-paragraph\">Mathematically, <code>x<\/code> should belong to class Star if <code>w.x + b \u2265 1<\/code> and <code>x<\/code> should belong to class Plus if <code>w.x + b \u2264 -1<\/code> or equivalently, we can write<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f7f7f7\" data-has-transparency=\"false\" style=\"--dominant-color: #f7f7f7;\" loading=\"lazy\" decoding=\"async\" width=\"1036\" height=\"80\" class=\"wp-image-315798 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/12DKOyJTiIpg6gP14tf7TJQ.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/12DKOyJTiIpg6gP14tf7TJQ.png 1036w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/12DKOyJTiIpg6gP14tf7TJQ-300x23.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/12DKOyJTiIpg6gP14tf7TJQ-1024x79.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/12DKOyJTiIpg6gP14tf7TJQ-768x59.png 768w\" sizes=\"auto, (max-width: 1036px) 100vw, 1036px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">for each point <code>x_i<\/code>, where we are considering <code>y_i<\/code> equal to -1 for the plus class and equal to +1 for the star class.<\/p>\n<p class=\"wp-block-paragraph\">These two rules correspond to the dotted lines in the following diagram and the decision boundary is parallel and at equal distance from both. As we can see, the points closest to the decision boundary (on either side) get to dictate its position. Now, since the decision boundary has to be at a maximum distance from the data points, we have to maximize the distance <code>d<\/code> between the dotted lines. By the way, these dotted lines are called support vectors.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fdfdfd\" data-has-transparency=\"true\" style=\"--dominant-color: #fdfdfd;\" loading=\"lazy\" decoding=\"async\" width=\"616\" height=\"496\" class=\"wp-image-4191 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png\" alt=\"Figure 4: Margin of a decision boundary\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png 616w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf-300x242.png 300w\" sizes=\"auto, (max-width: 616px) 100vw, 616px\" \/><figcaption class=\"wp-element-caption\">Figure 4: Margin of a decision boundary<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Now, let us denote the closest plus to the decision boundary as <code>x_-<\/code> and the closest star as <code>x_+<\/code>. Then, <code>d<\/code> is the length of the vector <code>x_+<\/code>\u2212<code>x_-<\/code> when projected along <code>w<\/code> direction (that is perpendicular to the decision boundary).<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fcfcfc\" data-has-transparency=\"true\" style=\"--dominant-color: #fcfcfc;\" loading=\"lazy\" decoding=\"async\" width=\"656\" height=\"533\" class=\"wp-image-315799 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/09qgrzRRh3V0iWPcW.png\" alt=\"Figure 5: Identifying the width of the margin\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/09qgrzRRh3V0iWPcW.png 656w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/09qgrzRRh3V0iWPcW-300x244.png 300w\" sizes=\"auto, (max-width: 656px) 100vw, 656px\" \/><figcaption class=\"wp-element-caption\">Figure 5: Identifying the width of the margin<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Mathematically, <code>d<\/code> could be written as:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f9f9f9\" data-has-transparency=\"false\" style=\"--dominant-color: #f9f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"1068\" height=\"90\" class=\"wp-image-315800 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1nf4v_DXZmkfp8jPUqMWmPw.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1nf4v_DXZmkfp8jPUqMWmPw.png 1068w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1nf4v_DXZmkfp8jPUqMWmPw-300x25.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1nf4v_DXZmkfp8jPUqMWmPw-1024x86.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1nf4v_DXZmkfp8jPUqMWmPw-768x65.png 768w\" sizes=\"auto, (max-width: 1068px) 100vw, 1068px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Since <code>x_+<\/code> and <code>x_-<\/code> are closest to the decision boundary and touch the dotted lines as mentioned earlier, they satisfy the following equations:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f9f9f9\" data-has-transparency=\"false\" style=\"--dominant-color: #f9f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"1030\" height=\"108\" class=\"wp-image-315801 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1f3KTtf85eufrZOWD09ll1Q.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1f3KTtf85eufrZOWD09ll1Q.png 1030w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1f3KTtf85eufrZOWD09ll1Q-300x31.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1f3KTtf85eufrZOWD09ll1Q-1024x107.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1f3KTtf85eufrZOWD09ll1Q-768x81.png 768w\" sizes=\"auto, (max-width: 1030px) 100vw, 1030px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Substituting <code>x_+ . w<\/code> and <code>x_- . w<\/code> in the equation of <code>d<\/code>, we get:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafafa\" data-has-transparency=\"false\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"1026\" height=\"80\" class=\"wp-image-315802 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1UBQ7XmA9shyQOk7KNKYG8Q.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1UBQ7XmA9shyQOk7KNKYG8Q.png 1026w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1UBQ7XmA9shyQOk7KNKYG8Q-300x23.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1UBQ7XmA9shyQOk7KNKYG8Q-1024x80.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1UBQ7XmA9shyQOk7KNKYG8Q-768x60.png 768w\" sizes=\"auto, (max-width: 1026px) 100vw, 1026px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Thus, if we have to maximize <code>d<\/code>, we can equivalently minimize <code>|w|<\/code> or minimize <code>1\/2.|w|^2<\/code> (this transformation is done for mathematical convenience). However, this optimization must be subjected to the constraint of correctly classifying all the data points. Hence, we&#8217;ll make use of Lagrange Multiplier here to enforce the constraint from equation <code>(A)<\/code>.<\/p>\n<p class=\"wp-block-paragraph\">Now, it is time to do some mathematics. Formally, our objective is to minimize the following objective function:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f5f5f5\" data-has-transparency=\"false\" style=\"--dominant-color: #f5f5f5;\" loading=\"lazy\" decoding=\"async\" width=\"1380\" height=\"88\" class=\"wp-image-315803 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/13fEEq22uKYJ3xfpknoAZPA.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/13fEEq22uKYJ3xfpknoAZPA.png 1380w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/13fEEq22uKYJ3xfpknoAZPA-300x19.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/13fEEq22uKYJ3xfpknoAZPA-1024x65.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/13fEEq22uKYJ3xfpknoAZPA-768x49.png 768w\" sizes=\"auto, (max-width: 1380px) 100vw, 1380px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Differentiating <code>L<\/code> with respect to <code>w<\/code>, we would obtain the optimal <code>w<\/code> as<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f9f9f9\" data-has-transparency=\"false\" style=\"--dominant-color: #f9f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"1176\" height=\"82\" class=\"wp-image-315804 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1hasFgPtJzn68chhrAgM2bA.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1hasFgPtJzn68chhrAgM2bA.png 1176w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1hasFgPtJzn68chhrAgM2bA-300x21.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1hasFgPtJzn68chhrAgM2bA-1024x71.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1hasFgPtJzn68chhrAgM2bA-768x54.png 768w\" sizes=\"auto, (max-width: 1176px) 100vw, 1176px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">The interesting thing to note here is that the decision vector <code>w<\/code> is a linear sum of the input vector (or data points) <code>x_i<\/code>s. The next step is to differentiate <code>L<\/code> with respect to <code>b<\/code> which would give us the following equality<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafafa\" data-has-transparency=\"false\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"1138\" height=\"78\" class=\"wp-image-315805 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1qhxLW4aP0gnUSQzK4SLeDg.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1qhxLW4aP0gnUSQzK4SLeDg.png 1138w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1qhxLW4aP0gnUSQzK4SLeDg-300x21.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1qhxLW4aP0gnUSQzK4SLeDg-1024x70.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1qhxLW4aP0gnUSQzK4SLeDg-768x53.png 768w\" sizes=\"auto, (max-width: 1138px) 100vw, 1138px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Now, we will substitute <code>(E)<\/code> into <code>(D)<\/code> and use <code>(F)<\/code> to rearrange the objective function into the following:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f4f4f4\" data-has-transparency=\"false\" style=\"--dominant-color: #f4f4f4;\" loading=\"lazy\" decoding=\"async\" width=\"1300\" height=\"74\" class=\"wp-image-315806 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1Ms0PHAfzrCawyGqZK3J68w.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1Ms0PHAfzrCawyGqZK3J68w.png 1300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1Ms0PHAfzrCawyGqZK3J68w-300x17.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1Ms0PHAfzrCawyGqZK3J68w-1024x58.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/1Ms0PHAfzrCawyGqZK3J68w-768x44.png 768w\" sizes=\"auto, (max-width: 1300px) 100vw, 1300px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">If you look closely, you will notice that the optimization function now depends on the dot product of the input vectors (that is, the data points). This is a nice property to have for the reasons we will discuss later on. Also, this optimization function is convex so we would not get stuck in the local maxima.<\/p>\n<p class=\"wp-block-paragraph\">Now that we have everything, we could apply optimization routines like gradient descent to find the values of <code>\u03bb<\/code>s. I would encourage you to implement it and observe the obtained values of <code>\u03bb<\/code>s. Upon observing, you would notice that the value of <code>\u03bb<\/code> would be zero for all the points except the ones which are closest to the decision boundary on either side. This means that the points that are far away from the decision boundary don&#8217;t get a say in deciding where the decision boundary should be. All the importance (through non-zero <code>\u03bb<\/code>s) is assigned to the points closest to the boundary, which was our understanding all along.<\/p>\n<h2 class=\"wp-block-heading\">Does it work on more General Cases?<\/h2>\n<p class=\"wp-block-paragraph\">So, now you know all about what SVM aims at and how it goes about it. But, what about the following case where the data points are not linearly separable:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fefefd\" data-has-transparency=\"true\" style=\"--dominant-color: #fefefd;\" loading=\"lazy\" decoding=\"async\" width=\"662\" height=\"524\" class=\"wp-image-315807 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0BHtt93YmCdg_sCv-.png\" alt=\"Figure 6: Motivating example for linearly inseparable cases\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0BHtt93YmCdg_sCv-.png 662w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0BHtt93YmCdg_sCv--300x237.png 300w\" sizes=\"auto, (max-width: 662px) 100vw, 662px\" \/><figcaption class=\"wp-element-caption\">Figure 6: Motivating example for linearly inseparable cases<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">In this case, the SVM would get stuck in finding the optimal position of the decision boundary and we&#8217;ll get a poor result at the end of our optimization. Does this mean we can&#8217;t apply this technique anymore? The answer is, fortunately, no. For scenarios like these, we have two options:<\/p>\n<p class=\"wp-block-paragraph\">1 &#8211; We can allow our algorithm to make a certain number of mistakes so that other points can still be classified correctly. In this case, we&#8217;ll modify our objective function to do just that. This is called the soft margin formulation of SVM.<\/p>\n<p class=\"wp-block-paragraph\">2 &#8211; We can transform our data space into a higher dimension (say from 2d to 3d, but higher dimensions are also possible) in the hope that the points would be linearly separable in that space. We&#8217;ll use the &quot;kernel trick&quot; in this case, which would be computationally inexpensive because of the dependence of the objective function on the dot product of the input vectors.<\/p>\n<h2 class=\"wp-block-heading\">Concluding Remarks<\/h2>\n<p class=\"wp-block-paragraph\">We&#8217;ll learn about these two methods in the next post which is linked below. If you have any questions or suggestions, please let me know in the comments. Thanks for reading. Cheers! \ud83e\udd42<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><a href=\"https:\/\/towardsdatascience.com\/support-vector-machines-soft-margin-formulation-and-kernel-trick-4c9729dc8efe\"><strong>Support Vector Machines &#8211; Soft Margin Formulation and Kernel Trick<\/strong><\/a><\/p><\/blockquote>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">Feedback is a Gift<\/h2>\n<p class=\"wp-block-paragraph\">If you found the content informative and would like to learn more about Machine Learning and AI, please follow the profile to get notified about my future posts. Also, let me know in the comments if you have any questions or topic requests. All the images in the post are created by me.<\/p>\n<p class=\"wp-block-paragraph\">Feel free to visit my website to get in touch <a href=\"http:\/\/rishabhmisra.github.io\/\">http:\/\/rishabhmisra.github.io\/<\/a> &#8211; I frequently consult on real-world AI applications.<\/p>","protected":false},"excerpt":{"rendered":"<p>Learn basic concepts that make Support Vector Machine a powerful linear classifier<\/p>\n","protected":false},"author":18,"featured_media":4191,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"is_member_only":true,"sub_heading":"Learn basic concepts that make Support Vector Machine a powerful linear classifier","footnotes":""},"categories":[44,22],"tags":[463,749,448,446,750],"sponsor":[],"coauthors":[31018],"class_list":["post-4190","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-machine-learning","tag-ai","tag-classification","tag-data-science","tag-machine-learning","tag-svm-algorithm"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics | Towards Data Science<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics | Towards Data Science\" \/>\n<meta property=\"og:description\" content=\"Learn basic concepts that make Support Vector Machine a powerful linear classifier\" \/>\n<meta property=\"og:url\" content=\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/\" \/>\n<meta property=\"og:site_name\" content=\"Towards Data Science\" \/>\n<meta property=\"article:published_time\" content=\"2024-08-10T21:09:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-08T16:48:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png\" \/>\n\t<meta property=\"og:image:width\" content=\"616\" \/>\n\t<meta property=\"og:image:height\" content=\"496\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Rishabh Misra\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:site\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rishabh Misra\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/\"},\"author\":{\"name\":\"TDS Editors\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\"},\"headline\":\"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics\",\"datePublished\":\"2024-08-10T21:09:41+00:00\",\"dateModified\":\"2025-01-08T16:48:36+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/\"},\"wordCount\":1520,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png\",\"keywords\":[\"AI\",\"Classification\",\"Data Science\",\"Machine Learning\",\"Svm Algorithm\"],\"articleSection\":[\"Data Science\",\"Machine Learning\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/\",\"url\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/\",\"name\":\"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics | Towards Data Science\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png\",\"datePublished\":\"2024-08-10T21:09:41+00:00\",\"dateModified\":\"2025-01-08T16:48:36+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png\",\"width\":616,\"height\":496,\"caption\":\"SVM's classification technique in action\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/towardsdatascience.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/towardsdatascience.com\/#website\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"name\":\"Towards Data Science\",\"description\":\"Publish AI, ML &amp; data-science insights to a global community of data professionals.\",\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"alternateName\":\"TDS\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/towardsdatascience.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/towardsdatascience.com\/#organization\",\"name\":\"Towards Data Science\",\"alternateName\":\"TDS\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"width\":696,\"height\":696,\"caption\":\"Towards Data Science\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/TDataScience\",\"https:\/\/www.youtube.com\/c\/TowardsDataScience\",\"https:\/\/www.linkedin.com\/company\/towards-data-science\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\",\"name\":\"TDS Editors\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"caption\":\"TDS Editors\"},\"description\":\"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds\",\"url\":\"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics | Towards Data Science","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/","og_locale":"en_US","og_type":"article","og_title":"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics | Towards Data Science","og_description":"Learn basic concepts that make Support Vector Machine a powerful linear classifier","og_url":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/","og_site_name":"Towards Data Science","article_published_time":"2024-08-10T21:09:41+00:00","article_modified_time":"2025-01-08T16:48:36+00:00","og_image":[{"width":616,"height":496,"url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png","type":"image\/png"}],"author":"Rishabh Misra","twitter_card":"summary_large_image","twitter_creator":"@TDataScience","twitter_site":"@TDataScience","twitter_misc":{"Written by":"Rishabh Misra","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#article","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/"},"author":{"name":"TDS Editors","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee"},"headline":"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics","datePublished":"2024-08-10T21:09:41+00:00","dateModified":"2025-01-08T16:48:36+00:00","mainEntityOfPage":{"@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/"},"wordCount":1520,"commentCount":0,"publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"image":{"@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png","keywords":["AI","Classification","Data Science","Machine Learning","Svm Algorithm"],"articleSection":["Data Science","Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/","url":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/","name":"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics | Towards Data Science","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage"},"image":{"@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png","datePublished":"2024-08-10T21:09:41+00:00","dateModified":"2025-01-08T16:48:36+00:00","breadcrumb":{"@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#primaryimage","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2024\/08\/0zdY3L-nQhTu7kpVf.png","width":616,"height":496,"caption":"SVM's classification technique in action"},{"@type":"BreadcrumbList","@id":"https:\/\/towardsdatascience.com\/introduction-to-support-vector-machines-motivation-and-basics-920e4c1e22e0\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/towardsdatascience.com\/"},{"@type":"ListItem","position":2,"name":"Introduction to Support Vector Machines\u200a-\u200aMotivation and Basics"}]},{"@type":"WebSite","@id":"https:\/\/towardsdatascience.com\/#website","url":"https:\/\/towardsdatascience.com\/","name":"Towards Data Science","description":"Publish AI, ML &amp; data-science insights to a global community of data professionals.","publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"alternateName":"TDS","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/towardsdatascience.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/towardsdatascience.com\/#organization","name":"Towards Data Science","alternateName":"TDS","url":"https:\/\/towardsdatascience.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","width":696,"height":696,"caption":"Towards Data Science"},"image":{"@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/TDataScience","https:\/\/www.youtube.com\/c\/TowardsDataScience","https:\/\/www.linkedin.com\/company\/towards-data-science\/"]},{"@type":"Person","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee","name":"TDS Editors","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac","url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","caption":"TDS Editors"},"description":"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds","url":"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Towards Data Science","distributor_original_site_url":"https:\/\/towardsdatascience.com","push-errors":false,"_links":{"self":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/4190","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/comments?post=4190"}],"version-history":[{"count":0,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/4190\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media\/4191"}],"wp:attachment":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media?parent=4190"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/categories?post=4190"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/tags?post=4190"},{"taxonomy":"sponsor","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/sponsor?post=4190"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/coauthors?post=4190"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}