{"id":8684,"date":"2019-03-03T17:04:11","date_gmt":"2019-03-03T17:04:11","guid":{"rendered":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/"},"modified":"2025-01-09T02:46:11","modified_gmt":"2025-01-09T02:46:11","slug":"understanding-the-roc-and-auc-curves-a05b68550b69","status":"publish","type":"post","link":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/","title":{"rendered":"Understanding the ROC and AUC metrics."},"content":{"rendered":"<h1 class=\"wp-block-heading\">Simplifying the ROC and AUC metrics.<\/h1>\n<p class=\"wp-block-paragraph\">.<\/p>\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&quot;The definition of genius is taking the complex and making it simple.&quot;\u2015 Albert Einstein<\/p><\/blockquote>\n<p class=\"wp-block-paragraph\">ROC and AUC curves are important evaluation metrics for calculating the performance of any classification model. These definitions and jargons are pretty common in the Machine learning community and are encountered by each one of us when we start to learn about classification models. However, most of the times they are not completely understood or rather misunderstood and their real essence cannot be utilized. Under the hood, these are very simple calculation parameters which just needs a little demystification.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The concept of ROC and AUC builds upon the knowledge of Confusion Matrix, Specificity and Sensitivity. Also, the example that I will use in this article is based on Logisitic Regression algorithm, however, it is important to keep in mind that the concept of ROC and AUC can apply to more than just Logistic Regression.<\/p><\/blockquote>\n<h2 class=\"wp-block-heading\">Reference<\/h2>\n<p class=\"wp-block-paragraph\"><em>The article is an adaptation of this <a href=\"https:\/\/www.youtube.com\/watch?v=xugjARegisk\">excellent video<\/a> by Josh Starmer on ROC and AUC. I&#8217;ll recommend you to watch this video for more clarity. Josh also has many other videos on various statistics and Machine Learning concepts.<\/em><\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p class=\"wp-block-paragraph\">Consider a <strong>hypothetical example<\/strong> containing a group of people. The y-axis has two categories i.e <strong><code>Has Heart Disease<\/code><\/strong> represented by red people and <strong><code>does not have Heart Disease<\/code><\/strong> represented by green circles. <strong>A<\/strong>long the x-axis, we have <strong>cholesterol<\/strong> levels and the classifier tries to classify people into two categories depending upon their cholesterol levels.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fcfbfa\" data-has-transparency=\"true\" style=\"--dominant-color: #fcfbfa;\" loading=\"lazy\" decoding=\"async\" width=\"605\" height=\"478\" class=\"wp-image-330790 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1XZQyEE_Bt3wjyRVZ5ECRgQ.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1XZQyEE_Bt3wjyRVZ5ECRgQ.png 605w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1XZQyEE_Bt3wjyRVZ5ECRgQ-300x237.png 300w\" sizes=\"auto, (max-width: 605px) 100vw, 605px\" \/><\/figure>\n<h3 class=\"wp-block-heading\">Something to notice:<\/h3>\n<ul class=\"wp-block-list\">\n<li>Circled Green person has a high level of cholesterol but does not have heart disease. This may be due to the reason that now the person is observing a better lifestyle and exercising regularly.<\/li>\n<li>Circled Red person has low cholesterol levels still had a heart attack. This may be due to the reason that he has other heart-related issues.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><em>This is a hypothetical example so the reasons are also hypothetical<\/em> \ud83d\ude03<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">Logistic Regression<\/h2>\n<p class=\"wp-block-paragraph\">Now if we fit a Logistic Regression curve to the data, the Y-axis will be converted to the <strong>Probability<\/strong> of a person having a heart disease based on the Cholesterol levels.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fbfafa\" data-has-transparency=\"true\" style=\"--dominant-color: #fbfafa;\" loading=\"lazy\" decoding=\"async\" width=\"607\" height=\"434\" class=\"wp-image-330791 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1Vc64m2-C5cSN1OvUylMqTw.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1Vc64m2-C5cSN1OvUylMqTw.png 607w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1Vc64m2-C5cSN1OvUylMqTw-300x214.png 300w\" sizes=\"auto, (max-width: 607px) 100vw, 607px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">The white dot represents a person having a lower heart disease probability than the person represented by the black dot.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fbfbfc\" data-has-transparency=\"true\" style=\"--dominant-color: #fbfbfc;\" loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"428\" class=\"wp-image-330792 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1aKlVdIM4iafrjDRCpQ5pkQ.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1aKlVdIM4iafrjDRCpQ5pkQ.png 600w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1aKlVdIM4iafrjDRCpQ5pkQ-300x214.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">However, if we want to classify the people in the two categories, we need a way to turn probabilities into classifications. One way is to set a threshold at 0.5. Next, classify the people who have a probability of heart disease &gt; 0.5 as &quot;<strong>having a heart disease<\/strong>&quot; and classify the people who have a probability of heart disease &lt; 0.5 as &quot; <strong>not having a heart disease<\/strong>&quot;.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafafa\" data-has-transparency=\"true\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"658\" height=\"429\" class=\"wp-image-330793 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1GJLE_YxjlSA5o_TJ9B_h5A.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1GJLE_YxjlSA5o_TJ9B_h5A.png 658w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1GJLE_YxjlSA5o_TJ9B_h5A-300x196.png 300w\" sizes=\"auto, (max-width: 658px) 100vw, 658px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Let us now evaluate the effectiveness of this logistic regression with the classification threshold set to 0.5, with some new people about whom we already know if they have heart disease or not.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fbfbfa\" data-has-transparency=\"true\" style=\"--dominant-color: #fbfbfa;\" loading=\"lazy\" decoding=\"async\" width=\"924\" height=\"459\" class=\"wp-image-330794 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1KPXQ6VBc1foXXWgkn1q4BA.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1KPXQ6VBc1foXXWgkn1q4BA.png 924w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1KPXQ6VBc1foXXWgkn1q4BA-300x149.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1KPXQ6VBc1foXXWgkn1q4BA-768x382.png 768w\" sizes=\"auto, (max-width: 924px) 100vw, 924px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Our Logistic Regression model correctly classifies all people except the persons 1 and 2.<\/p>\n<ul class=\"wp-block-list\">\n<li>We know Person 1 has heart disease but our model classifies it as otherwise.<\/li>\n<li>We also know person 2 doesn&#8217;t have heart disease but again our model classifies it incorrectly.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">Confusion Matrix<\/h2>\n<p class=\"wp-block-paragraph\">Let&#8217;s create a Confusion Matrix to summarize the classifications.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"edf0e0\" data-has-transparency=\"true\" style=\"--dominant-color: #edf0e0;\" loading=\"lazy\" decoding=\"async\" width=\"650\" height=\"296\" class=\"wp-image-330795 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1eiBAP1gwAARCxLi6hGEfEg.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1eiBAP1gwAARCxLi6hGEfEg.png 650w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1eiBAP1gwAARCxLi6hGEfEg-300x137.png 300w\" sizes=\"auto, (max-width: 650px) 100vw, 650px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Once the confusion matrix is filled in, we can calculate the <strong>Sensitivity<\/strong> and the <strong>Specificity<\/strong> to evaluate this logistic regression at 0.5 threshold.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">Specificity and Sensitivity<\/h2>\n<p class=\"wp-block-paragraph\">In the above confusion matrix, let&#8217;s replace the numbers with what they actually represent.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"ebeede\" data-has-transparency=\"true\" style=\"--dominant-color: #ebeede;\" loading=\"lazy\" decoding=\"async\" width=\"610\" height=\"315\" class=\"wp-image-330796 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1ZRU18eG-F5Sjtcph9ru8Og.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1ZRU18eG-F5Sjtcph9ru8Og.png 610w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1ZRU18eG-F5Sjtcph9ru8Og-300x155.png 300w\" sizes=\"auto, (max-width: 610px) 100vw, 610px\" \/><\/figure>\n<ul class=\"wp-block-list\">\n<li><strong>True Positives (TP):<\/strong> People who <em>had heart disease<\/em> and were also predicted to have heart disease.<\/li>\n<li><strong>True negatives (TN):<\/strong> People who <em>did not<\/em> <em>have heart disease<\/em> and were also predicted to not have heart disease.<\/li>\n<li><strong>False negatives (FN):<\/strong> People who have heart disease but the prediction says they don&#8217;t.<\/li>\n<li><strong>False positives (FP):<\/strong> People who <em>did not<\/em> <em>have heart disease<\/em> but the prediction says they do.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">We can now calculate two useful metrics based upon the confusion matrix:<\/p>\n<h3 class=\"wp-block-heading\">Sensitivity<\/h3>\n<p class=\"wp-block-paragraph\">Sensitivity tells us what percentage of people <em>with<\/em> heart disease were actually correctly identified.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"e7e7e7\" data-has-transparency=\"false\" style=\"--dominant-color: #e7e7e7;\" loading=\"lazy\" decoding=\"async\" width=\"535\" height=\"53\" class=\"wp-image-330797 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1n8fn2ozINr0f60FrD5ao0A.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1n8fn2ozINr0f60FrD5ao0A.png 535w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1n8fn2ozINr0f60FrD5ao0A-300x30.png 300w\" sizes=\"auto, (max-width: 535px) 100vw, 535px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">This turns out to be: 3\/3+1 = 0.75<\/p>\n<p class=\"wp-block-paragraph\">This tells us that <strong>75%<\/strong> of people with heart disease were correctly identified by our model.<\/p>\n<h3 class=\"wp-block-heading\">Specificity<\/h3>\n<p class=\"wp-block-paragraph\">Specificity tells us what percentage of people <em>without<\/em> heart disease were actually correctly identified.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"e9e9e9\" data-has-transparency=\"false\" style=\"--dominant-color: #e9e9e9;\" loading=\"lazy\" decoding=\"async\" width=\"535\" height=\"57\" class=\"wp-image-330798 not-transparent\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/14CB_ueBhjQ8v7lIoLgVS8A.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/14CB_ueBhjQ8v7lIoLgVS8A.png 535w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/14CB_ueBhjQ8v7lIoLgVS8A-300x32.png 300w\" sizes=\"auto, (max-width: 535px) 100vw, 535px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">This turns out to be: 3\/3+1 = 0.75<\/p>\n<p class=\"wp-block-paragraph\">This tells us that again <strong>75%<\/strong> of people <em>without heart disease<\/em> were correctly identified by our model.<\/p>\n<p class=\"wp-block-paragraph\"><em><strong>If correctly identifying positives is important for us, then we should choose a model with higher Sensitivity. However, if correctly identifying negatives is more important, then we should choose specificity as the measurement metric.<\/strong><\/em><\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">Identifying the Correct Thresholds<\/h2>\n<p class=\"wp-block-paragraph\">Now, let&#8217;s talk about what happens when we use a different threshold for deciding if a person has heart disease or not.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Setting the Threshold to 0.1<\/strong><\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">This would correctly identify all people who have heart disease. The person labeled 1 is also correctly classified to be a heart patient.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fbfbfa\" data-has-transparency=\"true\" style=\"--dominant-color: #fbfbfa;\" loading=\"lazy\" decoding=\"async\" width=\"880\" height=\"481\" class=\"wp-image-330799 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1FdWxUcsx-sBvAvpn5sTtzg.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1FdWxUcsx-sBvAvpn5sTtzg.png 880w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1FdWxUcsx-sBvAvpn5sTtzg-300x164.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1FdWxUcsx-sBvAvpn5sTtzg-768x420.png 768w\" sizes=\"auto, (max-width: 880px) 100vw, 880px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">However, it would also increase the number of False Positives since now person 2 and 3 will be wrongly classified as having heart disease.<\/p>\n<p class=\"wp-block-paragraph\">Therefore a lower threshold:<\/p>\n<ul class=\"wp-block-list\">\n<li>Increases the number of False Positives<\/li>\n<li>Decreases the number of False Negatives.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Recalculating the confusion matrix :<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"eef1e1\" data-has-transparency=\"true\" style=\"--dominant-color: #eef1e1;\" loading=\"lazy\" decoding=\"async\" width=\"609\" height=\"329\" class=\"wp-image-330800 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1ZREqKQH4xZxwa1L7o9ad9w.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1ZREqKQH4xZxwa1L7o9ad9w.png 609w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1ZREqKQH4xZxwa1L7o9ad9w-300x162.png 300w\" sizes=\"auto, (max-width: 609px) 100vw, 609px\" \/><\/figure>\n<p class=\"wp-block-paragraph\"><em>In this case, it becomes important to identify people having a heart disease correctly so that the corrective measures can be taken else heart disease can lead to serious complications. This means lowering the threshold is a good idea even if it results in more False Positive cases.<\/em><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Setting the Threshold to 0.9<\/strong><\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">This would now correctly identify all people who do not have heart disease. The person labeled, however, person 1, would be incorrectly classified having no heart disease.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fbfbfa\" data-has-transparency=\"true\" style=\"--dominant-color: #fbfbfa;\" loading=\"lazy\" decoding=\"async\" width=\"887\" height=\"472\" class=\"wp-image-330801 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/16EglCqnoHkVdux5hbElQ7w.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/16EglCqnoHkVdux5hbElQ7w.png 887w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/16EglCqnoHkVdux5hbElQ7w-300x160.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/16EglCqnoHkVdux5hbElQ7w-768x409.png 768w\" sizes=\"auto, (max-width: 887px) 100vw, 887px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Therefore a lower threshold:<\/p>\n<ul class=\"wp-block-list\">\n<li>Decreases the number of False Positives<\/li>\n<li>Increases the number of False Negatives.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Recalculating the confusion matrix :<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"ebefdb\" data-has-transparency=\"true\" style=\"--dominant-color: #ebefdb;\" loading=\"lazy\" decoding=\"async\" width=\"488\" height=\"339\" class=\"wp-image-330802 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1kDfLblYqkw5jZ_8DtYh4jg.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1kDfLblYqkw5jZ_8DtYh4jg.png 488w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1kDfLblYqkw5jZ_8DtYh4jg-300x208.png 300w\" sizes=\"auto, (max-width: 488px) 100vw, 488px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">The threshold could be set to any value between 0 and 1. So how do we determine which threshold is the best? Do we need to experiment with all the threshold values? Every threshold results in a different confusion matrix and a number of thresholds will result in a large number of confusion matrices which is not the best way to work.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">ROC Graphs<\/h2>\n<p class=\"wp-block-paragraph\">ROC(Receiver Operator Characteristic Curve) can help in deciding the best threshold value. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis).<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"efefef\" data-has-transparency=\"true\" style=\"--dominant-color: #efefef;\" loading=\"lazy\" decoding=\"async\" width=\"511\" height=\"77\" class=\"wp-image-330803 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1n9HBTyFPAJ6fhxGVpjxzOA.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1n9HBTyFPAJ6fhxGVpjxzOA.png 511w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1n9HBTyFPAJ6fhxGVpjxzOA-300x45.png 300w\" sizes=\"auto, (max-width: 511px) 100vw, 511px\" \/><\/figure>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"eeeeee\" data-has-transparency=\"true\" style=\"--dominant-color: #eeeeee;\" loading=\"lazy\" decoding=\"async\" width=\"565\" height=\"71\" class=\"wp-image-330804 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1iNh1Tn3b5vUk4bNrQ3Fbnw.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1iNh1Tn3b5vUk4bNrQ3Fbnw.png 565w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1iNh1Tn3b5vUk4bNrQ3Fbnw-300x38.png 300w\" sizes=\"auto, (max-width: 565px) 100vw, 565px\" \/><\/figure>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">True Positive Rate indicates what proportion of people &#8216;<strong>with heart diseas<\/strong>e&#8217; were correctly classified.<\/p>\n<p class=\"wp-block-paragraph\">False Positive Rate indicates the proportion of people classified as &#8216;<strong>not having heart disease<\/strong>&#8216;, that are False Positives.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">To get to know the ROC better, let&#8217;s draw one from scratch.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Threshold classifying all people as having heart disease.<\/strong><\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafaf9\" data-has-transparency=\"true\" style=\"--dominant-color: #fafaf9;\" loading=\"lazy\" decoding=\"async\" width=\"775\" height=\"399\" class=\"wp-image-330805 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1CN9qRMiAbtuEXuQZ-j_b0Q.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1CN9qRMiAbtuEXuQZ-j_b0Q.png 775w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1CN9qRMiAbtuEXuQZ-j_b0Q-300x154.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1CN9qRMiAbtuEXuQZ-j_b0Q-768x395.png 768w\" sizes=\"auto, (max-width: 775px) 100vw, 775px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">The confusion matrix will be:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f0f4ec\" data-has-transparency=\"true\" style=\"--dominant-color: #f0f4ec;\" loading=\"lazy\" decoding=\"async\" width=\"960\" height=\"261\" class=\"wp-image-330806 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15jN7IeLJ-582JlVm-g9y5Q.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15jN7IeLJ-582JlVm-g9y5Q.png 960w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15jN7IeLJ-582JlVm-g9y5Q-300x82.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15jN7IeLJ-582JlVm-g9y5Q-768x209.png 768w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">This means the True Positive Rate when the threshold is so low that every single person is classified as having heart disease, is 1. This means that every single person with heart disease was correctly classified.<\/p>\n<p class=\"wp-block-paragraph\">Also, the False Positive Rate when the threshold is so low that every single person is classified as having heart disease, is also 1. This means that every single person without heart disease was wrongly classified.<\/p>\n<p class=\"wp-block-paragraph\">Plotting this point on the ROC graph:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafafa\" data-has-transparency=\"true\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"716\" height=\"511\" class=\"wp-image-330807 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1HTVotuY5L-cZjXGAW2A6_A.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1HTVotuY5L-cZjXGAW2A6_A.png 716w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1HTVotuY5L-cZjXGAW2A6_A-300x214.png 300w\" sizes=\"auto, (max-width: 716px) 100vw, 716px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Any point on the Blue Diagonal Lines means that the proportion of correctly classified samples is equal to the proportion of incorrectly classified samples.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Increasing the Threshold slightly so that only the two people with the least cholesterol value are below the threshold.<\/strong><\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafaf9\" data-has-transparency=\"true\" style=\"--dominant-color: #fafaf9;\" loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"384\" class=\"wp-image-330808 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1CqKEwDEVRWnyc5mIN0ggLg.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1CqKEwDEVRWnyc5mIN0ggLg.png 740w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1CqKEwDEVRWnyc5mIN0ggLg-300x156.png 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">The confusion matrix will be:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f3f5ec\" data-has-transparency=\"true\" style=\"--dominant-color: #f3f5ec;\" loading=\"lazy\" decoding=\"async\" width=\"1005\" height=\"280\" class=\"wp-image-330809 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1cgq1HxSmqCU2rFCLUArFDg.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1cgq1HxSmqCU2rFCLUArFDg.png 1005w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1cgq1HxSmqCU2rFCLUArFDg-300x84.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1cgq1HxSmqCU2rFCLUArFDg-768x214.png 768w\" sizes=\"auto, (max-width: 1005px) 100vw, 1005px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Let&#8217;s plot this point (0.5,1) on the ROC graph.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafafa\" data-has-transparency=\"true\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"730\" height=\"605\" class=\"wp-image-330810 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1s4tdBI_DQ7xxMmUQUu5wqw.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1s4tdBI_DQ7xxMmUQUu5wqw.png 730w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1s4tdBI_DQ7xxMmUQUu5wqw-300x249.png 300w\" sizes=\"auto, (max-width: 730px) 100vw, 730px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">This means this threshold is better than the previous one.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Now if go on increasing the threshold values, and reach a point where we get the following confusion matrix:<\/strong><\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f3f4ec\" data-has-transparency=\"true\" style=\"--dominant-color: #f3f4ec;\" loading=\"lazy\" decoding=\"async\" width=\"999\" height=\"279\" class=\"wp-image-330811 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1J0aWAfkm1BDPfkNCgS3TbA.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1J0aWAfkm1BDPfkNCgS3TbA.png 999w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1J0aWAfkm1BDPfkNCgS3TbA-300x84.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1J0aWAfkm1BDPfkNCgS3TbA-768x214.png 768w\" sizes=\"auto, (max-width: 999px) 100vw, 999px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Let&#8217;s plot this point (0,0.75) on the ROC graph.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"f9f9f9\" data-has-transparency=\"true\" style=\"--dominant-color: #f9f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"693\" height=\"437\" class=\"wp-image-330812 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/16CxyxXxn9KhX_u-ZqxjbAA.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/16CxyxXxn9KhX_u-ZqxjbAA.png 693w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/16CxyxXxn9KhX_u-ZqxjbAA-300x189.png 300w\" sizes=\"auto, (max-width: 693px) 100vw, 693px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">By far this is the best threshold that we have got since it predicted no false positives.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Lastly, we choose a threshold where we classify all people as not having a heart disease i.e Threshold of 1.<\/strong><\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">The graph, in this case, would be at (0,0):<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fafafa\" data-has-transparency=\"true\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"796\" height=\"496\" class=\"wp-image-330813 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15-1RHdhFO4TniI-U7-nKeA.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15-1RHdhFO4TniI-U7-nKeA.png 796w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15-1RHdhFO4TniI-U7-nKeA-300x187.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/15-1RHdhFO4TniI-U7-nKeA-768x479.png 768w\" sizes=\"auto, (max-width: 796px) 100vw, 796px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">We can then connect the dots which gives us a ROC graph. The ROC graph summarises the confusion matrices produced for each threshold without having to actually calculate them.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"fbfafb\" data-has-transparency=\"true\" style=\"--dominant-color: #fbfafb;\" loading=\"lazy\" decoding=\"async\" width=\"787\" height=\"474\" class=\"wp-image-330814 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1-78-LvCmqxPyAmBIGJaoLw.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1-78-LvCmqxPyAmBIGJaoLw.png 787w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1-78-LvCmqxPyAmBIGJaoLw-300x181.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1-78-LvCmqxPyAmBIGJaoLw-768x463.png 768w\" sizes=\"auto, (max-width: 787px) 100vw, 787px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">Just by glancing over the graph, we can conclude that threshold C is better than threshold B and depending on how many False Positives that we are willing to accept, we can choose the optimal threshold.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">AUC<\/h2>\n<p class=\"wp-block-paragraph\">AUC stands for <strong>Area under the curve<\/strong>. AUC gives the rate of successful classification by the logistic model. The AUC makes it easy to compare the ROC curve of one model to another.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-dominant-color=\"e5dfeb\" data-has-transparency=\"true\" style=\"--dominant-color: #e5dfeb;\" loading=\"lazy\" decoding=\"async\" width=\"713\" height=\"471\" class=\"wp-image-330815 has-transparency\" src=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1U3KzhfUE4FLy4CL_U-ygpg.png\" alt=\"\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1U3KzhfUE4FLy4CL_U-ygpg.png 713w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/1U3KzhfUE4FLy4CL_U-ygpg-300x198.png 300w\" sizes=\"auto, (max-width: 713px) 100vw, 713px\" \/><\/figure>\n<p class=\"wp-block-paragraph\">The <strong>AUC<\/strong> for the red <strong>ROC<\/strong> curve is greater than the <strong>AUC<\/strong> for the blue <strong>RO<\/strong>C curve. This means that the Red curve is better. If the Red ROC curve was generated by say, a Random Forest and the Blue ROC by Logistic Regression we could conclude that the Random classifier did a better job in classifying the patients.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">AUC and ROC are important evaluation metrics for calculating the performance of any classification model&#8217;s performance. Therefore getting to know how they are calculated is as essential as using them. Hopefully, next time when you encounter these terms, you will be able to explain them easily in the context of your problem.<\/p>","protected":false},"excerpt":{"rendered":"<p>Taking the confusion out of classification metrics<\/p>\n","protected":false},"author":18,"featured_media":8685,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"is_member_only":true,"sub_heading":"Taking the confusion out of classification metrics","footnotes":""},"categories":[44,22,14668],"tags":[749,448,446,784,459],"sponsor":[],"coauthors":[30586],"class_list":["post-8684","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-machine-learning","category-statistics","tag-classification","tag-data-science","tag-machine-learning","tag-metrics","tag-statistics"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Understanding the ROC and AUC metrics. | Towards Data Science<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Understanding the ROC and AUC metrics. | Towards Data Science\" \/>\n<meta property=\"og:description\" content=\"Taking the confusion out of classification metrics\" \/>\n<meta property=\"og:url\" content=\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/\" \/>\n<meta property=\"og:site_name\" content=\"Towards Data Science\" \/>\n<meta property=\"article:published_time\" content=\"2019-03-03T17:04:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-09T02:46:11+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1707\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Parul Pandey\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:site\" content=\"@TDataScience\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Parul Pandey\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/\"},\"author\":{\"name\":\"TDS Editors\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\"},\"headline\":\"Understanding the ROC and AUC metrics.\",\"datePublished\":\"2019-03-03T17:04:11+00:00\",\"dateModified\":\"2025-01-09T02:46:11+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/\"},\"wordCount\":1523,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg\",\"keywords\":[\"Classification\",\"Data Science\",\"Machine Learning\",\"Metrics\",\"Statistics\"],\"articleSection\":[\"Data Science\",\"Machine Learning\",\"Statistics\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/\",\"url\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/\",\"name\":\"Understanding the ROC and AUC metrics. | Towards Data Science\",\"isPartOf\":{\"@id\":\"https:\/\/towardsdatascience.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg\",\"datePublished\":\"2019-03-03T17:04:11+00:00\",\"dateModified\":\"2025-01-09T02:46:11+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg\",\"width\":2560,\"height\":1707,\"caption\":\"Photo by Daniele Levis Pelusi on Unsplash\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/towardsdatascience.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Understanding the ROC and AUC metrics.\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/towardsdatascience.com\/#website\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"name\":\"Towards Data Science\",\"description\":\"Publish AI, ML &amp; data-science insights to a global community of data professionals.\",\"publisher\":{\"@id\":\"https:\/\/towardsdatascience.com\/#organization\"},\"alternateName\":\"TDS\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/towardsdatascience.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/towardsdatascience.com\/#organization\",\"name\":\"Towards Data Science\",\"alternateName\":\"TDS\",\"url\":\"https:\/\/towardsdatascience.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"contentUrl\":\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg\",\"width\":696,\"height\":696,\"caption\":\"Towards Data Science\"},\"image\":{\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/TDataScience\",\"https:\/\/www.youtube.com\/c\/TowardsDataScience\",\"https:\/\/www.linkedin.com\/company\/towards-data-science\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee\",\"name\":\"TDS Editors\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g\",\"caption\":\"TDS Editors\"},\"description\":\"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds\",\"url\":\"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Understanding the ROC and AUC metrics. | Towards Data Science","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/","og_locale":"en_US","og_type":"article","og_title":"Understanding the ROC and AUC metrics. | Towards Data Science","og_description":"Taking the confusion out of classification metrics","og_url":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/","og_site_name":"Towards Data Science","article_published_time":"2019-03-03T17:04:11+00:00","article_modified_time":"2025-01-09T02:46:11+00:00","og_image":[{"width":2560,"height":1707,"url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg","type":"image\/jpeg"}],"author":"Parul Pandey","twitter_card":"summary_large_image","twitter_creator":"@TDataScience","twitter_site":"@TDataScience","twitter_misc":{"Written by":"Parul Pandey","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#article","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/"},"author":{"name":"TDS Editors","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee"},"headline":"Understanding the ROC and AUC metrics.","datePublished":"2019-03-03T17:04:11+00:00","dateModified":"2025-01-09T02:46:11+00:00","mainEntityOfPage":{"@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/"},"wordCount":1523,"commentCount":0,"publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"image":{"@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg","keywords":["Classification","Data Science","Machine Learning","Metrics","Statistics"],"articleSection":["Data Science","Machine Learning","Statistics"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/","url":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/","name":"Understanding the ROC and AUC metrics. | Towards Data Science","isPartOf":{"@id":"https:\/\/towardsdatascience.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage"},"image":{"@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage"},"thumbnailUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg","datePublished":"2019-03-03T17:04:11+00:00","dateModified":"2025-01-09T02:46:11+00:00","breadcrumb":{"@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#primaryimage","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2019\/03\/0qbwGe8zB45pruqm9-scaled.jpg","width":2560,"height":1707,"caption":"Photo by Daniele Levis Pelusi on Unsplash"},{"@type":"BreadcrumbList","@id":"https:\/\/towardsdatascience.com\/understanding-the-roc-and-auc-curves-a05b68550b69\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/towardsdatascience.com\/"},{"@type":"ListItem","position":2,"name":"Understanding the ROC and AUC metrics."}]},{"@type":"WebSite","@id":"https:\/\/towardsdatascience.com\/#website","url":"https:\/\/towardsdatascience.com\/","name":"Towards Data Science","description":"Publish AI, ML &amp; data-science insights to a global community of data professionals.","publisher":{"@id":"https:\/\/towardsdatascience.com\/#organization"},"alternateName":"TDS","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/towardsdatascience.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/towardsdatascience.com\/#organization","name":"Towards Data Science","alternateName":"TDS","url":"https:\/\/towardsdatascience.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/","url":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","contentUrl":"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/tds-logo.jpg","width":696,"height":696,"caption":"Towards Data Science"},"image":{"@id":"https:\/\/towardsdatascience.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/TDataScience","https:\/\/www.youtube.com\/c\/TowardsDataScience","https:\/\/www.linkedin.com\/company\/towards-data-science\/"]},{"@type":"Person","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/f9925d336b6fe962b03ad8281d90b8ee","name":"TDS Editors","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/towardsdatascience.com\/#\/schema\/person\/image\/23494c9101089ad44ae88ce9d2f56aac","url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","caption":"TDS Editors"},"description":"Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly\/write-for-tds","url":"https:\/\/towardsdatascience.com\/author\/towardsdatascience\/"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Towards Data Science","distributor_original_site_url":"https:\/\/towardsdatascience.com","push-errors":false,"_links":{"self":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/8684","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/comments?post=8684"}],"version-history":[{"count":0,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/posts\/8684\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media\/8685"}],"wp:attachment":[{"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/media?parent=8684"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/categories?post=8684"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/tags?post=8684"},{"taxonomy":"sponsor","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/sponsor?post=8684"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/towardsdatascience.com\/wp-json\/wp\/v2\/coauthors?post=8684"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}