Publish AI, ML & data-science insights to a global community of data professionals.

What are the odds of getting a parking ticket in Toronto?

An exploratory data analysis and a simple statistical model

Photo by Michael Fousert on Unsplash
Photo by Michael Fousert on Unsplash

Performing exploratory data analysis (EDA) and building statistical models are essential skills for data scientists. In this post, I will explore the dataset of Toronto parking tickets. I will analyze and visualize patterns from the time and location perspectives. In the second half of the post, I will present a simple model to evaluate the chance of getting a parking ticket at a specific time and location.

This post aims to show junior data scientists how to explore a dataset and apply Bayesian thinking. It also introduces some interesting insights regarding the parking tickets issued in Toronto.

The public data I used is available at Toronto Open Data. The dataset description states, "Approximately 2.8 million parking tickets are issued annually across the City of Toronto. This dataset contains non-identifiable information relating to each parking ticket issued for each calendar year".


Data overview and preprocessing.

Below are five transposed rows of ticket records. The critical information we will use is the infraction’s time, location and type(description).

Example of five transposed rows
Example of five transposed rows

Some basic preprocessing has been applied to the raw data, such as merging date and time columns, removing outlier rows with the incorrect time, etc.

We can see an apparent "covid19 dip" around 2020 April if we plot the daily ticket number. We will only use the data from 2019 because 2020 data is not typical.

Daily Parking Tickets
Daily Parking Tickets

What time has the most tickets?

Let’s explore the data from the time perspective. Each infraction type has its distribution. So we will study the time distribution on an infraction type basis.

First, let’s create a heatmap with 7 X 24 cells for one week, count the number of tickets in each cell, and finally normalize the heatmap (the sum of all cells should be 1). Therefore, darker means a higher percentage, and lighter means a lower percentage.

Some interesting observations:

Normalized parking tickets distribution
Normalized parking tickets distribution

It’s doubtful getting a ‘PARK ON PRIVATE PROPERTY’ early morning (5 am to 7 am) or afternoon(3 pm to 6 pm).

Normalized parking tickets distribution
Normalized parking tickets distribution

There is almost no ‘PARK MACHINE-REQD FEE NOT PAID’ tickets happening at night (0 am to 7 am). I guess most machine parking lots are close to office and commercial buildings, where people barely park overnights. Also, some of those parking lots could be closed after the evening with a bar preventing entry. It’s reasonable that officers don’t work in those places overnights.

We can also treat the year-long ticket counts as time-series data and decompose them into several components. For instance, if we decompose ‘PARK – LONGER THAN 3 HOURS’ infraction data and look at the trend component, we can see that fewer tickets are issued in winter. From the seasonal part, we will see a weekly pattern. There are some significant dips (you can see them in the top line chart and the below residual error plot). They can be explained by Canada’s national holidays in summer and extreme weather days in winter.

After normalization, we can compare the top 5 infractions types because the ticket counts are on different scales for those types. I chose to normalize the value based on the ten days moving average.

Line chart to compare top 5 infraction types
Line chart to compare top 5 infraction types

Some observations:

Holidays and extreme weather days significantly decrease ‘PARK – LONGER THAN 3 HOURS’ and ‘PARK MACHINE-REQD FEE NOT PAID’ (green and purple lines) tickets. The takeaway is the city usually won’t give you tickets on those special days because of not paying the machines or parking longer.

Type ‘PARK-SIGNED HWY-PROHIBIT DY/TM’ (orange line) has several periodical peaks. The peaks imply the city may issue those tickets in batches, or the city may have enhanced periodical inspections on this type of infraction.


Where are the locations with the most tickets?

There is no geographic information in the dataset. Right now, we can only treat the locations as strings.

This list is the top 10 streets with the most tickets.

This list shows the top 10 locations with the most tickets.

Each top location has a unique pattern. For instance, most 2075 BAYVIEW AVE (a hospital) tickets are parking for private property from 10 am to 2 pm.

1 BRIMLEY RD S is a park. There are various reasons for issuing parking tickets there. But most of them are given in the afternoon and evening.


What’s the probability of getting a ticket?

Please note that If you never violate the parking regulations, you will never get a parking ticket.

We assume you violate parking regulations at a specific location where you may get a parking ticket. The chance depends on the parking enforcement officer. Suppose an inspector checks the place simultaneously, you will get a ticket 100%; if not, there is a 0% chance of getting a ticket. Therefore, the inspection’s probability given you violate parking rules will equal the probability of getting a ticket.

The fundamental math is the following simple Baysian equation.

Formula 1
Formula 1

We want to know our target probability P(Inspection|Infraction). The solution is once we have the P(Inspection And Infraction) and P(Infraction), we can get P(Inspection|Infraction).

Evaluate P(Inspection And Infraction)

Let’s figure out those probabilities by combining location and time factors. To make things easier, let’s pick the top spot, 4001 LESLIE ST. It’s North York General Hospital. Below is the ticket count distribution chart for the year 2019.

Each cell indicates how many tickets were issued for that time slot in 2019. Each cell presents the number of a joint event, meaning someone violates the law and the officer catches that.

We assume once the inspectors are on-site, they will find all the infractions. So the total number of tickets is not a good measure for a probability distribution. Since each time spot will repeat an average of 52 times in a year, we can use the unique number of days of the year from each time spot divided by 52 as the probability.

For instance, suppose 60 tickets are issued on Friday from 11 am to 12 am. The 60 tickets cover only 26 weeks (26 unique days of the year), meaning for those weeks without any tickets on Friday from 11 am to 12 am, neither there is an infraction nor the inspectors show up at that time. Then technically, we think the probability of the joint event is 26/52 = 50%.

Now we have the probabilities of the joint event P(Inspection And Infraction) for each cell. I smoothed the results by averaging each cell’s value with its two neighbours.

Smoothed result. Each cell is an average of itself and two neighbouring cells.
Smoothed result. Each cell is an average of itself and two neighbouring cells.

Evaluate P(Infraction)

Next, let’s obtain the probability of infraction for each time spot. We can assume that a small percentage of people like to violate the parking rules no matter the time and location. The more parking visitors, the more likely those "parking regulation breakers" will come. Now the question becomes if we can have the data of the number of parking visitors.

I found the 2019 statistics of the North York General Hospital online (Reference 3). There are 118152 emergency and 170132 outpatients, which means around 790 patients daily. Of course, not all patients need parking; some patients may visit and park multiple times. It’s reasonable to believe the daily parking visitors should be in the range of 500 to 1000.

I also found a paper on Forecasting Hourly Patient Visits in the Emergency Department (Reference 2). The key takeaway is the distribution of the number of patient arrivals across time.

Combing the above knowledge and information, I create a distribution of parking visitors for North York General Hospital. The total number of parking visitors is closed to 650 for weekdays and closed to 300 for weekends.

With the number of parking visitors, if we know the infraction rate, we can calculate the probability of infraction from this formula:

Formula 2.
Formula 2.

Unfortunately, I cannot find a reliable infraction rate online. But we can play with the rate. The following plot shows the P(Infraction) with a rate of 0.013. The rate means in every 1000 parking visitors, 13 will violate the parking regulation.

The assumption that the infraction rate is constant may not be valid. For example, people may be willing to steal a slight risk of parking illegally at night than in the daytime.

To estimate the hourly infraction rate, I assume P(inspection|infraction) is always 1, then P(infraction) will be equal to P(inspection and infraction). By solving Formula 2 with the number of parking visitors, we can have the calculated infraction rate for each time spot. As expected, the rates at night time are indeed higher.

The logic of assuming P(inspection|infraction) is always one means the officers are working perfectly by catching all the violations. The infraction rate derived from this assumption can be treated as the optimal rate; the real infraction rate should be higher than the optimal.

Cacalute P(Inspection|Infraction)

Now we can average the optimal infraction rates by hours, then apply Formula 1 and Formula 2 and get the P(inspection|infraction) as follows.

Because we are using the average of the optimal infraction rates, most of the probabilities are close to 1, as expected. Even if the result overestimates the actual likelihood; we can still find some interesting observations:

Monday daytime and weekday from 9 am to 10 am have a relatively lower chance of getting a parking ticket if you park illegally.

Officers are generally more likely on-site in the mornings than in the afternoons.

Please note the results highly depend on the accuracy of the parking visitor distribution and how to evaluate the infraction rate, which will finally determine the P(infraction). The time slots with lower probability may suggest a real less chance of inspection, or we overestimate P(infraction), the possibility of someone’s violation.


Conclusion

I explored the Toronto parking tickets data set based on time, location and infraction types. I showed some exciting insights learned from the 2019 data. I also proposed simple statistical modelling for infractions and inspections. We will need detailed information on the concerned location so that we can model how likely someone will violate the parking regulations and how likely parking enforcement officers will inspect the site.

Thanks for reading.


References:

[1] Toronto Open Data of Parking Tickets

[2] Forecasting Hourly Patient Visits in the Emergency Department to Counteract Crowding Morten Hertzum

[3] North York General Hospital Statistics


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles