SQLite practice – traffic violations analysis

SQLite is a lightweight, serverless relational database engine that stores data in a single file on disk. It is included in Python via the built-in sqlite3 module, allowing you to create, query, and manage databases without installing external software. SQLite is ideal for small to medium applications, prototyping, and local data storage.

In This Article You Will Learn:

Create a local SQLite database.
Define table schemas (DDL).
Load data from CSV files into our database.
Run SQL queries to analyze the data.

All files related to this article can be found in my GitHub repository at the following link: https://github.com/analysislessons/Article_SQLitePractice

The repository contains the following resources:

Three CSV files containing the data: drivers.csv, traffic_violations.csv, and violations_details.csv.
Two code files: sqlite_practice.ipynb, which includes the same descriptions as this article, and sqlite_practice_script.py, which allows for multiple queries and is the most convenient way to run the code and execute the tasks from the end of the article.

The data consists of three entities:

Drivers: Information about individuals and their vehicle registration.
Traffic Violations: A log of violation events linking drivers to specific offenses.
Violations Details: A reference table for types of offenses, points, and fines.

import os import csv import sqlite3 # Set database path DB_NAME = "traffic_violations.db" if os.path.exists(DB_NAME): os.remove(DB_NAME) # Start fresh for this tutorial

1. Defining the Table Schemas (DDL)

DDL (Data Definition Language) is used to define the structure of our database. We use the CREATE TABLE command to specify column names and data types.

# 1) Drivers table schema DDL_DRIVERS = ''' CREATE TABLE IF NOT EXISTS drivers ( PersonID INTEGER PRIMARY KEY, FirstName TEXT NOT NULL, LastName TEXT NOT NULL, RegistrationNumber TEXT NOT NULL ); ''' # 2) Traffic violations table schema DDL_TRAFFIC_VIOLATIONS = ''' CREATE TABLE IF NOT EXISTS traffic_violations ( EventID INTEGER PRIMARY KEY, Date TEXT NOT NULL, PersonID INTEGER NOT NULL, OffenseID INTEGER NOT NULL ); ''' # 3) Violations details table schema DDL_VIOLATIONS_DETAILS = ''' CREATE TABLE IF NOT EXISTS violations_details ( IdOffense PRIMARY KEY, Offense TEXT NOT NULL, Points INTEGER NOT NULL, Amount INTEGER NOT NULL ); '''

2. Creating the Database and Tables

Connecting to a database in Python follows a standardized approach defined in PEP 249 (Python Database API Specification v2.0). This specification ensures that different database drivers (for SQLite, PostgreSQL, MySQL, etc.) provide a consistent interface for developers.

The three key elements of this process are:

Connection Object: Established using a connect() function, it represents the link to the database file or server.
Cursor Object: Created from the connection (conn.cursor()). The cursor is the object used to execute SQL commands and fetch results.
Commit/Close: For databases that support transactions, changes must be committed to be saved. Finally, connections should be closed to free up system resources.

This standard is why connecting to PostgreSQL looks almost identical to SQLite. Instead of the built-in sqlite3 library, you would typically use a library like psycopg2 (or the newer psycopg), but the workflow of connect -> cursor -> execute -> commit remains exactly the same.

Now, let's connect to our SQLite database file and execute our DDL scripts.

conn = sqlite3.connect(DB_NAME) cur = conn.cursor() cur.execute(DDL_DRIVERS) cur.execute(DDL_TRAFFIC_VIOLATIONS) cur.execute(DDL_VIOLATIONS_DETAILS) print("Database and tables created successfully.")

3. Loading Data from CSV Files

We need a helper function to read our CSV files and insert the data into the corresponding SQLite tables. We will handle basic type conversion (e.g., strings to integers) during this process.

def load_csv_to_table(conn, filepath, table_name, columns): """Read a comma-separated file and insert rows into the named table.""" insert_sql = f'INSERT OR IGNORE INTO {table_name} ({ ", ".join(columns) }) VALUES ({ ",".join(["?" for _ in columns]) })' rows = [] if not os.path.exists(filepath): print(f"File not found: {filepath}") return with open(filepath, newline="", encoding="utf-8") as f: reader = csv.reader(f, delimiter=",") next(reader) # Skip header row for r in reader: # Basic data cleaning: handle empty strings and convert types vals = [None if v == "" else v for v in r] for i, col in enumerate(columns): if col.lower() in ("personid", "eventid", "idoffense", "offenseid", "points", "amount"): try: vals[i] = int(vals[i]) if vals[i] is not None else None except ValueError: pass rows.append(tuple(vals)) if rows: cur = conn.cursor() cur.executemany(insert_sql, rows) conn.commit() print(f"Loaded {len(rows)} rows into {table_name}.")

Now we execute the data loading for all three tables.

load_csv_to_table(conn, "drivers.csv", "drivers", ["PersonID", "FirstName", "LastName", "RegistrationNumber"]) load_csv_to_table(conn, "traffic_violations.csv", "traffic_violations", ["EventID", "Date", "PersonID", "OffenseID"]) load_csv_to_table(conn, "violations_details.csv", "violations_details", ["IdOffense", "Offense", "Points", "Amount"])

4. Verifying the Data

Let us check how many rows were actually inserted into each table.

for tbl in ("drivers", "traffic_violations", "violations_details"): cur.execute(f'SELECT COUNT(*) FROM {tbl}') cnt = cur.fetchone()[0] print(f"{tbl}: {cnt} rows")

Interactive SQL Console

The following code allows the execution of a single custom query. In the attached sqlite_practice_script.py, there is a while loop that enables running multiple queries, not just one; however, in a .ipynb file, this approach does not always work reliably.

Instructions:

Type your SQL query in the input box.
Type exit to stop the loop and move to the next section.

print("\\n" + "="*40) print("Type your SQL query and press Enter.") print("="*40) query_input = input("SQL> ") try: cur.execute(query_input) if query_input.strip().lower().startswith("select"): rows = cur.fetchall() cols = [d[0] for d in cur.description] if cur.description else [] if cols: print(" | ".join(cols)) print("-" * 40) for r in rows: print(r) else: conn.commit() print(f"Query executed. Rows affected: {cur.rowcount}") except Exception as e: print(f"Error: {e}")

Practice tasks

Use the Interactive SQL Console above, either by running sqlite_practice_script.py or sqlite_practice.ipynb, to solve these tasks. Check your answers against the solutions provided below.

Task 1: Basic Retrieval

Goal: List all drivers' first and last names, sorted alphabetically by their last name.

SELECT FirstName, LastName FROM drivers ORDER BY LastName ASC;

Task 2: Filtering Data

Goal: Find all types of offenses that result in 5 or more penalty points.

SELECT Offense, Points FROM violations_details WHERE Points >= 5;

Task 3: Aggregation

Goal: Calculate the total number of traffic violation events recorded in the system.

SELECT COUNT(*) FROM traffic_violations;

Task 4: Simple Join

Goal: List the dates of violations along with the registration numbers of the cars involved.

SELECT v.Date, d.RegistrationNumber FROM traffic_violations v JOIN drivers d ON v.PersonID = d.PersonID;

Task 5: The Challenge

Goal: Find the top 5 drivers who have accumulated the highest total number of penalty points.

SELECT d.FirstName, d.LastName, SUM(det.Points) as TotalPoints FROM traffic_violations v JOIN drivers d ON v.PersonID = d.PersonID JOIN violations_details det ON v.OffenseID = det.IdOffense GROUP BY d.PersonID ORDER BY TotalPoints DESC LIMIT 5;

Task 6: Grouping and Having

Goal: For each type of offense, find the total amount of fines collected, but only for offenses that have occurred more than 5 times.

SELECT det.Offense, COUNT(v.EventID) as Occurrences, SUM(det.Amount) as TotalFines FROM traffic_violations v JOIN violations_details det ON v.OffenseID = det.IdOffense GROUP BY det.Offense HAVING Occurrences > 5;

5. Query database using Pandas

We can run queries using the Pandas library, which performs part of the process internally. This allows us to write less code, display results in a nicely formatted way as a pandas DataFrame, and perform further analysis efficiently.

query = ''' SELECT v.Date, d.FirstName, d.LastName, det.Offense, det.Points, det.Amount FROM traffic_violations v JOIN drivers d ON v.PersonID = d.PersonID JOIN violations_details det ON v.OffenseID = det.IdOffense LIMIT 10; ''' import pandas as pd df = pd.read_sql_query(query, conn) display(df)

DATA ANALYSIS