DATA ANALYSIS  

PostgreSQL: Introduction

PostgreSQL is a powerful, open-source object-relational database system. This article will guide you through the installation process, basic operations, and understanding its architecture.

All files related to this article can be found in my GitHub repository at the following link: PostgreSQL_Introduction . In addition to the Jupyter Notebook containing the same content as this article, you will find both students_20.csv and students_30.csv for database import, along with the connect_and_load.py and query_database.py scripts. These scripts provide the same functionality as the Python code at the end of this article, allowing you to run the code independently of the notebook environment.

1. Installation

PostgreSQL can be installed on various operating systems. Below are the instructions for the most popular ones.

Ubuntu (Linux)

Ubuntu includes PostgreSQL in its default repositories. To install it, follow these steps:

  1. Update your package index:
    sudo apt update
  2. Install PostgreSQL and the -contrib package which adds extra utilities:
    sudo apt install postgresql postgresql-contrib
  3. Check the status of the service:
    sudo systemctl status postgresql

Windows

The easiest way to install PostgreSQL on Windows is using the interactive installer provided by EDB:

  • Download the installer from the official website.
  • Run the .exe file.
  • During installation, you will be asked to set a password for the postgres superuser, choose a port (default is 5432), and select components like pgAdmin 4.
  • Complete the wizard and PostgreSQL will be installed as a Windows Service.

MacOS

On MacOS, using Homebrew is the most popular choice for developers:

  1. Install PostgreSQL:
    brew install postgresql
  2. Start the service:
    brew services start postgresql
  3. Alternatively, you can use Postgres.app for a full-featured GUI-based installation.

2. Basic PostgreSQL Operations

Once installed, you can interact with PostgreSQL using the psql command-line tool.

Opening psql

To log in as the default administrative user on Linux:

sudo -i -u postgres psql

On Windows/MacOS you might need:

psql -U postgres

and provide password to postgres role that was set during installation (or configured afterward).

Creating a New User and Database

Inside the psql prompt, run the following commands to set up a new environment:

-- Create a new user CREATE USER my_user WITH PASSWORD 'my_password'; -- Create a new database owned by the user CREATE DATABASE my_new_db OWNER my_user;

Working with Tables

Connect to your database and create a simple table:

-- Connect to the database \c my_new_db -- or quit and log as newly created user psql -U my_user -d my_new_db -- Create a students table CREATE TABLE students ( id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, email VARCHAR(150) UNIQUE );

Adding a Fourth Column

To add a new column for the enrollment date, use the ALTER TABLE command:

ALTER TABLE students ADD COLUMN enrollment_date DATE; \d students

Adding Rows Manually

You can insert individual records using the INSERT INTO statement:

INSERT INTO students (name, email, enrollment_date) VALUES ('Michael Scott', 'm.scott@gmail.com', '2023-01-15');

Loading Data from a CSV File

If you have a large dataset in a CSV file, you can use the \copy command in psql. This is very efficient for bulk imports:

\copy students(name, email, enrollment_date) FROM 'students_20.csv' DELIMITER ',' CSV HEADER;

Note: The CSV file must be accessible from the machine where you are running the psql client. By default, it is looked for in the folder from which you started psql because a relative path is provided. You can also provide an absolute path instead of just the file name 'students_20.csv'.

3. Architecture: PostgreSQL as a background service

PostgreSQL vs. SQLite

  • SQLite is a file-based database. It doesn't have a separate server process.
  • PostgreSQL works as a background service. It is always running in the background, waiting for connections.

Client-Server Mechanism

PostgreSQL follows a client-server model:

  1. Server: The PostgreSQL daemon (postgres) manages the database files, accepts connections, and executes queries.
  2. Client: Any application that connects to the server to send commands.

Because of this architecture, clients can connect to PostgreSQL even if it is running on another computer.

Free Client Tools

  • psql: The built-in command-line tool. Fast and powerful, but lacks a graphical interface.
  • pgAdmin 4: The most popular web-based/desktop management tool specifically for PostgreSQL. It is often bundled with the Windows installer.
  • DBeaver: A universal database tool that supports PostgreSQL and many other databases. It provides a very rich GUI and is great for data analysis.

Connection Details and Ports

As a background service, PostgreSQL reserves a network port to listen for client connections. The default port is 5432. To establish a connection, a client typically requires: Host, Port, Database Name, User Name, and Password.

Below is a Python example using psycopg that demonstrates how to connect, load data from a second CSV file (students_30.csv), and count the total records.

import psycopg import csv # Connection details DB_PARAMS = { "host": "localhost", "port": 5432, "dbname": "my_new_db", "user": "my_user", "password": "my_password" } try: with psycopg.connect(**DB_PARAMS) as conn: with conn.cursor() as cur: # Load rows from the 2nd CSV file with open('students_30.csv', 'r') as f: with cur.copy("COPY students (name, email, enrollment_date) FROM STDIN WITH (FORMAT CSV, HEADER)") as copy: while data := f.read(1048576): copy.write(data) print("Data from students_30.csv loaded successfully.") cur.execute("SELECT COUNT(*) FROM students;") total_count = cur.fetchone()[0] print(f"Total students in database: {total_count}") except Exception as e: print(f"An error occurred: {e}")

The following Python script provides an interactive way to query and modify your PostgreSQL database directly from the notebook. It runs in a loop, allowing you to execute multiple SQL commands sequentially. For SELECT queries, it automatically formats and displays the results as a table, while for other operations (like INSERT, UPDATE, or DELETE), it confirms the success of the operation. You can exit the interactive mode at any time by typing q or exit.

import psycopg try: with psycopg.connect(**DB_PARAMS) as conn: while True: query = input("psql > ") if query.lower() in ['q', 'exit']: break if not query.strip(): continue try: with conn.cursor() as cur: cur.execute(query) if cur.description: colnames = [desc[0] for desc in cur.description] print(" | ".join(colnames)) for row in cur.fetchall(): print(" | ".join(map(str, row))) else: print(f"Success: {cur.statusmessage}") conn.commit() except Exception as e: print(f"Query error: {e}") conn.rollback() except Exception as e: print(f"Connection error: {e}")

4. Common Problems and Solutions

Linux (Ubuntu)

Problem: "Peer authentication failed for user 'postgres'".
Solution: Linux defaults to OS-level auth. Use sudo -u postgres psql or update pg_hba.conf.

Windows

Problem: Service fails to start or "Port 5432 already in use".
Solution: Use Resource Monitor to find conflicting processes.

MacOS

Problem: "psql: error: could not connect to server".
Solution: Ensure server is running with brew services start postgresql.

Contact details:

+48 790-430-860

analysislessons@gmail.com