Home PostgreSQL
Post
Cancel
PostgreSQL | SEG

PostgreSQL

This post builds on SQL fundamentals. If you need a refresher on SQL basics, check out our post on Structured Query Language first!

What is PostgreSQL?

PostgreSQL (often called “Postgres”) is an advanced, open-source relational database management system (RDBMS) that emphasizes extensibility and SQL compliance. First released in 1996, it has evolved into one of the most trusted databases in the industry, used by companies like Apple, Instagram, Spotify, and Reddit.

Unlike some databases that prioritize speed over features, PostgreSQL strikes a balance between performance, reliability, and rich functionality. It’s ACID-compliant (Atomicity, Consistency, Isolation, Durability), meaning your data remains consistent and reliable even during system failures.

Why PostgreSQL Matters

In modern software development, choosing the right database is crucial. PostgreSQL stands out for several reasons:

  1. Open Source and Free - No licensing costs, with a permissive license (PostgreSQL License)
  2. Standards Compliant - Adheres closely to SQL standards while adding powerful extensions
  3. Feature Rich - Advanced data types, full-text search, JSON support, and more
  4. Reliable - Battle-tested in production environments for decades
  5. Active Community - Regular updates, extensive documentation, and strong support

PostgreSQL is an excellent choice for your first production database. Its combination of reliability, features, and community support makes it ideal for learning database concepts properly.

Core Concepts

Before diving into PostgreSQL-specific features, let’s understand the fundamental concepts of relational databases.

Database Structure

A PostgreSQL database is organized hierarchically:

flowchart TB
    subgraph Server["PostgreSQL Server"]
        subgraph DB1["Database: ecommerce"]
            subgraph Schema1["Schema: public"]
                T1[Table: users]
                T2[Table: orders]
                T3[Table: products]
            end
            subgraph Schema2["Schema: analytics"]
                T4[Table: page_views]
                T5[Table: events]
            end
        end
        subgraph DB2["Database: blog"]
            subgraph Schema3["Schema: public"]
                T6[Table: posts]
                T7[Table: comments]
            end
        end
    end
    
    style Server fill:#336791,color:#fff
    style DB1 fill:#4285f4,color:#fff
    style DB2 fill:#4285f4,color:#fff
Server - The top-level PostgreSQL instance running on a machine. A PostgreSQL server can host multiple databases. Each server listens on a specific port (default: 5432) and manages all database operations, connections, and security.
Database - A collection of schemas containing related data. Databases are isolated from each other. A query in one database cannot directly access data from another database. Each database has its own set of users, permissions, and configuration.
Schema - A namespace within a database that contains tables, views, and other objects. Schemas help organize database objects logically. The default schema is called "public". You can create multiple schemas to separate different parts of your application (e.g., production tables vs. analytics tables).
Table - The fundamental structure that stores data in rows and columns. Tables define the structure of your data through columns (fields) and store actual data in rows (records). Each table should represent a single entity or concept in your domain.

Relationships Between Tables

Relational databases excel at representing relationships between different entities. PostgreSQL uses foreign keys to establish and enforce these relationships:

erDiagram
    USERS ||--o{ ORDERS : places
    USERS {
        int id PK
        string email
        string name
        timestamp created_at
    }
    ORDERS ||--|{ ORDER_ITEMS : contains
    ORDERS {
        int id PK
        int user_id FK
        decimal total
        timestamp created_at
    }
    PRODUCTS ||--o{ ORDER_ITEMS : "ordered in"
    PRODUCTS {
        int id PK
        string name
        decimal price
        int stock
    }
    ORDER_ITEMS {
        int id PK
        int order_id FK
        int product_id FK
        int quantity
        decimal price
    }

This diagram shows a typical e-commerce database structure:

  • One-to-Many: A user can have many orders
  • One-to-Many: An order contains many order items
  • Many-to-One: Many order items reference one product

PostgreSQL vs Other SQL Databases

While PostgreSQL shares SQL fundamentals with other databases, it has distinctive characteristics:

FeaturePostgreSQLMySQLSQLite
LicensePostgreSQL License (permissive)GPL/CommercialPublic Domain
ACID Compliance✅ Full✅ With InnoDB✅ Full
JSON Support✅ Native JSONB✅ JSON✅ JSON (limited)
Full-Text Search✅ Built-in✅ Basic✅ FTS5 extension
Window Functions✅ Advanced✅ Since 8.0✅ Since 3.25
Arrays✅ Native❌ No❌ No
Custom Data Types✅ Yes❌ Limited❌ Limited
Replication✅ Built-in✅ Built-in❌ No (file-based)
Use CaseGeneral purpose, complex queriesWeb applications, read-heavyEmbedded, mobile, testing

PostgreSQL’s advanced features like native arrays, JSONB, and custom types make it particularly powerful for complex applications. However, for simple read-heavy web applications, MySQL might be simpler to start with.

PostgreSQL Features and Advantages

1. Advanced Data Types

PostgreSQL goes beyond standard SQL types with rich, built-in data types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Array columns
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title TEXT,
    tags TEXT[]  -- Array of tags
);

INSERT INTO posts (title, tags)
VALUES ('PostgreSQL Guide', ARRAY['database', 'sql', 'tutorial']);

-- Query with array operators
SELECT * FROM posts WHERE 'database' = ANY(tags);

-- JSONB for flexible document storage
CREATE TABLE user_preferences (
    user_id INT PRIMARY KEY,
    settings JSONB
);

INSERT INTO user_preferences (user_id, settings)
VALUES (1, '{"theme": "dark", "language": "en", "notifications": {"email": true}}');

-- Query JSON data
SELECT settings->>'theme' AS theme
FROM user_preferences
WHERE settings->'notifications'->>'email' = 'true';
UUID - Universally Unique Identifiers for globally unique keys. UUIDs are 128-bit values that are statistically unique across space and time. PostgreSQL provides the uuid-ossp extension for generating UUIDs. Useful for distributed systems where auto-incrementing integers might conflict.
JSONB - Binary JSON storage with indexing support. JSONB stores JSON data in a decomposed binary format, which is slightly slower to input but significantly faster to process. Unlike plain JSON type, JSONB supports indexing and efficient querying. Perfect for semi-structured data or flexible schemas.
Geometric Types - Points, lines, circles, and polygons for spatial data. PostgreSQL includes built-in geometric types like POINT, LINE, CIRCLE, and PATH. For more advanced GIS functionality, the PostGIS extension provides comprehensive spatial database capabilities.

2. Constraints and Data Integrity

PostgreSQL enforces data quality through various constraint types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    age INT CHECK (age >= 18 AND age <= 100),
    department VARCHAR(50) NOT NULL,
    salary DECIMAL(10, 2) CHECK (salary > 0),
    hire_date DATE NOT NULL DEFAULT CURRENT_DATE,
    manager_id INT REFERENCES employees(id)  -- Self-referential foreign key
);

-- Composite unique constraint
CREATE TABLE user_roles (
    user_id INT REFERENCES users(id),
    role VARCHAR(50),
    PRIMARY KEY (user_id, role)  -- A user can't have the same role twice
);

Always define constraints at the database level, not just in application code. This ensures data integrity even if multiple applications or direct SQL access modify the data.

3. Indexes for Performance

Indexes dramatically speed up queries by creating efficient lookup structures:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- B-tree index (default) for equality and range queries
CREATE INDEX idx_users_email ON users(email);

-- Multi-column index for combined queries
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Partial index - only index a subset of rows
CREATE INDEX idx_active_users ON users(email) WHERE active = true;

-- GIN index for full-text search
CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', content));

-- Unique index to enforce uniqueness
CREATE UNIQUE INDEX idx_products_sku ON products(sku);
B-tree Index - The default index type, good for most use cases. B-tree indexes work efficiently for equality and range queries (<, <=, =, >=, >). They maintain data in sorted order and are balanced for consistent performance. Use for columns in WHERE clauses, JOIN conditions, and ORDER BY.
GIN Index - Generalized Inverted Index for array and full-text search. GIN indexes are ideal for columns containing multiple values like arrays, JSONB, or full-text search vectors. They store mappings from values to the rows containing those values. Essential for performance with complex data types.

Basic SQL Operations in PostgreSQL

Let’s walk through the fundamental operations you’ll use daily.

Creating Tables (DDL)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Create a database (run outside of a specific database)
CREATE DATABASE bookstore;

-- Connect to the database (in psql: \c bookstore)

-- Create tables with relationships
CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    bio TEXT,
    born_date DATE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    isbn VARCHAR(13) UNIQUE NOT NULL,
    author_id INT NOT NULL REFERENCES authors(id) ON DELETE CASCADE,
    publication_year INT CHECK (publication_year >= 1000),
    price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
    stock INT DEFAULT 0 CHECK (stock >= 0),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Add an index for frequent queries
CREATE INDEX idx_books_author ON books(author_id);
CREATE INDEX idx_books_title ON books USING GIN(to_tsvector('english', title));

The SERIAL type is PostgreSQL shorthand for creating an auto-incrementing integer. Under the hood, it creates a sequence and sets the column default to nextval('sequence_name').

Inserting Data (DML)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Single insert
INSERT INTO authors (name, bio, born_date)
VALUES ('George Orwell', 'English novelist and essayist', '1903-06-25');

-- Multiple inserts in one statement
INSERT INTO authors (name, bio, born_date) VALUES
    ('Aldous Huxley', 'English writer and philosopher', '1894-07-26'),
    ('Ray Bradbury', 'American author and screenwriter', '1920-08-22');

-- Insert and return the generated ID
INSERT INTO books (title, isbn, author_id, publication_year, price, stock)
VALUES ('1984', '9780451524935', 1, 1949, 15.99, 50)
RETURNING id, title, created_at;

-- Insert from a SELECT (copy data)
INSERT INTO archived_orders
SELECT * FROM orders WHERE created_at < '2020-01-01';

Querying Data (DQL)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Basic SELECT
SELECT title, price FROM books WHERE stock > 0;

-- JOIN to get related data
SELECT 
    b.title,
    b.price,
    a.name AS author_name
FROM books b
INNER JOIN authors a ON b.author_id = a.id
WHERE b.publication_year > 1950
ORDER BY b.title;

-- Aggregate functions
SELECT 
    a.name,
    COUNT(b.id) AS book_count,
    AVG(b.price) AS avg_price
FROM authors a
LEFT JOIN books b ON a.id = b.author_id
GROUP BY a.id, a.name
HAVING COUNT(b.id) > 0
ORDER BY book_count DESC;

-- Subquery example
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);

-- Window function (advanced)
SELECT 
    title,
    price,
    AVG(price) OVER () AS overall_avg_price,
    price - AVG(price) OVER () AS price_difference
FROM books;

Updating Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Simple update
UPDATE books
SET stock = stock + 10
WHERE id = 1;

-- Update with JOIN
UPDATE books
SET price = price * 1.10
FROM authors
WHERE books.author_id = authors.id
  AND authors.name = 'George Orwell';

-- Update and return modified rows
UPDATE books
SET stock = 0
WHERE stock < 5
RETURNING id, title, stock;

Deleting Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Delete specific rows
DELETE FROM books WHERE stock = 0 AND created_at < '2020-01-01';

-- Delete with JOIN (PostgreSQL syntax)
DELETE FROM books
USING authors
WHERE books.author_id = authors.id
  AND authors.name = 'Unknown Author';

-- Cascade delete (if foreign key has ON DELETE CASCADE)
DELETE FROM authors WHERE id = 1;  -- Also deletes all books by this author

-- Truncate for fast deletion of all rows
TRUNCATE TABLE books RESTART IDENTITY;  -- Resets SERIAL counters

Be extremely careful with DELETE and UPDATE without WHERE clauses. Always test with a SELECT first to verify which rows will be affected!

Connecting to PostgreSQL from Java

Here’s how to connect and query PostgreSQL from a Java application using JDBC:

1. Add PostgreSQL JDBC Driver

1
2
3
4
5
6
<!-- Add to pom.xml for Maven projects -->
<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <version>42.7.1</version>
</dependency>

2. Basic Connection Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import java.sql.*;

public class PostgreSQLExample {
    
    private static final String URL = "jdbc:postgresql://localhost:5432/bookstore";
    private static final String USER = "postgres";
    private static final String PASSWORD = "your_password";
    
    public static void main(String[] args) {
        // Try-with-resources ensures connection is closed
        try (Connection conn = DriverManager.getConnection(URL, USER, PASSWORD)) {
            System.out.println("Connected to PostgreSQL database!");
            
            // Query example
            queryBooks(conn);
            
            // Insert example
            insertBook(conn, "Brave New World", "9780060850524", 2, 1932, 14.99, 30);
            
        } catch (SQLException e) {
            System.err.println("Database error: " + e.getMessage());
            e.printStackTrace();
        }
    }
    
    // Query data
    private static void queryBooks(Connection conn) throws SQLException {
        String sql = """
            SELECT b.title, b.price, a.name AS author_name
            FROM books b
            INNER JOIN authors a ON b.author_id = a.id
            WHERE b.stock > ?
            ORDER BY b.title
            """;
        
        try (PreparedStatement stmt = conn.prepareStatement(sql)) {
            stmt.setInt(1, 0);  // Parameter for stock > ?
            
            try (ResultSet rs = stmt.executeQuery()) {
                System.out.println("\nBooks in stock:");
                while (rs.next()) {
                    String title = rs.getString("title");
                    double price = rs.getDouble("price");
                    String author = rs.getString("author_name");
                    System.out.printf("- %s by %s ($%.2f)%n", title, author, price);
                }
            }
        }
    }
    
    // Insert data with prepared statement (prevents SQL injection)
    private static void insertBook(Connection conn, String title, String isbn,
                                   int authorId, int year, double price, int stock) 
            throws SQLException {
        String sql = """
            INSERT INTO books (title, isbn, author_id, publication_year, price, stock)
            VALUES (?, ?, ?, ?, ?, ?)
            RETURNING id
            """;
        
        try (PreparedStatement stmt = conn.prepareStatement(sql)) {
            stmt.setString(1, title);
            stmt.setString(2, isbn);
            stmt.setInt(3, authorId);
            stmt.setInt(4, year);
            stmt.setDouble(5, price);
            stmt.setInt(6, stock);
            
            try (ResultSet rs = stmt.executeQuery()) {
                if (rs.next()) {
                    int newId = rs.getInt("id");
                    System.out.printf("\nInserted book '%s' with ID: %d%n", title, newId);
                }
            }
        }
    }
}

Always use PreparedStatement instead of string concatenation for SQL queries. This prevents SQL injection attacks and improves performance through query plan caching.

3. Transaction Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
public static void transferStock(Connection conn, int fromBookId, int toBookId, int quantity)
        throws SQLException {
    // Disable auto-commit to start a transaction
    conn.setAutoCommit(false);
    
    try {
        // Decrease stock from first book
        String decreaseSql = "UPDATE books SET stock = stock - ? WHERE id = ? AND stock >= ?";
        try (PreparedStatement stmt = conn.prepareStatement(decreaseSql)) {
            stmt.setInt(1, quantity);
            stmt.setInt(2, fromBookId);
            stmt.setInt(3, quantity);
            
            int rowsAffected = stmt.executeUpdate();
            if (rowsAffected == 0) {
                throw new SQLException("Insufficient stock for transfer");
            }
        }
        
        // Increase stock for second book
        String increaseSql = "UPDATE books SET stock = stock + ? WHERE id = ?";
        try (PreparedStatement stmt = conn.prepareStatement(increaseSql)) {
            stmt.setInt(1, quantity);
            stmt.setInt(2, toBookId);
            stmt.executeUpdate();
        }
        
        // Commit the transaction
        conn.commit();
        System.out.println("Stock transfer completed successfully");
        
    } catch (SQLException e) {
        // Rollback on error
        conn.rollback();
        System.err.println("Transaction rolled back: " + e.getMessage());
        throw e;
        
    } finally {
        // Restore auto-commit
        conn.setAutoCommit(true);
    }
}

Data Types in PostgreSQL

PostgreSQL offers a rich set of data types. Here are the most commonly used:

Numeric Types

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Integer types
smallint    -- 2 bytes: -32,768 to 32,767
integer     -- 4 bytes: -2 billion to 2 billion
bigint      -- 8 bytes: very large integers
serial      -- Auto-incrementing integer
bigserial   -- Auto-incrementing bigint

-- Floating-point
real        -- 4 bytes, 6 decimal precision
double precision  -- 8 bytes, 15 decimal precision

-- Fixed-point (preferred for money)
numeric(precision, scale)  -- e.g., numeric(10,2) for prices
decimal(p, s)              -- Same as numeric

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    price NUMERIC(10, 2),  -- Up to 99,999,999.99
    weight REAL,           -- Approximate: 5.25 kg
    stock INT
);

Character Types

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Fixed-length (pads with spaces)
char(n)     -- Exactly n characters

-- Variable-length with limit
varchar(n)  -- Up to n characters

-- Unlimited variable-length
text        -- Preferred for most text fields

CREATE TABLE articles (
    code CHAR(10),           -- Fixed product codes
    title VARCHAR(200),      -- Limited title length
    content TEXT             -- Full article content
);

In PostgreSQL, there’s no performance difference between varchar, char, and text. Prefer text unless you have a specific reason to limit length.

Date and Time Types

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Date only (no time)
date                -- '2025-12-31'

-- Time only (no date)
time                -- '14:30:00'

-- Date and time without timezone
timestamp           -- '2025-12-31 14:30:00'

-- Date and time with timezone (recommended)
timestamptz         -- '2025-12-31 14:30:00+01'

-- Time interval
interval            -- '2 days', '3 hours 30 minutes'

CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    name TEXT,
    event_date DATE,
    start_time TIMESTAMPTZ,  -- Use timestamptz for timezone awareness
    duration INTERVAL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

-- Working with dates
SELECT 
    name,
    start_time,
    start_time + duration AS end_time,
    AGE(start_time, CURRENT_TIMESTAMP) AS time_until_event
FROM events
WHERE start_time > CURRENT_TIMESTAMP;

Boolean Type

1
2
3
4
5
6
7
8
9
10
11
boolean  -- true, false, or NULL

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email TEXT UNIQUE NOT NULL,
    is_active BOOLEAN DEFAULT true,
    email_verified BOOLEAN DEFAULT false
);

-- Query with boolean
SELECT * FROM users WHERE is_active AND NOT email_verified;

Performance Basics

Understanding Query Plans

PostgreSQL’s query planner determines how to execute your queries. Use EXPLAIN to understand performance:

1
2
3
4
5
6
7
8
9
-- Show the query plan
EXPLAIN SELECT * FROM books WHERE author_id = 5;

-- Show the plan and actual execution statistics
EXPLAIN ANALYZE
SELECT b.title, a.name
FROM books b
JOIN authors a ON b.author_id = a.id
WHERE b.price > 20;

Output includes:

  • Seq Scan: Full table scan (slow for large tables)
  • Index Scan: Using an index (fast)
  • Nested Loop: Join method
  • Cost: Estimated execution cost
  • Rows: Estimated number of rows returned

If you see Seq Scan on a large table with a WHERE clause, consider adding an index on the filtered columns.

When to Add Indexes

flowchart TD
    A[Slow Query?] -->|Yes| B{Check EXPLAIN}
    A -->|No| Z[No action needed]
    B --> C{Seq Scan on<br/>large table?}
    C -->|Yes| D[Add index on<br/>WHERE/JOIN columns]
    C -->|No| E{Many rows<br/>returned?}
    E -->|Yes| F[Optimize query<br/>or add filters]
    E -->|No| G[Check other<br/>bottlenecks]
    D --> H[Test performance]
    H --> I{Improved?}
    I -->|Yes| Z
    I -->|No| F

Good candidates for indexes:

  • Foreign key columns (used in JOINs)
  • Columns in WHERE clauses
  • Columns in ORDER BY
  • Columns frequently used in searches

Don’t over-index:

  • Every index slows down INSERT/UPDATE/DELETE
  • Indexes consume disk space
  • Small tables (< 1000 rows) don’t benefit much

Basic Optimization Tips

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Use LIMIT to reduce result set size
SELECT * FROM books ORDER BY created_at DESC LIMIT 10;

-- Select only needed columns (not SELECT *)
SELECT title, price FROM books WHERE stock > 0;

-- Use EXISTS instead of COUNT for existence checks
-- Bad: slow for large tables
SELECT * FROM authors WHERE (SELECT COUNT(*) FROM books WHERE author_id = authors.id) > 0;

-- Good: stops after finding first match
SELECT * FROM authors WHERE EXISTS (SELECT 1 FROM books WHERE author_id = authors.id);

-- Batch inserts instead of individual statements
INSERT INTO books (title, isbn, author_id, publication_year, price, stock) VALUES
    ('Book 1', '1234567890123', 1, 2020, 19.99, 50),
    ('Book 2', '1234567890124', 1, 2021, 24.99, 30),
    ('Book 3', '1234567890125', 2, 2022, 29.99, 20);

When to Use PostgreSQL

PostgreSQL is an excellent choice when you need:

✅ Good Use Cases

  1. Complex Data Models
    • Multiple relationships between entities
    • Need for data integrity and constraints
    • ACID compliance is critical
  2. Advanced Features
    • Full-text search without external services
    • JSON/document storage alongside relational data
    • Array or custom data types
    • Geographic data (with PostGIS)
  3. Reporting and Analytics
    • Complex queries with aggregations
    • Window functions for analytical queries
    • Reliable data consistency
  4. General Purpose Applications
    • Web applications with moderate to high complexity
    • E-commerce platforms
    • Content management systems
    • SaaS applications

❌ Consider Alternatives When

  1. Embedded Applications
    • Mobile apps or desktop software that need a bundled database
    • Use SQLite instead (simpler, no server needed)
  2. Simple Key-Value Storage
    • Only need to store and retrieve by ID
    • Consider Redis for caching or simple data
  3. Document-Oriented Data
    • Highly flexible schema that changes frequently
    • Deeply nested documents
    • Consider MongoDB (though PostgreSQL’s JSONB is competitive)
  4. Time-Series Data
    • Massive volumes of time-stamped data
    • Consider TimescaleDB (PostgreSQL extension) or InfluxDB
  5. Extreme Scale Requirements
    • Billions of rows per second
    • Consider specialized databases like Apache Cassandra or distributed SQL like CockroachDB

For most Junior developers starting out, PostgreSQL is the best choice to learn relational database concepts. It’s powerful enough to grow with you as you advance.

Common Pitfalls and Best Practices

❌ Common Mistakes

1. Not Using Transactions for Multi-Step Operations

When performing operations that must succeed or fail together, always use transactions:

1
2
3
4
5
6
7
8
9
10
11
-- BAD: Without transaction
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- If the second update fails, money disappears!

-- GOOD: With transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- Both succeed or both fail

Always wrap related operations in transactions to maintain data consistency. If any operation fails, the entire transaction is rolled back.

2. Storing Money as FLOAT or REAL

Floating-point numbers are imprecise and should never be used for monetary values:

1
2
3
4
5
6
-- BAD: Floating-point errors
CREATE TABLE orders (price REAL);
-- Results in: 10.10 stored as 10.099999...

-- GOOD: Use NUMERIC for exact decimal values
CREATE TABLE orders (price NUMERIC(10, 2));

3. Not Setting up Proper Indexes

Missing indexes on frequently queried columns leads to slow performance:

1
2
3
4
5
6
7
8
9
10
11
-- BAD: No index on foreign key
CREATE TABLE orders (
    user_id INT REFERENCES users(id)
);
-- Queries joining on user_id will be slow

-- GOOD: Add index on foreign key
CREATE TABLE orders (
    user_id INT REFERENCES users(id)
);
CREATE INDEX idx_orders_user_id ON orders(user_id);

4. Using SELECT * in Production Code

Selecting all columns is inefficient and fragile to schema changes:

1
2
3
4
5
// BAD: Retrieves unnecessary columns, breaks if schema changes
String sql = "SELECT * FROM users";

// GOOD: Explicit columns
String sql = "SELECT id, email, name FROM users WHERE active = true";

✅ Best Practices

  1. Always Use Prepared Statements
    1
    2
    3
    4
    5
    6
    
    // Inside your query method:
    // Prevents SQL injection
    PreparedStatement stmt = conn.prepareStatement(
        "SELECT * FROM users WHERE email = ?"
    );
    stmt.setString(1, userEmail);
    
  2. Define Constraints at Database Level
    1
    2
    3
    4
    5
    
    -- Enforce rules in the database, not just application
    CREATE TABLE users (
        email VARCHAR(255) UNIQUE NOT NULL,
        age INT CHECK (age >= 18)
    );
    
  3. Use Meaningful Names
    1
    2
    3
    
    -- Clear, descriptive names
    CREATE TABLE customer_orders (...);
    CREATE INDEX idx_orders_customer_created ON customer_orders(customer_id, created_at);
    
  4. Back Up Your Data Regularly
    1
    2
    
    # Use pg_dump for backups
    pg_dump -U postgres -d bookstore -f backup.sql
    
  5. Use Connection Pooling in Production
    1
    2
    3
    4
    5
    
    // In your database configuration:
    // Use HikariCP or similar instead of creating connections directly
    HikariConfig config = new HikariConfig();
    config.setJdbcUrl("jdbc:postgresql://localhost:5432/bookstore");
    HikariDataSource ds = new HikariDataSource(config);
    

Start with good practices from the beginning. It’s much harder to fix data integrity issues and add indexes later than to design properly from the start.

Practical Example: Building a Simple Blog Schema

Let’s put it all together with a complete example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- Create database
CREATE DATABASE blog_platform;

-- Create tables
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT true
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    author_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    title VARCHAR(200) NOT NULL,
    slug VARCHAR(200) UNIQUE NOT NULL,
    content TEXT NOT NULL,
    published BOOLEAN DEFAULT false,
    published_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    content TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL
);

CREATE TABLE post_tags (
    post_id INT REFERENCES posts(id) ON DELETE CASCADE,
    tag_id INT REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (post_id, tag_id)
);

-- Create indexes
CREATE INDEX idx_posts_author ON posts(author_id);
CREATE INDEX idx_posts_published ON posts(published, published_at);
CREATE INDEX idx_comments_post ON comments(post_id);
CREATE INDEX idx_comments_user ON comments(user_id);

-- Full-text search index
CREATE INDEX idx_posts_search ON posts 
USING GIN(to_tsvector('english', title || ' ' || content));

-- Sample queries
-- Find all published posts by an author with comment counts
SELECT 
    p.title,
    p.published_at,
    COUNT(c.id) AS comment_count
FROM posts p
LEFT JOIN comments c ON p.id = c.post_id
WHERE p.author_id = 1 AND p.published = true
GROUP BY p.id, p.title, p.published_at
ORDER BY p.published_at DESC;

-- Full-text search
SELECT title, published_at
FROM posts
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('postgresql & database')
AND published = true
ORDER BY published_at DESC;

-- Get posts with their tags
SELECT 
    p.title,
    array_agg(t.name) AS tags
FROM posts p
LEFT JOIN post_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
WHERE p.published = true
GROUP BY p.id, p.title;

Next Steps

Now that you understand PostgreSQL fundamentals, here are recommended next steps:

  1. Practice with Real Projects
    • Build a small application with PostgreSQL backend
    • Experiment with different data models
    • Practice writing complex queries
  2. Learn More Advanced Features
    • Triggers and stored procedures
    • Views and materialized views
    • Partitioning for large tables
    • Replication and high availability
  3. Explore Performance Optimization
    • Query optimization techniques
    • Understanding execution plans deeply
    • Vacuum and maintenance operations
  4. Study PostgreSQL Extensions
    • PostGIS for geographic data
    • pg_stat_statements for query analysis
    • TimescaleDB for time-series data
  5. Learn About Database Design
    • Normalization (1NF, 2NF, 3NF)
    • When to denormalize
    • Schema versioning and migrations

Keep the official PostgreSQL documentation bookmarked: https://www.postgresql.org/docs/. It’s comprehensive and well-written.

Conclusion

PostgreSQL is a powerful, reliable, and feature-rich relational database that serves as an excellent foundation for learning database concepts. Its combination of SQL standard compliance, advanced features, and active community support makes it ideal for both learning and production use.

Key takeaways:

  • PostgreSQL excels at maintaining data integrity through ACID compliance and constraints
  • Its advanced features (JSONB, arrays, full-text search) provide flexibility beyond basic SQL
  • Proper indexing and query optimization are essential for performance
  • Always use prepared statements and transactions to ensure data safety and security
  • Start with good database design practices from the beginning

Whether you’re building a simple web application or a complex enterprise system, PostgreSQL provides the tools and reliability you need. Take time to practice the fundamentals, experiment with different features, and gradually build more complex database applications.

"The best database is the one you understand well. PostgreSQL rewards those who invest time in learning its features properly."

Happy querying! 🐘

This post is licensed under CC BY 4.0 by the author.