Databases Practical class 11

Creating an INDEX in SQL

In database systems, an index is a performance-tuning structure associated with a table or view. Its primary purpose is to speed up data retrieval operations (SELECT queries) on a table at the cost of slightly slower data modification (INSERT, UPDATE, DELETE) and additional storage space. Today, we will explore how to work with indexes in PostgreSQL.

We will use a sample table named employees for our examples. Assume this table has been created as follows:

-- Drop table if it exists to start fresh
DROP TABLE IF EXISTS employees;

-- Create the employees table
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100), -- Removed UNIQUE constraint for unique index example
    hire_date DATE,
    salary NUMERIC(10, 2),
    department_id INT,
    status VARCHAR(10) DEFAULT 'active', -- e.g., 'active', 'inactive'
    profile JSONB -- For JSON indexing example
);

-- Insert some sample data
INSERT INTO employees (first_name, last_name, email, hire_date, salary, department_id, profile) VALUES
('John', 'Doe', 'john.doe@company.com', '2022-01-15', 60000, 1, '{"skills": ["SQL", "Python"], "level": "Mid"}'),
('Jane', 'Smith', 'jane.smith@company.com', '2021-03-10', 75000, 2, '{"skills": ["Java", "Spring"], "level": "Senior"}'),
('Peter', 'Jones', 'peter.jones@company.com', '2023-05-20', 55000, 1, '{"skills": ["HTML", "CSS"], "level": "Junior"}'),
('Alice', 'Brown', 'alice.brown@company.com', '2022-08-01', 80000, 2, '{"skills": ["Project Management"], "level": "Lead"}'),
('Bob', 'White', 'bob.white@company.com', '2023-11-01', 50000, 3, null),
('Jane', 'Doe', 'jane.doe@company.com', '2023-02-28', 62000, 1, '{"skills": ["Python", "Data Analysis"], "level": "Mid"}');

You can execute this setup script in pgAdmin 4’s Query Tool.

1. Create Index – Basic Index Definition

A basic index helps speed up queries that filter or sort data based on the indexed column(s). PostgreSQL uses B-tree indexes by default, which are suitable for a wide range of queries involving comparison operators (=, <, <=, >, >=).

Syntax:
```
CREATE INDEX index_name
ON table_name (column_name);
```
You can optionally specify the index type using USING method (e.g., USING btree), but B-tree is the default.
Example: Create an index on the last_name column of the employees table to speed up searches by last name.
```
CREATE INDEX idx_employees_last_name
ON employees (last_name);
```
Execution: Run this command in the pgAdmin 4 Query Tool connected to your database.
Note: Primary Key and Unique constraints automatically create underlying indexes in PostgreSQL. You don’t need to manually create an index for the employee_id column because it’s the primary key.

2. Unique Index – Enforcing Uniqueness

A unique index serves two purposes: it speeds up data retrieval (like a regular index) and enforces the constraint that values in the indexed column(s) must be unique across the table (allowing NULLs by default, but only one NULL unless specified otherwise in some contexts).

Syntax:

CREATE UNIQUE INDEX index_name
ON table_name (column_name);

Example: Ensure that every employee has a unique email address by creating a unique index on the email column.
```
CREATE UNIQUE INDEX idx_employees_email_unique
ON employees (email);
```
Execution: Run this command. If duplicate emails exist in your sample data (excluding NULLs), the command will fail.

Attempting Violation: Try inserting a duplicate email:

-- This should FAIL because of the unique index
INSERT INTO employees (first_name, last_name, email, hire_date, salary, department_id)
VALUES ('Duplicate', 'User', 'john.doe@company.com', '2024-01-01', 50000, 1);

You will receive an error indicating a violation of the unique constraint imposed by the index.

3. Index on Expression – Indexing Calculated Values

Sometimes, queries filter or sort based on the result of a function or expression applied to a column, rather than the column’s raw value. You can create an index on such expressions.

Syntax:

CREATE INDEX index_name
ON table_name ((expression)); -- Note the double parentheses

Example: If you frequently perform case-insensitive searches on the email column, you can index the lower-cased version of the email.
```
CREATE INDEX idx_employees_email_lower
ON employees (LOWER(email));
```
Execution: Run this command.

Usage: A query like the following can now potentially use this index:

SELECT employee_id, first_name, last_name, email
FROM employees
WHERE LOWER(email) = 'jane.smith@company.com';

4. Partial Index – Indexing a Subset of Rows

A partial index includes entries only for a subset of the table’s rows that satisfy a specific condition (defined by a WHERE clause). This can save storage space and improve performance for queries that target this specific subset.

Syntax:

CREATE INDEX index_name
ON table_name (column_name)
WHERE condition;

Example: Create an index only for active employees based on their hire_date. This is useful if queries frequently look for active employees hired within certain date ranges.
```
CREATE INDEX idx_employees_active_hire_date
ON employees (hire_date)
WHERE status = 'active';
```
Execution: Run this command.

Usage: Queries that filter by hire_date and include WHERE status = 'active' are candidates for using this index:

SELECT first_name, last_name, hire_date
FROM employees
WHERE status = 'active' AND hire_date > '2023-01-01';

5. Multicolumn Indexes – Indexing Multiple Columns

You can create an index on multiple columns. The order of columns in the index definition matters. A multicolumn index can speed up queries that filter or sort on the leading columns of the index, or on all columns in the index.

Syntax:

CREATE INDEX index_name
ON table_name (column1, column2, ...);

Example: Create an index on department_id and salary to optimize queries that filter by department and then sort or filter by salary within that department.

CREATE INDEX idx_employees_dept_salary
ON employees (department_id, salary);

Execution: Run this command.

Usage: This index can potentially optimize queries like:

-- Filter by the first column
SELECT * FROM employees WHERE department_id = 1;

-- Filter by the first and second columns
SELECT * FROM employees WHERE department_id = 2 AND salary > 70000;

-- Filter by the first and sort by the second
SELECT * FROM employees WHERE department_id = 1 ORDER BY salary DESC;

Note: A query filtering only on salary might not use this index effectively, as salary is not the leading column.

6. Reindex – Rebuilding Indexes

Over time, especially in tables with frequent updates or deletes, indexes can become bloated (containing empty space) or fragmented, potentially reducing their efficiency. You might also need to rebuild indexes after changing certain storage parameters. The REINDEX command rebuilds an index or all indexes on a table or database.

Syntax:

-- Rebuild a specific index
REINDEX INDEX index_name;

-- Rebuild all indexes on a specific table
REINDEX TABLE table_name;

-- Rebuild all indexes in the current database (requires appropriate permissions)
REINDEX DATABASE database_name;

-- Rebuild all indexes on system tables in the current database
REINDEX SYSTEM database_name;

Example: Rebuild the idx_employees_last_name index.

REINDEX INDEX idx_employees_last_name;

Execution: Run this command. It will lock the table for writes while rebuilding.

Example: Rebuild all indexes on the employees table.

REINDEX TABLE employees;

Execution: Run this command.

Note: REINDEX typically requires exclusive locks, which can block other operations. For high-availability systems, consider REINDEX CONCURRENTLY, though this has its own trade-offs and is a more advanced topic beyond the scope of simply rebuilding.

7. Drop Index – Deleting an Index

If an index is no longer needed, consumes too much space, or negatively impacts write performance more than it benefits read performance, you can remove it.

Syntax:

DROP INDEX index_name;

You can add IF EXISTS to avoid an error if the index doesn’t exist:

DROP INDEX IF EXISTS index_name;

Example: Remove the expression index on LOWER(email).

DROP INDEX IF EXISTS idx_employees_email_lower;

Execution: Run this command.

Caution: Do not drop indexes created automatically by primary key or unique constraints unless you also intend to drop the constraint itself (which is usually done via ALTER TABLE ... DROP CONSTRAINT).

8. List Indexes – Finding Existing Indexes

You need ways to see which indexes exist on a table or in your database.

Method 1: Using psql command (or SQL Pane in pgAdmin 4) In psql or the pgAdmin 4 Query Tool, you can use the \d meta-command:
```
\d employees
```
Execution: Type this command (without a semicolon) into the Query Tool and execute it. Output: The output will list table columns, constraints, and at the bottom, under “Indexes:”, it will list all indexes associated with the employees table, including their type (e.g., btree) and the columns they cover. It also shows indexes backing constraints.
Method 2: Querying System Catalogs You can query the pg_indexes system view for more detailed information.

SELECT
	indexname,
	indexdef
FROM
	pg_indexes
WHERE
	tablename = 'employees';

Execution: Run this SQL query.

Output: This provides the index name and the CREATE INDEX command used to define it.

Method 3: Using pgAdmin 4 Object Browser
1. In the pgAdmin 4 browser pane on the left, navigate to your server.
2. Expand Databases -> your_database_name -> Schemas -> public (or your relevant schema).
3. Expand Tables -> employees.
4. Expand the Indexes node under the employees table.
5. This will list all indexes on the table by name (e.g., idx_employees_last_name, employees_pkey). Clicking on an index name will show its properties (including the indexed columns and definition) in the properties pane on the right.