Databases | Practical class 11
Creating an INDEX in SQL
In database systems, an index is a performance-tuning structure associated with a table or view. Its primary purpose is to speed up data retrieval operations (SELECT queries) on a table at the cost of slightly slower data modification (INSERT, UPDATE, DELETE) and additional storage space. Today, we will explore how to work with indexes in PostgreSQL.
We will use a sample table named employees
for our examples. Assume this table has been created as follows:
-- Drop table if it exists to start fresh
DROP TABLE IF EXISTS employees;
-- Create the employees table
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100), -- Removed UNIQUE constraint for unique index example
hire_date DATE,
salary NUMERIC(10, 2),
department_id INT,
status VARCHAR(10) DEFAULT 'active', -- e.g., 'active', 'inactive'
profile JSONB -- For JSON indexing example
);
-- Insert some sample data
INSERT INTO employees (first_name, last_name, email, hire_date, salary, department_id, profile) VALUES
('John', 'Doe', 'john.doe@company.com', '2022-01-15', 60000, 1, '{"skills": ["SQL", "Python"], "level": "Mid"}'),
('Jane', 'Smith', 'jane.smith@company.com', '2021-03-10', 75000, 2, '{"skills": ["Java", "Spring"], "level": "Senior"}'),
('Peter', 'Jones', 'peter.jones@company.com', '2023-05-20', 55000, 1, '{"skills": ["HTML", "CSS"], "level": "Junior"}'),
('Alice', 'Brown', 'alice.brown@company.com', '2022-08-01', 80000, 2, '{"skills": ["Project Management"], "level": "Lead"}'),
('Bob', 'White', 'bob.white@company.com', '2023-11-01', 50000, 3, null),
('Jane', 'Doe', 'jane.doe@company.com', '2023-02-28', 62000, 1, '{"skills": ["Python", "Data Analysis"], "level": "Mid"}');
You can execute this setup script in pgAdmin 4’s Query Tool.
1. Create Index – Basic Index Definition
A basic index helps speed up queries that filter or sort data based on the indexed column(s). PostgreSQL uses B-tree indexes by default, which are suitable for a wide range of queries involving comparison operators (=
, <
, <=
, >
, >=
).
- Syntax:
CREATE INDEX index_name ON table_name (column_name);
You can optionally specify the index type using
USING method
(e.g.,USING btree
), but B-tree is the default. - Example: Create an index on the
last_name
column of theemployees
table to speed up searches by last name.CREATE INDEX idx_employees_last_name ON employees (last_name);
Execution: Run this command in the pgAdmin 4 Query Tool connected to your database.
- Note: Primary Key and Unique constraints automatically create underlying indexes in PostgreSQL. You don’t need to manually create an index for the
employee_id
column because it’s the primary key.
2. Unique Index – Enforcing Uniqueness
A unique index serves two purposes: it speeds up data retrieval (like a regular index) and enforces the constraint that values in the indexed column(s) must be unique across the table (allowing NULLs by default, but only one NULL unless specified otherwise in some contexts).
- Syntax:
CREATE UNIQUE INDEX index_name ON table_name (column_name);
- Example: Ensure that every employee has a unique email address by creating a unique index on the
email
column.CREATE UNIQUE INDEX idx_employees_email_unique ON employees (email);
Execution: Run this command. If duplicate emails exist in your sample data (excluding NULLs), the command will fail.
- Attempting Violation: Try inserting a duplicate email:
-- This should FAIL because of the unique index INSERT INTO employees (first_name, last_name, email, hire_date, salary, department_id) VALUES ('Duplicate', 'User', 'john.doe@company.com', '2024-01-01', 50000, 1);
You will receive an error indicating a violation of the unique constraint imposed by the index.
3. Index on Expression – Indexing Calculated Values
Sometimes, queries filter or sort based on the result of a function or expression applied to a column, rather than the column’s raw value. You can create an index on such expressions.
- Syntax:
CREATE INDEX index_name ON table_name ((expression)); -- Note the double parentheses
- Example: If you frequently perform case-insensitive searches on the
email
column, you can index the lower-cased version of the email.CREATE INDEX idx_employees_email_lower ON employees (LOWER(email));
Execution: Run this command.
- Usage: A query like the following can now potentially use this index:
SELECT employee_id, first_name, last_name, email FROM employees WHERE LOWER(email) = 'jane.smith@company.com';
4. Partial Index – Indexing a Subset of Rows
A partial index includes entries only for a subset of the table’s rows that satisfy a specific condition (defined by a WHERE
clause). This can save storage space and improve performance for queries that target this specific subset.
- Syntax:
CREATE INDEX index_name ON table_name (column_name) WHERE condition;
- Example: Create an index only for
active
employees based on theirhire_date
. This is useful if queries frequently look for active employees hired within certain date ranges.CREATE INDEX idx_employees_active_hire_date ON employees (hire_date) WHERE status = 'active';
Execution: Run this command.
- Usage: Queries that filter by
hire_date
and includeWHERE status = 'active'
are candidates for using this index:SELECT first_name, last_name, hire_date FROM employees WHERE status = 'active' AND hire_date > '2023-01-01';
5. Multicolumn Indexes – Indexing Multiple Columns
You can create an index on multiple columns. The order of columns in the index definition matters. A multicolumn index can speed up queries that filter or sort on the leading columns of the index, or on all columns in the index.
- Syntax:
CREATE INDEX index_name ON table_name (column1, column2, ...);
- Example: Create an index on
department_id
andsalary
to optimize queries that filter by department and then sort or filter by salary within that department.
CREATE INDEX idx_employees_dept_salary
ON employees (department_id, salary);
Execution: Run this command.
- Usage: This index can potentially optimize queries like:
-- Filter by the first column
SELECT * FROM employees WHERE department_id = 1;
-- Filter by the first and second columns
SELECT * FROM employees WHERE department_id = 2 AND salary > 70000;
-- Filter by the first and sort by the second
SELECT * FROM employees WHERE department_id = 1 ORDER BY salary DESC;
Note: A query filtering only on salary
might not use this index effectively, as salary
is not the leading column.
6. Reindex – Rebuilding Indexes
Over time, especially in tables with frequent updates or deletes, indexes can become bloated (containing empty space) or fragmented, potentially reducing their efficiency. You might also need to rebuild indexes after changing certain storage parameters. The REINDEX
command rebuilds an index or all indexes on a table or database.
- Syntax:
-- Rebuild a specific index
REINDEX INDEX index_name;
-- Rebuild all indexes on a specific table
REINDEX TABLE table_name;
-- Rebuild all indexes in the current database (requires appropriate permissions)
REINDEX DATABASE database_name;
-- Rebuild all indexes on system tables in the current database
REINDEX SYSTEM database_name;
- Example: Rebuild the
idx_employees_last_name
index.
REINDEX INDEX idx_employees_last_name;
Execution: Run this command. It will lock the table for writes while rebuilding.
- Example: Rebuild all indexes on the
employees
table.
REINDEX TABLE employees;
Execution: Run this command.
- Note:
REINDEX
typically requires exclusive locks, which can block other operations. For high-availability systems, considerREINDEX CONCURRENTLY
, though this has its own trade-offs and is a more advanced topic beyond the scope of simply rebuilding.
7. Drop Index – Deleting an Index
If an index is no longer needed, consumes too much space, or negatively impacts write performance more than it benefits read performance, you can remove it.
- Syntax:
DROP INDEX index_name;
You can add IF EXISTS
to avoid an error if the index doesn’t exist:
DROP INDEX IF EXISTS index_name;
- Example: Remove the expression index on
LOWER(email)
.
DROP INDEX IF EXISTS idx_employees_email_lower;
Execution: Run this command.
- Caution: Do not drop indexes created automatically by primary key or unique constraints unless you also intend to drop the constraint itself (which is usually done via
ALTER TABLE ... DROP CONSTRAINT
).
8. List Indexes – Finding Existing Indexes
You need ways to see which indexes exist on a table or in your database.
- Method 1: Using
psql
command (or SQL Pane in pgAdmin 4) Inpsql
or the pgAdmin 4 Query Tool, you can use the\d
meta-command:\d employees
Execution: Type this command (without a semicolon) into the Query Tool and execute it. Output: The output will list table columns, constraints, and at the bottom, under “Indexes:”, it will list all indexes associated with the
employees
table, including their type (e.g., btree) and the columns they cover. It also shows indexes backing constraints. - Method 2: Querying System Catalogs
You can query the
pg_indexes
system view for more detailed information.
SELECT
indexname,
indexdef
FROM
pg_indexes
WHERE
tablename = 'employees';
Execution: Run this SQL query.
Output: This provides the index name and the CREATE INDEX
command used to define it.
- Method 3: Using pgAdmin 4 Object Browser
- In the pgAdmin 4 browser pane on the left, navigate to your server.
- Expand Databases -> your_database_name -> Schemas -> public (or your relevant schema).
- Expand Tables ->
employees
. - Expand the
Indexes
node under theemployees
table. - This will list all indexes on the table by name (e.g.,
idx_employees_last_name
,employees_pkey
). Clicking on an index name will show its properties (including the indexed columns and definition) in the properties pane on the right.