Creating complex queries. Combining multiple tables and working with “Joins”.

1. Setting Up The Environment (pgAdmin 4)

  1. Launch pgAdmin 4: Open the pgAdmin 4 application.
  2. Connect to Server: Connect to your PostgreSQL server instance. You should have this configured from P1.
  3. Select Database: Navigate to your designated database for this course under Servers -> [Your Server Name] -> Databases. Right-click the database name and select “Query Tool”.
  4. Sample Schema: For today’s examples, we will use a hypothetical university database schema. Execute the following SQL code in the Query Tool (press F5 or click the “Execute/Refresh” button (the triangle/play icon)) to create and populate the necessary tables:
-- Drop tables if they exist (to start fresh)
DROP TABLE IF EXISTS enrollments;
DROP TABLE IF EXISTS courses;
DROP TABLE IF EXISTS instructors;
DROP TABLE IF EXISTS students;
DROP TABLE IF EXISTS departments;

-- Create Departments table
CREATE TABLE departments (
	dept_id SERIAL PRIMARY KEY,
	dept_name VARCHAR(100) NOT NULL UNIQUE,
	building VARCHAR(50)
);

-- Create Instructors table
CREATE TABLE instructors (
	instructor_id SERIAL PRIMARY KEY,
	instructor_name VARCHAR(100) NOT NULL,
	dept_id INT REFERENCES departments(dept_id)
);

-- Create Courses table
CREATE TABLE courses (
	course_id SERIAL PRIMARY KEY,
	course_name VARCHAR(100) NOT NULL,
	credits INT CHECK (credits > 0),
	dept_id INT REFERENCES departments(dept_id),
	instructor_id INT REFERENCES instructors(instructor_id)
);

-- Create Students table
CREATE TABLE students (
	student_id SERIAL PRIMARY KEY,
	student_name VARCHAR(100) NOT NULL,
	major_dept_id INT REFERENCES departments(dept_id),
	enrollment_year INT
);

-- Create Enrollments table (Junction table)
CREATE TABLE enrollments (
	enrollment_id SERIAL PRIMARY KEY,
	student_id INT REFERENCES students(student_id),
	course_id INT REFERENCES courses(course_id),
	grade CHAR(1) CHECK (grade IN ('A', 'B', 'C', 'D', 'F')),
	UNIQUE(student_id, course_id) -- A student can enroll in a course only once
);

-- Insert sample data
INSERT INTO departments (dept_name, building) VALUES
('Computer Science', 'Building A'),
('Mathematics', 'Building B'),
('Physics', 'Building A'),
('History', 'Building C');

INSERT INTO instructors (instructor_name, dept_id) VALUES
('Dr. Alan Turing', 1),
('Dr. Ada Lovelace', 1),
('Dr. Carl Gauss', 2),
('Dr. Marie Curie', 3),
('Dr. Herodotus', 4),
('Dr. Isaac Newton', NULL); -- Instructor not assigned to a department yet

INSERT INTO courses (course_name, credits, dept_id, instructor_id) VALUES
('Introduction to Programming', 4, 1, 1),
('Data Structures', 4, 1, 2),
('Calculus I', 4, 2, 3),
('Linear Algebra', 3, 2, 3),
('Quantum Mechanics', 4, 3, 4),
('Classical Mechanics', 4, 3, 6), -- Taught by Newton
('World History I', 3, 4, 5),
('Database Systems', 3, 1, 2); -- Taught by Lovelace

INSERT INTO students (student_name, major_dept_id, enrollment_year) VALUES
('Alice Smith', 1, 2022),
('Bob Johnson', 2, 2021),
('Charlie Brown', 1, 2022),
('Diana Prince', 3, 2023),
('Eve Adams', NULL, 2021); -- Undecided major

INSERT INTO enrollments (student_id, course_id, grade) VALUES
(1, 1, 'A'), -- Alice, Intro to Prog
(1, 2, 'B'), -- Alice, Data Structures
(1, 8, 'A'), -- Alice, Database Systems
(2, 3, 'A'), -- Bob, Calculus I
(2, 4, 'B'), -- Bob, Linear Algebra
(3, 1, 'B'), -- Charlie, Intro to Prog
(3, 8, 'C'), -- Charlie, Database Systems
(4, 5, 'A'); -- Diana, Quantum Mechanics
-- Note: Student Eve Adams is not enrolled in any course yet.
-- Note: Course World History I has no enrollments yet.

2. Joining Multiple Tables (Three or More)

The fundamental principle remains the same as joining two tables: you chain JOIN clauses. The output of the first join (conceptually, an intermediate table) becomes the left table for the second join, and so on.

Syntax Pattern:

SELECT
    t1.column1,
    t2.column2,
    t3.column3,
    -- ... other columns
FROM
    table1 AS t1
JOIN
    table2 AS t2 ON t1.joining_column = t2.joining_column
JOIN
    table3 AS t3 ON t2.another_joining_column = t3.another_joining_column
-- Additional JOIN clauses can be added here
WHERE
    -- Optional conditions
ORDER BY
    -- Optional sorting;

Key Points:

  • Aliases are Crucial: Use table aliases (t1, t2, t3, or more descriptive names like s, c, d) to keep your query readable and unambiguous, especially when tables have columns with the same name (e.g., dept_id).
  • Join Order Matters (Logically): Ensure the ON condition correctly links the tables being joined at that step. You typically join tables based on foreign key relationships.
  • Default is INNER JOIN: If you just write JOIN, PostgreSQL assumes INNER JOIN.

Example 1: List students, the courses they are enrolled in, and the instructors teaching those courses.

This requires joining students, enrollments, courses, and instructors.

  • students links to enrollments via student_id.
  • enrollments links to courses via course_id.
  • courses links to instructors via instructor_id.
SELECT
    s.student_name,
    c.course_name,
    i.instructor_name
FROM
    students AS s
INNER JOIN
    enrollments AS e ON s.student_id = e.student_id
INNER JOIN
    courses AS c ON e.course_id = c.course_id
INNER JOIN
    instructors AS i ON c.instructor_id = i.instructor_id
ORDER BY
    s.student_name,
    c.course_name;

Execution in pgAdmin 4:

  1. Enter the query into the Query Tool.
  2. Press F5 or click the Execute button.
  3. Observe the results in the Data Output pane. You should see student names paired with the courses they’re taking and the respective instructor names.

3. Combining Different Join Types

Sometimes, you need to combine INNER JOIN with LEFT JOIN (or RIGHT JOIN, FULL OUTER JOIN) to achieve the desired result. For instance, you might want to list all items from one table but only matching items from subsequent joins.

Example 2: List all students and, if they are enrolled in any courses, list those courses and their departments.

We need all students, so we start with students and use LEFT JOIN for enrollments and courses. We also want the course department.

  • Start with students.
  • LEFT JOIN to enrollments (to keep all students, even those not enrolled).
  • LEFT JOIN to courses (to keep students enrolled in courses that might somehow be missing details, though less likely with foreign keys enforced. Also, conceptually keeps the “all students” focus).
  • LEFT JOIN to departments (to get the department name for the course).
SELECT
    s.student_name,
    c.course_name,
    d.dept_name AS course_department
FROM
    students AS s
LEFT JOIN
    enrollments AS e ON s.student_id = e.student_id
LEFT JOIN
    courses AS c ON e.course_id = c.course_id
LEFT JOIN
    departments AS d ON c.dept_id = d.dept_id -- Join based on the course's department
ORDER BY
    s.student_name,
    c.course_name;

Explanation:

  • students LEFT JOIN enrollments: Ensures all students (like ‘Eve Adams’) appear, even if they have no matching rows in enrollments. For these students, columns from enrollments (and subsequent joins) will be NULL.
  • ... LEFT JOIN courses: If a student has enrollments, we try to find the matching course.
  • ... LEFT JOIN departments: If we found a course, we try to find its department.

Observation: You will see ‘Eve Adams’ listed with NULL values for course_name and course_department. Students enrolled in courses will show the course name and the department that offers the course.

Example 3: List all departments and the names of instructors in those departments. Include departments even if they currently have no instructors assigned.

  • Start with departments.
  • LEFT JOIN to instructors.
SELECT
    d.dept_name,
    d.building,
    i.instructor_name
FROM
    departments AS d
LEFT JOIN
    instructors AS i ON d.dept_id = i.dept_id
ORDER BY
    d.dept_name,
    i.instructor_name;

Observation: All departments are listed. If a department has instructors, their names are listed. If a department has no instructors assigned in the instructors table (or if an instructor has dept_id = NULL, like Dr. Isaac Newton), that department will still appear, potentially with NULL for the instructor name, or the instructor might not be linked to any displayed department.


4. Self-Joins

A self-join is used to join a table to itself. This is useful when rows within a single table have a relationship with other rows in the same table. A classic example is an employee table where one column stores the ID of the employee’s manager, who is also an employee in the same table.

Key Requirement: You must use table aliases to distinguish between the two conceptual copies of the table being joined.

Example 4: List instructors and the name of the department they belong to. (Using a self-join on the department table is not typical here, but let’s imagine a scenario where we model prerequisites within the courses table).

Modification for a better self-join example: Let’s add a prerequisite_course_id to the courses table.

First, update the table and data (Run this in pgAdmin Query Tool):

-- Add a prerequisite column to courses
ALTER TABLE courses
ADD COLUMN prerequisite_course_id INT REFERENCES courses(course_id) NULL; -- Allow NULL for courses with no prerequisite

-- Update some courses to have prerequisites
UPDATE courses SET prerequisite_course_id = 1 WHERE course_name = 'Data Structures'; -- Data Structures requires Intro to Prog
UPDATE courses SET prerequisite_course_id = 3 WHERE course_name = 'Linear Algebra'; -- Linear Algebra requires Calculus I
UPDATE courses SET prerequisite_course_id = 1 WHERE course_name = 'Database Systems'; -- Database Systems requires Intro to Prog

Now, the self-join query: List courses and their prerequisites.

SELECT
    c1.course_name AS course,
    c2.course_name AS prerequisite
FROM
    courses AS c1 -- The main course
LEFT JOIN
    courses AS c2 ON c1.prerequisite_course_id = c2.course_id -- Join back to courses table for the prerequisite
ORDER BY
    c1.course_name;

Explanation:

  • We reference the courses table twice, aliasing it as c1 (representing the course) and c2 (representing the prerequisite).
  • The LEFT JOIN ensures that courses without prerequisites (where prerequisite_course_id is NULL) are still listed, showing NULL in the prerequisite column.
  • The ON condition links the prerequisite_course_id of the main course (c1) to the course_id of the prerequisite course (c2).

Observation: The output will show each course and, if it has one defined, the name of its prerequisite course.


5. Combining Joins with Aggregation and Grouping

Complex queries often involve combining joins with aggregate functions (COUNT, AVG, etc. - P10) and GROUP BY / HAVING clauses (P6).

Example 5: List departments and the number of courses offered by each. Only include departments that offer more than one course.

This requires joining departments and courses, then grouping by department and filtering the groups.

SELECT
    d.dept_name,
    COUNT(c.course_id) AS number_of_courses
FROM
    departments AS d
INNER JOIN -- Or LEFT JOIN if you want to see departments with 0 courses (before filtering)
    courses AS c ON d.dept_id = c.dept_id
GROUP BY
    d.dept_id, d.dept_name -- Group by primary key and any other selected non-aggregated columns
HAVING
    COUNT(c.course_id) > 1 -- Filter groups based on the aggregate count
ORDER BY
    number_of_courses DESC,
    d.dept_name;

Explanation:

  1. departments is joined with courses on dept_id. An INNER JOIN only includes departments that have at least one course. A LEFT JOIN would include all departments initially.
  2. GROUP BY d.dept_id, d.dept_name: Groups the rows so that all courses belonging to the same department are together. We group by dept_id (the unique identifier) and include dept_name because it’s in the SELECT list and isn’t an aggregate.
  3. COUNT(c.course_id): Counts the number of courses within each department group.
  4. HAVING COUNT(c.course_id) > 1: Filters out the groups (departments) where the course count is not greater than 1.
  5. ORDER BY: Sorts the final result.

Example 6: Find the average grade for each course.

This requires joining courses and enrollments, then grouping. Note: We need to handle letter grades numerically for averaging.

-- We need a way to convert grades to numerical values for AVG.
-- For simplicity here, we'll just count enrollments per course instead of averaging grades.
-- A true average grade would require a CASE statement or a separate mapping table (potentially beyond P11 scope simplicity).

-- Let's find the number of students enrolled in each course.
SELECT
    c.course_name,
    COUNT(e.student_id) AS number_of_students_enrolled
FROM
    courses AS c
LEFT JOIN -- Use LEFT JOIN to include courses with zero enrollments
    enrollments AS e ON c.course_id = e.course_id
GROUP BY
    c.course_id, c.course_name -- Group by course
ORDER BY
    number_of_students_enrolled DESC,
    c.course_name;

Observation: This query lists all courses and shows how many students are enrolled in each, including courses with 0 enrollments like ‘World History I’.


6. Best Practices for Complex Joins

  • Use Meaningful Aliases: e for employees, d for departments is better than t1, t2.
  • Format Your SQL: Use indentation and line breaks to make complex queries readable. Align JOIN, ON, WHERE clauses.
  • Understand Join Types: Choose INNER, LEFT, RIGHT, or FULL OUTER deliberately based on whether you need to preserve all rows from the left, right, or both tables, even when there’s no match.
  • Specify Join Columns: Always use the ON clause with the correct joining columns. Avoid the older comma-separated join syntax in the FROM clause.
  • Test Incrementally: Build complex queries step-by-step. Start with two tables, verify the result, then add the next join, and so on.