Combining Multiple Tables with JOINs in PostgreSQL

Objectives:

  1. Understand the need for JOINs in relational databases.
  2. Explain the different types of JOINs: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, and CROSS JOIN.
  3. Write SQL queries using each type of JOIN in PostgreSQL.
  4. Execute JOIN queries using pgAdmin 4.
  5. Interpret the results of JOIN queries and understand how data is combined from multiple tables.

1. Sample Database Setup

Let’s create a simple database with two tables to illustrate JOINs. We’ll use tables named students and courses.

Create the students table. Execute the following SQL command in the query editor:

CREATE TABLE students (
    student_id SERIAL PRIMARY KEY,
    student_name VARCHAR(100) NOT NULL,
    major VARCHAR(50)
);

Create the courses table. Execute the following SQL command in the query editor:

CREATE TABLE courses (
    course_id SERIAL PRIMARY KEY,
    course_name VARCHAR(100) NOT NULL,
    credits INTEGER
);

Populate the tables with sample data. Execute the following INSERT statements to add some data to our tables:

INSERT INTO students (student_name, major) VALUES
('KOMILJONOV AZIMJON HAMIDBEKOVICH', 'Computer Science'),
('OTAJONOV SHOHZODBEK QAXRAMON O‘G‘LI', 'Mathematics'),
('OTANAZAROV SULTONBEK O‘KTAM O‘G‘LI', 'Physics'),
('QURAMBOYEV DIYORBEK RASHID O‘G‘LI', 'Chemistry'),
('RAXMONBERGANOV XAYRULLO MAXMUD O‘G‘LI', 'Chemistry'),
('SHONAZAROV AZIZBEK XUSHNUDBEK O‘G‘LI', 'Biology');

INSERT INTO courses (course_name, credits) VALUES
('Introduction to Programming', 3),
('Calculus I', 4),
('Classical Mechanics', 3),
('Organic Chemistry', 4),
('Genetics', 3),
('Data Structures', 3);

Add a linking column to represent student enrollment. Execute the following ALTER TABLE command:

ALTER TABLE courses
ADD COLUMN student_id INTEGER REFERENCES students(student_id);

This command adds a new column student_id to the courses table. It’s of type INTEGER and establishes a foreign key relationship with the student_id column in the students table. This means that the student_id in the courses table must either be NULL or match an existing student_id in the students table.

Update the courses table to link courses to students.

Now, let’s update the courses table to indicate which student is enrolled in each course. Note that not all courses will be linked to a student in this example, demonstrating different JOIN behaviours later.

UPDATE courses SET student_id = 1 WHERE course_id IN (1, 6);
UPDATE courses SET student_id = 2 WHERE course_id IN (2); 
UPDATE courses SET student_id = 3 WHERE course_id IN (3); 
UPDATE courses SET student_id = 4 WHERE course_id IN (4); 

We now have two tables, students and courses, with a relationship established through the student_id column in the courses table. Let’s explore different types of JOINs to combine data from these tables.

2. Types of JOINs

There are several types of JOINs in SQL, each designed for different scenarios of combining data. We’ll cover the most common ones:

  • INNER JOIN: Returns rows only when there is a match in both tables based on the join condition.
  • LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table and the matching rows from the right table. If there’s no match in the right table, it returns NULL values for the columns from the right table.
  • RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table and the matching rows from the left table. If there’s no match in the left table, it returns NULL values for the columns from the left table.
  • FULL OUTER JOIN: Returns all rows from both tables. If there’s a match, rows are combined. If there’s no match in one table, NULL values are used for the columns from the other table.
  • CROSS JOIN: Returns the Cartesian product of all rows from both tables. It combines every row from the first table with every row from the second table.

Let’s examine each type with examples using our students and courses tables.

INNER JOIN Purpose: To retrieve rows where there is a match in both tables based on a specified condition. Syntax:

SELECT column1, column2, ...
FROM table1
INNER JOIN table2 ON join_condition;
  • table1 and table2 are the tables you want to join.
  • join_condition specifies how the tables should be related. It’s usually based on a common column between the tables. In our case, it will be students.student_id = courses.student_id.

Example: Let’s find the names of students and the courses they are enrolled in. Execute the following query in pgAdmin 4:

SELECT students.student_name, courses.course_name
FROM students
INNER JOIN courses ON students.student_id = courses.student_id;

Explanation:

  • We are selecting student_name from the students table and course_name from the courses table.
  • FROM students INNER JOIN courses: We are specifying that we want to join the students and courses tables using an INNER JOIN.
  • ON students.student_id = courses.student_id: This is the join condition. It specifies that we want to match rows where the student_id in the students table is equal to the student_id in the courses table.

Result: You will see a table like this in the “Data output” pane of pgAdmin 4:

Interpretation: The INNER JOIN only returned rows where there was a matching student_id in both the students and courses tables. Courses like ‘Genetics’ which are not associated with any student (i.e., student_id is NULL or not matching) and students like ‘RAXMONBERGANOV XAYRULLO MAXMUD O‘G‘LI’ who are not explicitly linked to courses in our current setup are not included in the result.

LEFT JOIN (LEFT OUTER JOIN) Purpose: To retrieve all rows from the left table and matching rows from the right table. If there is no match in the right table, it still includes the row from the left table, but with NULL values for the columns from the right table. Syntax:

SELECT column1, column2, ...
FROM table1
LEFT JOIN table2 ON join_condition;
  • table1 is considered the left table, and table2 is the right table in a LEFT JOIN.

Example: Let’s list all students and, if they are enrolled in a course, show the course name. Execute the following query in pgAdmin 4:

SELECT students.student_name, courses.course_name
FROM students
LEFT JOIN courses ON students.student_id = courses.student_id;

Explanation:

  • We are using LEFT JOIN with students as the left table and courses as the right table.
  • The query will return all rows from the students table.
  • For each student, if there’s a matching student_id in the courses table, it will also retrieve the course_name.
  • If a student is not enrolled in any course (according to our data), the course_name will be NULL.

Result:

Interpretation: Notice that ‘RAXMONBERGANOV XAYRULLO MAXMUD O‘G‘LI’ and ‘SHONAZAROV AZIZBEK XUSHNUDBEK O‘G‘LI’ are included in the result, even though she is not associated with any course in our courses table (no matching student_id). The course_name is NULL, indicating no course is linked for this student based on the join condition.

RIGHT JOIN (RIGHT OUTER JOIN) Purpose: To retrieve all rows from the right table and matching rows from the left table. If there is no match in the left table, it still includes the row from the right table, but with NULL values for the columns from the left table. Syntax:

SELECT column1, column2, ...
FROM table1
RIGHT JOIN table2 ON join_condition;
  • table1 is the left table, and table2 is the right table in a RIGHT JOIN.

Example: Let’s list all courses and, if a student is enrolled, show the student’s name. Execute the following query in pgAdmin 4:

SELECT students.student_name, courses.course_name
FROM students
RIGHT JOIN courses ON students.student_id = courses.student_id;

Explanation:

  • We are using RIGHT JOIN with students as the left table and courses as the right table.
  • The query will return all rows from the courses table.
  • For each course, if there’s a matching student_id in the students table, it will also retrieve the student_name.
  • If a course is not associated with any student (although in our example, all courses are linked in some way, let’s imagine if ‘Genetics’ was not linked), the student_name would be NULL.

Result:

Interpretation: ‘Genetics’ is included in the result, even though we haven’t explicitly linked it to a student in our students table for this demonstration. Since our current data links all courses to some student, the practical difference between RIGHT JOIN and INNER JOIN in this specific dataset is that RIGHT JOIN will include courses even if they hypothetically weren’t linked to any student (which is not the case in our current data setup, but conceptually important). If ‘Genetics’ had student_id as NULL or an ID not in students, and we wanted to see it in the result, RIGHT JOIN would ensure it appears.

FULL OUTER JOIN Purpose: To retrieve all rows from both the left and right tables. For matching rows, columns are combined. For non-matching rows from the left table, right table columns will be NULL, and for non-matching rows from the right table, left table columns will be NULL. Syntax:

SELECT column1, column2, ...
FROM table1
FULL OUTER JOIN table2 ON join_condition;

Example: Let’s see all students and all courses, showing matches where they exist and NULL where they don’t. Execute the following query in pgAdmin 4:

SELECT students.student_name, courses.course_name
FROM students
FULL OUTER JOIN courses ON students.student_id = courses.student_id;

Explanation:

  • FULL OUTER JOIN combines the effects of LEFT JOIN and RIGHT JOIN.
  • It will include all students and all courses.
  • Where there’s a match based on student_id, the rows will be combined.
  • For students without a matching course (in our current setup, all students are linked to at least one course), and for courses without a matching student (again, in our setup, all courses are linked), you would see NULL values in the columns from the table without a match.

Result:

Interpretation: This result shows all students and all courses. ‘SHONAZAROV AZIZBEK XUSHNUDBEK O‘G‘LI’, who is not linked to a course, appears with course_name as NULL. ‘Genetics’, which we can imagine might not be linked to a student in some scenarios, appears with student_name as NULL. FULL OUTER JOIN ensures that no row from either table is lost in the result.

CROSS JOIN Purpose: To perform a Cartesian product of the two tables. This means every row from the first table is combined with every row from the second table. It’s generally used less frequently than other JOIN types and can result in very large result sets if tables are large. Syntax:

SELECT column1, column2, ...
FROM table1
CROSS JOIN table2;

Note: CROSS JOIN does not use an ON condition because it combines every row with every other row, regardless of any relationship. Example: Let’s see all possible combinations of students and courses. Execute the following query in pgAdmin 4:

SELECT students.student_name, courses.course_name
FROM students
CROSS JOIN courses;

Explanation:

  • CROSS JOIN combines each student with every course.
  • If you have 5 students and 6 courses, the result will have 6 * 6 = 36 rows. Result: (Only a partial result is shown due to length)

Interpretation: The result is a list of every possible pairing of a student with a course. CROSS JOIN is useful in specific situations, such as generating all possible combinations for testing or analysis, but it’s crucial to understand its behaviour to avoid unintentionally creating extremely large result sets.

3. Aliases for Tables

When writing JOIN queries, especially with multiple tables or when referring to the same table multiple times (though not covered in P8), it’s good practice to use aliases for table names to make queries shorter and more readable.

Example (using INNER JOIN with aliases):

SELECT s.student_name, c.course_name
FROM students AS s
INNER JOIN courses AS c ON s.student_id = c.student_id;

In this example, s is an alias for students, and c is an alias for courses. You can use these aliases throughout the query instead of the full table names.

4. WHERE Clause with JOINs

You can combine JOINs with WHERE clauses to further filter the results. The WHERE clause is applied after the JOIN operation has been performed.

Example: Find the names of Computer Science students and the courses they are enrolled in.

SELECT s.student_name, c.course_name
FROM students AS s
INNER JOIN courses AS c ON s.student_id = c.student_id
WHERE s.major = 'Computer Science';

Explanation:

  • This query first performs an INNER JOIN between students and courses.
  • Then, the WHERE clause filters the result to only include rows where the major in the students table is ‘Computer Science’.

5. Using COALESCE() to Handle NULL Values in JOIN Results

When using LEFT JOIN, RIGHT JOIN, or FULL OUTER JOIN, you might encounter NULL values in your result set, especially for columns from the table that might not have a match. The COALESCE() function is very useful for replacing these NULL values with more meaningful or user-friendly values.

Purpose of COALESCE(): The COALESCE() function returns the first non-NULL expression in a list of expressions. It takes multiple arguments and evaluates them in order until it finds the first one that is not NULL, and then returns that value. If all arguments are NULL, COALESCE() will return NULL.

Syntax:

COALESCE(expression1, expression2, expression3, ...);

Example with LEFT JOIN: In our LEFT JOIN example, we displayed all students and their enrolled courses. For students not enrolled in any course, the course_name was shown as NULL. Let’s use COALESCE() to replace this NULL with the text ‘Not enrolled in any course’.

Execute the following query in pgAdmin 4:

SELECT s.student_name, COALESCE(c.course_name, 'Not enrolled in any course') AS course_name
FROM students AS s
LEFT JOIN courses AS c ON s.student_id = c.student_id;

Explanation:

  • COALESCE(c.course_name, 'Not enrolled in any course'): This part of the query checks the c.course_name (which is courses.course_name due to alias c).
    • If c.course_name is not NULL (meaning the student is enrolled in a course), COALESCE() will return the actual course_name.
    • If c.course_name is NULL (meaning the student is not enrolled in a course in our courses table), COALESCE() will return the second argument, which is the string 'Not enrolled in any course'.
  • AS course_name: We are using an alias course_name for the result of the COALESCE() function, so the output column will still be named course_name.

Result:

Interpretation: Now, instead of seeing NULL for students who are not enrolled in a course, you see ‘Not enrolled in any course’. This makes the output more readable and informative. COALESCE() is a powerful function for handling NULL values and providing default values in your query results, especially when working with JOINs that can introduce NULL values. You can use COALESCE() with more than two expressions to check for multiple potential NULL values and provide a fallback option.