Databases | Practical class 7
Combining Multiple Tables with JOINs in PostgreSQL
Objectives:
- Understand the need for JOINs in relational databases.
- Explain the different types of JOINs:
INNER JOIN
,LEFT JOIN
,RIGHT JOIN
,FULL OUTER JOIN
, andCROSS JOIN
. - Write SQL queries using each type of JOIN in PostgreSQL.
- Execute JOIN queries using pgAdmin 4.
- Interpret the results of JOIN queries and understand how data is combined from multiple tables.
1. Sample Database Setup
Let’s create a simple database with two tables to illustrate JOINs. We’ll use tables named students
and courses
.
Create the students
table.
Execute the following SQL command in the query editor:
CREATE TABLE students (
student_id SERIAL PRIMARY KEY,
student_name VARCHAR(100) NOT NULL,
major VARCHAR(50)
);
Create the courses
table.
Execute the following SQL command in the query editor:
CREATE TABLE courses (
course_id SERIAL PRIMARY KEY,
course_name VARCHAR(100) NOT NULL,
credits INTEGER
);
Populate the tables with sample data.
Execute the following INSERT
statements to add some data to our tables:
INSERT INTO students (student_name, major) VALUES
('KOMILJONOV AZIMJON HAMIDBEKOVICH', 'Computer Science'),
('OTAJONOV SHOHZODBEK QAXRAMON O‘G‘LI', 'Mathematics'),
('OTANAZAROV SULTONBEK O‘KTAM O‘G‘LI', 'Physics'),
('QURAMBOYEV DIYORBEK RASHID O‘G‘LI', 'Chemistry'),
('RAXMONBERGANOV XAYRULLO MAXMUD O‘G‘LI', 'Chemistry'),
('SHONAZAROV AZIZBEK XUSHNUDBEK O‘G‘LI', 'Biology');
INSERT INTO courses (course_name, credits) VALUES
('Introduction to Programming', 3),
('Calculus I', 4),
('Classical Mechanics', 3),
('Organic Chemistry', 4),
('Genetics', 3),
('Data Structures', 3);
Add a linking column to represent student enrollment.
Execute the following ALTER TABLE
command:
ALTER TABLE courses
ADD COLUMN student_id INTEGER REFERENCES students(student_id);
This command adds a new column student_id
to the courses
table. It’s of type INTEGER
and establishes a foreign key relationship with the student_id
column in the students
table. This means that the student_id
in the courses
table must either be NULL
or match an existing student_id
in the students
table.
Update the courses
table to link courses to students.
Now, let’s update the courses
table to indicate which student is enrolled in each course. Note that not all courses will be linked to a student in this example, demonstrating different JOIN behaviours later.
UPDATE courses SET student_id = 1 WHERE course_id IN (1, 6);
UPDATE courses SET student_id = 2 WHERE course_id IN (2);
UPDATE courses SET student_id = 3 WHERE course_id IN (3);
UPDATE courses SET student_id = 4 WHERE course_id IN (4);
We now have two tables, students
and courses
, with a relationship established through the student_id
column in the courses
table. Let’s explore different types of JOINs to combine data from these tables.
2. Types of JOINs
There are several types of JOINs in SQL, each designed for different scenarios of combining data. We’ll cover the most common ones:
- INNER JOIN: Returns rows only when there is a match in both tables based on the join condition.
- LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table and the matching rows from the right table. If there’s no match in the right table, it returns
NULL
values for the columns from the right table. - RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table and the matching rows from the left table. If there’s no match in the left table, it returns
NULL
values for the columns from the left table. - FULL OUTER JOIN: Returns all rows from both tables. If there’s a match, rows are combined. If there’s no match in one table,
NULL
values are used for the columns from the other table. - CROSS JOIN: Returns the Cartesian product of all rows from both tables. It combines every row from the first table with every row from the second table.
Let’s examine each type with examples using our students
and courses
tables.
INNER JOIN Purpose: To retrieve rows where there is a match in both tables based on a specified condition. Syntax:
SELECT column1, column2, ...
FROM table1
INNER JOIN table2 ON join_condition;
table1
andtable2
are the tables you want to join.join_condition
specifies how the tables should be related. It’s usually based on a common column between the tables. In our case, it will bestudents.student_id = courses.student_id
.
Example: Let’s find the names of students and the courses they are enrolled in. Execute the following query in pgAdmin 4:
SELECT students.student_name, courses.course_name
FROM students
INNER JOIN courses ON students.student_id = courses.student_id;
Explanation:
- We are selecting
student_name
from thestudents
table andcourse_name
from thecourses
table. FROM students INNER JOIN courses
: We are specifying that we want to join thestudents
andcourses
tables using anINNER JOIN
.ON students.student_id = courses.student_id
: This is the join condition. It specifies that we want to match rows where thestudent_id
in thestudents
table is equal to thestudent_id
in thecourses
table.
Result: You will see a table like this in the “Data output” pane of pgAdmin 4:
Interpretation: The INNER JOIN
only returned rows where there was a matching student_id
in both the students
and courses
tables. Courses like ‘Genetics’ which are not associated with any student (i.e., student_id is NULL or not matching) and students like ‘RAXMONBERGANOV XAYRULLO MAXMUD O‘G‘LI’ who are not explicitly linked to courses in our current setup are not included in the result.
LEFT JOIN (LEFT OUTER JOIN)
Purpose: To retrieve all rows from the left table and matching rows from the right table. If there is no match in the right table, it still includes the row from the left table, but with NULL
values for the columns from the right table.
Syntax:
SELECT column1, column2, ...
FROM table1
LEFT JOIN table2 ON join_condition;
table1
is considered the left table, andtable2
is the right table in aLEFT JOIN
.
Example: Let’s list all students and, if they are enrolled in a course, show the course name. Execute the following query in pgAdmin 4:
SELECT students.student_name, courses.course_name
FROM students
LEFT JOIN courses ON students.student_id = courses.student_id;
Explanation:
- We are using
LEFT JOIN
withstudents
as the left table andcourses
as the right table. - The query will return all rows from the
students
table. - For each student, if there’s a matching
student_id
in thecourses
table, it will also retrieve thecourse_name
. - If a student is not enrolled in any course (according to our data), the
course_name
will beNULL
.
Result:
Interpretation: Notice that ‘RAXMONBERGANOV XAYRULLO MAXMUD O‘G‘LI’ and ‘SHONAZAROV AZIZBEK XUSHNUDBEK O‘G‘LI’ are included in the result, even though she is not associated with any course in our courses
table (no matching student_id
). The course_name
is NULL
, indicating no course is linked for this student based on the join condition.
RIGHT JOIN (RIGHT OUTER JOIN)
Purpose: To retrieve all rows from the right table and matching rows from the left table. If there is no match in the left table, it still includes the row from the right table, but with NULL
values for the columns from the left table.
Syntax:
SELECT column1, column2, ...
FROM table1
RIGHT JOIN table2 ON join_condition;
table1
is the left table, andtable2
is the right table in aRIGHT JOIN
.
Example: Let’s list all courses and, if a student is enrolled, show the student’s name. Execute the following query in pgAdmin 4:
SELECT students.student_name, courses.course_name
FROM students
RIGHT JOIN courses ON students.student_id = courses.student_id;
Explanation:
- We are using
RIGHT JOIN
withstudents
as the left table andcourses
as the right table. - The query will return all rows from the
courses
table. - For each course, if there’s a matching
student_id
in thestudents
table, it will also retrieve thestudent_name
. - If a course is not associated with any student (although in our example, all courses are linked in some way, let’s imagine if ‘Genetics’ was not linked), the
student_name
would beNULL
.
Result:
Interpretation: ‘Genetics’ is included in the result, even though we haven’t explicitly linked it to a student in our students
table for this demonstration. Since our current data links all courses to some student, the practical difference between RIGHT JOIN
and INNER JOIN
in this specific dataset is that RIGHT JOIN
will include courses even if they hypothetically weren’t linked to any student (which is not the case in our current data setup, but conceptually important). If ‘Genetics’ had student_id
as NULL
or an ID not in students
, and we wanted to see it in the result, RIGHT JOIN
would ensure it appears.
FULL OUTER JOIN
Purpose: To retrieve all rows from both the left and right tables. For matching rows, columns are combined. For non-matching rows from the left table, right table columns will be NULL
, and for non-matching rows from the right table, left table columns will be NULL
.
Syntax:
SELECT column1, column2, ...
FROM table1
FULL OUTER JOIN table2 ON join_condition;
Example: Let’s see all students and all courses, showing matches where they exist and NULL
where they don’t.
Execute the following query in pgAdmin 4:
SELECT students.student_name, courses.course_name
FROM students
FULL OUTER JOIN courses ON students.student_id = courses.student_id;
Explanation:
FULL OUTER JOIN
combines the effects ofLEFT JOIN
andRIGHT JOIN
.- It will include all students and all courses.
- Where there’s a match based on
student_id
, the rows will be combined. - For students without a matching course (in our current setup, all students are linked to at least one course), and for courses without a matching student (again, in our setup, all courses are linked), you would see
NULL
values in the columns from the table without a match.
Result:
Interpretation: This result shows all students and all courses. ‘SHONAZAROV AZIZBEK XUSHNUDBEK O‘G‘LI’, who is not linked to a course, appears with course_name
as NULL
. ‘Genetics’, which we can imagine might not be linked to a student in some scenarios, appears with student_name
as NULL
. FULL OUTER JOIN
ensures that no row from either table is lost in the result.
CROSS JOIN Purpose: To perform a Cartesian product of the two tables. This means every row from the first table is combined with every row from the second table. It’s generally used less frequently than other JOIN types and can result in very large result sets if tables are large. Syntax:
SELECT column1, column2, ...
FROM table1
CROSS JOIN table2;
Note: CROSS JOIN
does not use an ON
condition because it combines every row with every other row, regardless of any relationship.
Example: Let’s see all possible combinations of students and courses.
Execute the following query in pgAdmin 4:
SELECT students.student_name, courses.course_name
FROM students
CROSS JOIN courses;
Explanation:
CROSS JOIN
combines each student with every course.- If you have 5 students and 6 courses, the result will have 6 * 6 = 36 rows. Result: (Only a partial result is shown due to length)
Interpretation: The result is a list of every possible pairing of a student with a course. CROSS JOIN
is useful in specific situations, such as generating all possible combinations for testing or analysis, but it’s crucial to understand its behaviour to avoid unintentionally creating extremely large result sets.
3. Aliases for Tables
When writing JOIN queries, especially with multiple tables or when referring to the same table multiple times (though not covered in P8), it’s good practice to use aliases for table names to make queries shorter and more readable.
Example (using INNER JOIN with aliases):
SELECT s.student_name, c.course_name
FROM students AS s
INNER JOIN courses AS c ON s.student_id = c.student_id;
In this example, s
is an alias for students
, and c
is an alias for courses
. You can use these aliases throughout the query instead of the full table names.
4. WHERE
Clause with JOINs
You can combine JOINs with WHERE
clauses to further filter the results. The WHERE
clause is applied after the JOIN operation has been performed.
Example: Find the names of Computer Science students and the courses they are enrolled in.
SELECT s.student_name, c.course_name
FROM students AS s
INNER JOIN courses AS c ON s.student_id = c.student_id
WHERE s.major = 'Computer Science';
Explanation:
- This query first performs an
INNER JOIN
betweenstudents
andcourses
. - Then, the
WHERE
clause filters the result to only include rows where themajor
in thestudents
table is ‘Computer Science’.
5. Using COALESCE()
to Handle NULL Values in JOIN Results
When using LEFT JOIN
, RIGHT JOIN
, or FULL OUTER JOIN
, you might encounter NULL
values in your result set, especially for columns from the table that might not have a match. The COALESCE()
function is very useful for replacing these NULL
values with more meaningful or user-friendly values.
Purpose of COALESCE()
:
The COALESCE()
function returns the first non-NULL expression in a list of expressions. It takes multiple arguments and evaluates them in order until it finds the first one that is not NULL
, and then returns that value. If all arguments are NULL
, COALESCE()
will return NULL
.
Syntax:
COALESCE(expression1, expression2, expression3, ...);
Example with LEFT JOIN
:
In our LEFT JOIN
example, we displayed all students and their enrolled courses. For students not enrolled in any course, the course_name
was shown as NULL
. Let’s use COALESCE()
to replace this NULL
with the text ‘Not enrolled in any course’.
Execute the following query in pgAdmin 4:
SELECT s.student_name, COALESCE(c.course_name, 'Not enrolled in any course') AS course_name
FROM students AS s
LEFT JOIN courses AS c ON s.student_id = c.student_id;
Explanation:
COALESCE(c.course_name, 'Not enrolled in any course')
: This part of the query checks thec.course_name
(which iscourses.course_name
due to aliasc
).- If
c.course_name
is notNULL
(meaning the student is enrolled in a course),COALESCE()
will return the actualcourse_name
. - If
c.course_name
isNULL
(meaning the student is not enrolled in a course in ourcourses
table),COALESCE()
will return the second argument, which is the string'Not enrolled in any course'
.
- If
AS course_name
: We are using an aliascourse_name
for the result of theCOALESCE()
function, so the output column will still be namedcourse_name
.
Result:
Interpretation:
Now, instead of seeing NULL
for students who are not enrolled in a course, you see ‘Not enrolled in any course’. This makes the output more readable and informative. COALESCE()
is a powerful function for handling NULL
values and providing default values in your query results, especially when working with JOINs that can introduce NULL
values. You can use COALESCE()
with more than two expressions to check for multiple potential NULL
values and provide a fallback option.