Databases Midterm Exam 2

Concepts

question: Explain the difference between the VARCHAR(10) and CHAR(10) data types. When might CHAR be used?
answer: VARCHAR(10) stores variable-length strings up to a maximum of 10 characters, using only the space needed for the actual string. CHAR(10) stores fixed-length strings of exactly 10 characters; shorter strings are padded with spaces, and it always uses the full 10 characters of storage. CHAR is typically used for short, fixed-length codes like country codes (CHAR(2)) or state abbreviations (CHAR(2)).

question: Describe the two-step execution order when a subquery is used inside the WHERE clause of an outer query.
answer: Inner Query Execution: The database system first executes the subquery (inside the parentheses) which produces a result set. Outer Query Execution: The database system then executes the outer query, using the result set from the subquery in its WHERE clause (or other clauses) as a filter or condition.

question: Explain the purpose of the ORDER BY clause in SQL. What are the two sorting direction, and which one is the default if not specified?
answer: The ORDER BY clause is used to sort the results of a query based on one or more columns. The two sorting options mentioned are ASC (ascending order, A-Z, smallest to largest), which is the default, and DESC (descending order, Z-A, largest to smallest).

question: What is the main purpose of the GROUP BY clause, and with which type of SQL function is it most powerfully used?
answer: The GROUP BY clause groups rows that have the same values in specified columns into summary rows. It is most powerfully used with Aggregate Functions, allowing calculations to be performed per group rather than on the entire table.

question: Define what a database transaction is. Explain its primary purpose.
answer: According to the lecture, a transaction is a sequence of one or more database operations (like INSERT, UPDATE, DELETE) treated as a single, logical unit of work. Its primary purpose is to ensure data integrity and maintain consistency by guaranteeing that related changes occur together or not at all, preventing partial updates.

question: List the four properties represented by the acronym ACID. What is the overall goal of these properties?
answer: The four properties represented by ACID are: Atomicity, Consistency, Isolation, Durability, The overall goal of these properties is to ensure that database transactions are processed reliably, even in the presence of errors, power failures, or concurrent access.

question: Explain the Consistency property (“C” in ACID). How does it relate to database rules like constraints (e.g., CHECK, NOT NULL)?
answer: Consistency ensures that a transaction transitions the database from one valid state to another consistent state. It preserves all defined database rules (constraints) such as NOT NULL, UNIQUE, CHECK constraints (like Balance >= 0), and Foreign Key relationships. If a transaction attempts an operation that would violate a constraint, it violates consistency and must be rolled back.

question: What is the primary method recommended for controlling the order of operations in complex SQL expressions, and why is it beneficial even if the default precedence seems correct?
answer: The primary method is using parentheses (). It’s beneficial because it explicitly enforces the intended sequence, overrides default operator precedence, and significantly improves the readability and clarity of the code, reducing potential ambiguity and errors.

question: Why is a single, centralized computer generally insufficient for handling the requirements of large-scale internet services like Google Search or Amazon? List two distinct reasons mentioned.
answer: A single computer cannot handle the (1) massive data storage requirements and (2) the high processing load (billions of queries/transactions) and/or (3) the high availability requirements of these large-scale services. This necessitates distributing data and processing.

question: Limitations of centralized databases. Explain the concept of “Vertical Scaling” and state one major drawback associated with it.
answer: Vertical Scaling involves upgrading the central server (more CPU, RAM, faster disks). Drawbacks include physical limits (how big one machine can get) and becoming disproportionately expensive at the high end.

question: What are the key differences between a static website and a dynamic web application in terms of content generation and the need for a database?
answer: Static websites have pre-written, fixed HTML content served directly by the web server, the same for all users, and updates require manual HTML editing. They typically don’t require a database. Dynamic web applications generate content “on-the-fly” using server-side logic and database interaction, allowing for personalized, interactive, and real-time data experiences. Databases are essential for dynamic applications to manage persistent data.

question: Why is Scalability considered a crucial database design consideration for web applications? Name one technique mentioned for achieving horizontal scaling.
answer: Scalability is crucial because web applications often face massive and unpredictable user loads (e.g., viral content, sales events). The database must be designed to handle this growth. Designing for horizontal scaling (adding more machines) is prioritized over solely vertical scaling. Techniques mentioned for horizontal scaling include Replication or Sharding (Partitioning).

question: Explain why Concurrency is a significant challenge for databases supporting web applications, and list one type of database mechanism used to manage it.
answer: Concurrency is a challenge because a high number of web users may attempt to access and modify the same data simultaneously (e.g., booking the last seat, multiple users commenting). This requires mechanisms to manage simultaneous access and prevent data corruption. Database mechanisms mentioned include utilizing Concurrency Control Mechanisms like Locking (e.g., row-level) or setting appropriate Transaction Isolation Levels.

question: What is the fundamental problem that Database Drivers and Database APIs aim to solve when connecting application code to a database?
answer: The problem is how application code (written in various languages like Python, Java, PHP) can communicate reliably and effectively with different database management systems (like PostgreSQL, MySQL, Oracle), each having its own specific protocols and nuances. Applications need a standardized way to interact regardless of the underlying database.

question: What is SQL Injection (SQLi), and what coding practice makes applications vulnerable to it?
answer: SQL Injection is an attack where malicious SQL code is inserted into application input, tricking the database into executing unintended commands. The coding practice that makes applications vulnerable is dynamically constructing SQL queries by concatenating strings that include unvalidated user input (e.g., sql = "SELECT * FROM users WHERE name = '" + userName + "'";).

question: What is an SQL Injection attack, and which part of the system is primarily responsible for the vulnerability?
answer: SQL Injection is an attack where malicious SQL code is inserted into application input fields. If the application improperly constructs SQL queries using this input, the malicious code gets executed by the database. The lecture notes this is primarily an application-level vulnerability.

question: What is Role-Based Access Control (RBAC) and why is it beneficial compared to assigning privileges directly?
answer: RBAC involves grouping related privileges into logical ROLEs and assigning users to these roles. It’s beneficial because it simplifies administration and improves consistency compared to assigning many individual privileges directly to each user.

question: What is the primary purpose of a database index, and what real-world analogy was used to illustrate this?
answer: The primary purpose of a database index is to speed up data retrieval operations (primarily SELECT queries) by providing faster lookup paths. Real-world analogies used were a book index, a library card catalog, or a phone book index, all of which allow direct access to information without a full sequential scan.

question: What is the main difference in supported operations between a Hash index and a B-Tree index in PostgreSQL?
answer: A Hash index primarily supports only equality (=) operations efficiently. A B-Tree index supports a wider range of operations, including equality (=), range queries (<, >, BETWEEN), and sorting (ORDER BY).

question: For what kind of data values are GIN (Generalized Inverted Index) indexes particularly well-suited? Give one specific use case example mentioned.
answer: GIN indexes are well-suited for indexing composite values (values containing multiple components). Use case examples mentioned include: searching elements within arrays, querying keys/values within JSONB, or full-text search.

question: What is a key condition required for a BRIN (Block Range Index) to be highly effective?
answer: A BRIN index is highly effective only when the indexed column’s values have a strong physical correlation with the row storage order in the table (e.g., data naturally ordered on insertion like an auto-incrementing ID or timestamp).

Query

question: Write a SQL query to display the names of students (student_name) and the names of the courses (course_name) they are enrolled in. Only include students who are matched with a course in the courses table. Use table aliases s for students and c for courses.

answer:

SELECT s.student_name, c.course_name
FROM students AS s
INNER JOIN courses AS c ON s.student_id = c.student_id;

question: Write a SQL query to list all student names (student_name). For each student, also list the course_name they are enrolled in. If a student is not associated with any course in the courses table, their name should still appear, but the course_name should be NULL. Use table aliases.

answer:

SELECT s.student_name, c.course_name
FROM students AS s
LEFT JOIN courses AS c ON s.student_id = c.student_id;

question: Write a SQL query to list all course names (course_name). For each course, also list the student_name of the student enrolled in it. If a course is not associated with any student in the students table (based on courses.student_id), the course name should still appear, but the student_name should be NULL. Use table aliases.

answer:

SELECT s.student_name, c.course_name
FROM students AS s
RIGHT JOIN courses AS c ON s.student_id = c.student_id;

question: Write a SQL query to find the student_name and course_name for all courses associated with students whose major is ‘Physics’. Use an INNER JOIN and table aliases.

answer:

SELECT s.student_name, c.course_name
FROM students AS s
INNER JOIN courses AS c ON s.student_id = c.student_id
WHERE s.major = 'Physics';

question: Write a SQL query that lists all student names (student_name). For each student, display the course_name they are enrolled in. If a student is not enrolled in any course, display the text ‘Not Enrolled’ instead of NULL for the course name. Use the COALESCE function and table aliases.

answer:

SELECT s.student_name, COALESCE(c.course_name, 'Not Enrolled') AS course_name
FROM students AS s
LEFT JOIN courses AS c ON s.student_id = c.student_id;

question: Write an SQL query using the employees table to display the first_name and last_name of each employee, concatenated together with a single space in between. Use the PostgreSQL || operator and alias the resulting column as full_name.

answer:

SELECT first_name, last_name, first_name || ' ' || last_name AS full_name
FROM employees;

question: Write an SQL query using the products table to display the product_name in all uppercase letters. Alias the resulting column as upper_product_name.

answer:

SELECT UPPER(product_name) AS upper_product_name
FROM products;

question: The products table has a product_code column which might contain leading or trailing spaces (e.g., ‘ LP15-001 ‘). Write an SQL query to display the product_code for all products after removing any leading and trailing whitespace. Alias the result as trimmed_code.

answer:

SELECT product_code, TRIM(product_code) AS trimmed_code
FROM products;

question: Write an SQL query using the products table to display the product_name and price, with the price rounded to one decimal place. Alias the rounded price column as rounded_price.

answer:

SELECT product_name, price, ROUND(price, 1) AS rounded_price
FROM products;

question: Write an SQL query using the employees table to extract and display the year of hiring for each employee from their hire_date. Alias the year column as hire_year.

answer:

SELECT hire_date, EXTRACT(YEAR FROM hire_date) AS hire_year
FROM employees;

question: Write an SQL query using the employees table to calculate the date that is exactly 90 days after each employee’s hire_date. Alias this calculated date column as review_date.

answer:

SELECT hire_date, hire_date + INTERVAL '90 day' AS review_date
FROM employees;

question: Write an SQL query using the employees table to display the first_name, middle_name, and last_name. If an employee’s middle_name is NULL, display the text ‘None’ instead.

answer:

SELECT first_name, COALESCE(middle_name, 'None') AS middle_name, last_name
FROM employees;

question: Write an SQL query using the products table and a CASE WHEN expression to categorize products based on their price. Display the product_name, price, and a price_category column with values: ‘Budget’ (if price < 50.00), ‘Mid-Range’ (if price >= 50.00 AND price < 200.00), or ‘Premium’ (otherwise).

answer:

SELECT product_name, price,
 CASE
 WHEN price < 50.00 THEN 'Budget'
 WHEN price >= 50.00 AND price < 200.00 THEN 'Mid-Range'
 ELSE 'Premium'
 END AS price_category
FROM products;

question: Write an SQL query using the employees table to find the first_name, last_name, and hire_date of all employees hired in December (Month = 12) of any year. Use a function in the WHERE clause.

answer:

SELECT first_name, last_name, hire_date
FROM employees
WHERE EXTRACT(MONTH FROM hire_date) = 12;

question: Write an SQL query using the employees table to display the first_name and last_name, ordered by the length of the employee’s last name in descending order (longest last name first). Use a function in the ORDER BY clause.

answer:

SELECT first_name, last_name
FROM employees
ORDER BY LENGTH(last_name) DESC;

question: Write an SQL query to count the total number of employees listed in the employees table. Use the alias total_employees for the count.

answer:

SELECT COUNT(*) AS total_employees
FROM employees;

question: Write an SQL query to find the sum of all salaries for employees working in the ‘Engineering’ department (department_id = 2). Use the alias engineering_salary_total.

answer:

SELECT SUM(salary) AS engineering_salary_total
FROM employees
WHERE department_id = 2;

question: Write an SQL query to find the earliest (minimum) hire_date among all employees. Use the alias earliest_hire_date.

answer:

SELECT MIN(hire_date) AS earliest_hire_date
FROM employees;

question: Write an SQL query to calculate the average salary specifically for employees with the job title ‘Software Engineer’. Use the alias avg_salary_software_engineer.

answer:

SELECT AVG(salary) AS avg_salary_software_engineer
FROM employees
WHERE job_title = 'Software Engineer';

question: Write an SQL query to count the number of employees in each department. Only include employees who are assigned to a department (department_id IS NOT NULL). Display the department_id and the count (aliased as num_employees). Order the results by department_id.

answer:

SELECT
department_id,
COUNT(*) AS num_employees
FROM employees
WHERE department_id IS NOT NULL
GROUP BY department_id
ORDER BY department_id;

question: Write an SQL query to find the minimum and maximum salary for each department_id. Only consider employees assigned to a department. Display the department_id, the minimum salary (aliased as min_salary_in_dept), and the maximum salary (aliased as max_salary_in_dept).

answer:

SELECT
department_id,
MIN(salary) AS min_salary_in_dept,
MAX(salary) AS max_salary_in_dept
FROM employees
WHERE department_id IS NOT NULL
GROUP BY department_id
ORDER BY department_id;

question: Write an SQL query to find the department_ids that have more than 2 employees. Display only the department_id and the employee count (aliased as num_employees). Exclude employees not assigned to a department.

answer:

SELECT
department_id,
COUNT(*) AS num_employees
FROM employees
WHERE department_id IS NOT NULL
GROUP BY department_id
HAVING COUNT(*) > 2
ORDER BY department_id;

question: Write an SQL query to find the job titles where the average salary is greater than $70,000. Display the job_title and the average salary (aliased as average_salary_for_job). Exclude employees with NULL job titles or NULL salaries from the calculation.

answer:

SELECT
job_title,
AVG(salary) AS average_salary_for_job
FROM employees
WHERE job_title IS NOT NULL AND salary IS NOT NULL
GROUP BY job_title
HAVING AVG(salary) > 70000
ORDER BY average_salary_for_job DESC;

question: Write an SQL query to count the number of distinct job titles present in the employees table, excluding any NULL job titles. Use the alias distinct_job_titles.

answer:

SELECT COUNT(DISTINCT job_title) AS distinct_job_titles
FROM employees;

question: Using the provided schema, write an SQL query to list the names of all students (student_name) and the names of the courses (course_name) they are currently enrolled in. Order the results alphabetically by student name first, then by course name.

answer:

SELECT
s.student_name,
c.course_name
FROM
students AS s
INNER JOIN
enrollments AS e ON s.student_id = e.student_id
INNER JOIN
courses AS c ON e.course_id = c.course_id
ORDER BY
s.student_name,
c.course_name;

question: Write an SQL query to list the names of all instructors (instructor_name) and the name of the department (dept_name) they belong to. Only include instructors who are assigned to a department (i.e., their dept_id is not NULL).

answer:

SELECT
i.instructor_name,
d.dept_name
FROM
instructors AS i
INNER JOIN
departments AS d ON i.dept_id = d.dept_id
ORDER BY
i.instructor_name;

question: Write an SQL query to list every department (dept_name) and the number of instructors assigned to that department. Include departments even if they currently have no instructors assigned. Alias the count as number_of_instructors. Order by department name.

answer:

SELECT
d.dept_name,
COUNT(i.instructor_id) AS number_of_instructors
FROM
departments AS d
LEFT JOIN
instructors AS i ON d.dept_id = i.dept_id
GROUP BY
d.dept_id, d.dept_name -- Group by the PK and selected non-aggregated column
ORDER BY
d.dept_name;

question: Based on the modified courses table schema from Example 4 (which includes prerequisite_course_id), write an SQL query to list the name of each course (course_name) that has a prerequisite, along with the name of the prerequisite course (course_name). Alias the main course name as course and the prerequisite course name as prerequisite.

answer:

SELECT
c1.course_name AS course,
c2.course_name AS prerequisite
FROM
courses AS c1 -- The main course
INNER JOIN -- Use INNER JOIN to ensure only courses *with* a prerequisite are listed
courses AS c2 ON c1.prerequisite_course_id = c2.course_id -- Join back to courses table for the prerequisite
ORDER BY
c1.course_name;

question: Write an SQL query to find the number of students enrolled in each course. List the course name (course_name) and the count of enrolled students (aliased as enrollment_count). Include courses even if they have zero enrollments. Order the results by course name.

answer:

SELECT
c.course_name,
COUNT(e.student_id) AS enrollment_count
FROM
courses AS c
LEFT JOIN
enrollments AS e ON c.course_id = e.course_id
GROUP BY
c.course_id, c.course_name -- Group by course
ORDER BY
c.course_name;

question: Write the SQL command to create a basic index named idx_emp_hire_date on the hire_date column of the employees table.

answer:

CREATE INDEX idx_emp_hire_date
ON employees (hire_date);

question: Write the SQL command shown in the lecture to create a unique index that ensures every employee has a unique email address. Use the index name specified in the example.

answer:

CREATE UNIQUE INDEX idx_employees_email_unique
ON employees (email);

question: Write the SQL command, exactly as shown in the lecture, to create an index suitable for speeding up case-insensitive searches on the email column of the employees table.

answer:

CREATE INDEX idx_employees_email_lower
ON employees (LOWER(email));

question: Write the SQL command shown in the lecture to create a partial index named idx_employees_active_hire_date only for employees with the status ‘active’, indexing their hire_date.

answer:

CREATE INDEX idx_employees_active_hire_date
ON employees (hire_date)
WHERE status = 'active';

question: Write the SQL command shown in the lecture to create a multicolumn index named idx_employees_dept_salary on the department_id and salary columns of the employees table.

answer:

CREATE INDEX idx_employees_dept_salary
ON employees (department_id, salary);

question: Write the SQL command, as shown in the lecture, to rebuild all indexes specifically on the employees table.
answer:
```
REINDEX TABLE employees;
```

question: Write the SQL command to remove the index named idx_employees_last_name, ensuring the command does not produce an error if the index does not actually exist.

answer:

DROP INDEX IF EXISTS idx_employees_last_name;

question: Besides using \d, what SQL query targeting a system catalog view was shown in the lecture to list the indexes for the employees table?

answer:

SELECT
  indexname,
  indexdef
FROM
  pg_indexes
WHERE
  tablename = 'employees';