Databases | Midterm Exam 2
Concepts
- question: Explain the difference between the VARCHAR(10) and CHAR(10) data types. When might CHAR be used?
- answer: VARCHAR(10) stores variable-length strings up to a maximum of 10 characters, using only the space needed for the actual string. CHAR(10) stores fixed-length strings of exactly 10 characters; shorter strings are padded with spaces, and it always uses the full 10 characters of storage. CHAR is typically used for short, fixed-length codes like country codes (CHAR(2)) or state abbreviations (CHAR(2)).
- question: Describe the two-step execution order when a subquery is used inside the WHERE clause of an outer query.
- answer: Inner Query Execution: The database system first executes the subquery (inside the parentheses) which produces a result set. Outer Query Execution: The database system then executes the outer query, using the result set from the subquery in its WHERE clause (or other clauses) as a filter or condition.
- question: Explain the purpose of the ORDER BY clause in SQL. What are the two sorting direction, and which one is the default if not specified?
- answer: The ORDER BY clause is used to sort the results of a query based on one or more columns. The two sorting options mentioned are ASC (ascending order, A-Z, smallest to largest), which is the default, and DESC (descending order, Z-A, largest to smallest).
- question: What is the main purpose of the GROUP BY clause, and with which type of SQL function is it most powerfully used?
- answer: The GROUP BY clause groups rows that have the same values in specified columns into summary rows. It is most powerfully used with Aggregate Functions, allowing calculations to be performed per group rather than on the entire table.
- question: Define what a database transaction is. Explain its primary purpose.
- answer: According to the lecture, a transaction is a sequence of one or more database operations (like INSERT, UPDATE, DELETE) treated as a single, logical unit of work. Its primary purpose is to ensure data integrity and maintain consistency by guaranteeing that related changes occur together or not at all, preventing partial updates.
- question: List the four properties represented by the acronym ACID. What is the overall goal of these properties?
- answer: The four properties represented by ACID are: Atomicity, Consistency, Isolation, Durability, The overall goal of these properties is to ensure that database transactions are processed reliably, even in the presence of errors, power failures, or concurrent access.
- question: Explain the Consistency property (“C” in ACID). How does it relate to database rules like constraints (e.g., CHECK, NOT NULL)?
- answer: Consistency ensures that a transaction transitions the database from one valid state to another consistent state. It preserves all defined database rules (constraints) such as NOT NULL, UNIQUE, CHECK constraints (like Balance >= 0), and Foreign Key relationships. If a transaction attempts an operation that would violate a constraint, it violates consistency and must be rolled back.
- question: What is the primary method recommended for controlling the order of operations in complex SQL expressions, and why is it beneficial even if the default precedence seems correct?
- answer: The primary method is using parentheses (). It’s beneficial because it explicitly enforces the intended sequence, overrides default operator precedence, and significantly improves the readability and clarity of the code, reducing potential ambiguity and errors.
- question: Why is a single, centralized computer generally insufficient for handling the requirements of large-scale internet services like Google Search or Amazon? List two distinct reasons mentioned.
- answer: A single computer cannot handle the (1) massive data storage requirements and (2) the high processing load (billions of queries/transactions) and/or (3) the high availability requirements of these large-scale services. This necessitates distributing data and processing.
- question: Limitations of centralized databases. Explain the concept of “Vertical Scaling” and state one major drawback associated with it.
- answer: Vertical Scaling involves upgrading the central server (more CPU, RAM, faster disks). Drawbacks include physical limits (how big one machine can get) and becoming disproportionately expensive at the high end.
- question: What are the key differences between a static website and a dynamic web application in terms of content generation and the need for a database?
- answer: Static websites have pre-written, fixed HTML content served directly by the web server, the same for all users, and updates require manual HTML editing. They typically don’t require a database. Dynamic web applications generate content “on-the-fly” using server-side logic and database interaction, allowing for personalized, interactive, and real-time data experiences. Databases are essential for dynamic applications to manage persistent data.
- question: Why is Scalability considered a crucial database design consideration for web applications? Name one technique mentioned for achieving horizontal scaling.
- answer: Scalability is crucial because web applications often face massive and unpredictable user loads (e.g., viral content, sales events). The database must be designed to handle this growth. Designing for horizontal scaling (adding more machines) is prioritized over solely vertical scaling. Techniques mentioned for horizontal scaling include Replication or Sharding (Partitioning).
- question: Explain why Concurrency is a significant challenge for databases supporting web applications, and list one type of database mechanism used to manage it.
- answer: Concurrency is a challenge because a high number of web users may attempt to access and modify the same data simultaneously (e.g., booking the last seat, multiple users commenting). This requires mechanisms to manage simultaneous access and prevent data corruption. Database mechanisms mentioned include utilizing Concurrency Control Mechanisms like Locking (e.g., row-level) or setting appropriate Transaction Isolation Levels.
- question: What is the fundamental problem that Database Drivers and Database APIs aim to solve when connecting application code to a database?
- answer: The problem is how application code (written in various languages like Python, Java, PHP) can communicate reliably and effectively with different database management systems (like PostgreSQL, MySQL, Oracle), each having its own specific protocols and nuances. Applications need a standardized way to interact regardless of the underlying database.
- question: What is SQL Injection (SQLi), and what coding practice makes applications vulnerable to it?
- answer: SQL Injection is an attack where malicious SQL code is inserted into application input, tricking the database into executing unintended commands. The coding practice that makes applications vulnerable is dynamically constructing SQL queries by concatenating strings that include unvalidated user input (e.g.,
sql = "SELECT * FROM users WHERE name = '" + userName + "'";
).
- question: What is an SQL Injection attack, and which part of the system is primarily responsible for the vulnerability?
- answer: SQL Injection is an attack where malicious SQL code is inserted into application input fields. If the application improperly constructs SQL queries using this input, the malicious code gets executed by the database. The lecture notes this is primarily an application-level vulnerability.
- question: What is Role-Based Access Control (RBAC) and why is it beneficial compared to assigning privileges directly?
- answer: RBAC involves grouping related privileges into logical ROLEs and assigning users to these roles. It’s beneficial because it simplifies administration and improves consistency compared to assigning many individual privileges directly to each user.
- question: What is the primary purpose of a database index, and what real-world analogy was used to illustrate this?
- answer: The primary purpose of a database index is to speed up data retrieval operations (primarily SELECT queries) by providing faster lookup paths. Real-world analogies used were a book index, a library card catalog, or a phone book index, all of which allow direct access to information without a full sequential scan.
- question: What is the main difference in supported operations between a Hash index and a B-Tree index in PostgreSQL?
- answer: A Hash index primarily supports only equality (=) operations efficiently. A B-Tree index supports a wider range of operations, including equality (=), range queries (<, >, BETWEEN), and sorting (ORDER BY).
- question: For what kind of data values are GIN (Generalized Inverted Index) indexes particularly well-suited? Give one specific use case example mentioned.
- answer: GIN indexes are well-suited for indexing composite values (values containing multiple components). Use case examples mentioned include: searching elements within arrays, querying keys/values within JSONB, or full-text search.
- question: What is a key condition required for a BRIN (Block Range Index) to be highly effective?
- answer: A BRIN index is highly effective only when the indexed column’s values have a strong physical correlation with the row storage order in the table (e.g., data naturally ordered on insertion like an auto-incrementing ID or timestamp).
Query
- question: Write a SQL query to display the names of students (
student_name
) and the names of the courses (course_name
) they are enrolled in. Only include students who are matched with a course in thecourses
table. Use table aliasess
forstudents
andc
forcourses
. - answer:
SELECT s.student_name, c.course_name FROM students AS s INNER JOIN courses AS c ON s.student_id = c.student_id;
- question: Write a SQL query to list all student names (
student_name
). For each student, also list thecourse_name
they are enrolled in. If a student is not associated with any course in thecourses
table, their name should still appear, but thecourse_name
should beNULL
. Use table aliases. - answer:
SELECT s.student_name, c.course_name FROM students AS s LEFT JOIN courses AS c ON s.student_id = c.student_id;
- question: Write a SQL query to list all course names (
course_name
). For each course, also list thestudent_name
of the student enrolled in it. If a course is not associated with any student in thestudents
table (based oncourses.student_id
), the course name should still appear, but thestudent_name
should beNULL
. Use table aliases. - answer:
SELECT s.student_name, c.course_name FROM students AS s RIGHT JOIN courses AS c ON s.student_id = c.student_id;
- question: Write a SQL query to find the
student_name
andcourse_name
for all courses associated with students whosemajor
is ‘Physics’. Use anINNER JOIN
and table aliases. - answer:
SELECT s.student_name, c.course_name FROM students AS s INNER JOIN courses AS c ON s.student_id = c.student_id WHERE s.major = 'Physics';
- question: Write a SQL query that lists all student names (
student_name
). For each student, display thecourse_name
they are enrolled in. If a student is not enrolled in any course, display the text ‘Not Enrolled’ instead ofNULL
for the course name. Use theCOALESCE
function and table aliases. - answer:
SELECT s.student_name, COALESCE(c.course_name, 'Not Enrolled') AS course_name FROM students AS s LEFT JOIN courses AS c ON s.student_id = c.student_id;
- question: Write an SQL query using the
employees
table to display thefirst_name
andlast_name
of each employee, concatenated together with a single space in between. Use the PostgreSQL||
operator and alias the resulting column asfull_name
. - answer:
SELECT first_name, last_name, first_name || ' ' || last_name AS full_name FROM employees;
- question: Write an SQL query using the
products
table to display theproduct_name
in all uppercase letters. Alias the resulting column asupper_product_name
. - answer:
SELECT UPPER(product_name) AS upper_product_name FROM products;
- question: The
products
table has aproduct_code
column which might contain leading or trailing spaces (e.g., ‘ LP15-001 ‘). Write an SQL query to display theproduct_code
for all products after removing any leading and trailing whitespace. Alias the result astrimmed_code
. - answer:
SELECT product_code, TRIM(product_code) AS trimmed_code FROM products;
- question: Write an SQL query using the
products
table to display theproduct_name
andprice
, with the price rounded to one decimal place. Alias the rounded price column asrounded_price
. - answer:
SELECT product_name, price, ROUND(price, 1) AS rounded_price FROM products;
- question: Write an SQL query using the
employees
table to extract and display the year of hiring for each employee from theirhire_date
. Alias the year column ashire_year
. - answer:
SELECT hire_date, EXTRACT(YEAR FROM hire_date) AS hire_year FROM employees;
- question: Write an SQL query using the
employees
table to calculate the date that is exactly 90 days after each employee’shire_date
. Alias this calculated date column asreview_date
. - answer:
SELECT hire_date, hire_date + INTERVAL '90 day' AS review_date FROM employees;
- question: Write an SQL query using the
employees
table to display thefirst_name
,middle_name
, andlast_name
. If an employee’smiddle_name
is NULL, display the text ‘None’ instead. - answer:
SELECT first_name, COALESCE(middle_name, 'None') AS middle_name, last_name FROM employees;
- question: Write an SQL query using the
products
table and aCASE WHEN
expression to categorize products based on their price. Display theproduct_name
,price
, and aprice_category
column with values: ‘Budget’ (if price < 50.00), ‘Mid-Range’ (if price >= 50.00 AND price < 200.00), or ‘Premium’ (otherwise). - answer:
SELECT product_name, price, CASE WHEN price < 50.00 THEN 'Budget' WHEN price >= 50.00 AND price < 200.00 THEN 'Mid-Range' ELSE 'Premium' END AS price_category FROM products;
- question: Write an SQL query using the
employees
table to find thefirst_name
,last_name
, andhire_date
of all employees hired in December (Month = 12) of any year. Use a function in theWHERE
clause. - answer:
SELECT first_name, last_name, hire_date FROM employees WHERE EXTRACT(MONTH FROM hire_date) = 12;
- question: Write an SQL query using the
employees
table to display thefirst_name
andlast_name
, ordered by the length of the employee’s last name in descending order (longest last name first). Use a function in theORDER BY
clause. - answer:
SELECT first_name, last_name FROM employees ORDER BY LENGTH(last_name) DESC;
- question: Write an SQL query to count the total number of employees listed in the
employees
table. Use the aliastotal_employees
for the count. - answer:
SELECT COUNT(*) AS total_employees FROM employees;
- question: Write an SQL query to find the sum of all salaries for employees working in the ‘Engineering’ department (department_id = 2). Use the alias
engineering_salary_total
. - answer:
SELECT SUM(salary) AS engineering_salary_total FROM employees WHERE department_id = 2;
- question: Write an SQL query to find the earliest (minimum)
hire_date
among all employees. Use the aliasearliest_hire_date
. - answer:
SELECT MIN(hire_date) AS earliest_hire_date FROM employees;
- question: Write an SQL query to calculate the average salary specifically for employees with the job title ‘Software Engineer’. Use the alias
avg_salary_software_engineer
. - answer:
SELECT AVG(salary) AS avg_salary_software_engineer FROM employees WHERE job_title = 'Software Engineer';
- question: Write an SQL query to count the number of employees in each department. Only include employees who are assigned to a department (
department_id
IS NOT NULL). Display thedepartment_id
and the count (aliased asnum_employees
). Order the results bydepartment_id
. - answer:
SELECT department_id, COUNT(*) AS num_employees FROM employees WHERE department_id IS NOT NULL GROUP BY department_id ORDER BY department_id;
- question: Write an SQL query to find the minimum and maximum salary for each
department_id
. Only consider employees assigned to a department. Display thedepartment_id
, the minimum salary (aliased asmin_salary_in_dept
), and the maximum salary (aliased asmax_salary_in_dept
). - answer:
SELECT department_id, MIN(salary) AS min_salary_in_dept, MAX(salary) AS max_salary_in_dept FROM employees WHERE department_id IS NOT NULL GROUP BY department_id ORDER BY department_id;
- question: Write an SQL query to find the
department_id
s that have more than 2 employees. Display only thedepartment_id
and the employee count (aliased asnum_employees
). Exclude employees not assigned to a department. - answer:
SELECT department_id, COUNT(*) AS num_employees FROM employees WHERE department_id IS NOT NULL GROUP BY department_id HAVING COUNT(*) > 2 ORDER BY department_id;
- question: Write an SQL query to find the job titles where the average salary is greater than $70,000. Display the
job_title
and the average salary (aliased asaverage_salary_for_job
). Exclude employees with NULL job titles or NULL salaries from the calculation. - answer:
SELECT job_title, AVG(salary) AS average_salary_for_job FROM employees WHERE job_title IS NOT NULL AND salary IS NOT NULL GROUP BY job_title HAVING AVG(salary) > 70000 ORDER BY average_salary_for_job DESC;
- question: Write an SQL query to count the number of distinct job titles present in the
employees
table, excluding any NULL job titles. Use the aliasdistinct_job_titles
. - answer:
SELECT COUNT(DISTINCT job_title) AS distinct_job_titles FROM employees;
- question: Using the provided schema, write an SQL query to list the names of all students (
student_name
) and the names of the courses (course_name
) they are currently enrolled in. Order the results alphabetically by student name first, then by course name. - answer:
SELECT s.student_name, c.course_name FROM students AS s INNER JOIN enrollments AS e ON s.student_id = e.student_id INNER JOIN courses AS c ON e.course_id = c.course_id ORDER BY s.student_name, c.course_name;
- question: Write an SQL query to list the names of all instructors (
instructor_name
) and the name of the department (dept_name
) they belong to. Only include instructors who are assigned to a department (i.e., theirdept_id
is notNULL
). - answer:
SELECT i.instructor_name, d.dept_name FROM instructors AS i INNER JOIN departments AS d ON i.dept_id = d.dept_id ORDER BY i.instructor_name;
- question: Write an SQL query to list every department (
dept_name
) and the number of instructors assigned to that department. Include departments even if they currently have no instructors assigned. Alias the count asnumber_of_instructors
. Order by department name. - answer:
SELECT d.dept_name, COUNT(i.instructor_id) AS number_of_instructors FROM departments AS d LEFT JOIN instructors AS i ON d.dept_id = i.dept_id GROUP BY d.dept_id, d.dept_name -- Group by the PK and selected non-aggregated column ORDER BY d.dept_name;
- question: Based on the modified
courses
table schema from Example 4 (which includesprerequisite_course_id
), write an SQL query to list the name of each course (course_name
) that has a prerequisite, along with the name of the prerequisite course (course_name
). Alias the main course name ascourse
and the prerequisite course name asprerequisite
. - answer:
SELECT c1.course_name AS course, c2.course_name AS prerequisite FROM courses AS c1 -- The main course INNER JOIN -- Use INNER JOIN to ensure only courses *with* a prerequisite are listed courses AS c2 ON c1.prerequisite_course_id = c2.course_id -- Join back to courses table for the prerequisite ORDER BY c1.course_name;
- question: Write an SQL query to find the number of students enrolled in each course. List the course name (
course_name
) and the count of enrolled students (aliased asenrollment_count
). Include courses even if they have zero enrollments. Order the results by course name. - answer:
SELECT c.course_name, COUNT(e.student_id) AS enrollment_count FROM courses AS c LEFT JOIN enrollments AS e ON c.course_id = e.course_id GROUP BY c.course_id, c.course_name -- Group by course ORDER BY c.course_name;
- question: Write the SQL command to create a basic index named
idx_emp_hire_date
on thehire_date
column of theemployees
table. - answer:
CREATE INDEX idx_emp_hire_date ON employees (hire_date);
- question: Write the SQL command shown in the lecture to create a unique index that ensures every employee has a unique email address. Use the index name specified in the example.
- answer:
CREATE UNIQUE INDEX idx_employees_email_unique ON employees (email);
- question: Write the SQL command, exactly as shown in the lecture, to create an index suitable for speeding up case-insensitive searches on the
email
column of theemployees
table. - answer:
CREATE INDEX idx_employees_email_lower ON employees (LOWER(email));
- question: Write the SQL command shown in the lecture to create a partial index named
idx_employees_active_hire_date
only for employees with the status ‘active’, indexing theirhire_date
. - answer:
CREATE INDEX idx_employees_active_hire_date ON employees (hire_date) WHERE status = 'active';
- question: Write the SQL command shown in the lecture to create a multicolumn index named
idx_employees_dept_salary
on thedepartment_id
andsalary
columns of theemployees
table. - answer:
CREATE INDEX idx_employees_dept_salary ON employees (department_id, salary);
- question: Write the SQL command, as shown in the lecture, to rebuild all indexes specifically on the
employees
table. - answer:
REINDEX TABLE employees;
- question: Write the SQL command to remove the index named
idx_employees_last_name
, ensuring the command does not produce an error if the index does not actually exist. - answer:
DROP INDEX IF EXISTS idx_employees_last_name;
- question: Besides using
\d
, what SQL query targeting a system catalog view was shown in the lecture to list the indexes for theemployees
table? - answer:
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'employees';