23 Jul 2024
CMU 15-445 notes: Modern SQL
This is a personal note for the CMU 15-445 L2 notes, along with some SQL command explained by Claude.ai.
1. Terminology
1.1. SQL and relational algebra
- Relational algebra is based on sets (unordered, no duplicates); SQL is based on bags (unordered, allows duplicates).
- SQL is a declarative query language; users use SQL to specify the desired result, each DBMS determines the most efficient strategy to produce the answer.
1.2. SQL commands
- Data manipulation language (DML):
SELECT
,INSERT
,UPDATE
,DELETE
. Data definition language (DDL):
CREATE
.CREATE TABLR student ( sid INT PRIMARY KEY, name VARCHAR(16), login VARCHAR(32) UNIQUE, age SMALLINT, gpa FLOAT );
- Data control language (DCL): security, access control.
2. SQL syntax
2.1. Join
Combine columns from one or more tables and produces a new table.
-- All students that get an A in 15-721 SELECT s.name FROM enrolled AS e, student AS s WHERE e.grade = 'A' AND e.cid = '15-721' AND e.sid = s.sid
2.2. Aggregation function
AVG(COL)
,MIN(COL)
,MAX(COL)
,COUNT(COL)
.Take as input a bag of tuples and produce a single scalar value.
-- Get number of students and their average GPA with a '@cs' login SELECT AVG(gpa), COUNT(sid) FROM student WHERE login LIKE '@cs'; -- Get the unique students SELECT COUNT(DISTINCT login) FROM student WHERE login LIKE '@cs';
Non-aggregated values in
SELECT
output must appear inGROUP BY
.-- Get the average GPA in each course SELECT AVG(s.gpa), e.cid FROM enrolled AS e, student AS s WHERE e.sid = s.sid GROUP BY e.cid;
HAVING
: filter output results based on aggregation computation.SELECT AVG(s.gpa), e.cid FROM enrolled AS e, student AS s WHERE e.sid = s.sid GROUP BY e.cid HAVING AVG(s.gpa) > 3.9;
2.3. String operation
- Strings are case sensitive and single-quotes only in the SQL standard.
- Use
LIKE
for string pattern matching:%
matches any sub-string,_
matches any one character
- Standard string functions:
UPPER(S)
,SUBSTRING(S, B, E)
. ||
: string concatenation.
2.4. Date and time
- Attributes:
DATE
,TIME
. - Different DBMS have different date/time operations.
2.5. Output redirection
One can store the results into another table
-- output to a non-existing table SELECT DISTINCT cis INTO CourseIds FROM enrolled; -- output to an existing table with the same number of columns and column type -- but the names do not matter INSERT INTO CourseIds (SELECT DISTINCT cid FROM enrolled);
2.6. Output control
- Use
ORDER
,ASC
andDESC
to sort the output tuples; otherwise the output could have different order every time. Use
LIMIT
,OFFSET
to restrict the output number.SELECT sid FROM enrolled WHERE cid = '15-721' ORDER BY UPPER(grade) DESC, sid + 1 ASC; LIMIT 10 OFFSET 10; -- output 10 tuples, starting from the 11th tuple
2.7. Nested queries
- Nested queries are often difficult to optimize.
- The inner query can access attributes defined in the outer query.
Inner queries can appear anywhere.
-- Output a column 'one' with 1s, the number of 1s -- equals to the number of rows in 'student' SELECT (SELECT 1) AS one FROM student; -- Get the names of students that are enrolled in '15-445' SELECT name FROM students WHERE sid IN ( SELECT sid FROM enrolled WHERE cid = '15-445' ); -- Get student record with the highest id -- that is enrolled in at least one course. SELECT student.sid, name FROM student -- the intermediate output is aliases as max_e JOIN (SELECT MAX(sid) AS sid FROM enrolled) AS max_e -- only select student who has the max_e ON student.sid = max_e.sid; -- the above is same as below, but `join` syntax is more preferred SELECT student.sid, name FROM student AS s, (SELECT MAX(sid) AS sid FROM enrolled) AS max_e WHERE s.sid = max_e.sid;
- Nested query results expression:
ALL
: must satisfy expression for all rows in sub-query.ANY
,IN
: must satisfy expression for at least one row in sub-query.EXISTS
: at least one row is returned.-- Get all courses with no students enrolled in SELECT * FROM course WHERE NOT EXISTS( SELECT * FROM enrolled WHERE course.cid = enrolled.cid ) -- Get students whose gpa is larget than the highest score in '15-712' -- and the login has a level > 3 SELECT student.sid, name FROM student AS S WHERE s.gpa > ALL ( SELECT course.score FROM course WHERE course.cid = '15-712' ) AND student.login IN ( SELECT login FROM enrolled WHERE level > 3 );
2.8. Window functions
- Perform sliding calculation across a set of tuples.
2.9. Common Table Expressions (CTE)
- An alternative to windows or nested queries when writing more complex queries.
CTEs use
WITH
to bind the output of an inner query to a temporary table.WITH cteName (col1, col2) AS ( SELECT 1, 2 ) SELECT col1 + col2 FROM cteName;