Chenyo's org-static-blog

23 Jul 2024

CMU 15-445 notes: Modern SQL

This is a personal note for the CMU 15-445 L2 notes, along with some SQL command explained by Claude.ai.

1. Terminology

1.1. SQL and relational algebra

  • Relational algebra is based on sets (unordered, no duplicates); SQL is based on bags (unordered, allows duplicates).
  • SQL is a declarative query language; users use SQL to specify the desired result, each DBMS determines the most efficient strategy to produce the answer.

1.2. SQL commands

  • Data manipulation language (DML): SELECT, INSERT, UPDATE, DELETE.
  • Data definition language (DDL): CREATE.

    CREATE TABLR student (
        sid INT PRIMARY KEY,
        name VARCHAR(16),
        login VARCHAR(32) UNIQUE,
        age SMALLINT,
        gpa FLOAT
    );
    
  • Data control language (DCL): security, access control.

2. SQL syntax

2.1. Join

  • Combine columns from one or more tables and produces a new table.

    -- All students that get an A in 15-721
    SELECT s.name
        FROM enrolled AS e, student AS s
    WHERE e.grade = 'A' AND e.cid = '15-721'
        AND e.sid = s.sid
    

2.2. Aggregation function

  • AVG(COL), MIN(COL), MAX(COL), COUNT(COL).
  • Take as input a bag of tuples and produce a single scalar value.

    -- Get number of students and their average GPA with a '@cs' login
    SELECT AVG(gpa), COUNT(sid) FROM student WHERE login LIKE '@cs';
    -- Get the unique students
    SELECT COUNT(DISTINCT login) FROM student WHERE login LIKE '@cs';
    
  • Non-aggregated values in SELECT output must appear in GROUP BY.

    -- Get the average GPA in each course
    SELECT AVG(s.gpa), e.cid
        FROM enrolled AS e, student AS s
    WHERE e.sid = s.sid
    GROUP BY e.cid;
    
  • HAVING: filter output results based on aggregation computation.

    SELECT AVG(s.gpa), e.cid
        FROM enrolled AS e, student AS s
    WHERE e.sid = s.sid
    GROUP BY e.cid
    HAVING AVG(s.gpa) > 3.9;
    

2.3. String operation

  • Strings are case sensitive and single-quotes only in the SQL standard.
  • Use LIKE for string pattern matching:
    • % matches any sub-string,
    • _ matches any one character
  • Standard string functions: UPPER(S), SUBSTRING(S, B, E).
  • ||: string concatenation.

2.4. Date and time

  • Attributes: DATE, TIME.
  • Different DBMS have different date/time operations.

2.5. Output redirection

  • One can store the results into another table

    -- output to a non-existing table
    SELECT DISTINCT cis INTO CourseIds FROM enrolled;
    -- output to an existing table with the same number of columns and column type
    -- but the names do not matter
    INSERT INTO CourseIds (SELECT DISTINCT cid FROM enrolled);
    

2.6. Output control

  • Use ORDER, ASC and DESC to sort the output tuples; otherwise the output could have different order every time.
  • Use LIMIT, OFFSET to restrict the output number.

    SELECT sid FROM enrolled WHERE cid = '15-721'
    ORDER BY UPPER(grade) DESC, sid + 1 ASC;
        LIMIT 10 OFFSET 10;  -- output 10 tuples, starting from the 11th tuple
    

2.7. Nested queries

  • Nested queries are often difficult to optimize.
  • The inner query can access attributes defined in the outer query.
  • Inner queries can appear anywhere.

    -- Output a column 'one' with 1s, the number of 1s
    -- equals to the number of rows in 'student'
    SELECT (SELECT 1) AS one FROM student;
    
    -- Get the names of students that are enrolled in '15-445'
    SELECT name FROM students
        WHERE sid IN (
            SELECT sid FROM enrolled
            WHERE cid = '15-445'
    );
    
    -- Get student record with the highest id
    -- that is enrolled in at least one course.
    SELECT student.sid, name
        FROM student
        -- the intermediate output is aliases as max_e
        JOIN (SELECT MAX(sid) AS sid FROM enrolled) AS max_e
        -- only select student who has the max_e
        ON student.sid = max_e.sid;
    
    -- the above is same as below, but `join` syntax is more preferred
    SELECT student.sid, name
    FROM student AS s, (SELECT MAX(sid) AS sid FROM enrolled) AS max_e
    WHERE s.sid = max_e.sid;
    
  • Nested query results expression:
    • ALL: must satisfy expression for all rows in sub-query.
    • ANY, IN: must satisfy expression for at least one row in sub-query.
    • EXISTS: at least one row is returned.

      -- Get all courses with no students enrolled in
      SELECT * FROM course
          WHERE NOT EXISTS(
              SELECT * FROM enrolled
                  WHERE course.cid = enrolled.cid
      )
      
      -- Get students whose gpa is larget than the highest score in '15-712'
      -- and the login has a level > 3
      SELECT student.sid, name
          FROM student AS S
      WHERE s.gpa > ALL (
          SELECT course.score FROM course
              WHERE course.cid = '15-712'
      )
      AND student.login IN (
          SELECT login FROM enrolled
          WHERE level > 3
      );
      

2.8. Window functions

  • Perform sliding calculation across a set of tuples.

2.9. Common Table Expressions (CTE)

  • An alternative to windows or nested queries when writing more complex queries.
  • CTEs use WITH to bind the output of an inner query to a temporary table.

    WITH cteName (col1, col2) AS (
        SELECT 1, 2
    )
    SELECT col1 + col2 FROM cteName;
    
Tags: study database cmu