17 Jul 2024
CMU 15-445 notes: Relational Model & Algebra
This is a personal note for the CMU 15-445 L1 video and CMU 15-445 L1 notes, along with some terminology explained by Claude.ai.
1. Terminology
1.1. Database
- An organized collection of inter-related data that models some aspect of the real-world.
1.2. Database design consideration
- Data integrity: protect invalid writing.
- Implementation: query complexity, concurrent query.
- Durability: replication, fault tolerance.
1.3. Database management system (DBMS)
- A software that manages a database.
- Allow the definition, creation, query, update and administration of databases.
1.4. Data model
- A conceptual, high-level representation of how data is structured
- Defines entities, attributes, relationships between entities and constraints.
1.5. Schema
- A concrete implementation of a data model.
- Defines tables, fields, data types, keys and rules.
- Typically represented by a specific database language.
1.6. Entities and Tables
- Entities: conceptual representations of objects in the logical data model.
- Tables: physical storage structures in the physical data model.
1.7. Attributes and Fields
- Attributes: properties of an entity.
- Fields: columns in a database table.
1.8. Logical layer
- The entities and attributes the database has.
1.9. Physical layer
- How are entities and attributes stored in the database.
1.10. Data manipulation languages (DMLs)
- Methods to store and retrieve information from a database.
- Procedural: the query specifies the (high-level) strategy the DBMS should use to get the results, e.g., with relational algebra.
- Declarative: the query specifies only what data is desired but not how to get it, e.g., with relational calculus (a formal language).
1.11. SQL (Structured Query Language) and relational model
- SQL implements the relational model in DBMS and provides a standard way to create, manipulate and query relational databases.
- Different SQL implementation may vary and do not strictly adhere to the relational model, e.g., allow duplicate rows.
2. Relational model
- A data model that defines a database abstraction to avoid maintenance overhead when changing the physical layer.
- Data is stored as relations/tables.
- Physical layer implementation and execution strategy depends on DBMS implementation.
2.1. A relation
- An unordered set that contains the relationship of attributes that represent entities.
- Relationships are unordered in the relation.
2.2. A domain
- A named set of allowable values for a specific attribute.
2.3. A tuple
- A set of attribute values in the relation.
- Values can also be lists or nested data structures.
Null
: a special value in any attribute which means the attribute in a tuple is undefined.- \(n-ary\): a relation with \(n\) attributes.
2.4. Keys
- Primary key: uniquely identifies a single tuple.
- Foreign key: specifies that an attribute (e.g.,
CustomerID
) in one relation (e.g.,OrderTable
) has to map to a tuple (e.g., the tuple with the sameCustomerID
) in another relation (e.g.,CustomerTable
).
3. Relational Algebra
- A set of fundamental operations to retrieve and manipulate tuples in a relation.
- Each operator takes in one or more relations as inputs, and outputs a new relation; operators can be chained.
- Is a procedure language, meaning the execution always follow the query, even there exists more efficient way to get the same result; A better way is to be more declarative, e.g., SQL’s
where
syntax. - Common relational algebra.