One way to categorize most contemporary computing systems is that they are either data-intensive or compute-intensive
We are only interested in data-intensive systems in this course—by the way, a good resource on data-intensive systems is Kleppmann (2017)
Data-intensive systems
Data-intensive systems can be categorized in many ways, among them OLAP vs OLTP
OLAP stands for OnLine Analytic Processing Systems and are usually denormalized data warehouses
OLTP stands for OnLine Transaction Processing Systems and are what we are concerned with in this course
OLTP systems mostly rely on relational databases
These relational databases must obey four properties to work correctly
These four properties are usually known by the acronym ACID
The ACID Principles, 1 of 2
According to Gemini, the four ACID principles are
Atomicity (“All or Nothing”): Each transaction is treated as a single unit, which either succeeds completely or fails completely. If any part of the transaction fails, the entire transaction fails, and the database state remains unchanged
Consistency (Data Validity): Ensures that a transaction brings the database from one valid state to another, maintaining all predefined rules, constraints, and triggers
The ACID Principles, 2 of 2
Isolation (Independent Execution): Concurrent transactions do not interfere with each other. Even if multiple transactions occur simultaneously, they are isolated, so each behaves as if it were the only one in the system
Durability (Permanent Changes): Once a transaction is committed, it remains committed, even in the event of a system crash, power failure, or error. The changes are permanently stored, usually via write-ahead logging
Why ACID?
The database must maintain data integrity, even in the face of problems like concurrent users, power outages, hardware failures, and more
The ACID properties ensure data integrity and are the goal of any relational database management system
We haven’t been concerned with them in your projects for this class because you are running single user projects without concern for concurrency
We assume that you back up your data and you know the state of the data if there is a power failure or hardware failure or the like
In a multi-user setting, we can’t make these assumptions
BASE
Believe it or not, NoSQL databases coined the term BASE to be the opposite of ACID for their databases
A further acronym you will encounter in more advanced databases is CAP
The CAP theorem says that a distributed database system can guarantee at most two of the three letters in CAP, which stand for consistency, availability, and partition tolerance
Note that distributed database systems are kind of the opposite of what we’ve been talking about in this course—they occur when the database is divided among many machines
Partition tolerance means the way the data is partitioned among machines to account for when communication between the machines fails
END
Colophon
This slideshow was produced using quarto
Fonts are Roboto, Roboto Light, and Victor Mono Nerd Font