Database Management
03 May 2026
The convention we’ll observe for writing out relations is as follows.
relationName( primaryKey, foreignKey, field1, field2, \(\ldots\) )
Some books follow the convention that the name of the relation is in all caps. We will not observe that convention here. We will follow the convention of underlining the primary key and italicizing foreign keys. If you are writing in plain text without the ability to underline or italicize, please obey the following convention.
relationName( primaryKey (pk), foreignKey (fk), field1, field2, \(\ldots\) )
A relation is in first normal form, 1NF, if and only if all columns contain only atomic values—that is, each column can have only one value for each row in the table.
Whether an item is atomic is a matter determined by people.
SQL has no concept of non-atomic items. Oracle has an extension to SQL that explicitly defines some items as non-atomic. Hence, any relation specified in SQL is in 1NF unless that relation was specified using an extension to SQL.
Many database textbooks teach that 1NF eliminates repeating groups, based on the notion that a repeating group is non-atomic. Some textbooks hearken back to the original definition of non-atomic and move the eliminate repeating groups into a more appropriate place in 2NF. Going forward in this class, I would like you to do just that—relegate eliminate repeating groups to 2NF.
Note that this means that you may, if you wish, specify a relational notation for a table in 1NF without reference to a primary key. For example
Supplier(sid, sname, rating, city)
is in 1NF without reference to a primary key, provided that each row is unique.
A relation is in second normal form, 2NF, if and only if it is in 1NF and every nonkey attribute is fully dependent on the primary key. An attribute is fully dependent on the primary key if it is is on the right side of a functional dependency for which the left side is either the primary key or something that can be derived from the primary key using the transitivity of functional dependencies.
In many if not most applications, functional dependencies are implied in the ER diagram but not certain. For example, Apple obtains displays for its iPhones from several different suppliers. To uniquely identify a display item might require a concatenation of the item identifier and the supplier identifier. Alternatively, each item might have a separate identifier that incorporates the information about which supplier provided the item.
Suppose we know the functional dependencies
{Supplier}\(\rightarrow\){Name, Street, City, State, Zip}
and suppose further that we know that
{Zipcode}\(\rightarrow\){City, State}
Then the following ITEM relation is in 2NF
Item(ItemID, ItemName, Description, Returnable, Perishable, ShelfQty, Notes, Colors, RetailPrice, Cost, Supplier)
Notice that the information that is dependent on Supplier is no longer present. To create the required tables, I would also specify
Supplier(Supplier, Name, Street, Zipcode)
as a separate relation. But what happened to City and State? The above table would not be in 2NF if I included them because City and State ordinarily depend only on Zip. Since I noted that explicitly above, I’m going to skip the 1NF version of a Supplier table go right to a combination of Supplier and Zip as follows
Zipcode(Zipcode, City, State)
You may object to using the same word to identify an entity and an attribute. Although we usually add ID to attributes named for entities, I don’t think SQL requires it. Also, SQL offers us a way to qualify ambiguous identifiers.
What about Colors? Is the following functional dependency legitimate?
{Item}\(\rightarrow\){Colors}
No. Knowing the Item does not determine the Colors, given that Colors has been described as multi-valued in the ER diagram. Therefore, it should be eliminated from the ITEM relation and a new relation should be specified as
Itemcolors(Item,Colors)
where each row uniquely identifies a combination of Item and Colors. There is only one superkey, one candidate key, and one primary key possible.
A relation is in third normal form, 3NF, if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key.
What is transitivity? A transitive dependency exists if a non-key attribute determines a non-key attribute.
Edgar F Codd, the inventor of the relational databases (he actually called them data banks) asserted the following edict: A non-key column should depend on the key, the whole key, and nothing but the key where “the whole key” refers to 2NF and “nothing but the key” refers to 3NF
I asked Opus 4.7 for an example of a 3NF problem. The result (edited):
Given the following relation and functional dependencies
Employee(EmpID, Name, DeptID, DeptName, DeptLocation)
{ EmpID } → { Name, DeptID } (key determines all)
{ DeptID } → { DeptName, DeptLocation } (department determines its own attributes)
DeptName and DeptLocation depend on EmpID only through DeptID. That’s transitive. DeptID is not a superkey, and DeptName/DeptLocation aren’t prime—so this violates 3NF.
Split into two relations:
Employee(EmpID, Name, DeptID) — DeptID is a foreign key
Department(DeptID, DeptName, DeptLocation)
Now each non-key attribute depends only on its table’s key. The decomposition is lossless (joinable back without spurious rows) and dependency-preserving.
A relation is in Boyce-Codd normal form, BCNF, if and only if every determinant (LHS of a functional dependency) is a candidate key.
Claude Opus says: 3NF allows one loophole: a non-trivial dependency X → A is permitted if A is a prime attribute, even when X isn’t a superkey. Boyce-Codd Normal Form (BCNF) closes that loophole — X must always be a superkey. So BCNF ⊆ 3NF. The tradeoff: every relation has a lossless, dependency-preserving 3NF decomposition, but not always a dependency-preserving BCNF one. That’s why 3NF remains the practical target in most designs.
Date (2004), page 385, gives the following definition of fourth normal form: Relvar R is in 4NF if and only if, whenever there exist subsets A and B of the attributes of R such that the nontrivial MVD A \(\twoheadrightarrow\) B is satisified, then all attributes of R are also functionally dependent on A.
To understand this definition, it is necessary to review Date’s definition of multi-valued dependence, MVD, which is as follows. Let R be a relvar and let A, B, and C be subsets of the attributes of R. Then we say that B is multidependent on A—in symbols A \(\twoheadrightarrow\) B (read A multi-determines B or simply A double arrow B)—if and only if, in every legal value of R, the set of B values matching a given AC value pair depends only on the A value and is independent of the C value.
Data in a derived system is the result of taking some existing data from another
System and transforming or processing it in some way.
If you lose derived data, you can recreate it from the original source.
A classic example is a cache: data can be served from the cache if present, but if the cache doesn’t contain what you need, you can fall back to the underlying database.
Denormalized values, indexes, and materialized views also fall into this category.
In recommendation systems, predictive summary data is often derived from usage logs.
This slideshow was produced using quarto
Fonts are Roboto, Roboto Light, and Victor Mono Nerd Font