01eDataShadow

Database Management

Mick McQuaid

University of Texas at Austin

28 Apr 2026

Data Shadow

Origin of the term

The origin of the term data shadow was discussed in the following article:

https://circleid.com/posts/20210629-where-did-data-shadow-come-from

It was seemingly coined by Kerstin Anér around 1972 in an essay called the Data Shadow (English translation of the Swedish “Dataskuggan”).

Original meaning of the term

A later writer, quoted in the preceding post, said

By the term “data shadow” Kerstin Anér meant “the two-dimensional image of the individual that is evoked through one or more data registers” or “the series of crosses in different boxes that a living person is transformed into in the data registers and which tends by those in power to be treated as more real and more interesting than the living man himself.”

Wikipedia’s claim

Wikipedia says:

She [Anér] mentions the terms for the first time (in print) in the Christian cultural magazine Vår Lösen in 1972, in an essay entitled “Dataskuggan” (the Data Shadow).

Our definition

In this course, and in your current assignment, we will take data shadow to mean the traces of a person’s digital activities, which are collected and stored in databases and digital systems, and which can be used to infer information about that person’s behavior, preferences, and identity. This definition is broader than the original meaning of the term, but it captures the essence of the concept in the digital age.

Why is this important?

At the very beginning of your database journey, you need to know the implications of large scale database creation and use. That is, implications at the societal level must be understood. This is a key concept for understanding databases.

Example from Database Nation

The 2008 book Database Nation by Simson Garfinkel, describes a day in the life of a fictional character in the future. Here’s an excerpt:

When you enter the apartment’s elevator, a hidden video camera scans your face, approves your identity, and takes you to the garage in the basement. You hope nobody else gets in the elevator—you don’t relish a repeat of what happened last week to that poor fellow in 4G. It turns out that a neighbor recently broke up with her violent boyfriend and got a restraining order against him. Naturally, the elevator was programmed to recognize the man and, if he was spotted, to notify the police and keep the doors locked until they arrived. Too bad somebody else was in the elevator when it happened. Nobody realized the boyfriend was an undiagnosed (and claustrophobic) psychotic. A hostage situation quickly developed. Too bad for Mr. 4G. Fortunately, everything was captured on videotape.

A personal story

In the 1980s, I knew a woman who worked for the Manhattan DA’s office. She claimed that she could trace the movements of a suspect throughout the day of the crime by following a digital footprint left by the suspect’s interactions with databases—databases of which the suspect might be unaware. She even boasted that her surveillance might be illegal! The investigators might have to claim that they were acting on a hunch or an anonymous tip if they turned up valuable information.

Another story

Some years ago, Congress heard testimony from privacy experts, who claimed that data brokers (private companies that collect and sell personal data) were difficult but important to regulate. For example, an insurance company might be able to find out whether an applicant is HIV-positive and deny them coverage. The insurance company might have to invent an excuse for denial, but the critical information to prosecute them might reside in a data broker’s offshore database, beyond the reach of regulators. Testimony before Congress stipulated that data brokers dissolve and reform their businesses all the time to avoid scrutiny and regulation.

Yet another story

At a university that shall remain nameless, library queries were held in a publicly accessible database that included a string that could identify the searcher. At that time, the university issued laptops to employees with identifiers that included the employee name! A whistle blower in this case was punished by the university and the university quietly changed the laptop identifiers to remove the employee name. The university’s actions were not publicly disclosed. No one was ever warned. People in the database had searched for information about trans issues and terrorism issues among other controversial topics.

A story about statistics and databases

Suppose you’re the only female full professor in a department at an unnamed university. Suppose the university has to publish information in a database about each member of the department, but anonymously. If gender is reported, then your information is known with certainty. Suppose there are two female full professors. Then your information is known with a probability of 50%. There is actually an academic field of study called differential privacy that deals with this kind of problem.

Points about the stories

  • Even members of the government might misuse databases. (Can you think of a contemporary example?)
  • Data privacy was compromised even in the 1980s.
  • Although databases contain vast amounts of information about nearly everyone, it may be that only a small fraction of these databases are ever actually searched.
  • The excuse “I have nothing to hide” is poor—monitors in government and industry may make mistakes or your situation might change or they might misinterpret your actions. Can you think of more reasons why “I have nothing to hide” is a poor concept?

Your task this week

Your task is in four parts:

  1. Journal your interactions with databases for one day.
  2. Read Chapters 1 (Data as a by-product of computing) and 2 (Data surveillance) of Data and Goliath by Bruce Schneier.
  3. Journal your interactions with databases a subsequent day.
  4. Compare the two journals and write a short reflection.

Keep in mind that this exercise can take two or three days, so plan ahead!

END

Colophon

This slideshow was produced using quarto

Fonts are Roboto, Roboto Light, and Victor Mono Nerd Font