02cDataCollection

Database Management

Mick McQuaid

University of Texas at Austin

12 May 2026

What gets counted counts

An anecdote from Louisiana in the 1960s

My father’s boss, a casual racist, asserted that black people (he didn’t call them that) don’t commit suicide because they are happy-go-lucky people. My father countered that they commit suicide at at least the rate of white people and offered a bet, which LTC Doty happily accepted.

My father went to the local hall of records and, to his dismay, found that records of deaths of black people were not kept. Period. No chance to prove his point. What gets counted counts and LTC Doty could go on believing what he liked.

Reading: Data Feminism, Ch 4

In this chapter, you will learn that better records are kept of the rich and powerful, while sparser records are kept of the poor and vulnerable. Why does this matter? My father’s story is one example. Your job is to think of more.

For example, how much access to services should poor people be given in contemporary society? To answer that question, you need data!

Reading: Racial Discrimination in Facial Recognition

In this reading, you will learn various kinds of racial discrimination that happen, sometimes intentionally, in the use of facial recognition. You must think about this specific example and the broader problem of surveillance. You must try to find out how and why such things happen.

More anecdotes

  • When I was first a doctoral student, two of my international colleagues didn’t get paid on the first paycheck of the year, because their names “broke” the payroll database
  • I objected to a University using my social security number on my student ID card and pointed out that they couldn’t legally force me to give it to them—an official gleefully told me that they would get and use my SSN as soon as I was employed as a research assistant
  • I already shared the example of the University that used people’s names in their hostnames

Your assignment: problematic data collection

  • You must find and write up an example of problematic data collection
  • Your description should be rich and detailed, including who is disadvantaged and why
  • Whether you use an LLM, you must include an appendix of prompts you used or, if you don’t use an LLM, an attestation to that effect
  • The length should be 750–1000 words

END

Colophon

This slideshow was produced using quarto

Fonts are Roboto, Roboto Light, and Victor Mono Nerd Font