This documents our choice of dataset for our project.

Data card information

Make sure you include at least

  • URL
  • source (original source not repository)
  • repository (e.g., HuggingFace, Kaggle, etc.)
  • task you intend to use it for (e.g., question answering, summarization, etc.)
  • size
  • structure (e.g., train, test split)
  • other information, depending on the dataset’s documentation

Data dictionary

Here’s an example of a table:

Here’s the table caption. It, too, may span multiple lines.
Centered Header Default Aligned Right Aligned Left Aligned
First row 12.0 Example of a row that spans multiple lines.
Second row 5.0 Here’s another one. Note the blank line between rows.

Warning: if you use the Visual editor in RStudio, it will mangle the above table.

The data dictionary should be laid out like a table. It should include

  • column name
  • description
  • type
  • units
  • example

