Prompt Engineering
29 Jan 2025
Week Three
Let’s pause to look at an exemplary version of the assignment. Notice the detailed commentary and conclusion.
Another way to discover genuine principles is to evaluate LLMs. What qualities are we looking for in an LLM?
Automating evaluation of LLMs
van Schaik focuses on automatic, offline, system-level evaluation of generative AI text: methods for evaluating quality of summaries
LLMs are the new way to evaluate LLMs!
What could possibly go wrong?
Shankar presents an example solution to the obvious problem
This is the most frequent goal of automatic evaluation
Users need criteria to grade outputs but grading outputs helps users define criteria
Some criteria cannnot be defined a priori
This section is mainly of interest to system builders, a small subset of the audience here and not within course scope. I’ll only discuss it if people have questions.
END
This slideshow was produced using quarto
Fonts are Roboto Light, Roboto Bold, and JetBrains Mono Nerd Font