Prompt Engineering
29 Jan 2025
Week Three
Let’s pause to look at an exemplary version of the assignment. Notice the detailed commentary and conclusion.
Another way to discover genuine principles is to evaluate LLMs. What qualities are we looking for in an LLM?
Automating evaluation of LLMs
van Schaik focuses on automatic, offline, system-level evaluation of generative AI text: methods for evaluating quality of summaries
LLMs are the new way to evaluate LLMs!
What could possibly go wrong?
Shankar presents an example solution to the obvious problem
This is the most frequent goal of automatic evaluation
Users need criteria to grade outputs but grading outputs helps users define criteria
Some criteria cannnot be defined a priori