Prompt Engineering
24 Nov 2025
AI Security Institute (2024) is an open-source Python framework for evaluating LLMs.
pip install inspect-ai but actually call it from the command line as inspectwelcome and copy the file theory.py into itexport INSPECT_EVAL_MODEL='anthropic/claude-sonnet-4-0'inspect eval theory.pyinspect eval theory.py --model openai/gpt-4logsinspect viewhttps://127.0.0.1:7575theory.pytheory.py returns a task composed of three partsThese can be from HuggingFace or can be CSV, JSON, or JSONL
At its simplest, a dataset is a table with input / target pairs
solvers page of the documentation on the home pagetheory.py example:generate() which calls a promptchain_of_thought() which encourages the model work step-by-stepself_critique() which prompts the model to critique the results of the previous call to generate()f1 mechanism that computes the harmonic mean of precision and recall (see next frame)welcome task cost 2.12 USD and ran for 11 minuteshello task cost 0.00 USDsecurity_guide task cost 0.11 USDhellaswag task cost 0.05 USD (I limited it to 50 samples)gsm8k task cost 0.99 USD and ran for 7 minutes (I limited it to 100 samples)mathematics task cost 1.39 USD and ran for 9 minutes (then I interrupted it after 110 samples)This slideshow was produced using quarto
Fonts are Roboto, Roboto Light, and Victor Mono Nerd Font