Prompting Techniques

Prompt Engineering

Mick McQuaid

mcq@utexas.edu

University of Texas at Austin

23 Sep 2025

Week Five

Agenda

Presentations: Xiaoqi, Zhizhou

News

Review whatiknow (Dhruvi)

eB review

eC preview

m1 questions

Finish previous chatbot

Introduce Google AI Studio

Techniques

Presentations

News

s1

From simplescaling

Trained on 1,000 examples

Each example is $\langle$ a question $\rangle$ $\langle$ an answer $\rangle$ $\langle$ a reasoning process $\rangle$

First tried 59,000 examples

article at Economist

The Batch and Data Points

$\langle$ pause to look at last week’s edition $\rangle$

WhatIKnow (Dhruvi)

$\langle$ pause to discuss contributions $\rangle$

Writing and formatting the doc

Write about topics that excite you. If it greatly interests you, it’s more likely to greatly interest others in the class.
Sign your contributions at the end of the contribution. You can either write your name or use a smart chip with your google identity.
Add a horizontal rule before and after your contribution. (Only after if there is already one before!)
Use headings and subheadings in your contribution.
Use links in your contribution. Use Links from the Insert menu.
Consolidate tabs—no reason for three tabs for Evaluation. Either delete the second and third tab (especially the stuff about articles that were assigned readings) or move their contents into the first tab. Move the content from the Prompt Optimization tab into the Prompt Techniques tab, which already has a contribution about Prompt Optimization anyway.

eB review

Smart strategies

adjust the temperature (needs API)

adjust the max_tokens (needs API)

do it piecemeal

use more than one LLM

think about the goal more than constraints

eC preview

$\langle$ look at the doc $\rangle$

m1 questions

Project levels

There are two groups of students in the class: novices (you’ve just met the requirement of one Python course), and advanced (you have significant development experience)

It is vital that the course meets the needs of both groups, which is not easy.

To achieve this, one necessity is to have varying levels of projects. Broadly there are two levels, novice and advanced. Both are eligible for an A grade. You should work comfortably at your limit, not shirking but not trying to be heroic.

You should deploy something you can put in your portfolio and can be anything from a simple chatbot like we created using chatlas or an agentic app using, say, Google’s ADK. Include extensive use of prompting and make it possible to evaluate and compare prompts.

Deliverable

A short qmd / html document describing the domain

The doc should include specification of model(s)

The doc should include a discussion of the possible tools and or datasets you may use (actual tools and or datasets are due in m2)

You are not required to stick with the directions you give here, but this should be your current best guess of what you plan to do

Examples: chatbot to emulate a foreign leader; chatbot to triage banking problems; chatbot to analyze tweets; note that you can do other genAI tasks that require prompt engineering, such as image generation, as long as a conversation is involved

Note that I’m expecting to see two levels of projects: novice (you’ve just met the requirement of one Python course), or advanced (you have signficant development experience)

Simple chatbot revisited

We’ll create a chatbot about the famous Titanic dataset

pip install shiny
shiny create --template querychat --github posit-dev/py-shiny-templates/gen-ai

Notes

The requirements.txt file contains some spurious code. Delete part about the python-package.

The app.py file is missing the load_env() code. Add it after line 5.

from dotenv import load_dotenv
load_dotenv()

Otherwise, follow the onscreen instructions.

Google AI Studio Intro

We’ll start by doing the same thing in this environment that we did with chatlas, create a chatbot that offers expense policy advice.

Techniques

According to Schulhoff et al. (2024)

tree of techniques

Important Note

There is no substitute for reading Schulhoff et al. (2024)! I’m just listing the main concepts here. I’ll ask you to pick one and explain it in your own words.

Top level

Zero-Shot

Few-Shot

Thought Generation

Ensembling

Self-Criticism

Decomposition

Few-Shot Design Decisions

Exemplar Quantity: as many as possible

Exemplar Ordering: randomly order them

Exemplar Label Distribution: balance the distribution

Exemplar Label Quality: ensure correct labeling

Exemplar Format: use a common format

Exemplar Similarity: select similar examples to the test instance

Few-Shot Techniques

difficult to implement

K-Nearest Neighbors

Vote-K

Self-Generated In-Context Learning

Prompt Mining

Complicated Techniques use iterative filtering, embedding and retrieval, and reinforcement learning

Zero-Shot Techniques

use no exemplars

Role Prompting

Style Prompting

Emotion Prompting

System 2 Attention

SimToM

Rephrase and Respond

Re-reading

Self-Ask

Thought Generation

prompting the model to articulate its ongoing reasoning

Chain-of-Thought

Zero-Shot Chain-of-Thought

Step-Back Prompting

Analogical Prompting

Thread-of-Thought Prompting

Tabular Chain-of-Thought

Few-Shot CoT

multiple examples, including chains-of-thought

Contrastive CoT Prompting

Uncertainty-Routed CoT Prompting

Complexity-based Prompting

Active Prompting

Memory-of-Thought Prompting

Automatic Chain-of-Thought Prompting

Decomposition

explicitly decomposing the problem into subproblems

Least-to-Most Prompting

Plan-and-Solve Prompting

Tree-of-Thought Prompting

Recursion-of-Thought Prompting

Program-of-Thoughts

Faithful Chain-of-Thought

Skeleton-of-Thought

Metacognitive Prompting

Ensembling

using multiple prompts to solve the same problem, then aggregating the results, for example, by majority vote

Demonstration Ensembling

Mixture of Reasoning Experts

Max Mutual Information Method

Self-Consistency

Universal Self-Consistency

Meta-Reasoning over Multiple CoTs

DiVeRSe

Consistency-based Self-Adaptive Prompting

Universal Self-Adaptive Prompting

Prompt Paraphrasing

Self-Criticism

prompting the model to critique its own output

Self-Calibration

Self-Refine

Reversing Chain-of-Thought

Self-Verification

Chain-of-Verification

Cumulative Reasoning

Exercise

Run the two readings, Sahoo et al. (2024) and Schulhoff et al. (2024), through NotebookLM. Ask it to summarize them and then ask about the discrepancy between the F1 scores in Schulhoff’s case study. (Manual had high precision and low recall, while automated had the reverse.)

Then consider the diagram on the following screen, showing graphically the definitions of precision and recall (F1 is the harmonic mean of these two statistics). Comment on your view of how NotebookLM has described the difference between the F1 scores.

END

References

Sahoo, Pranab, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. “A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications.” https://arxiv.org/abs/2402.07927.

Schulhoff, Sander, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, et al. 2024. “The Prompt Report: A Systematic Survey of Prompting Techniques.” https://arxiv.org/abs/2406.06608.

Wilkinson, Leland. 2005. The Grammar of Graphics (Statistics and Computing). Secaucus, NJ, USA: Springer-Verlag.

Colophon

This slideshow was produced using quarto

Fonts are Roboto Light, Roboto Bold, and Victor Mono Nerd Font