09cGenAIandSQL
Database Management
Mick McQuaid
University of Texas at Austin
10 May 2026
GenAI is changing all the time
- It is difficult to give general advice about genAI since the pace of development is so fast
- Development of genAI is also affected by business, political, environmental, and other factors
- In May of 2026, most genAI tools rely on the latest version of one of three large language model families: claude, chatgpt, or gemini
- Frequently, genAI tools are cheaper if they use variants from these families
- Some people run large language models locally, i.e., on their own machines
- To be useful, local models need a lot of memory but have the advantages of privacy and decreased environmental concerns
LLMs are quite good at SQL
- Because SQL is a slowly changing standard, the training data for the frontier models (the three families mentioned previously) is adequate for generic tasks
- Because SQL is mostly declarative, the models don’t get bogged down in procedural details (usually)
- Because you can specify your database engine, the models don’t give you incompatible statements (usually)
LLM problems
- Notice that I used the terms generic and usually in the previous frame
- LLMs are still amalgams of many experiences reported on the web
- LLMs can artificially splice these together
- LLMs can miss the context of their source material
- For these and other reasons, LLMs may hallucinate incorrect answers
- It is critical that you monitor LLMs
Monitoring LLMs
- When a model offers you a snippet of SQL, inspect it for obvious problems
- First, does it CREATE or DROP or ALTER anything?
- Second, does it escape to the operating system and use redirection operators like
> or |?
- Does it refer to columns that don’t exist?
- Have you been generating a lot of prompts so that it may drift from the original goal?
Writing for LLMs
- There’s a popular cartoon where the characters discover that the best way to specify what they want is called “code”
- The closer you can get to code, the less room there is for an LLM to misunderstand you
- You should never write your prompts directly into the LLM but instead use a text editor and copy / paste
- When the query doesn’t work, you should paste it as a new query in your text file and try again so you have an audit trail of queries that sometimes worked and sometimes didn’t
- Try to avoid indefinite references
- Try to write short sentences. Break up long sentences
- Be more explicit than you usually need to. Write long prompts
Debugging LLMs
- You can tell the LLM to write automated tests
- You can tell the LLM to write documentation
- But then you have to use those!
- The same is true of documentation—if you don’t know how to write it, how can you evaluate it?
- Interestingly, sqlite3 is one of the most heavily tested software products in existence—you might want to look more closely at it and its tests
Colophon
This slideshow was produced using quarto
Fonts are Roboto, Roboto Light, and Victor Mono Nerd Font