exercise C
Benchmarking
Intro
This documents my attempts to benchmark a pair of prompts on several models.
Instructions
Step one
Using your best model, identify the two best prompts you can find for detecting sentiment (positive, negative, or neutral) in a tweet.
Step two
Using the dataset example_tweets.csv
, found on Canvas, randomly select 20 tweets for testing. You will need to create a Python script to select the tweets and you will include the Python script in your .qmd file under Step two, along with the selected tweets.
Step three
Using Agenta.AI, benchmark your two prompts on the 20 tweets, using any three models. Then try the same tweets on an established sentiment analysis tool.
Step four
Document your results in your .qmd file, clearly identifying the best prompt and the best model.
Step five
You must upload your .qmd file and your rendered .html file to Canvas. All must have the basename eC
. Points will be deducted for any other name. (If you upload it more than once, Canvas will change the name but don’t be concerned about that.)
You should remove this instruction section from the final document.
Step one: two best prompts
\(\langle\) replace this with your process for developing two best prompts \(\rangle\)
Step two: select 20 tweets
\(\langle\) replace this with your Python script and results \(\rangle\)
Step three: benchmark
\(\langle\) replace this with your setup including which three models you chose \(\rangle\)
Step four: results
\(\langle\) replace this with the results of the benchmarking \(\rangle\)
Conclusion
\(\langle\) replace this with your reflections \(\rangle\)
Addendum: Features of this file
Note: delete this section before you turn in the file!
- Front matter
- Includes your name
- Includes the keyword “today” which resolves to the date on which you render the document
- Includes fonts—you should install these fonts on your computer or change the font specification to fonts you already have on your computer
- Includes the format (html) to which Quarto will render
- Includes some directives that are specific to that format: toc and embed-resources
toc
causes the table of contents to be rendered, on the right side of the frame by defaultembed-resources
causes any diagrams to be included in the html file itself rather than linked—that way you can just submit the html file and I can view it instead of having to submit linked files
- Headings: top level headings are preceded by a # and a space; second level headings are preceded by ## and a space; you can go down several levels by increasing the number of # symbols
- Bulleted lists, formed by preceding the list with a blank line (or a heading) and beginning each line with a dash and a space (both are important)
- LaTeX symbols, in this case \(\langle\) and \(\rangle\), which resolve to angle brackets when you render the document … you can include any LaTeX math expressions between dollar signs or double dollar signs … by the way, any dollar signs meant as real dollar signs should be preceded by a backslash, like $ this, so Quarto doesn’t get confused about whether you are starting an equation
- Programmatic keywords, preceded and followed by a backtick, in this case, the name
eB.bib
bibliography file … this causes the keyword to be rendered in a code font - Emphasis, by surrounding an important word with asterisks, causing it to be rendered in italics
Of course, you will delete all the instructions and comments in this file before you turn it in! I don’t need to read them when I read your solution. The files you turn in (the qmd and the rendered html) will just include your work. These instructions and comments are just to help you get going.