
evals
/ɪˈvælz/
systematic tests to measure model performance on specific tasks
evals in a sentence
“Running evals after every prompt change ensures no regressions.”
Origin of evals
Short for evaluations; French évaluer find the value
Related Words
LLM-as-a-Judge
using a strong LLM to evaluate the outputs of another model
ground truth
the actual absolute truth or correct answer used for comparison
tracing
recording the flow of execution and data through a complex system
hallucination rate
the frequency with which a model generates incorrect information
benchmark
a standardized test used to compare performance