
benchmark
/ˈbentʃmɑːrk/
a standardized test used to compare performance
benchmark in a sentence
“MMLU is a popular benchmark for general knowledge.”
Origin of benchmark
Surveying term; a surveyor's mark on a stone
Related Words
evals
systematic tests to measure model performance on specific tasks
LLM-as-a-Judge
using a strong LLM to evaluate the outputs of another model
ground truth
the actual absolute truth or correct answer used for comparison
tracing
recording the flow of execution and data through a complex system
hallucination rate
the frequency with which a model generates incorrect information