Segue
Segue
Today
iOS
Evaluation & Observability·Artificial Intelligence
benchmark

benchmark

/ˈbentʃmɑːrk/

📏 Evaluation & Observability

a standardized test used to compare performance

benchmark in a sentence

“MMLU is a popular benchmark for general knowledge.”

Origin of benchmark

Surveying term; a surveyor's mark on a stone

Related Words

evals

systematic tests to measure model performance on specific tasks

LLM-as-a-Judge

using a strong LLM to evaluate the outputs of another model

ground truth

the actual absolute truth or correct answer used for comparison

tracing

recording the flow of execution and data through a complex system

hallucination rate

the frequency with which a model generates incorrect information

SegueMaster the art of eloquence
iOS AppWord of the DayBlogContactPrivacyTerms