
LLM-as-a-Judge
/ˌeɫ eɫ ˈem æz ə ˈdʒʌdʒ/
using a strong LLM to evaluate the outputs of another model
LLM-as-a-Judge in a sentence
“LLM-as-a-Judge scales evaluation better than human review.”
Origin of LLM-as-a-Judge
Industry term (Zheng et al., 2023)
Related Words
ground truth
the actual absolute truth or correct answer used for comparison
tracing
recording the flow of execution and data through a complex system
hallucination rate
the frequency with which a model generates incorrect information
benchmark
a standardized test used to compare performance
evals
systematic tests to measure model performance on specific tasks