LLM-as-a-Judge

/ˌeɫ eɫ ˈem æz ə ˈdʒʌdʒ/

using a strong LLM to evaluate the outputs of another model

LLM-as-a-Judge in a sentence

“LLM-as-a-Judge scales evaluation better than human review.”

Industry term (Zheng et al., 2023)

Related Words

ground truth

the actual absolute truth or correct answer used for comparison

tracing

recording the flow of execution and data through a complex system

hallucination rate

the frequency with which a model generates incorrect information

benchmark

a standardized test used to compare performance

evals

systematic tests to measure model performance on specific tasks