
LLM-as-a-Judge
/ˌeɫ eɫ ˈem æz ə ˈdʒʌdʒ/
using a strong LLM to evaluate the outputs of another model
“LLM-as-a-Judge scales evaluation better than human review.”
Origin: Industry term (Zheng et al., 2023)

/ˌeɫ eɫ ˈem æz ə ˈdʒʌdʒ/
using a strong LLM to evaluate the outputs of another model
“LLM-as-a-Judge scales evaluation better than human review.”
Origin: Industry term (Zheng et al., 2023)