red teaming

red teaming

/ˈred ˌtiːmɪŋ/

🛡️ AI Safety & Alignment

adversarial testing to find vulnerabilities and failure modes in AI systems

Red teaming uncovered that the chatbot could be manipulated into giving harmful advice.

Origin: From military exercises where a red team plays the adversary