AI “red teaming” involves simulating attacks on AI systems to uncover vulnerabilities and enhance security. It is becoming an increasingly important practice, as regulatory frameworks—such as the National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF)—emphasize the importance of red teaming. Additionally, some AI vendors may require red teaming in order to benefit from certain indemnification commitments. However, the lack of standardized practices in AI red teaming can lead to varying methodologies and hinder objective comparisons of AI system safety.
Anthropic recently released its latest in a line of red teaming research, delineating the methods they've explored (both manual and automated), and also providing some policy recommendations. Google Deepmind researchers also recently developed and published details about STAR, a sociotechnical framework for red teaming large language models. Establishing consistent red teaming methods is important for managing current risks and preparing for future threats. Hence a careful eye should be kept not only on emerging industry best practices, but also on regulatory developments, such as NIST's Assessing Risks and Impacts of AI (ARIA) program. ARIA is designed to provide more concrete guidance related to the “measure” pillar of the AI RMF, and it includes a specific red teaming evaluation to be developed, making ARIA an important program to keep track of for those looking to understand and implement AI red teaming best practices.