Team: Anka Reuel, Cedric Whitney, Avijit Ghosh
Category: Research
Tags:
This project addresses the need for a structured and systematic approach to documenting AI model evaluations through the creation of "evaluation cards," focusing specifically on technical base systems in their pre-deployment state. By concentrating on context-independent upstream evaluations, this framework, informed by stakeholder interviews and building upon prior work, enables early-stage model assessments that capture core capabilities, risks, and trade-offs relevant to future adopters regardless of specific downstream applications. The resulting evaluation cards will provide a user-friendly format for model developers to document key social aspects and evaluation results, facilitating transparent communication and informed use of AI models by attaching these cards to release documentation such as model and system cards.