Evaluation Cards

Team: Anka Reuel, Cedric Whitney, Avijit Ghosh

Category: Research

Tags:

Description

This project addresses the need for a structured and systematic approach to documenting AI model evaluations through the creation of "evaluation cards," focusing specifically on technical base systems in their pre-deployment state. By concentrating on context-independent upstream evaluations, this framework, informed by stakeholder interviews and building upon prior work, enables early-stage model assessments that capture core capabilities, risks, and trade-offs relevant to future adopters regardless of specific downstream applications. The resulting evaluation cards will provide a user-friendly format for model developers to document key social aspects and evaluation results, facilitating transparent communication and informed use of AI models by attaching these cards to release documentation such as model and system cards.