RESEARCH COMMUNITY

EvalEval Coalition

We’re building a coalition on evaluating evaluations (EvalEval)!

The flawed state of evaluations, lack of consensus for documenting evaluation applicability and utility, and disparate coverage of broader impact categories hinder adoption, scientific research, and policy analysis of generative AI systems. Coalition members are working collectively to improve the state of evaluations.

In addition to working group outputs, positive effects can include influencing research papers’ Broader Impact sections, how evaluations are released, and overall public policy.

Our NeurIPS 2024 “Evaluating Evaluations” workshop kicked off the latest work streams, highlighting tiny paper submissions that provoked and advanced the state of evaluations. Our original framework for establishing categories of social impact is now published as a chapter in the Oxford Handbook on Generative AI and we hope it will guide standards for Broader Impact analyses. We’ve previously hosted a series of workshops to hone the framework.

Hosted by Hugging Face, University of Edinburgh, and EleutherAI, this cross‑sector and interdisciplinary coalition operates across three working groups: Research, Infrastructure and Organization.