RESEARCH COMMUNITY

EvalEval Coalition

We are a research community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.

Explore Research Join Our Community

Research Areas

CURRENT FOCUS

Benchmark Development

Creating comprehensive evaluation frameworks for AI systems across domains.

Learn more →

Evaluation Metrics

Developing robust metrics that capture real-world performance and limitations.

Learn more →

Methodology

Establishing best practices for reproducible and statistically sound evaluations.

Learn more →

View All Projects

Latest Research

BLOG & PUBLICATIONS

May 1, 2025

TEMPLATE

Metrics Generative AI Evaluation

May 1, 2025

Rethinking Evaluation Metrics for Generative Models

The rapid advancement of generative AI models has outpaced the development of appropriate evaluation metrics. Current...

Metrics Generative AI Evaluation

View All Blog Posts

Resources

TOOLS & DATASETS

EvalKit

Open-source Python library for standardized model evaluation.

View on GitHub →

GenEval Dataset

Curated dataset for evaluating generative models across domains.

Download →

Evaluation Handbook

Best practices for rigorous AI system evaluation.

Read Online →

Join Our Community

Researchers, practitioners, and students are welcome to contribute to our mission.

CONTACT INFO

evalevalpc@googlegroups.com

JOIN US

Join our slack community!

WORKING GROUPS

Research

Infrastructure

Organization

HOSTED BY

HuggingFace

University of Edinburgh

EleutherAI

EvalEval Coalition

Research Areas

Benchmark Development

Evaluation Metrics

Methodology

Latest Research

TEMPLATE

Rethinking Evaluation Metrics for Generative Models

Resources

EvalKit

GenEval Dataset

Evaluation Handbook

Join Our Community

CONTACT INFO

FOLLOW US