We are a research community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.
Creating comprehensive evaluation frameworks for AI systems across domains.
Learn more →Developing robust metrics that capture real-world performance and limitations.
Learn more →Establishing best practices for reproducible and statistically sound evaluations.
Learn more →Researchers, practitioners, and students are welcome to contribute to our mission.