CounselBench
A large-scale expert evaluation and adversarial benchmarking of LLMs in mental health question answering. Published at ICLR 2026 (Oral).
CounselBench is a comprehensive benchmark for evaluating large language models in mental health counseling question answering. The project includes expert-annotated evaluations and adversarial testing scenarios to rigorously assess LLM capabilities and limitations in sensitive clinical contexts.
Key contributions:
- Large-scale expert evaluation framework
- Adversarial benchmarking methodology
- Comprehensive analysis of LLM strengths and failure modes in mental health QA
Published at ICLR 2026 as an Oral presentation.