research CounselBench A large-scale expert evaluation and adversarial benchmarking of LLMs in mental health question answering. Published at ICLR 2026 (Oral). Clinical T5 Evaluation A systematic evaluation of whether clinical T5 models provide meaningful improvements over general-purpose models for clinical NLP tasks.