Science

USC Researchers Stress-Test AI Chatbots for Mental Health Care

A new study enlisted 100 mental health professionals to evaluate how leading language models respond to real patient questions, revealing both promise and safety concerns.

Omega Editorial· June 24, 2026· 3 min read

As artificial intelligence chatbots become increasingly common sources of mental health support, researchers are working to understand whether these systems can safely handle sensitive conversations about psychological well-being. More than a third of psychologists now report patients who use AI for mental health support, according to recent surveys.

Ruishan Liu, a computer science professor at USC Viterbi School of Engineering, is leading efforts to rigorously evaluate how AI performs in these high-stakes scenarios. Her latest research project, CounselBench, recruited 100 mental health professionals—more than 70% of them licensed therapists—to assess how leading language models respond to real patient questions.

Promising performance with persistent risks

The evaluation revealed a complex picture. Current AI models generally scored well on empathy and performed strongly across multiple dimensions. However, high overall ratings did not eliminate safety concerns. Clinicians identified recurring problems including overgeneralization, limited personalization, and advice that sometimes crossed clinical boundaries. These issues appeared even in responses that otherwise seemed empathetic and helpful.

To probe deeper, Liu's team conducted a second phase where clinicians designed challenging questions specifically to stress-test the models and expose potential weaknesses. The approach represents a departure from traditional AI evaluation methods, which typically focus on knowledge-based tasks like multiple-choice questions or standardized exams.

Why it matters

With a documented shortage of mental health resources and growing public reliance on AI for psychological support, understanding the capabilities and limitations of these systems has direct implications for patient safety. The research highlights a critical gap: computer scientists evaluating AI responses may miss subtle safety issues that trained clinicians immediately recognize, such as when advice crosses professional boundaries or fails to account for individual patient circumstances.

Beyond evaluation to improvement

Liu's team recently received an OpenAI Mental Health Award to extend the research. The next phase will examine more challenging scenarios, including how models handle resistant or uncooperative clients—common situations in real-world therapy settings.

The research also aims to move from identifying problems to solving them. Now that specific failure patterns have been documented, Liu's team is working on methods to make language models safer and more suitable for deployment in clinical contexts.

Liu emphasizes that building trustworthy AI for healthcare requires collaboration across disciplines. Computer scientists working alone may not recognize risks that domain experts immediately identify. For CounselBench, input from trained mental health professionals proved essential for spotting subtle safety concerns that might otherwise go unnoticed.

The findings were first reported by USC News in an interview with Liu about her research on AI evaluation methods for healthcare applications.

#ai in healthcare#mental health chatbots#ai safety#language models#clinical ai evaluation#usc research

This is an original analysis by the Omega editorial team. Source reporting: AI Watch.

Want systems like this working for your business?

Book a Call

More in Science

Science· 3 min read

ORNL Advances Autonomous Labs with AI-Driven Decision-Making

Rob Moore explains how artificial intelligence is evolving from automation to true autonomy in scientific research at Oak Ridge National Laboratory.

Via Automation Watch · Jun 23, 2026
Science· 2 min read

Nexentis subsidiary partners with Boltz AI to screen drug targets

MitoCareX Bio will combine proprietary structural modeling with foundation models to identify small molecules against transporter proteins.

Via AI Watch · Jun 22, 2026
Science· 3 min read

Space-Based Solar Power Emerges as Answer to AI Energy Crunch

With data center electricity demand set to more than double by 2030, companies are looking beyond Earth's grid to continuous orbital energy sources.

Via AI Watch · Jun 22, 2026