Academic

CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification

arXiv:2603.14078v1 Announce Type: new Abstract: Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger scale or more complex models are necessary for improved performance. In order to improve logical consistency, We introduce CMHL, a novel single-model architecture that explicitly models the logical structure of emotions through three key innovations: (1) multi-task learning that jointly predicts primary emotions, valence, and intensity, (2) psychologically-grounded auxiliary supervision derived from Russell's circumplex model, and (3) a novel contrastive contradiction loss that enforces emotional consistency by penalizing mutually incompatible predictions (e.g., simultaneous high confidence in joy and anger). With just 125M parameters, our model outperforms 56x larger LLMs and sLM ensembles with a new state-of-the-art

M
Menna Elgabry, Ali Hamdi, Khaled Shaban
· · 1 min read · 8 views

arXiv:2603.14078v1 Announce Type: new Abstract: Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger scale or more complex models are necessary for improved performance. In order to improve logical consistency, We introduce CMHL, a novel single-model architecture that explicitly models the logical structure of emotions through three key innovations: (1) multi-task learning that jointly predicts primary emotions, valence, and intensity, (2) psychologically-grounded auxiliary supervision derived from Russell's circumplex model, and (3) a novel contrastive contradiction loss that enforces emotional consistency by penalizing mutually incompatible predictions (e.g., simultaneous high confidence in joy and anger). With just 125M parameters, our model outperforms 56x larger LLMs and sLM ensembles with a new state-of-the-art F1 score of 93.75\% compared to (86.13\%-93.2\%) on the dair-ai Emotion dataset. We further show cross domain generalization on the Reddit Suicide Watch and Mental Health Collection dataset (SWMH), outperforming domain-specific models like MentalBERT and MentalRoBERTa with an F1 score of 72.50\% compared to (68.16\%-72.16\%) + a 73.30\% recall compared to (67.05\%-70.89\%) that translates to enhanced sensitivity for detecting mental health distress. Our work establishes that architectural intelligence (not parameter count) drives progress in TEC. By embedding psychological priors and explicit consistency constraints, a well-designed single model can outperform both massive LLMs and complex ensembles, offering a efficient, interpretable, and clinically-relevant paradigm for affective computing.

Executive Summary

This study introduces Contrastive Multi-Head Learning (CMHL), a novel single-model architecture that outperforms large language models and ensemble methods in textual emotion classification tasks. CMHL's key innovations include multi-task learning, psychologically-grounded auxiliary supervision, and a novel contrastive contradiction loss. The model achieves a new state-of-the-art F1 score of 93.75% on the dair-ai Emotion dataset and demonstrates cross-domain generalization on the Reddit Suicide Watch and Mental Health Collection dataset. The study highlights the importance of architectural intelligence and explicit consistency constraints in improving performance. The findings have significant implications for affective computing, particularly in developing efficient, interpretable, and clinically-relevant models for detecting mental health distress.

Key Points

  • Introduction of CMHL, a novel single-model architecture for textual emotion classification
  • CMHL's performance outperforms large language models and ensemble methods on various datasets
  • Key innovations of CMHL include multi-task learning, psychologically-grounded auxiliary supervision, and contrastive contradiction loss

Merits

Improved performance on textual emotion classification tasks

CMHL achieves state-of-the-art results on the dair-ai Emotion dataset and demonstrates cross-domain generalization on the Reddit Suicide Watch and Mental Health Collection dataset.

Efficient and interpretable model architecture

CMHL's design enables efficient and interpretable models that can be used in affective computing applications.

Clinically-relevant paradigm for affective computing

The study's findings have significant implications for developing clinically-relevant models for detecting mental health distress.

Demerits

Limited generalizability to other NLP tasks

The study focuses on textual emotion classification tasks and may not generalize to other NLP tasks or domains.

Dependence on psychologically-grounded auxiliary supervision

The model's performance may be dependent on the quality and relevance of the psychologically-grounded auxiliary supervision data.

Expert Commentary

The study's contribution is significant, as it challenges the assumption that larger or more complex models are necessary for improved performance in textual emotion classification tasks. The introduction of CMHL, a novel single-model architecture, demonstrates that architectural intelligence and explicit consistency constraints can drive progress in TEC. The study's findings have important implications for affective computing and mental health applications, particularly in developing efficient, interpretable, and clinically-relevant models. However, the study's limitations, such as limited generalizability and dependence on auxiliary supervision data, should be addressed in future research.

Recommendations

  • Future research should investigate the applicability of CMHL to other NLP tasks and domains.
  • The development of standardized evaluation protocols for affective computing and mental health applications is essential for ensuring the reliability and validity of model performance.

Sources