Skip to main content
Academic

PsihoRo: Depression and Anxiety Romanian Text Corpus

arXiv:2602.18324v1 Announce Type: new Abstract: Psychological corpora in NLP are collections of texts used to analyze human psychology, emotions, and mental health. These texts allow researchers to study psychological constructs, detect mental health issues and analyze emotional language. However, mental health data can be difficult to collect correctly from social media, due to suppositions made by the collectors. A more pragmatic strategy involves gathering data through open-ended questions and then assessing this information with self-report screening surveys. This method was employed successfully for English, a language with a lot of psychological NLP resources. However, this cannot be stated for Romanian, which currently has no open-source mental health corpus. To address this gap, we have created the first corpus for depression and anxiety in Romanian, by utilizing a form with 6 open-ended questions along with the standardized PHQ-9 and GAD-7 screening questionnaires. Consisting

A
Alexandra Ciobotaru, Ana-Maria Bucur, Liviu P. Dinu
· · 1 min read · 2 views

arXiv:2602.18324v1 Announce Type: new Abstract: Psychological corpora in NLP are collections of texts used to analyze human psychology, emotions, and mental health. These texts allow researchers to study psychological constructs, detect mental health issues and analyze emotional language. However, mental health data can be difficult to collect correctly from social media, due to suppositions made by the collectors. A more pragmatic strategy involves gathering data through open-ended questions and then assessing this information with self-report screening surveys. This method was employed successfully for English, a language with a lot of psychological NLP resources. However, this cannot be stated for Romanian, which currently has no open-source mental health corpus. To address this gap, we have created the first corpus for depression and anxiety in Romanian, by utilizing a form with 6 open-ended questions along with the standardized PHQ-9 and GAD-7 screening questionnaires. Consisting of the texts of 205 respondents and although it may seem small, PsihoRo is a first step towards understanding and analyzing texts regarding the mental health of the Romanian population. We employ statistical analysis, text analysis using Romanian LIWC, emotion detection and topic modeling to show what are the most important features of this newly introduced resource to the NLP community.

Executive Summary

The article introduces PsihoRo, the first open-source Romanian text corpus focused on depression and anxiety. Created using open-ended questions and standardized screening questionnaires (PHQ-9 and GAD-7), the corpus includes responses from 205 participants. The authors employ statistical analysis, text analysis with Romanian LIWC, emotion detection, and topic modeling to highlight key features of this resource. While the corpus is a significant first step for Romanian NLP in mental health, its small size and potential biases in data collection are notable limitations. The study underscores the need for more robust mental health corpora in underrepresented languages and offers insights into the emotional and psychological language of Romanian speakers.

Key Points

  • PsihoRo is the first open-source Romanian text corpus for depression and anxiety.
  • Data was collected using open-ended questions and standardized screening questionnaires.
  • The corpus includes responses from 205 participants, analyzed using statistical and text analysis methods.
  • The study highlights the importance of mental health corpora in underrepresented languages.

Merits

Innovation in Romanian NLP

The creation of PsihoRo fills a critical gap in the NLP landscape by providing the first open-source mental health corpus in Romanian, enabling future research in psychological and emotional language analysis.

Methodological Rigor

The use of open-ended questions and standardized screening questionnaires ensures a more reliable and comprehensive data collection process compared to social media data.

Comprehensive Analysis

The study employs a variety of analytical methods, including statistical analysis, text analysis with Romanian LIWC, emotion detection, and topic modeling, providing a multifaceted understanding of the corpus.

Demerits

Small Sample Size

The corpus includes only 205 respondents, which may limit the generalizability of the findings and the robustness of the analytical results.

Potential Bias in Data Collection

The self-report nature of the data collection method may introduce biases, such as response bias or social desirability bias, which could affect the accuracy of the results.

Limited Scope of Analysis

The study focuses primarily on depression and anxiety, which may not capture the full spectrum of mental health issues experienced by the Romanian population.

Expert Commentary

The introduction of PsihoRo marks a significant milestone in the field of NLP and mental health research, particularly for the Romanian language. The corpus addresses a critical gap in the availability of mental health resources in underrepresented languages, offering a valuable tool for researchers and practitioners. The study's methodological approach, which combines open-ended questions with standardized screening questionnaires, ensures a more reliable and comprehensive data collection process. However, the small sample size and potential biases in data collection are notable limitations that must be addressed in future research. The study also raises important ethical considerations, such as the protection of participant privacy and the responsible use of mental health data. Overall, PsihoRo represents a promising first step towards understanding and analyzing the mental health of the Romanian population, and its implications extend to the broader field of NLP and mental health research.

Recommendations

  • Future research should aim to expand the PsihoRo corpus by including a larger and more diverse sample of participants to enhance the generalizability of the findings.
  • Researchers should explore the use of advanced NLP techniques, such as deep learning and machine learning, to further analyze the corpus and uncover more nuanced insights into mental health language.

Sources