Academic

UniSkill: A Dataset for Matching University Curricula to Professional Competencies

arXiv:2603.03134v1 Announce Type: new Abstract: Skill extraction and recommendation systems have been studied from recruiter, applicant, and education perspectives. While AI applications in job advertisements have received broad attention, deficiencies in the instructed skills side remain a challenge. In this work, we address the scarcity of publicly available datasets by releasing both manually annotated and synthetic datasets of skills from the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and university course pairs and publishing corresponding annotation guidelines. Specifically, we match graduate-level university courses with skills from the Systems Analysts and Management and Organization Analyst ESCO occupation groups at two granularities: course title with a skill, and course sentence with a skill. We train language models on this dataset to serve as a baseline for retrieval and recommendation systems for course-to-skill and skill-to-course match

Nurlan Musazade, Joszef Mezei, Mike Zhang · March 5, 2026 · 1 min read · 27 views

#cs.CL

Executive Summary

This article presents UniSkill, a dataset designed to match university curricula with professional competencies. The dataset comprises manually annotated and synthetic skills from the ESCO taxonomy and university course pairs. The authors train language models on this dataset, achieving an 87% F1-score on course and skill matching. This work addresses the scarcity of publicly available datasets in the instructed skills side, providing a baseline for retrieval and recommendation systems. The UniSkill dataset has the potential to facilitate more effective skill extraction and recommendation systems, ultimately benefiting education and employment.

Key Points

▸ UniSkill is a new dataset for matching university curricula to professional competencies
▸ The dataset includes manually annotated and synthetic skills from the ESCO taxonomy and university course pairs
▸ The authors train language models on the UniSkill dataset, achieving an 87% F1-score on course and skill matching

Merits

Strength in Addressing Dataset Scarcity

The UniSkill dataset fills a significant gap in the availability of publicly accessible datasets for instructed skills, enabling researchers to develop more effective skill extraction and recommendation systems.

Implications for Education and Employment

The UniSkill dataset has the potential to improve the alignment between university curricula and professional competencies, ultimately benefiting education and employment outcomes.

Demerits

Limited Scope of the ESCO Taxonomy

The ESCO taxonomy, while comprehensive for European skills and competencies, may not be universally applicable, potentially limiting the generalizability of the UniSkill dataset.

Need for Further Validation

The dataset's performance, as demonstrated by the 87% F1-score, is promising but may not be representative of real-world scenarios; further validation and testing are necessary to ensure the dataset's reliability.

Expert Commentary

The UniSkill dataset is a significant contribution to the field of education technology, addressing the scarcity of publicly available datasets for instructed skills. The authors' use of language models to match university curricula with professional competencies is a promising approach. However, the limitations of the ESCO taxonomy and the need for further validation should be acknowledged. The implications of the UniSkill dataset are far-reaching, with potential applications in education and employment. As researchers and policymakers, it is essential to continue exploring and refining this dataset to ensure its reliability and effectiveness.

Recommendations

✓ Further research is needed to validate the UniSkill dataset and explore its generalizability beyond European skills and competencies.
✓ The development of more advanced language models and the incorporation of additional data sources, such as industry partners and professional associations, can enhance the accuracy and relevance of the UniSkill dataset.

Sources

arXiv - cs.CL

UniSkill: A Dataset for Matching University Curricula to Professional Competencies

AI Commentary

Executive Summary

Key Points

Merits

Strength in Addressing Dataset Scarcity

Implications for Education and Employment

Demerits

Limited Scope of the ESCO Taxonomy

Need for Further Validation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs