UniSkill: A Dataset for Matching University Curricula to Professional Competencies
arXiv:2603.03134v1 Announce Type: new Abstract: Skill extraction and recommendation systems have been studied from recruiter, applicant, and education perspectives. While AI applications in job advertisements have received broad attention, deficiencies in the instructed skills side remain a challenge. In this work, we address the scarcity of publicly available datasets by releasing both manually annotated and synthetic datasets of skills from the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and university course pairs and publishing corresponding annotation guidelines. Specifically, we match graduate-level university courses with skills from the Systems Analysts and Management and Organization Analyst ESCO occupation groups at two granularities: course title with a skill, and course sentence with a skill. We train language models on this dataset to serve as a baseline for retrieval and recommendation systems for course-to-skill and skill-to-course match
arXiv:2603.03134v1 Announce Type: new Abstract: Skill extraction and recommendation systems have been studied from recruiter, applicant, and education perspectives. While AI applications in job advertisements have received broad attention, deficiencies in the instructed skills side remain a challenge. In this work, we address the scarcity of publicly available datasets by releasing both manually annotated and synthetic datasets of skills from the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and university course pairs and publishing corresponding annotation guidelines. Specifically, we match graduate-level university courses with skills from the Systems Analysts and Management and Organization Analyst ESCO occupation groups at two granularities: course title with a skill, and course sentence with a skill. We train language models on this dataset to serve as a baseline for retrieval and recommendation systems for course-to-skill and skill-to-course matching. We evaluate the models on a portion of the annotated data. Our BERT model achieves 87% F1-score, showing that course and skill matching is a feasible task.
Executive Summary
This article presents UniSkill, a dataset designed to match university curricula with professional competencies. The dataset comprises manually annotated and synthetic skills from the ESCO taxonomy and university course pairs. The authors train language models on this dataset, achieving an 87% F1-score on course and skill matching. This work addresses the scarcity of publicly available datasets in the instructed skills side, providing a baseline for retrieval and recommendation systems. The UniSkill dataset has the potential to facilitate more effective skill extraction and recommendation systems, ultimately benefiting education and employment.
Key Points
- ▸ UniSkill is a new dataset for matching university curricula to professional competencies
- ▸ The dataset includes manually annotated and synthetic skills from the ESCO taxonomy and university course pairs
- ▸ The authors train language models on the UniSkill dataset, achieving an 87% F1-score on course and skill matching
Merits
Strength in Addressing Dataset Scarcity
The UniSkill dataset fills a significant gap in the availability of publicly accessible datasets for instructed skills, enabling researchers to develop more effective skill extraction and recommendation systems.
Implications for Education and Employment
The UniSkill dataset has the potential to improve the alignment between university curricula and professional competencies, ultimately benefiting education and employment outcomes.
Demerits
Limited Scope of the ESCO Taxonomy
The ESCO taxonomy, while comprehensive for European skills and competencies, may not be universally applicable, potentially limiting the generalizability of the UniSkill dataset.
Need for Further Validation
The dataset's performance, as demonstrated by the 87% F1-score, is promising but may not be representative of real-world scenarios; further validation and testing are necessary to ensure the dataset's reliability.
Expert Commentary
The UniSkill dataset is a significant contribution to the field of education technology, addressing the scarcity of publicly available datasets for instructed skills. The authors' use of language models to match university curricula with professional competencies is a promising approach. However, the limitations of the ESCO taxonomy and the need for further validation should be acknowledged. The implications of the UniSkill dataset are far-reaching, with potential applications in education and employment. As researchers and policymakers, it is essential to continue exploring and refining this dataset to ensure its reliability and effectiveness.
Recommendations
- ✓ Further research is needed to validate the UniSkill dataset and explore its generalizability beyond European skills and competencies.
- ✓ The development of more advanced language models and the incorporation of additional data sources, such as industry partners and professional associations, can enhance the accuracy and relevance of the UniSkill dataset.