Skip to main content
Academic

Eye-Tracking-while-Reading: A Living Survey of Datasets with Open Library Support

arXiv:2602.19598v1 Announce Type: new Abstract: Eye-tracking-while-reading corpora are a valuable resource for many different disciplines and use cases. Use cases range from studying the cognitive processes underlying reading to machine-learning-based applications, such as gaze-based assessments of reading comprehension. The past decades have seen an increase in the number and size of eye-tracking-while-reading datasets as well as increasing diversity with regard to the stimulus languages covered, the linguistic background of the participants, or accompanying psychometric or demographic data. The spread of data across different disciplines and the lack of data sharing standards across the communities lead to many existing datasets that cannot be easily reused due to a lack of interoperability. In this work, we aim at creating more transparency and clarity with regards to existing datasets and their features across different disciplines by i) presenting an extensive overview of existin

arXiv:2602.19598v1 Announce Type: new Abstract: Eye-tracking-while-reading corpora are a valuable resource for many different disciplines and use cases. Use cases range from studying the cognitive processes underlying reading to machine-learning-based applications, such as gaze-based assessments of reading comprehension. The past decades have seen an increase in the number and size of eye-tracking-while-reading datasets as well as increasing diversity with regard to the stimulus languages covered, the linguistic background of the participants, or accompanying psychometric or demographic data. The spread of data across different disciplines and the lack of data sharing standards across the communities lead to many existing datasets that cannot be easily reused due to a lack of interoperability. In this work, we aim at creating more transparency and clarity with regards to existing datasets and their features across different disciplines by i) presenting an extensive overview of existing datasets, ii) simplifying the sharing of newly created datasets by publishing a living overview online, https://dili-lab.github.io/datasets.html, presenting over 45 features for each dataset, and iii) integrating all publicly available datasets into the Python package pymovements which offers an eye-tracking datasets library. By doing so, we aim to strengthen the FAIR principles in eye-tracking-while-reading research and promote good scientific practices, such as reproducing and replicating studies.

Executive Summary

The article 'Eye-Tracking-while-Reading: A Living Survey of Datasets with Open Library Support' provides a comprehensive overview of eye-tracking-while-reading datasets, highlighting their interdisciplinary value and the challenges associated with data sharing and interoperability. The authors present an extensive survey of existing datasets, create a living online overview to facilitate dataset sharing, and integrate publicly available datasets into the Python package pymovements. This work aims to enhance the FAIR (Findable, Accessible, Interoperable, Reusable) principles in eye-tracking research and promote good scientific practices.

Key Points

  • Eye-tracking-while-reading datasets are valuable for cognitive and machine-learning applications.
  • The lack of data sharing standards leads to interoperability issues.
  • The authors present an extensive overview of existing datasets and create a living online survey.
  • Publicly available datasets are integrated into the Python package pymovements.
  • The work aims to strengthen FAIR principles and promote good scientific practices.

Merits

Comprehensive Survey

The article provides an extensive and detailed overview of existing eye-tracking-while-reading datasets, which is crucial for researchers across various disciplines.

Promotion of FAIR Principles

By creating a living online survey and integrating datasets into a Python package, the authors actively promote the FAIR principles, making datasets more findable, accessible, interoperable, and reusable.

Facilitation of Data Sharing

The living online survey simplifies the sharing of newly created datasets, fostering collaboration and reproducibility in research.

Demerits

Potential Bias in Dataset Selection

The survey may be subject to biases in the selection of datasets, which could limit the generalizability of the findings.

Technical Barriers

The integration of datasets into the Python package pymovements may present technical barriers for researchers who are not familiar with Python or programming.

Limited Scope

The article focuses primarily on eye-tracking-while-reading datasets, which may not fully capture the diversity of eye-tracking applications in other fields.

Expert Commentary

The article 'Eye-Tracking-while-Reading: A Living Survey of Datasets with Open Library Support' makes a significant contribution to the field of eye-tracking research by providing a comprehensive survey of existing datasets and promoting the FAIR principles. The creation of a living online survey and the integration of datasets into the Python package pymovements are commendable efforts that will undoubtedly facilitate data sharing and collaboration. However, the potential biases in dataset selection and the technical barriers associated with the Python package are important considerations that should be addressed. The article also highlights the broader ethical and privacy concerns related to data sharing, which are critical issues that need to be carefully managed. Overall, this work sets a strong foundation for enhancing the reproducibility and interoperability of eye-tracking-while-reading datasets, and it is a valuable resource for researchers in this field.

Recommendations

  • Future research should aim to address the potential biases in dataset selection and ensure a more comprehensive and representative survey of eye-tracking-while-reading datasets.
  • Efforts should be made to lower technical barriers by providing user-friendly tools and resources for researchers who may not be familiar with programming or the Python package pymovements.

Sources