We generate comprehensive data extracts of a set of Eclipse projects, including data sources like:
- Software Configuration Management (Eclipse git or GitHub),
- Issues tracking (Bugzilla or GitHub),
- Project metadata (PMI) checks (PMI),
- Licencing and copyrights (Scancode), and
- Static Code Analysis (SonarCloud) when available.
Each dataset is composed of:
- Compressed (gzip’d) CSV and JSON files for tool-specific data.
- A full bundle including all above data files related to a project.
- A R Markdown document that analyses the extracted files and provides some hints about how to use them. This document also serves as a validation step to identify empty or inconsistent datasets.
These datasets are published under the Creative Commons BY-Attribution-Share Alike 4.0 (International) licence. Data is updated weekly, at 2am on Sunday. If you would like to add a project, please submit an issue.