Thursday, January 20, 2011

IDP Data available on GitHub

The data for the Integrating Digital Papyrology (IDP) project is now available on GitHub: https://github.com/papyri/idp.data. This encompasses data from the DDbDP and HGV used to run papyri.info (some 55,000+ papyrological texts) - all in EpiDoc XML and made available under a Creative Commons license.

What does this mean for users?
  • Anyone can browse the available data, and see any document's revision history.
  • The repository can easily be downloaded or cloned, enabling re-use.
  • Additionally, copies of the repository can be kept up-to-date and watched for changes.
  • Changes can also be monitored using a variety of features made available by GitHub (generated Atom feeds, built-in "watch" functionality, etc.).
  • Users can fork the repository on GitHub, and contribute suggestions, modifications, and emendations en masse by making a pull request.
Changes from the Papyrological Editor which have undergone editorial approval are updated hourly on the 'sosol' branch. In fact, much of this is made possible by the Papyrological Editor's use of the SoSOL tooling developed under the previous phase of the IDP project, as all of its internal revision control is done with Git. This is used to allow users to edit and submit changes to large numbers of files simultaneously with minimal overhead for merging changes together, a process which can be visualized quite well on the GitHub copy of the repository. The repository itself is relatively large (a checkout with the complete history is around 2GB), as it includes the revision history of the dataset going back several years.

We're excited to make this resource available, as we feel it represents an enabling of true community ownership of the data.