Friday, September 2, 2011

Updates to the DDbDP and PN (

Just posted to papylist:

Dear Colleagues:

I write with some updates concerning the DDbDP and PN.

First, we are most pleased to welcome Mark Depauw to the Editorial Boards of the DDbDP and HGV; and it is also our great pleasure to welcome W. Graham Claytor as Assistant Editor of the DDbDP. Both will be great assets to the team.

The latest release features a number of improvements to the PN search and display, many of which you will have seen already; these include line numbers in search returns and enhanced tabular display of search/browse hits. We look forward to releasing vastly improved and significantly redesigned search functionality in the coming weeks. You may also have noticed that we are starting to change the look of the search interface; we are in process of tightening the integration of the navigator and editor and bits of the new appearance are already showing.

Frequent users of SoSOL/Papyrological Editor will be happy to know that we now offer syntax for multiple alternate readings, e.g. <:στρατ[ηγὸς]||alt||στρατ[ηλάτης]|στρατ[ηγήσαντα]:>. The editor also offers enhanced display controls for introductory and line-by-line commentary: bold, italics, underline, footnotes, embedded links to PN (DDbDP/HGV/APIS) and any other website; in the next couple months we shall also support controlled linking to bibliography.

We have also very nearly finished entering CPR 25, CPR 30, P.Heid. 9, O. Stras. 2, and O.Abu Mina; the bugs that we encountered with P.Köln have now been fixed, so that we hope to move quickly to finish those volumes as well. We have also begun entering P.Count and continue (slowly) to enter SB 26. There is much to do, but since rolling out the new system we have entered well over 2000 texts. Joyous thanks to the many and good-spirited contributors who do this work without any pay and in service to our field.

I also want to alert the community to a development currently under way, whose fruits you will start to see in the DDbDP. It has long been DDbDP practice to place orthographically/grammatically normalized forms, and even outright *corrected* forms, in the text and the ancient reading in the apparatus, e.g. TEXT: ἀνωμολογήσαντο, APP: ανομολογησαντο. This is of course not in keeping with papyrological convention and we have heard (loud and clear) over the years that colleagues would like to see this practice ‘flipped’.

We have begun that process. This means putting accents, breathing marks, and capitalization (and in many many cases Leiden mark-up, including line-breaks) in the appropriate places in the original reading; on 70,000 unique strings (more than 90,000 cases). This is, as you might imagine, a big job and we can neither convert every single case nor verify autoptically every case that we can convert. This cannot be done by hand on any reasonable timeframe or budget. But thanks to generous assistance from the TLG and some excellent programming work done at the Department of Digital Humanities King’s College London, we now can convert over 75% of these cases to a fairly high degree of accuracy. For example, we can take the original reading as it currently stands, αροστοι, collate it against the regularized reading, ἄρρωστοι, and produce ἄροστοι, which we will soon display in the text (and “read ἄρρωστοι” in the app).

The process is driven by a large table of generalizable equivalencies; the script does not know rules for accenting non-standard forms; rather, it sees that an original form is γιτωνεις, sees also that the regularized form is γείτονες, and knows also that ει/ι, ω/ο, and ε/ει are often in exchange, and so it collates the two forms and moves the accent over from the ί to the εί. It works very well, but we know that we will be introducing some forms whose accents may/will be incorrect; intensive checking on a sample of 10,000 unique strings suggests that such will be quite few, but there will be mistakes. Expect some nouns to have incorrect accents where the regularization involved change of case; expect to see a few iota subscripts where they do not belong or omitted where they do). Some cases cannot be correctly converted automatically; where γυναικος is normalized to γυναῖκες the computer does not ‘know’ whether the normalization is phonetic (in which case → γυναῖκος) or morphological (in which case → γυναικὸς); some such cases may be converted where you would prefer them not to be and some left unconverted where you would prefer otherwise. There will also be a relatively small number of infelicitous conversions, e.g. αιλαεινα → αἰλάεϊνα (regularized: ἐλάϊνα). To avoid all of these errors would mean not (correctly) converting many many more clear cases. So, we tolerate some mess.

Forms that we cannot convert by this collation process, we will pass through the TLG morphological engine; this will catch many but not all of the remaining examples. Names, for example, are less than fully cotrolled by TLG, so that in many cases we can only partially convert a string. So, where we have corrected to Ἐσμῖνι from εσμινιος, we cannot simply move the accent over (→ Ἐσμῖνιος); and since our system does not know accentuation rules, we have no automatic way to generate the form ᾿Εσμίνιος; but we do know that the regularized form begins with a capital letter and smooth breathing, so that we can output ᾿Εσμινιος, which at least indicates the class of noun and is better than nothing!

Errors, infelicities, and partial conversions can of course be corrected via SoSOL by anyone who wishes! Some degree of mess is simply the cost of delivering the type of text and apparatus that the community wants. And we will clean up the mess over time and, I hope, together. In any case, know that we have not changed the *spelling* of such original readings--only the accentuation; so, in this crucial way the integrity of the data remains intact. And some of the successes are pretty spectacular, thanks to the hard work of the team: for instance, the original επειγιμαινον, which was normalized to ἐπικείμενον, is ‘correctly’ converted to ἐπειγίμαινον!

In the next month or two we should have apparatus criticus syntax that is both much better than the status quo and deploys the new handling of such regularizations.

As always, please feel free to contact Rodney Ast (, James Cowey (, and myself with questions, comments, complaints about the DDbDP, and Hugh Cayless ( with the same regarding PN functionality.

Josh Sosin

No comments:

Post a Comment