Friday, April 20, 2012

PN Search Updates


just posted to papy-list:

Dear Colleagues:

We write with recent updates to the PN.

(1) Search facets now feature auto-complete combo-boxes instead of simple drop-down menus. So, let’s say you search for #οικονομ (strings that start with οικονομ); 913 hits. You are really just interested in the Hermopolite and so you click into the ‘Nome’ facet. As you start to type ‘he’ you see your options shrinking...1 Heliopolite, 1 Heptakomias, 80 Herakleopolite, and 24 Hermopolite. Select Hermopolite and see your 24 hits. The same sort of thing should work with the other facets as well.

(2) APIS Collection and Publication Series facets now play nicely together. Let’s say you remember that a Berkeley papyrus was published in P.Coll.Youtie, but you cannot remember which and you do not have the book to hand. Select Berkeley from the ‘Collection’ facet; 4316 hits. Select ‘DDBDP: p.coll.youtie (2)’ from the facet box, or else start to type, p.coll; 2 hits. Find P.Coll.Youtie 12.

(3) Improved indexing of regularizations. Let’s say you are interested in documents whose first word after the greeting χαίρειν is βούλομαι vel sim. You search for "χαιρειν# #βουλ" (with the quotes; a string equal to or ending in χαιρειν followed immediately by a new-word string starting βουλ); 11 hits. But you will miss Chrest.Wilck. 323, since lin.7-8 read χέρειν(*). βουλό|μεθα.

Why? Because χαίρειν is in this case not, strictly speaking, adjacent to βουλ; χερειν is. This is an important fact to master.

In this particular case, it means that, knowing that χαίρειν is often rendered as χέρειν, you must search carefully. You can either

(a) enter search query "χαιρειν# #βουλ", then click OR and in the new search box enter "χερειν# #βουλ"; 12 hits
or
(b) search for REGEX χ(αι|ε)ρειν\sβουλ; 12 hits. The regular expression for ‘αι or ε’ is (αι|ε) and the regular expression for space is \s.

Note that under both of these searches PN *finds* Chrest.Wilck. 323 but does not properly *highlight* the query for that text. This is a known bug; we are working on it.

(4) We now support abbreviation-aware searching. Say you are looking for abbreviations beginning πρ. Search for #πρ° (enter πρ and then click the abbr button; ° indicates opening parenthesis); 862 hits. NOTE: this new feature has some known bugs:

(a) highlighting: if you search for #πρ° in your first hit you will see highlighting on both Πρ(ώτων), which you expect, and on Πρώτων, which you do not. We are working on it.

(b) highlighting: if you search for τιθ° you will see that in your 6th hit (BGU 9.1891) none of the highlighted hits is τιθ( , which is what you expect to see. But if you click to view the text, you will see (lin.169, 194, 220 etc.) Τιθ(οείους), which is what you expect to see. This isn’t very helpful; you want the right hit on the search results page. We are working on it.

(c) false positives: if you search for αρ° you will find in your second hit (BGU 1.4) that none of the highlighted hits includes αρ( , which is what you expect to find. And (ἑκατοντάρ)χ(ῃ) and χ(ιλιά)ρ(χῃ) (both in lin.1) even seem misleading: in neither case is αρ on the papyrus at all. We are working on it; this particular fix may even be live by Monday, and in any case quite soon.

(d) XML error bug: if you search for °ετους, expecting many many many hits, you find only ten! There is a bug in the search, whose fix will be in soon (along with (c) above); but there are also odd bugs in the XML, which we’ll fix.

As always, if you have questions, please feel free to contact us. Generally best to write to ast@uni-heidelberg.de, hugh.cayless@nyu.edu, james.cowey@urz.uni-heidelberg.de, and joshua.sosin@duke.edu. Or just write the papy-list.

All best,
josh sosin

Friday, April 6, 2012

Aegyptus [1920– 2006] at JSTOR

Aegyptus 1920-2006 (Anni 1-86) (Previous Title: Studi della Scuola papirologica
[1915–1920])
ISSN: 0001-9046
Aegyptus, the Italian Egyptology and Papyrology journal, was founded in 1920
by Aristide Calderini and directed by him until his death (1968). The direction
was then entrusted to Orsolina Montevecchi until year n. 80 (2000). Since year
81 (2001) the Director has been Rosario Pintaudi. The editing is care of the
Papyrology School of the Catholic University of Milan. A general index of the first
50 years (1920-1970) can be found in “Studia Amstelodamensia” II (1974) edited
by S.M.E. van Lith. This specialized magazine publishes articles of Egyptology,
Greek and Coptic Papyrology written by Italian and foreigner  scholars, in Italian,
French, English, Spanish and German.

is now available in the Arts & Sciences IX collection at JSTOR to institutions with
subscriptions.

You will find a link to it in AWOL.

And see also The Ancient World in JSTOR: AWOL's full list of journals in JSTOR with substantial representation of the Ancient World.

Thursday, March 1, 2012

IDP Updates

This just sent to papylist:
= = =
Colleagues:

I write with news of recent IDP updates. Frequent users will have noticed already that we have just released significant enhancements to PN search capabilities. Thanks go to our colleagues Rodney Ast and James Cowey, and especially to our fantastic programmers Hugh Cayless and Tim Hill.

Some of the changes are intuitive, others less so. If you visit our search checklist :
https://docs.google.com/spreadsheet/ccc?key=0Ajkz6D9lOd20dGtwczJSUXVhWUx6ZEJCUEhvR1lIYVE#gid=1
and click on the SearchDocumentation tab at the bottom, you will see a more clearly emerging set of search instructions. This will be a useful page to bookmark.

Note that Boolean operators are now deployable with buttons. You will note also that sometimes when you click on a particular operator PN will automatically open another search box for you. For example if you search for πρωτ* THEN αρσιν* within 5 words, and then click ‘and’, PN will open a new box for you and insert the operator ‘AND’. Note also that PN greys out some operator buttons some of the time. IMPORTANT : both of these features are designed to help you avoid running searches that are impossible or so computationally intensive that they will cause disruptions for the wider community. You can type the operators into the search box and you can construct complicated searches in a single search box, but you run the very real risk of a timeout or fail. Please try to help us keep things running smoothly for everyone by using the buttons and not ‘hacking’ the search box!

We now support proximity searches in two forms:

(1) NEAR = word/string X within n words or characters of word/string Y in either direction
(2) THEN = word/string X followed by word/string Y, within n words or characters=

Note: NEAR and AND are not the same thing. X AND Y finds documents containing X and Y anywhere within them. A search for X NEAR Y must specify within n characters/words and will only find documents in which X and Y co-occur within the specified distance.

So, if you want to find documents containing a string beginning σπεν- followed immediately by a string -υχεσθ- then enter σπεν* THEN *χεσθ* within 1 word. Note that * indicates wildcards (the absence of a word boundary). You will find one hit, P.Lond. VII 2193.9: σ̣π̣έν̣δοντ̣ες εὐχέσθωισαν. Note that the highlighting and hit-quotation does not work on this search. This is a *known bug* and we are working to fix it.

Note that the PN treats strings and words very differently and this can affect the way you must run certain searches. Take the previous example. A search for σπεν THEN χεσθ within 10 characters will return one hit, P.Lond. VII 2193.9: σ̣π̣έν̣δοντ̣ες εὐχέσθωισαν (with the highlighting bug). However, a search for σπεν THEN χεσθ within 1 word WILL NOT WORK at all. This is because proximity searching looks by default for words not strings. Thus, if you are running a proximity search you must include * to indicate the part of the word that you are looking for. Again, when you constrain a proximity search by characters you are in effect running a substring search; but when you constrain a proximity search by words you are searching for words and so must use asterisks to indicate that you have provided only a part of the word. If you constrain the search by characters you do not need asterisks, because PN automatically treats the queried terms as strings (so πεν finds σπενδ).

Sometimes you can reach the same or similar results via different search strategies. For example, say you want to find documents containing the precise phrase "ομολογουμεν πεπρακεναι" and the precise phrase "του νυν". You can enter:

(1) enter "ομολογουμεν πεπρακεναι" AND "του νυν" in a single box
(2) enter "ομολογουμεν πεπρακεναι", click AND and in the second box enter "του νυν" after the AND, which PN will enter for you.
(3) enter "ομολογουμεν πεπρακεναι" then click search; then refine search with "του νυν"

Lexical searching can be combined with string searching. Let’s say you want to find documents containing any form of the word ἀνήκω followed by the string beginning υπηρετ- within 5 words. Enter LEX ἀνήκω NEAR υπηρετ* within 5 words (you will get 2 hits).

You can use the radio buttons to search Text, Metadata (=HGV and APIS), or Translations. But you can also restrict searches to either HGV or APIS. So a search for HGV:witnesses will return 5 hits; a search for APIS:witnesses will return 87 hits; a search for witnesses alone, with the ‘Metadata’ radio button selected will return 92 hits.

Phrases of more than 2 words are now searchable: enter "οἱ ἐκ τῆς"

For those of you who are comfortable with regular expressions, we now offer support for regex searches. If you want to find documents containing a string beginning αυτο- within 20 characters of a string beginning και-, but not within 20 words of a complete word αδριανου, enter: REGEX αυτο\b.{1,20}\bκαι\s(?!(\S+\s){1,20}αδριανου)

Our best advice is to play around with the new interface and its capabilities, try searches from the SearchDocumentation list. Everything that we have marked as working should work. If something appears not to, let us know. If a particularly important combination of searches does not appear on the list, let us know; we can add it. Let us know also whether that particular combination works or does not, so that we can test and confirm. If anything is especially weird or puzzling please be sure to give us as much information as possible (the precise steps you followed, the browser you are using, etc); even attach screenshots if you like. The more information we have the better we can address problems that you encounter.

Once you get used to the changes, we hope and think you will be as pleased with them as we are.

Best,
Josh Sosin

Thursday, December 1, 2011

papyri.info updates

This just posted by sosin to the papylist:

Colleagues:

I write with some IDP updates.

(1) By now you will have noticed that we have rolled out the new search interface. This is an entirely new way of doing search and will take some getting used to.

A crude test scenario:

Go to http://papyri.info/search . The search screen is divided into two parts. Search results appear on the right and search filters appear on the left.

The primary difference between the old and the new is that rather than coming up with a single search query that aims to give you exactly what you want, it is now possible to begin with a more open-ended search and successively narrow and expand it until you reach a desired end.

So, let's say your class is interested in literacy. Run an initial search for #αγραμματ: enter #αγραμματ in the search box or select "Convert from betacode as you type" and enter #agrammat (by the way, this is a good option if you want to run searches from your smartphone, which probably does not have Unicode Greek); click 'Search'. This will give you words beginning with αγραμματ (# indicates a boundary; you will not find διαγρραμματ)

You get 446 hits--too many to show your class. You select (toward the bottom) "Show only records with images from:" / "Papyri.info" and click 'Go'. This will narrow your found set down to the 16 texts that contain a word beginning with αγραμματ and are known (via APIS) to Papyri.info. But there must be more: HGV knows about a great many links to other sites (external to papyri.info). So, you select "Other sites" and 'Go'. Now you have 221 records for which some digital image is known, whether via APIS or HGV. Add "Print publications" and you will see that papyri.info knows of a total of 337 texts that have *some* image associated with them, whether digital or print.

Your class is working on texts from the first 2 centuries CE. So, you set "Date on or after" at "1 CE" and "Date on or before" to "200 CE" and click Go. That's 45 hits. The classroom where you are teaching has digital projection, but no access to books. So, you *remove* the "Print images exist" filter by clicking it away from the top of the right side of your screen. That's 28 hits; you can actually look at several of these in class.

A student asks how many of these were written on behalf of a man or a woman. Well, it wouldn't be a foolproof test, but you can now add another Greek search as an additional filter. So, enter "υπερ αυτης" (as "Word/Phrase search", which assumes that the strings that you enter are complete words) or "υπερ# #αυτης" (as "Substring search", which assumes that the strings you enter are fragments, unless you indicate a boundary with #). Now we have 9 hits.

Not all of the students in the class have strong Greek; so, you also set "Translation language" to "English" and you are down to 3 hits. Note: these filters are defined by what is in the system, so that if 5 of these texts had been translated into, say, Italian, you would have found that as one of your filter options. At the moment, most of the translations known to papyri.info come from APIS and so are in English.

You may remove any of these filters by deselecting them (click the X next to each at the top of the window). Remember to remove all such queries before starting a completely *new* search (or click on the “Reset all” button.)

This is a somewhat silly example, but it should give you the basic idea of how this new approach works and how it differs from the old. It takes some getting used to, but once all of the functionality is in place you will find that it is a much more powerful and flexible manner of searching.

Also, if you would like to track our progress toward implementing more complicated search functionality, you may feel free to look at the following Google Doc:
https://docs.google.com/spreadsheet/ccc?key=0Ajkz6D9lOd20dGtwczJSUXVhWUx6ZEJCUEhvR1lIYVE&hl=en_US#gid=0
It is meant for our own internal use and so may not make perfect sense to you, but you should probably get the idea. Y = we expect this to work now. N = we expect this not to work now. dev = our test / development server. prodo = the production version of papyri.info, which you use. If your favorite search pattern is not there, and if you cannot achieve the same result with a combination of filters, we encourage you, please, to *send us an email*. If you don’t report an error or omission, there is a good chance we won’t know to correct it!

(2) You will have noted also that tens of thousands of ‘original’ readings now have diacriticals. So, where the scribe wrote αναδεχομε, we put the normalized “ἀναδέχομαι” in the text and now print “ἀναδέχομε papyrus” in the apparatus. Before the year is out we shall invert this practice, printing “ἀναδέχομε” in the text and “read ἀναδέχομαι” in the apparatus. This first step has been a bigger task than we had imagined, and we owe a great deal to the brilliant and devoted work of Faith Lawrence at the Department of Digital Humanities, King’s College London. A few thousand words still lack accents; please feel free to add such and submit them to us. Some small number or errors have been introduced along the way (outright mistakes, which will be clearly apparent as nonsense); we are hacking away at those, fixing as many of them as we can; but if you notice any please feel free to correct and submit, or else (if is seems to be a systematic kind of error) please just alert us by email.

(3) Note also that the process of adding accents and reversing the display-regime of regularizations was more difficult in the case of regularized words that wrapped from the end of one line to the beginning of the next. Many thousands of these we handled automatically; about 6000 or so still need to be fixed. We think we are in position to have 80-90% fixed in the next couple weeks. In the meantime, all lines affected by this are flagged with a bright red line number--you may have noticed this already. In all such lines you will see that chunks of words appear oddly at the end or beginning of an affected line; the apparatus entries for all such will look fine.

(4) The line-by-line commentary feature has a bug, which will create a dozen or so clones of some commentary entries...this is very frustrating. It should be fixed before the new year. Just don’t enter any commentary in the next few weeks!

(5) Frequent Papyrological Editor users will notice that we have radically overhauled our apparatus criticus capabilities.

* for BL corrections: <:αἱ τοῦ=BL 9.17|ed|Θίτου:>
* for corrections proposed in journals/books: <:(διαγρ(άφου))=N. Gonis, ZPE 143 (2003) 150|ed|(διαγρ(αφῆς)):>
* for corrections proposed direct to DDbDP: <:τοῦ=PN G. Claytor (CPR VI plate 35)|ed|:>
* we now support multiple regularizations: <:ἀνοίγεται (?)|ἀνοίεται (?)||reg||ἀ̣νύεται:>
* we now support ‘regularizations’ by language (useful in multi-lingual texts for example): <:ἄρακος=grc|reg|ⲁⲣⲁⲕ:>
* we now support combinations of all of the above, as the following *fictional* example illustrates:

275a. <:στρ[ατηγὸς]=BL 15.2||ed||
στρ[ατηλάτης]=J. Cowey, ZPE 150 (2020) 321-323|
στρ[ατιώτης]=R. Ast, CdE 100 (2018) 13-15 (BL 14.5)|
Συρ[ίων]=Original Edition:>

* And we support extremely complicated combinations (including nesting of virtually every type of apparatus tag, as the following *fictional* example illustrates:

75. <:<:στρ[ατηγὸς]|subst|<:σ.2[.?]|alt|γ.3[.?]:>:>=BL 19.2||ed||
<:<:στρ[ατηλάτης]|reg|ξ̣τ̣ρ[ατηλάτης]:>|alt|.1γρ[.?]:>=J. Cowey, ZPE 200 (2020) 321-323|
<:<:στρ[ατιώτης]|alt|στρ[ατηγία]:>|reg|στυ̣ρ[ατ][.?]:>=R. Ast, CdE 100 (2018) 13-15 (BL 14.5)|
<:Συρ[ίων](?)|reg|<:<:Σο̣υ̣ρ[ίων]||alt||Συ̣υ̣ρ[ίων]|Σω̣υ̣ρ[ίων]:>|subst|Σ.2ρ[ίων]:>:>=Original Edition:>

This means:
(i) at line 275 the DDbDP prints στρ[ατηγὸς], which the scribe himself corrected from either "σ . . [ca.?]" or "γ . . . [ca.?]", and which is recorded in BL vol.19 p.2

(ii) previously, Cowey had argued (in ZPE 200) for correcting the text to either στρ[ατηλάτης], which is a modern regularization of ξ̣τ̣ρ[ατηλάτης], or to ". γρ[ca.?]"

(iii) before Cowey, Ast had suggested (in CdÉ 100) that the papyrus reads στυ̣ρ[ατ- ca.?], which should be regularized either to στρ[ατιώτης] or to στρ[ατηγία]; this was subsequently picked up by BL 14.5

(iv) The original editors of the papyrus thought that the scribe had originally written "Σ . . ρ[ίων]", and then corrected it to either Σο̣υ̣ρ[ίων] or Συ̣υ̣ρ[ίων] or Σω̣υ̣ρ[ίων], any one of which should perhaps be regularized to Συρ[ίων]

The PN will display στρ[ατηγὸς] in the text and in the app: 275. corr. ex σ ̣ ̣[ -ca.?- ] (or γ ̣ ̣ ̣[ -ca.?- ]) BL 19.2 : ξ̣τ̣ρ[ατηλάτης] (l. στρ[ατηλάτης]) (or ̣γρ[ -ca.?- ]) J. Cowey, ZPE 200 (2020) 321-323 : στυ̣ρ[ατ -ca.?- ] (l. στρ[ατιώτης (or στρ[ατηγία])) R. Ast, CdE 100 (2018) 13-15 (BL 14.5) : Σο̣υ̣ρ[ίων] (or Συ̣υ̣ρ[ίων] or Σω̣υ̣ρ[ίων]) (corr. ex Σ ̣ ̣ρ[ίων]) (l. Συρ[ίων]) Original Edition. This is not *precisely* what you expect to find in a print publication, but it is full, clear, and quite unambiguous.

This is, we think, a huge achievement and a great good; we have Gabby Bodard at King’s College and Jon Fox at the University of Kentucky to thank!

The Leiden+ Documentation page should now reflect most/all of these improvements: (http://papyri.info/editor/documentation?docotype=text); please let us know if we have missed something. Also, please note that the Apparatus “Helpers” (for BL, Editorial, and SoSOL) are now buggy as a result of the changes but should be fixed before the new year. In the meantime, enter apparatus entries by hand.

(6) I shall also mention just briefly that thanks to the extraordinary generosity and collegiality of our colleagues in Brussels, Alain Martin, Paul Heilporn, and Alain Delattre, we have begun a process of surfacing Bibliographie Papyrologique data via the PN. Our Heidelberg colleagues James Cowey and Carmen Lanz did an amazing amount of work to make this happen. You will see that there are still many bugs in the conversion of the BP records to structured XML, but we are getting there one step at a time. From the navigation bar at the top of the PN select Search / Bibliography. This will take you to a very *simple* search screen, where you may enter, for example, “Hombert” (no quotation marks) (http://papyri.info/bibliosearch?q=Hombert); or you may constrain the search by BP fields. So, if you enter “author:Hombert; title:bibliographie; date:1932” (no quotation marks) you will get 2 records (http://papyri.info/bibliosearch?q=author%3AHombert%3B+title%3Abibliographie%3B+date%3A1932+). This works also for BP subject codes; so, for example, “index:146” will find 1106 records concerning Archives. The FileMaker version of the BP remains much more flexible, powerful, and manipulable, for those users who are so inclined. In the next few weeks we shall be able to link from BP records to DDbDP texts and from DDbDP texts to BP records. This is only a start; we expect this service to improve dramatically over the coming months, soon with the ability to add/correct records, and submit them for review by the BP editorial team.

This message is already far too long. So, enough for now.

As always, please feel free to send questions, comments, complaints to Josh Sosin jds15@duke.edu, Rodney Ast ast@uni-heidelberg.de, and James Cowey james.cowey@urz.uni-heidelberg.de; same, with regard to PN performance to Hugh Cayless .

All best,
Rodney Ast
James Cowey
Josh Sosin

Wednesday, November 16, 2011

stable identifiers

Trismegistos now has stable identifiers for the following items:

- Texts: e.g. www.trismegistos.org/text/4563
- Archives: e.g. www.trismegistos.org/archive/364
- Collections: e.g. www.trismegistos.org/collection/234
- Places: e.g. www.trismegistos.org/place/264
- Names: e.g. www.trismegistos.org/name/1

We are still developing TM People and a stable identifier for individuals will be implemented.

For TM,

Mark

Sunday, September 25, 2011

Trismegistos People

Just posted to the PAPY-list:

Dear colleagues,

New travels fast in these days of Facebook and Twitter ...
Although we had hoped to develop the system for proposing corrections and adding names before sending this email, it seems better to announce the launch of a beta version of Trismegistos People now.

Trismegistos People consists of a complex set of prosopographical and onomastic databases, listing personal names of non-royal individuals in Trismegistos Texts (currently some 458,000 attestations).
It is very much a work in progress and needs to be perfected in many ways, as users will notice. For some of these we will develop, as said above, a system enabling you to help us.
In the meantime, I hope the tool proves useful and you enjoy its search facilities, limited as they currently are.

Please check out the website at http://www.trismegistos.org/ref/about.php.

For Trismegistos,

Mark Depauw

Friday, September 2, 2011

Updates to the DDbDP and PN (papyri.info)

Just posted to papylist:

Dear Colleagues:

I write with some updates concerning the DDbDP and PN.

First, we are most pleased to welcome Mark Depauw to the Editorial Boards of the DDbDP and HGV; and it is also our great pleasure to welcome W. Graham Claytor as Assistant Editor of the DDbDP. Both will be great assets to the team.

The latest release features a number of improvements to the PN search and display, many of which you will have seen already; these include line numbers in search returns and enhanced tabular display of search/browse hits. We look forward to releasing vastly improved and significantly redesigned search functionality in the coming weeks. You may also have noticed that we are starting to change the look of the search interface; we are in process of tightening the integration of the navigator and editor and bits of the new appearance are already showing.

Frequent users of SoSOL/Papyrological Editor will be happy to know that we now offer syntax for multiple alternate readings, e.g. <:στρατ[ηγὸς]||alt||στρατ[ηλάτης]|στρατ[ηγήσαντα]:>. The editor also offers enhanced display controls for introductory and line-by-line commentary: bold, italics, underline, footnotes, embedded links to PN (DDbDP/HGV/APIS) and any other website; in the next couple months we shall also support controlled linking to bibliography.

We have also very nearly finished entering CPR 25, CPR 30, P.Heid. 9, O. Stras. 2, and O.Abu Mina; the bugs that we encountered with P.Köln have now been fixed, so that we hope to move quickly to finish those volumes as well. We have also begun entering P.Count and continue (slowly) to enter SB 26. There is much to do, but since rolling out the new system we have entered well over 2000 texts. Joyous thanks to the many and good-spirited contributors who do this work without any pay and in service to our field.

I also want to alert the community to a development currently under way, whose fruits you will start to see in the DDbDP. It has long been DDbDP practice to place orthographically/grammatically normalized forms, and even outright *corrected* forms, in the text and the ancient reading in the apparatus, e.g. TEXT: ἀνωμολογήσαντο, APP: ανομολογησαντο. This is of course not in keeping with papyrological convention and we have heard (loud and clear) over the years that colleagues would like to see this practice ‘flipped’.

We have begun that process. This means putting accents, breathing marks, and capitalization (and in many many cases Leiden mark-up, including line-breaks) in the appropriate places in the original reading; on 70,000 unique strings (more than 90,000 cases). This is, as you might imagine, a big job and we can neither convert every single case nor verify autoptically every case that we can convert. This cannot be done by hand on any reasonable timeframe or budget. But thanks to generous assistance from the TLG and some excellent programming work done at the Department of Digital Humanities King’s College London, we now can convert over 75% of these cases to a fairly high degree of accuracy. For example, we can take the original reading as it currently stands, αροστοι, collate it against the regularized reading, ἄρρωστοι, and produce ἄροστοι, which we will soon display in the text (and “read ἄρρωστοι” in the app).

The process is driven by a large table of generalizable equivalencies; the script does not know rules for accenting non-standard forms; rather, it sees that an original form is γιτωνεις, sees also that the regularized form is γείτονες, and knows also that ει/ι, ω/ο, and ε/ει are often in exchange, and so it collates the two forms and moves the accent over from the ί to the εί. It works very well, but we know that we will be introducing some forms whose accents may/will be incorrect; intensive checking on a sample of 10,000 unique strings suggests that such will be quite few, but there will be mistakes. Expect some nouns to have incorrect accents where the regularization involved change of case; expect to see a few iota subscripts where they do not belong or omitted where they do). Some cases cannot be correctly converted automatically; where γυναικος is normalized to γυναῖκες the computer does not ‘know’ whether the normalization is phonetic (in which case → γυναῖκος) or morphological (in which case → γυναικὸς); some such cases may be converted where you would prefer them not to be and some left unconverted where you would prefer otherwise. There will also be a relatively small number of infelicitous conversions, e.g. αιλαεινα → αἰλάεϊνα (regularized: ἐλάϊνα). To avoid all of these errors would mean not (correctly) converting many many more clear cases. So, we tolerate some mess.

Forms that we cannot convert by this collation process, we will pass through the TLG morphological engine; this will catch many but not all of the remaining examples. Names, for example, are less than fully cotrolled by TLG, so that in many cases we can only partially convert a string. So, where we have corrected to Ἐσμῖνι from εσμινιος, we cannot simply move the accent over (→ Ἐσμῖνιος); and since our system does not know accentuation rules, we have no automatic way to generate the form ᾿Εσμίνιος; but we do know that the regularized form begins with a capital letter and smooth breathing, so that we can output ᾿Εσμινιος, which at least indicates the class of noun and is better than nothing!

Errors, infelicities, and partial conversions can of course be corrected via SoSOL by anyone who wishes! Some degree of mess is simply the cost of delivering the type of text and apparatus that the community wants. And we will clean up the mess over time and, I hope, together. In any case, know that we have not changed the *spelling* of such original readings--only the accentuation; so, in this crucial way the integrity of the data remains intact. And some of the successes are pretty spectacular, thanks to the hard work of the team: for instance, the original επειγιμαινον, which was normalized to ἐπικείμενον, is ‘correctly’ converted to ἐπειγίμαινον!

In the next month or two we should have apparatus criticus syntax that is both much better than the status quo and deploys the new handling of such regularizations.

As always, please feel free to contact Rodney Ast (ast@uni-heidelberg.de), James Cowey (james.cowey@urz.uni-heidelberg.de), and myself with questions, comments, complaints about the DDbDP, and Hugh Cayless (hugh.cayless@nyu.edu) with the same regarding PN functionality.

Sincerely,
Josh Sosin