Thursday, March 1, 2012

IDP Updates

This just sent to papylist:
= = =
Colleagues:

I write with news of recent IDP updates. Frequent users will have noticed already that we have just released significant enhancements to PN search capabilities. Thanks go to our colleagues Rodney Ast and James Cowey, and especially to our fantastic programmers Hugh Cayless and Tim Hill.

Some of the changes are intuitive, others less so. If you visit our search checklist :
https://docs.google.com/spreadsheet/ccc?key=0Ajkz6D9lOd20dGtwczJSUXVhWUx6ZEJCUEhvR1lIYVE#gid=1
and click on the SearchDocumentation tab at the bottom, you will see a more clearly emerging set of search instructions. This will be a useful page to bookmark.

Note that Boolean operators are now deployable with buttons. You will note also that sometimes when you click on a particular operator PN will automatically open another search box for you. For example if you search for πρωτ* THEN αρσιν* within 5 words, and then click ‘and’, PN will open a new box for you and insert the operator ‘AND’. Note also that PN greys out some operator buttons some of the time. IMPORTANT : both of these features are designed to help you avoid running searches that are impossible or so computationally intensive that they will cause disruptions for the wider community. You can type the operators into the search box and you can construct complicated searches in a single search box, but you run the very real risk of a timeout or fail. Please try to help us keep things running smoothly for everyone by using the buttons and not ‘hacking’ the search box!

We now support proximity searches in two forms:

(1) NEAR = word/string X within n words or characters of word/string Y in either direction
(2) THEN = word/string X followed by word/string Y, within n words or characters=

Note: NEAR and AND are not the same thing. X AND Y finds documents containing X and Y anywhere within them. A search for X NEAR Y must specify within n characters/words and will only find documents in which X and Y co-occur within the specified distance.

So, if you want to find documents containing a string beginning σπεν- followed immediately by a string -υχεσθ- then enter σπεν* THEN *χεσθ* within 1 word. Note that * indicates wildcards (the absence of a word boundary). You will find one hit, P.Lond. VII 2193.9: σ̣π̣έν̣δοντ̣ες εὐχέσθωισαν. Note that the highlighting and hit-quotation does not work on this search. This is a *known bug* and we are working to fix it.

Note that the PN treats strings and words very differently and this can affect the way you must run certain searches. Take the previous example. A search for σπεν THEN χεσθ within 10 characters will return one hit, P.Lond. VII 2193.9: σ̣π̣έν̣δοντ̣ες εὐχέσθωισαν (with the highlighting bug). However, a search for σπεν THEN χεσθ within 1 word WILL NOT WORK at all. This is because proximity searching looks by default for words not strings. Thus, if you are running a proximity search you must include * to indicate the part of the word that you are looking for. Again, when you constrain a proximity search by characters you are in effect running a substring search; but when you constrain a proximity search by words you are searching for words and so must use asterisks to indicate that you have provided only a part of the word. If you constrain the search by characters you do not need asterisks, because PN automatically treats the queried terms as strings (so πεν finds σπενδ).

Sometimes you can reach the same or similar results via different search strategies. For example, say you want to find documents containing the precise phrase "ομολογουμεν πεπρακεναι" and the precise phrase "του νυν". You can enter:

(1) enter "ομολογουμεν πεπρακεναι" AND "του νυν" in a single box
(2) enter "ομολογουμεν πεπρακεναι", click AND and in the second box enter "του νυν" after the AND, which PN will enter for you.
(3) enter "ομολογουμεν πεπρακεναι" then click search; then refine search with "του νυν"

Lexical searching can be combined with string searching. Let’s say you want to find documents containing any form of the word ἀνήκω followed by the string beginning υπηρετ- within 5 words. Enter LEX ἀνήκω NEAR υπηρετ* within 5 words (you will get 2 hits).

You can use the radio buttons to search Text, Metadata (=HGV and APIS), or Translations. But you can also restrict searches to either HGV or APIS. So a search for HGV:witnesses will return 5 hits; a search for APIS:witnesses will return 87 hits; a search for witnesses alone, with the ‘Metadata’ radio button selected will return 92 hits.

Phrases of more than 2 words are now searchable: enter "οἱ ἐκ τῆς"

For those of you who are comfortable with regular expressions, we now offer support for regex searches. If you want to find documents containing a string beginning αυτο- within 20 characters of a string beginning και-, but not within 20 words of a complete word αδριανου, enter: REGEX αυτο\b.{1,20}\bκαι\s(?!(\S+\s){1,20}αδριανου)

Our best advice is to play around with the new interface and its capabilities, try searches from the SearchDocumentation list. Everything that we have marked as working should work. If something appears not to, let us know. If a particularly important combination of searches does not appear on the list, let us know; we can add it. Let us know also whether that particular combination works or does not, so that we can test and confirm. If anything is especially weird or puzzling please be sure to give us as much information as possible (the precise steps you followed, the browser you are using, etc); even attach screenshots if you like. The more information we have the better we can address problems that you encounter.

Once you get used to the changes, we hope and think you will be as pleased with them as we are.

Best,
Josh Sosin

No comments:

Post a Comment