The WORDS program, with it's accompanying data files should run on any machine for which it is adapted, any monitor. Simply download the self-extracting EXE files or the compressed file for the appropriate system and execute/decompress it in your chosen subdirectory on the hard disk, creating the necessary files. Then call/run WORDS.
See the particular page for each specific system.
Intel PC Systems
DOS
Windows 95/NT/98
Linux
This program, (WORDS.EXE for the PC - DOS, Windows 95/98/NT or LINUX console versions), takes keyboard input or a file of Latin text lines and provides an analysis of each word individually. It uses an INFLECT.SEC, UNIQUES.LAT, ADDONS.LAT, STEMFILE.GEN, INDXFILE.GEN, and DICTFILE.GEN, and possibly .SPE and DICT.LOC.
The dictionary contains over 30000 entries, as would be counted
in an ordinary dictionary. This expands to almost twice that number
of individual stems (the count that the program may display at startup),
and, through additional word construction with hundreds of prefixes and
suffixes,
may generate more, leading to many hundreds of thousands of 'words' that
can be formed by declension and conjugation.
This version of WORDS provides a tool to help in translations for the Latin
student. It is now a large dictionary by any measure and can be helpful
to advanced users. The dictionary will continue to grow - slowly.
I am no expert in Latin, indeed my training is limited to a couple of years in high school 50 years ago. But I always felt that Latin, as presented after two millennia, was a scientific language. It had the interesting property of inflection, words were constructed in a logical manner. I admired this feature, but could never remember the vocabulary well enough when it came time to exercise it on tests.
I decided to automate an elementary-level Latin vocabulary list. As a first stage, I produced a computer program that will analyze a Latin word and give the various possible interpretations (case, person, gender, tense, mood, etc.), within the limitations of its dictionary. This might be the first step to a full parsing system, but, although just a development tool, it is useful by itself.
Please remember that this is only a computer exercise in automating a Latin dictionary. I am not a Latin scholar and anything in the program or documentation is filtered by me from reading Latin dictionaries. Please let no one go to his teacher and cite me as an authority.
While developing this initial implementation, based on different sources, I learned (or re-learned) something that I had overlooked at the beginning. Latin courses, and even very large Latin dictionaries, are put together under very strict ground rules. Some dictionary might be based exclusively on 'Classical' (200 BC - 200 AD) texts; it might have every word that appears in every surviving writing of Cicero, but nothing much before or since. Such a dictionary will be inadequate for translating medieval theological or scientific texts. In another example, one textbook might use Caesar as their main source of readings (my high school texts did), while another might avoid Caesar and all military writings (either for pacifist reasons, or just because the author had taught Caesar for 30 years and had grown bored with going over the same material, year after year). One can imagine that the selection of words in such different texts would differ considerably; moreover, even with the same words, the meanings attached would be different. This presents a problem in the development of a dictionary for general use.
One could produce a separate dictionary for each era and application or a universal dictionary with tags to indicate the appropriate application and meaning for each word. With such a tag arrangement one would not be offered inappropriate or improbable interpretations. The present system has such a mechanism, but it is not yet exploited.
The Version 1.97 dictionary may be found to be of fairly general use for the student; it has the easy words that every text uses. It also has a goodly number of adverbs, prepositions, and conjunctions, which are not as sensitive to application as are the nouns and verbs. The system also tests a few hundred prefixes and suffixes, if the raw word cannot be found. This allows an interpretation of many words which would otherwise be marked unknown. The result of this analysis is fairly straightforward in most cases, accurate but esoteric in some others. Some constructions are recognized Latin words, and some are perfectly reasonable words which may never have been used by Cicero or Caesar but might have been used by Augustine or a monk of Jarrow. For about 1 in 10 constructed words the result has no relation to the normal dictionary meaning.
BE WARNED! The program will go to great lengths if all tricks are invoked. If you get a word formed with an enclitic, prefix, suffix, and syncope, be very suspicious! It could be right, but do not bet on it. (Try siquempiamque!)
With this facility, and a 30000 word dictionary, trials on some tested classical texts and the Vulgate Bible give hit rates of far better than 99%, excluding proper names (there are very few proper names in this dictionary). (I am an old soldier and seem to have in the dictionary every possible word for attack or destroy. The system is near perfect for Caesar.) The question arises, what hit rate can be expected for a general dictionary. Classical Latin dictionaries have no references to the terminology of Christian theology. The legal documents and deeds of the Middle Ages are a challenge of jargon and abbreviations. These areas require special knowledge and vocabulary, but even there the ability to handle the non-specialized words is a large part of the effort.
The development system allows the inclusion of specialized vocabulary (for instance a SPEcial dictionary for specialized words not wanted in most dictionaries), and the opportunity for the user to add additional words to a DICT.LOC.
It was initially expected that there would be special dictionaries for special applications. That is why there is the possibility of a SPECIAL dictionary. Now the general dictionary is coded by AGE and application AREA. Thus special words used initially/only by St Thomas Aquinas would be Medieval (AGE code F) and Ecclesiastical (AREA code E). Eventually there needs to be a filter that allows one, upon setting parameters for Medieval and Ecclesiastical, to push those words over others. Right now there are not have enough non-classical vocabulary to support such a scheme. The problem is that one needs a really complete classical dictionary before one can assure that new entries are uniquely Medieval, that they are not just classical words that appear in a Medieval text. And the updated is only into the D's. So the situation is that the mechanism is there, but not sufficient data. Nevertheless that is exactly the application I had in mind when I set out to do the program.
The program is probably much larger than is necessary for the present application. It is still in development but some effort has now been put into optimization. Nevertheless there is lots of room for speeding it up.
This is a free Shareware program, which means it is proper to copy it and pass it on to your friends. Consider it a developmental item for which there is no charge. However, it is Copyrighted (c), so please don't sell it as your own without at least telling me.
This version is distributed without obligation, but the developer would appreciate comments and suggestions.
William A Whitaker
PO Box 3036
McLean VA 22103-3036
USA
whitaker@erols.com
This write up is rudimentary and assumes that the user is experienced with computers.
The WORDS program, Version 1.97, with it's accompanying data files should run on PC in DOS/Windows 95/98/NT, any monitor. Simply download the self-extracting EXE file and execute it in your chosen subdirectory to UNZIP the files into a subdirectory of a hard disk. Then call WORDS.
There are a number of files associated with the program. These must be in the subdirectory of the program, and the program must be run from that subdirectory.
All these files are necessary to run the program (except the optional dictionaries SPE and LOC). This excess of files is a consequence of the present developmental nature of the program. The files are very simple, almost human-readable. Presumably, a later version could condense and encode them. Nevertheless, beyond the original COPY, the user need not worry about them.
Additionally, there are files that the program may produce on request. All of these share the name WORD, with various extensions, and they are all ASCII text files which can be viewed and processed with an ordinary editor. The casual user probably does not want to get involved with these. WORD.OUT will record the whole output, WORD.UNK will list only words the program is unable to interpret. These outputs are turned on through the PARAMETERS mechanism.
PARAMETERS may be set while running the program by inputting a line containing a '#' mark as the only (or first) character. Alternatively, WORD.MOD contains the MODES that can be set by CHANGE_PARAMETERS. If this file does not exist, default modes will be used. The file may be produced or changed when changing parameters. It can also be modified, if the user is sufficiently confident, with an editor, or deleted, thereby reverting to defaults.
(There is another set of developers parameters which may be set in some versions with the input of '!'. These MODES may be changed and saved in a file WORD.MDV. These are not normal user facilities, probably no one but the developer would be interested. In any specific release these facilities may, or may not, work. They are just mentioned here in case they ever come up accidentally, and to point out that there are other capabilities, actual and possible, which may be invoked if there is a special need.)
WORD.OUT is the file produced if the user requests, in CHANGE_PARAMETERS, output to a file. This output can be used for later manipulation with a text editor, especially when the input was a text file of some length. If the parameter UNKNOWNS_ONLY is set, the output serves as a sort of a Latin spell checker. Those words it cannot match may just not be in the dictionary, but alternatively they may be typos. A WORD.UNK file of unknowns can be generated.
To start the program, in the subdirectory that contains all the files, type WORDS. A setup procedure will execute, processing files. Then the program will ask for a word to be keyed in. Input the word and give a return (ENTER). Information about the word will be displayed.
One can input a whole line at a time, but only one line since the return at the end of line will start the processing. If the results would fill more than a computer screen, the output is halted until the user responds to the 'MORE' message with a return. A file containing a text, a series of lines, can be input by keying in the character '@', followed (with no spaces) by the DOS name of the file of text. This input file need not be in the program subdirectory, just use the full DOS path and name of the file. This is usually accompanied with the setting of the parameter switches to create and write to an output file, WORD.OUT.
One can have a comment in the file, a terminal portion of a line that is not parsed. This could be an English meaning, a source where the word was found, an indication that it may have been miscopied, etc. A comment begins with a double dash [--] and continues to the end of the line. The '--' and everything after on that line is ignored by the program.
A '#' character input will permit the user to set modes to prevent the process from trying prefixes and suffixes to get a match on an item unknown to the dictionary, put output to a file, etc. Going into the CHANGE_PARAMETERS, the '?' character calls help for each entry.
Two successive returns with no text will terminate the program (except in text being read from an @ disk file.)
One can also call WORDS with the input on the command line
WORDS amo amas amatwhich will cause it to execute for that input and then terminate. This is for a quick word or two.
Another mode of operation is to provide an input and an output file.
WORDS INFILE OUTFILEwith names of your choice (full path names if not operating all in the same subdirectory). The program will read as input the INFILE and write the output to the OUTFILE (as though it were WORD.OUT). It will then await further input from the user. It terminates with a return. If the parameters are not legal file names, the program will assume they are Latin words to be processed as command line input.
Following are annotated examples of output. Examination of these will give a good idea of the system. The present version may not match these examples exactly - things are changing - but the principle is there. A recent modification is the output of dictionary forms or 'principal parts' (shown below for some examples).
=>agricolarum agricol.arum N 1 1 GEN P M P agricola, agricolae farmer
This is a simple first declension noun, and a unique interpretation. The '1 1' means it is first declension, with variant 1. This is an internal coding of the program, and may not correspond exactly with the grammatical numbering. The 'N' means it is a noun. It is the form for genitive (GEN), plural (1st 'P'). The stem is masculine (M) and represents a person (2nd 'P'). The stem is given as 'agricol' and the ending is 'arum'. The stem is normal in this case, but is a product of the program, and may not always correspond to conventional usage.
=>feminae femin.ae N 1 1 GEN S F P femin.ae N 1 1 DAT S F P femin.ae N 1 1 NOM P F P femin.ae N 1 1 VOC P F P femina, feminae woman
This word has several possible interpretations in case and number (Singular and Plural). The gender is Feminine. Presumably, the user can examine the adjoining words and reduce the set of possibilities. Maybe the program will take care of this in some future version.
=>cornu corn.u N 4 2 NOM S N T corn.u N 4 2 DAT S N T corn.u N 4 2 ACC S N T corn.u N 4 2 ABL S N T cornu, cornus horn (of an animal); horn, trumpet; wing of an attacking army
Here is an example of another declension and a second variant. The Masculine (-us) nouns of the declension (fructus) are '4 1' and the Neuter (-u) nouns are coded as '4 2'. This word is neuter (2nd N) and represents a thing (T).
=>ego ego PRON 5 1 NOM S C PERS I, me; myself
A pronoun is much like a noun. The gender is common (C), that is, it may be masculine or feminine. It is a personal (PERS) pronoun.
=>illud ill.ud PRON 6 1 NOM S N ADJECT ill.ud PRON 6 1 ACC S N ADJECT that; those (pl.); also DEMONST
Here we have an adjectival (ADJECT) and demonstrative (DEMONST) pronoun.
=>hic hic ADV POS here, in this place h.ic PRON 3 1 NOM S M ADJECT this; these (pl.); also DEMONST
In this case there is a adjectival/demonstrative pronoun, or it may be an adverb. The POS means that the comparison of the adverb is positive.
=>bonum bon.um N 2 2 NOM S N T bon.um N 2 2 ACC S N T good thing, profit, advantage; goods (pl.), possessions bon.um ADJ 1 1 NOM S N POS bon.um ADJ 1 1 ACC S M POS bon.um ADJ 1 1 ACC S N POS bon.um ADJ 1 1 VOC S N POS good, honest, brave, noble; better; best
Here we have an adjective, but it might also be a noun. The interpretation of the adjective says that it is POSitive, but note that there are meanings for COMParative and SUPERlative also on the line. Check the comparison value before deciding.
=>facile facile ADV POS easily, readily facil.e ADJ 3 2 NOM S N POS facil.e ADJ 3 2 ACC S N POS facil.e ADJ 3 2 VOC S N POS easy, easy to do, without difficulty, ready, quick, good natured, courteous
Here is an adjective or and adverb. Although they are related in meaning, they are different words.
=>acerrimus acerrim.us ADJ 3 2 NOM S M SUPER sharp, bitter, pointed, piercing, shrill; sagacious, keen; severe, vigoro
Here we have an adjective in the SUPERlative. The meanings are all POSitive and the user must add the -est by himself.
=>optime optim.e ADJ 1 1 VOC S M SUPER good, honest, brave, noble; better; best optime ADV SUPER well, very, quite, rightly, agreeably, cheaply, in good, style; better; bestHere is an adjective or and adverb, both are SUPERlative.
=>monuissemus monu.issemus V 2 1 PLUP ACTIVE SUB 1 P X remind, advise, warn; teach; admonish; foretell
Here is a verb for which the form is PLUPerfect, ACTIVE, SUBjunctive, 1st person, Plural. It is 2nd conjugation, variant 1.
=>amat am.at V 1 1 PRES ACTIVE IND 3 S X amo, amare, amavi, amatus love, like; fall in love with; be fond of; have a tendency to
Another regular verb, PRESent, ACTIVE, INDicative.
=>amatus amat.us VPAR 1 1 NOM S M PERF PASSIVE PPL X amo, amare, amavi, amatus love, like; fall in love with; be fond of; have a tendency to
Here we have the PERFect, PASSIVE ParticiPLe, in the NOMinative, Singular, Masculine.
=>amatu amat.u SUPINE 1 1 ABL S X amo, amare, amavi, amatus love, like; fall in love with; be fond of; have a tendency to
Here is the SUPINE of the verb in the ABLative Singular.
=>orietur ori.etur V 3 4 FUT PASSIVE IND 3 S DEP rise, arise; spring from, appear; be descended; begin, proceed, originate
For DEPondent verbs the passive form is to be translated as if it were active voice.
=>ab ab PREP ABL by, from, away from
Here is a PREPosition that takes an ABLative object.
=>sine sin.e V 3 1 PRES ACTIVE IMP 2 S X allow, permit sine PREP ABL without
Here is a PREPosition that might also be a Verb.
=>contra contra PREP ACC against, opposite; facing; contrary to, in reply to contra ADV POS in opposition, in turn; opposite, on the contrary
Here is a PREPosition that might also be an ADVerb. This is a very common situation, with the meanings being much the same.
=>et et CONJ and, and even; also, even; (et ... et = both ... and)
Here is a straight CONJunction.
=>vae vae INTERJ alas, woe, ah; oh dear; (Vae, puto deus fio.)
Here is a straight INTERJection.
=>septem septem NUM 2 0 X X X CARD 7 seven
An additional provision is the attempt to recognize and display the value of Roman numerals, even combinations of appropriate letters that do not parse conventionally to a value but may be ill-formed Roman numerals.
=>VII vii NUM 2 0 X X X CARD 7 7 as a ROMAN NUMERAL
Generally, the meaning is given for the base word, as is usual for dictionaries. For the verb, it will be a present meaning, even when the tense given is perfect. For an adjective, the positive meaning is given, even if a comparative or superlative form is shown. This is also so when a word is constructed with a suffix, thus an adverb constructed from its adjective will show the base adjective meaning and an indication of how to make the adverb in English. The user must make the proper interpretation.
In some cases an adjective will
be found that is a participle of a verb that is also found. The
participle meaning, as inferred by the user from the verb meaning, is
not superseded by the explicit adjective entry, but supplemented by it with
possible specialized meanings.
Signs and Abbreviations in Meaning
, [comma] is used to separate meanings that are similar. The philosophy
has been to list a number of synonyms just to key the reader in making
his translation. There is no rigor in this.
; [semicolon] is used to separate sets of meanings that differ in intent.
This is just a general tendency and is not rigorously enforced.
/ [solidus] means 'or' or gives an alternative word. It sometimes
replaces the comma and is often used to compress the meaning into a short line.
(...) [parentheses] set off and optional word or modifier, e.g.,
'(nearly) white' means 'white' or 'nearly white', (matter in) dispute means
either the matter in dispute or the dispute itself. They are also used to
set off an explanation, further information about the word or meaning,
or an example of a translation or a word combination.
? [question mark] in a meaning implies a doubt about the
interpretation, or even about the existence of the word at all. For the
purposes of this program, it does not matter much. If the dubious word does
not exist, no one will ask for it. If it appears in his text, the reader is
warned that the interpretation may be questionable to some degree,
but is what is available. May indicate somewhat more doubt than (perh.).
~ [tilde] stands for the stem or word in question. Usually it
does not have an ending affixed, as is the convention in other dictionaries,
but represents the word with whatever ending is proper. It is just
a space saving shorthand or abbreviation.
=> in meaning this indicates a translation example.
abb. abbreviation.
(Dif) - [Diferrari] is used to indicate an additional meaning
taken from A Latin-English Dictionary of
St. Thomas Aquinas by Roy J. Diferrari.
This is singled out because of the importance of Aquinas.
The reference is to be applied from the last semicolon before the mark.
It is likely that the meaning diverges from the base by being medieval
and ecclesiastical, but not so overwhelming as to deserve a separate entry.
(Douay) is used to designate those words for which the meaning
has been derived or modified by examination of the Douay translation
of the Latin Vulgate Bible of St Jerome.
(eccl.) ecclesiastical - designating a special church meaning in
a list of conventional meanings, an additional meaning not sufficient
to justify a separate entry with an ecclesiastical code.
esp. [especially] - indicates a significant association,
but is only advisory.
(King James) or (KJames) is used to designate those words for
which the meaning has been derived or modified by examination of the
King James Bible in connection with the Latin Vulgate Bible of St Jerome.
(KLUDGE) This indicates that the particular form is distorted
in order to make it come out correctly. This usually takes the form
of a special conjugational form applied to a few words, not applicable
to other words of the same conjugation or declension. The user can
expect the form and meaning to be correct, but the numerical coding
will be odd.
(L+S) [Lewis and Short] is used to indicate that the meaning
starting from the previous semicolon is information from Lewis and Short
'A Latin Dictionary' that differs from, or significantly expands on, the
meaning in the 'Oxford Latin Dictionary' (OLD) which is the baseline for
this program. This is not to imply that the meaning listed is otherwise
taken directly from the OLD, just that it is not inconsistent with OLD,
but the L+S information either inconsistent (likely OLD knows better)
or Lewis and Short has included meanings appropriate for late Latin
writers beyond the scope of OLD. The program is
just warning the reader that there may be some difference.
There are cases in which this indication occurs in entries that have
Lewis and Short as the source. In those cases, the basic word is in
OLD but the entry is a variant form or spelling not cited there.
There are cases where OLD and L+S give somewhat different spellings
and meanings for the 'same' word (same in the sense that both
dictionaries point to the same citation). In these cases a combination
of meanings are given for both entries with the (L+S) code distinction
and the entries of different spelling or declension have the SOURCE coded.
(OLD) [Oxford Latin Dictionary] is used to indicate an additional
meaning taken from the Oxford Latin Dictionary in an entry that is
otherwise attributed.
While it is usually true that if a classical word has other than OLD
as the listed source then it does not appear in that form in OLD,
this is not always the case. On occasion some other dictionary gives a
much better or more complete and understandable definition and the
honor of source is thereto given.
(PASS) [passive] - indicates a special, unexpected meaning for the
passive form of the verb, not easily associated with the active meaning.
perh. [perhaps] - denotes an additional uncertainty,
but not as strong as (?).
(pl.) [plural] means that the Latin word is believed by scholars to be
used (almost) always in the plural form, with the meaning stated, even though
that meaning in English may be singular.
If it appears in the beginning of the meaning,
before the first comma, it applies to all the meanings. If it
appears later, it applies only to that and later meanings.
For the purpose of this program, this is only advisory.
While it is used by some tools to find the expected dictionary entry,
the program does not exclude a singular form in the output. While
it may be true that in good, classical Latin it is never used in
the singular, this does not mean that some text somewhere might not
use the singular, nor that it is uncommon in later Latin.
prob. [probably] - denotes an some uncertainty,
but not as much as (perh.).
pure Latin ... indicates a pure Latin term for a word which
is derived from another language (almost certainly Greek).
(rude) - indicates that this meaning was used in a rude, vulgar,
coarse, or obscene manner, not what one should hear in polite company.
Such use is likely from graffiti or epigrams, or
in plays in which the dialogue is to indicate that the characters are
low or crude. Meanings given by the program for these words are more
polite, and the user is invited to substitute the current street language
or obscenity of his choice to get the flavor of text.
(sg.) [singular] means that the Latin word is believed by scholars to be used
always in the singular. If it appears in the beginning of the meaning,
before the first comma, it applies to all the meanings. If it
appears later, it applies only to that and later meanings.
For the purpose of this program, this is only advisory.
usu. [usually] is weakly advisory. (usu. pl.) is even weaker than pl.
and may imply that the pl. tendency occurred only during certain periods.
w/ means 'with'.
A effect of the program is to derive the structure and meaning of individual Latin words. A procedure was devised to:
With the input of a word, or several words in a line, the program returns information about the possible accedience, if it can find an agreeable stem in its dictionary.
=>amo am.o V 1 1 PRES ACTIVE IND 1 S X love, like; fall in love with; be fond of; have a tendency to
To support this method, an INFLECT.SEC data file was constructed containing possible Latin endings encoded by a structure that identifies the part of speech, declension, conjugation, gender, person, number, etc. This is a pure computer encoding for a 'brute force' search. No sophisticated knowledge of Latin is used at this point. Rules of thumb (e.g., the fact, always noted early in any Latin course, that a neuter noun has the same ending in the nominative and accusative, with a final -a in the plural) are not used in the search. However, it is convenient to combine several identical endings with a general encoding (e.g., the endings of the perfect tenses are the same for all verbs, and are so encoded, not replicated for every conjugation and variant).
Many of the distinguishing differences identifying conjugations come from the voiced length of stem vowels (e.g., between the present, imperfect and future tenses of a third conjugation I-stem verb and a fourth conjugation verb). These aural differences, the features that make Latin 'sound right' to one who speaks it, are lost entirely in the analysis of written endings.
The endings for the verb conjugations are the result of trying to minimize the number of individual endings records, while yet keeping the structure of the inflections data file fairly readable. There is no claim that the resulting arrangement is consonant with any grammarian's view of Latin, nor should it be examined from that viewpoint. While it started from the conjugations in text books, it can only be viewed as some fuzzy intermediate step along a path to a mathematically minimal number of encoded verb endings. Later versions of the program might improve the system.
There are some egregious liberties taken in the encoding. With the inclusion of two present stems, the third conjugation I-stem verbs may share the endings of the regular third conjugation. The fourth conjugation has disappeared altogether, and is represented as a somewhat modified variant of the third conjugation (3, 4)! There is an artificial fifth conjugation for esse and others, and a sixth for eo.
As an example, a verb ending record has the structure:
Thus, the entry for the ending appropriate to 'amo' is:
V 1 1 PRES IND ACTIVE 1 S X 1 o
KIND is not often used with the verb endings, but is part of the record for convenience elsewhere. For verbs, the KIND has not yet been exploited significantly, except for DEP and IMPERS.
The rest of the elements are straightforward and generally use the abbreviations that are common in any Latin text. An X or 0 represents the 'don't know' or 'don't care' for enumeration or numeric types. Details are documented below in the CODES section.
A verb dictionary record has the structure:
Thus, an entry corresponding to 'amo amare amavi amatus' is:
am am amav amat V 1 1 X X X X X X like, love
(The dangling X X X X X are used to encode information about the time in which this word is found and the subject area. There is not yet enough details in the dictionary to allow much exploitation of this information.)
Endings may not uniquely determine which stem, and therefore the right meaning. 'portas' could be the ablative plural of 'gate', or the second person, singular, present indicative active of 'carry'. In both cases the stem is 'port'. All possibilities are reported.
portas port.as V 1 1 PRES IND ACTIVE 2 S X carry, bring port.as N 1 1 ACC P F T gate, entrance; city gates; door; avenue;
And note that the same stem (port) has other uses, for 'portus', 'harbor'.
portum port.um N 4 1 ACC S M T port, harbor; refuge, haven, place of refuge
PLEASE NOTE: It is certainly possible for the program to find a valid Latin construction that fits the input word and to have that interpretation be entirely wrong in the context. It is even possible to interpret a number, in Roman numerals, as a word! (But the number would be reported also.)
For the case of defective verbs, the process does not necessarily have to be precise. Since the purpose is only to translate from Latin, even if there are unused forms included in the algorithm, these will not come up in any real Latin text. The endings for the verb conjugations are the result of trying to minimize the number of individual endings records, while keeping the structure of the base INFLECTIONS data file fairly readable.
In general the program will try to construct a match with the inflections and the dictionaries. There are a number of specific checks to reject certain mathematically correct combinations that do not appear in the language, but these check are relatively few. The philosophy has been to allow a generous interpretation. A remark in a text or dictionary that a particular form does not exist must be tempered with the realization that the author probably means that it has not been observed in the surviving classical literature. This body of reference is minuscule compared to the total use of Latin, even limited to the classical period. Who is to say that further examples would not turn up such an example, even if it might not have been approved of by Cicero. It is also possible the such reasonable, if 'improper', constructs might occur in later writings by less educated, or just different, authors. Certainly English shows this sort of variation over time.
If the exact stem is not found in the dictionary, there are rules for the construction of words which any student would try. The simplest situation is a known stem to which a prefix or suffix has been attached. The method used by the program (if DO_FIXES is on) is to try any fixes that fit, to see if their removal results in an identifiable remainder. Then the meaning is mechanically constructed from the meaning of the fix and the stem. The user may need to interpret with a more conventional English usage. This technique improves the performance significantly. However, in about 40% of the instances in which there is a hit, the derivation is correct but the interpretation takes some imagination. In something less than 10% of the cases, the inferred fix is just wrong, so the user must take some care to see if the interpretation makes any sense.
This method is complicated by the tendency for prefixes to be modified upon attachment (ab+fero => aufero, sub+fero => suffero). The program's 'tricks' take many such instances into account. Ideally, one should look inside the stem for identifiable fragments. One would like to start with the smallest possible stem, and that is most frequently the correct one. While it is mathematically possible that the stem of 'actorum' is 'actor' with the common inflection 'um', no intuitive first semester Latin student would fail to opt for the genitive plural 'orum', and probably be right. To first order, the procedure ignores such hints and reports this word in both forms, as well as a verb participle. However, it can use certain generally applicable rules, like the superlative characteristic 'issim', to further guess.
In addition, there is the capability to examine the word for such common techniques as syncope, the omission of the 've' or 'vi' in certain verb perfect forms (audivissem => audissem).
If the dictionary can not identify a matching stem, it may
be possible to derive a stem from 'nearby' stems (an adverb
from an adjective is the most common example) and infer a
meaning. If all else fails, a portion of the possible dictionary
stem can be listed, from which the user can
draw in making a guess.
Trimming of uncommon results
Trimming now means someting. If TRIM_OUTPUT parameter is set, and specific parameters set in the MDEV, the program will disparage those possible forms which come from archaic or medieval (non-classical) stems or inflections, also stems or inflections which are relatively uncommon. It will report such if no classical/common solutions are found. The default is set for this, expecting that most users are students and unlikely to encounter rare forms. Other users can set the parameters appropriately for their situation.
This capability is preliminary. It
is just becoming useful in that the factors are set
for about half the dictionary entries. There are still a large number
of entries and inflections that are not set and will continue to
be reported until determination of rarity is made.
Special Cases
Some adjectives have no conventional positive forms (either missing
or undeclined), or the POS forms have more than one COMP/SUPER.
In these few cases, the individual COMP or SUPER form is entered
separately. Since it is not directly connected with a POS form,
and only the POS forms have different numbered declensions, the
special form is given a declension of (0, 0). An additional consequence
is that the dictionary form in output is only for the COMP/SUPER,
and does not reflect all comparisons.
Uniques
There are some irregular situations which are not convenient to handle
through the general algorithms. For these a UNIQUES file and procedure
was established. The number of these special cases is less than one
hundred, but may increase as new situations arise, and decrease as algorithms
provide better coverage. The user will not see much difference, except in
that no dictionary forms are available for these unique words.
Tricks
There are a number of situations in Latin writing where certain modifications or conventions regularly are found. While often found, these are not the normal classical forms. If a conventional match is not found, the program may be instructed to TRY_TRICKS. Below is a partial list of current tricks.
Various manipulations of 'u' and 'v' are possible: 'v' could be replaced by 'u', like the new Oxford Latin Dictionary, leading 'U' could be replaced by 'V', checking capitalization, all 'U's could have been replaced by 'V', like stone cutting. Previous versions had various kludges attempting to calculate the correct interpretation. They were surprisingly good, but philosophically baseless and certainly failed in a number of cases. The present version simply considers 'u' and 'v' as the same letter in parsing the word. However, the dictionary entries make the distinction and this is reflected in the output.
Various combinations of these tricks are attempted, and each try that results in a possible hit is run against the full dictionary, which can make these efforts time consuming. That is a good reason to make the dictionary as large as possible, rather than counting on a smaller number of roots and doing the maximum word formation.
Finally, while the program can succeed on a word that requires two or three of these tricks to work in combination, there are limits. Some words for which all the modifications are supported will fail, if there are just too many. In fact, it is probably better that that be the case, otherwise one will generate too many false positives. Testing so far does not seem to show excessive zeal on the part of the program, but the user should examine the results, especially when several tricks are involved.
There is a basic conflict here. At the state of the 1.97 dictionary
there are so few words that both fail the main program and are caught by
tricks that this option has been defaulted to No. However, one could
argue that there will be very few occasions for trying TRICKS, so that
the cost is minimal. Unfortunately the degree of completeness of the
dictionary for classical latin does not carry over to medieval Latin.
With the hope that the program will become more useful in that area,
the default has been changed back to Yes, reflecting the philosophy
early in the development for classical Latin.
Codes in Inflection Line
For completeness, the enumeration codes used in the output are listed here as Ada statements. Simple numbers are used for person, declension, conjugations, and their variants. Not all the facilities implied by these values are developed or used in the program or the dictionary. This list is only for Version 1.97. Other versions may be somewhat different. This may make their dictionaries incompatible with the present program.
type PART_OF_SPEECH_TYPE is ( X, -- all, none, or unknown N, -- Noun PRON, -- PRONoun PACK, -- PACKON -- artificial for code ADJ, -- ADJective NUM, -- NUMeral ADV, -- ADVerb V, -- Verb VPAR, -- Verb PARticiple SUPINE, -- SUPINE PREP, -- PREPosition CONJ, -- CONJunction INTERJ, -- INTERJection TACKON, -- TACKON -- artificial for code PREFIX, -- PREFIX -- here artificial for code SUFFIX -- SUFFIX -- here artificial for code ); type GENDER_TYPE is ( X, -- all, none, or unknown M, -- Masculine F, -- Feminine N, -- Neuter C -- Common (masculine and/or feminine) ); type CASE_TYPE is ( X, -- all, none, or unknown NOM, -- NOMinative VOC, -- VOCative GEN, -- GENitive LOC, -- LOCative DAT, -- DATive ABL, -- ABLative ACC -- ACCusitive ); type NUMBER_TYPE is ( X, -- all, none, or unknown S, -- Singular P -- Plural ); type COMPARISON_TYPE is ( X, -- all, none, or unknown POS, -- POSitive COMP, -- COMParative SUPER -- SUPERlative ); type TENSE_TYPE is ( X, -- all, none, or unknown PRES, -- PRESent IMPF, -- IMPerFect FUT, -- FUTure PERF, -- PERFect PLUP, -- PLUPerfect FUTP -- FUTure Perfect ); type VOICE_TYPE is ( X, -- all, none, or unknown ACTIVE, -- ACTIVE PASSIVE -- PASSIVE ); type MOOD_TYPE is ( X, -- all, none, or unknown IND, -- INDicative SUB, -- SUBjunctive IMP, -- IMPerative INF, -- INFinative PPL -- ParticiPLe ); type NOUN_KIND_TYPE is ( X, -- unknown, nondescript S, -- Singular 'only' M, -- plural or Multiple 'only' A, -- Abstract idea N, -- proper Name L, -- Locale, name of country/city P, -- a Person T, -- a Thing W -- a place Where ); type PRONOUN_KIND_TYPE is ( X, -- unknown, nondescript PERS, -- PERSonal REL, -- RELative REFLEX, -- REFLEXive DEMONS, -- DEMONStrative INTERR, -- INTERRogative INDEF, -- INDEFinite ADJECT -- ADJECTival ); type VERB_KIND_TYPE is ( X, -- all, none, or unknown TO_BE, -- only the verb TO BE (esse) TO_BEING, -- compounds of the verb to be (esse) GEN, -- verb taking the GENitive DAT, -- verb taking the DATive ABL, -- verb taking the ABLative TRANS, -- TRANSitive verb INTRANS, -- INTRANSitive verb IMPERS, -- IMPERSonal verb (implied subject 'it', 'they', 'God') -- agent implied in action, subject in predicate DEP, -- DEPonent verb -- only passive form but with active meaning SEMIDEP, -- SEMIDEPonent verb (forms perfect as deponent) -- (perfect passive has active force) PERFDEF -- PERFect DEFinite verb -- having only perfect stem, but with present force ); type NUMERAL_KIND_TYPE is ( X, -- all, none, or unknown CARD, -- CARDinal ORD, -- ORDinal DIST, -- DISTributive ADVERB -- numeral ADVERB );
The KIND_TYPEs represent various aspects of a word which may be useful to some program, not necessarily the present one. They were put in for various reasons, and later versions may change the selection and use. Some of the KIND flags are never used. In some cases more than one KIND flag might be appropriate, but only one is selected. Some seemed to be a good idea at one time, but have not since proved out. The lists above are just for completeness.
NOUN KIND is used in trimming (when set) the output and removing possibly spurious cases (locative for a person, but preserving the vocative).
VERB KIND allows examples (when set) to give a more reasonable meaning. A DEP flag allows the example to reflect active meaning for passive form. It also allows the dictionary form to be constructed properly from stems. TRANS/INTRANS were included to allow a further program a hint as to what kind of object it should expect. This flag is only now being fixed during the update. There are some verbs which, although mostly used in one way, might be either. These are assigned X rather than breaking into two entries. This would be of no particular use at this point since it would not allow the object to be determined. GEN/DAT/ABL flags have related function, but are almost absent. TO_BE is used to indicate that a form of esse may be part of a compound verb tense with a participle. TO_BEING indicates a verb related to esse (e.g., abesse) which has no object, neither is in used to form compounds. IMPERS is used to weed out person and forms inappropriate to an impersonal verb, and to insert a special meaning distinct from a general form associated with the same verb stem.
NUMERAL KIND really is used by the program in constructing the meaning line.
Help for Parameters
One can CHANGE_PARAMETERS by inputting a '#' [number sign] character (ANSI 35) as the input word, followed by a return. (Note that this has changed from previous versions in which '?' was used.) Each parameter is listed and the user is offered the opportunity to change it from the current value by answering Y or N (any case). For each parameter there is some explanation or help. This is displayed by in putting a '?' [question mark], followed by a return. HINT: While going down the list if one has made all the changes desired, one need not continue to the end. Just enter a space and then give a return. The program will interpret this as an illegal entry (not Y or N) and will cancel the rest of the list, while retaining any changes made to that point.
The various help displays are listed here:
HAVE_OUTPUT_FILE_HELP : This option instructs the program to create a file which can hold the output for later study, otherwise the results are just displayed on the screen. The output file is named WORD.OUT This means that one run will necessarily overwrite a previous run, unless the previous results are renamed or copied to a file of another name. This is available if the METHOD is INTERACTIVE, no parameters. The default is N(o), since this prevents the program from overwriting previous work unintentionally. Y(es) creates the output file. WRITE_OUTPUT_TO_FILE_HELP : This option instructs the program, when HAVE_OUTPUT_FILE is on, to write results to the file WORD.OUT This option may be turned on and off during running of the program, thereby capturing only certain desired results. If the option HAVE_OUTPUT_FILE is off, the user will not be given a chance to turn this one on. Only for INTERACTIVE running. Default is N(o). DO_UNKNOWNS_ONLY_HELP : This option instructs the program to only output those words that it cannot resolve. Of course, it has to do processing on all words, but those that are found (with prefix/suffix, if that option in on) will be ignored. The purpose of this option is o allow a quick look to determine if the dictionary and process is going to do an acceptable job on the current text. It also allows the user to assemble a list of unknown words to look up manually, and perhaps augment the system dictionary. For those purposes, the system is usually run with the MINIMIZE_OUTPUT option, just producing a list. Another use is to run without MINIMIZE to an output file. This gives a list of the input text with the unknown words, by line. This functions as a spelling checker for Latin. The default is N(o). WRITE_UNKNOWNS_TO_FILE_HELP : This option instructs the program to write all unresolved words to a UNKNOWNS file named WORD.UNK With this option on , the file of unknowns is written, even though the main output contains both known and unknown (unresolved) words. One may wish to save the unknowns for later analysis, testing, or to form the basis for dictionary additions. When this option is turned on, the UNKNOWNS file is written, destroying any file from a previous run. However, the write may be turned on and off during a single run without destroying the information written in that run. This option is for specialized use, so its default is N(o). IGNORE_UNKNOWN_NAMES_HELP : This option instructs the program to assume that any capitalized word longer than three letters is a proper name. As no dictionary can be expected to account for many proper names, many such occur that would be called UNKNOWN. This contaminates the output in most cases, and it is often convenient to ignore these sperious UNKNOWN hits. This option implements that mode, and calls such words proper names. Any proper names that are in the dictionary are handled in the normal manner. The default is Y(es). IGNORE_UNKNOWN_CAPS_HELP : This option instructs the program to assume that any all caps word is a proper name or similar designation. This convention is often used to designate speakers in a discussion or play. No dictionary can claim to be exaustive on proper names, so many such occur that would be called UNKNOWN. This contaminates the output in most cases, and it is often convenient to ignore these sperious UNKNOWN hits. This option implements that mode, and calls such words names. Any similar designations that are in the dictionary are handled in the normal manner, as are normal words in all caps. The default is Y(es). DO_COMPOUNDS_HELP : This option instructs the program to look ahead for the verb TO_BE (or iri) when it finds a verb participle, with the expectation of finding a compound perfect tense or periphastic. This option can also be a trimming of the output, in that VPAR that do not fit (not NOM) will be excluded, possible interpretations are lost. Default choice is Y(es). This processing is turned off with the choice of N(o). DO_FIXES_HELP : This option instructs the program, when it is unable to find a proper match in the dictionary, to attach various prefixes and suffixes and try again. This effort is successful in about a quarter of the cases which would otherwise give UNKNOWN results, or so it seems in limited tests. For those cases in which a result is produced, about half give easily interpreted output; many of the rest are etymologically true, but not necessarily obvious; about a tenth give entirely spurious derivations. The user must proceed with caution. The default choice is Y(es), since the results are generally useful. This processing can be turned off with the choice of N(o). DO_TRICKS_HELP : This option instructs the program, when it is unable to find a proper match in the dictionary, and after various prefixes and suffixes, to try every dirty Latin trick it can think of, mainly common letter replacements like cl -> cul, vul -> vol, ads -> ass, inp -> imp, etc. Together these tricks are useful, but may give false positives (>10%). They provide for recognized varients in classical spelling. Most of the texts with which this program will be used have been well edited and standardized in spelling. Now, moreover, the dictionary is being populated to such a state that the hit rate on tricks has fallen to a low level. It is very seldom productive, and it is always expensive. The only excuse for keeping it as default is that now the dictionary is quite extensive and misses are rare. Default is now Y(es). ) ; DO_DICTIONARY_FORMS_HELP : This option instructs the program to output a line with the forms normally associated with a dictionary entry (NOM and GEN of a noun, the four principal parts of a verb, M-F-N NOM of an adjective, ...). This occurs when there is other output (i.e., not with UNKNOWNS_ONLY). The default choice is N(o), but it can be turned on with a Y(es). DO_EXAMPLES_HELP : This option instructs the program to provide examples of usage of the cases/tenses/etc. that were constructed. The default choice is N(o). This produces lengthly output and is turned on with the choice Y(es). SHOW_AGE_HELP : This option causes a flag, like '' to be put before the meaning in the output. The AGE is an indication when this word/meaning came into use, at least from indications is dictionary citations. It is just an indication, not controlling, useful when there are choices. The default choice is N(o), but it can be turned on with a Y(es). SHOW_FREQUENCY_HELP : This option causes a flag, like ' ' to be put before the meaning in the output. The FREQ is an indication of the relative usage of the word use, at least from indications is dictionary citations. It is just an indication, not controlling, useful when there are choices. The default choice is N(o), but it can be turned on with a Y(es). DO_ONLY_MEANINGS_HELP : This option instructs the program to only output the MEANING for a word, and omit the inflection details. This is primarily used in analyzing new dictionary material, comparing with the existing. However it may be of use for the translator who knows most all of the words and just needs a little reminder for a few. The default choice is N(o), but it can be turned on with a Y(es). DO_STEMS_FOR_UNKNOWN_HELP : This option instructs the program, when it is unable to find a proper match in the dictionary, and after various prefixes and suffixes, to try even dirtier tricks, specifically to try all the dictionary stems that it finds that fit the letters, independent of whether the endings match the parts of speech to which the stems are assigned. This will catch a substantive for which only the ADJ stem appears in dictionary, an ADJ for which there is only a N stem, etc. It will also list the various endings that match the end of the input word. A certain amount of weeding has been done, so only reasonably common endings are quoted, and these are lumped together masking declension, etc. Only N, ADJ, and V endings are given, LOC and VOC omitted, etc. The user can then make his own judgement. This option should probably only be used with individual UNKNOWN words, and off-line from full translations, therefore the default choice is N(o). This processing can be turned on with the choice of Y(es). TRIM_OUTPUT_HELP : This option instructs the program to remove from the output list of possible constructs those which are least likely. At the present stage, there is not much trimming except for removing Uncommon and non-classical (Archaic/Medieval) when more common results are found and this action is requested (MDEV), however, if the program grows more powerful this may be a useful option. Nevertheless, there is no absolute assurence that the items removed are not correct, just that they are statistically less likely (e.g., vocatives or locatives in certain situations). Since little is now done, the default is Y(es) SAVE_PARAMETERS_HELP : This option instructs the program, to save the current parameters, as just established by the user, in a file WORD.MOD. If such a file exists, the program will load those parameters at the start. If no such file can be found in the current subdirectory, the program will start with a default set of parameters. Since this parameter file is human-readable ASCII, it may also be created with a text editor. If the file found has been improperly created, is in the wrong format, or otherwise uninterpretable by the program, it will be ignored and the default parameters used, until a proper parameter file in written by the program. Since one may want to make temporary changes during a run, but revert to the usual set, the default is N(o).
There is also a set of DEVELOPER_PARAMETERS that are unlikely to be of interest to the normal user. Some of these parameters may be disconnected or not work for other reasons. They are mostly for the use for and in the development process. These may be changed or examined by in similar change procedure by inputting a '!' [exclamation sign] character, followed by a return.
HAVE_DEBUG_FILE_HELP : This option instructs the program to create a file which can hold certain internal information about the current search. The file is overwritten for every word in order to prevent it from growing out of hand, so information about the last word searched is saved in case of failure. The debug output file is named & DEBUG_FULL_NAME & (42+DEBUG_FULL_NAME'LENGTH..70 => ' '), Use of this option, along with the WRITE_DEBUG_FILE option may slow the program significantly. This information is usually only useful to the developer, so the default is N(o). WRITE_DEBUG_FILE_HELP : This option instructs the program, when HAVE_DEBUG_FILE is on, to put some debug data to a file named & DEBUG_FULL_NAME & (33+DEBUG_FULL_NAME'LENGTH..70 => ' '), This option may be turned on and off while running of the program, thereby capturing only certain desired results. The file is reset and restarted after each word parsed, so that it does not get too big. If the option HAVE_DEBUG_FILE is off, the user will not be given a chance to turn this one on. Default is N(o). HAVE_STATISTICS_FILE_HELP : This option instructs the program to create a file which can hold certain statistical information about the process. The file is overwritten for new invocation of the program, so old data must be explicitly saved if it is to be retained. The statistics are in TEXT format. The statistics file is named & STATS_FULL_NAME & (42+STATS_FULL_NAME'LENGTH..70 => ' '), This information is only of development use, so the default is N(o). WRITE_STATISTICS_FILE_HELP : This option instructs the program, with HAVE_STATISTICS_FILE, to put derived statistics in a file named & STATS_FULL_NAME & (36+STATS_FULL_NAME'LENGTH..70 => ' '), This option may be turned on and off while running of the program, thereby capturing only certain desired results. The file is reset at each invocation of the program, if the HAVE_STATISTICS_FILE is set. If the option HAVE_STATISTICS_FILE is off, the user will not be given a chance to turn this one on. Default is N(o). SHOW_DICTIONARY_HELP : This option causes a flag, like 'GEN>' to be put before the meaning in the output. While this is useful for certain development purposes, it forces off a few characters from the meaning, and is really of no interest to most users. The default choice is N(o), but it can be turned on with a Y(es). SHOW_DICTIONARY_LINE_HELP : This option causes the number of the dictionary line for the current meaning to be output. This is of use to no one but the dictionary maintainer. The default choice is N(o). It is activated by Y(es). ); SHOW_DICTIONARY_CODES_HELP : This option causes the codes for the dictionary entry for the current meaning to be output. This may not be useful to any but the most involved user. The default choice is N(o). It is activated by Y(es).); DO_PEARSE_CODES_HELP : This option causes special codes to be output flagging the different kinds of output lines. 01 for forms, 02 for dictionary forms, and 03 for meaning. The default choice is N(o). It is activated by Y(es).); DO_ONLY_INITIAL_WORD_HELP : This option instructs the program to only analyze the initial word on each line submitted. This is a tool for checking and integrating new dictionary input, and will be of no interest to the general user. The default choice is N(o), but it can be turned on with a Y(es). FOR_WORD_LIST_CHECK_HELP : This option works in conjunction with DO_ONLY_INITIAL_WORD to allow the processing of scanned dictionarys or text word lists. It accepts only the forms common in dictionary entries, like NOM S for N or ADJ, or PRES ACTIVE IND 1 S for V. It is be used only with DO_INITIAL_WORD The default choice is N(o), but it can be turned on with a Y(es). UPDATE_LOCAL_DICTIONARY_HELP : This option instructs the program to invite the user to input a new word to the local dictionary on the fly. This is only active if the program is not using an (@) input file! If an UNKNOWN is discovered, the program asks for STEM, PART, and MEAN, the basic elements of a dictionary entry. These are put into the local dictionary right then, and are available for the rest of the session, and all later sessions. The use of this option requires a detailed knowledge of the structure of dictionary entries, and is not for the average user. If the entry is not valid, reloading the dictionary will raise and exception, and the invalid entry will be rejected, but the program will continue without that word. Any invalid entries can be corrected or deleted off-line with a text editor on the local dictionary file. If one does not want to enter a word when this option is on, a simple RETURN at the STEM=> prompt will ignore and continue the program. This option is only for very experienced users and should normally be off. The default is N(o). ------ NOT AVAILABLE IN THIS VERSION ------- UPDATE_MEANINGS_HELP : This option instructs the program to invite the user to modify the meaning displayed on a word translation. This is only active if the program is not using an (@) input file! These changes are put into the dictionary right then and permenently, and are available from then on, in this session, and all later sessions. Unfortunately, these changes will not survive the replacement of the dictionary by a new version from the developer. Changes can only be recovered by considerable prcessing by the deneloper, and should be left there. This option is only for experienced users and should remain off. The default is N(o). ------ NOT AVAILABLE IN THIS VERSION ------- DO_ONLY_FIXES_HELP : This option instructs the program to ignore the normal dictionary search and to go direct to attach various prefixes and suffixes before processing. This is a pure research tool. It allows one to examine the coverage of pure stems and dictionary primary compositions. This option is only available if DO_FIXES is turned on. This is entirely a development and research tool, not to be used in conventional translation situations, so the default choice is N(o). This processing can be turned on with the choice of Y(es). DO_FIXES_ANYWAY_HELP : This option instructs the program to do both the normal dictionary search and then process for the various prefixes and suffixes too. This is a pure research tool allowing one to consider the possibility of strangge constructions, even in the presence of conventional results, e.g., alte => deeply (ADV), but al+t+e => wing+ed (ADJ VOC) (If multiple suffixes were supported this could also be wing+ed+ly.) This option is only available if DO_FIXES is turned on. This is entirely a development and research tool, not to be used in conventional translation situations, so the default choice is N(o). This processing can be turned on with the choice of Y(es). ------ PRESENTLY NOT IMPLEMENTED ------ DO_MEDIEVAL_TRICKS_HELP : This option instructs the program, when it is unable to find a proper match in the dictionary, and after various prefixes and suffixes, and tring every Classical Latin trick it can think of, to go to a few that are usually only found in medieval Latin, replacements of caul -> col, st -> est, z -> di, ix -> is, nct -> nt. It also tries some things like replacing doubled consonants in classical with a single one. Together these tricks are useful, but may give false positives (>20%). This option is only available if the general DO_TRICKS is chosen. If the text is late or medieval, this option is much more useful than tricks for classical. The dictionary can never contain all spelling variations found in medieval Latin, but some constructs are common. The default choice is N(o), since the results are iffy, medieval only, and expensive. This processing is turned on with the choice of Y(es). DO_SYNCOPE_HELP : This option instructs the program to postulate that syncope of perfect stem verbs may have occured (e.g, aver -> ar in the perfect), and to try various possibilities for the insertion of a removed 'v'. To do this it has to fully process the modified candidates, which can have a consderable impact on the speed of processind a large file. However, this trick seldom producesa false positive, and syncope is very common in Latin (first year texts excepted). Default is Y(es). This lengthy processing is turned off with the choice of N(o). INCLUDE_UNKNOWN_CONTEXT_HELP : This option instructs the program, when writing to an UNKNOWNS file, to put out the whole context of the UNKNOWN (the whole input line on which the UNKNOWN was found). This is appropriate for processing large text files in which it is expected that there will be relatively few UNKNOWNS. The main use at the moment is to provide display of the input line on the output file in the case of UNKNOWNS_ONLY. ); MINIMIZE_OUTPUT_HELP : This option instructs the program to minimize the output. This is a somewhat flexible term, but the use of this option will probably lead to less output. The default is Y(es). OMIT_ARCHAIC_HELP : THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET! This option instructs the program to omit inflections and dictionary entries with an AGE code of A (Archaic). Archaic results are rarely of interest in general use. If there is no other possible form, then the Archaic (roughly defined) will be reported. The default is Y(es). OMIT_MEDIEVAL_HELP : THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET! This option instructs the program to omit inflections and dictionary entries with AGE codes of E or later, those not in use in Roman times. While later forms and words are a significant application, most users will not want them. If there is no other possible form, then the Medieval (roughly defined) will be reported. The default is Y(es). OMIT_UNCOMMON_HELP : THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET! This option instructs the program to omit inflections and dictionary entries with FREQ codes indicating that the selection is uncommon. While these forms area significant feature of the program, many users will not want them. If there is no other possible form, then the uncommon (roughly defined) will be reported. The default is Y(es). DO_I_FOR_J_HELP : This option instructs the program to modify the output so that the j/J is represented as i/I. The consonant i was writen as j in cursive in Imperial times and called i longa, and often rendered as j in medieval times. The capital is usually rendered as I, as in inscriptions. If this is NO/FALSE, the output will have the same character as input. The program default, and the dictionary convention is to retain the j. Reset if this ia unsuitable for your application. The default is N(o). DO_U_FOR_V_HELP : This option instructs the program to modify the output so that the u is represented as v. The consonant u was writen sometimes as uu. The pronounciation was as current w, and important for poetic meter. With the printing press came the practice of distinguishing consonant u with the character v, and was common for centuries. The practice of using only u has been adopted in some 20th century publications (OLD), but it is confusing to many modern readers. The capital is commonly V in any case, as it was and is in inscriptions (easier to chisel). If this is NO/FALSE, the output will have the same character as input. The program default, and the dictionary convention is to retain the v. Reset If this ia unsuitable for your application. The default is N(o). PAUSE_IN_SCREEN_OUTPUT_HELP : This option instructs the program to pause in output on the screen after about 16 lines so that the user can read the output, otherwise it would just scroll off the top. A RETURN/ENTER gives another page. If the program is waiting for a return, it cannot take other input. This option is active only for keyboard entry or command line input, and only when there is no output file. It is moot if only single word input or brief output. The default is Y(es). SAVE_PARAMETERS_HELP : This option instructs the program, to save the current parameters, as just established by the user, in a file WORD.PAR. If such a file exists, the program will load those parameters at the start. If no such file can be found in the current subdirectory, the program will start with a default set of parameters. Since this parameter file is human-readable ASCII, it may also be created with a text editor. If the file found has been improperly created, is in the wrong format, or otherwise uninterpretable by the program, it will be ignored and the default parameters used, until a proper parameter file in written by the program. Since one may want to make temporary changes during a run, but revert to the usual set, the default is N(o).
The program is written in Ada, and is machine independent.
Ada source code is available for compiling onto other machines.
Purpose
The dictionary is intended as a help to someone who knows roughly enough Latin for the document under study. It gives the accidence and meanings possible for an input Latin word. It is for someone reading Latin text. There is no English-to-Latin mode.
This is a translation dictionary. Mostly it provides individual words in English that correspond to, and might be used in a translation of, words in Latin test. The program assumes a fair command of English. This is in contrast to a conventional same-language desktop dictionary which would explain the meanings of words in the same language. The distinction may be obvious but it is important. A Latin dictionary in medieval times would have explanations in Latin of Latin words.
There are various approaches to the preparation of a dictionary. The most scholarly might be to select only proper and correct entries, only correct derivations, grammar, and spelling. This would be a dictionary for one who wished to write 'correct' Latin. (Correct being defined as the way Cicero, or your favorite writer or grammarian, used it.) The current project has a different goal. This program is successful if word found in text is given an appropriate meaning, whether or not that word is spelled in the generally approved way, or is 'good Latin'. Thus the program includes various words and forms that may have been rejected by recent scholars, but still appear in some texts. Philosophically, thus program deals with Latin as it was, not as it should have been. I make no corrections to Cicero, which some might have been tempted to do if producing an academic dictionary instead of a program. Moreover I make no corrections of St Jerome. If your copy of the Vulgate has a particular spelling, that may be recognized by the program, either through a TRICK or as a dictionary entry that I have generated.
A philosophical difference from many dictionary projects is that this one has no firm model of the user or application. It is not limited to classical Latin, or to 'good practice', or to common words, or to words appearing in certain texts. As a result there will be a lot of chaff in the output. Some of this may be trimmed out automatically if desired, but it is there and available.
However inadequately, I hope to document decisions that went into the arrangement of the program and dictionary. I am surprised that there is little or no such information to the user of published dictionaries. If others generate similar products, or use the data from this one, they can do so in knowledge of how and why processes and forms were constructed.
I make few value judgments and those are mechanical, not scholarly,
and are documented herein. Nevertheless some may be arbitrary, in spite of
good intentions.
Method
The program subtracts possible endings from an input words and searches a list of stems, trying to make a match. If no exact match is possible, it tries various modifications, beginning with prefixes and suffixes, and eventually involving various regular spelling variations (or 'tricks') common in classical and medieval Latin.
A choice was made that the base was classical Latin as defined by the Oxford Latin Dictionary (OLD). Their primary time period is arbitrary/roughly 100 BC to 100 AD.
The classical form of words is taken as the base. Modifications are in such a way to correct to this base. Further additions to local dictionaries should keep this in mind. Modifications are made to the input words, not to the dictionary stems. It could be done the other way, but the present situation was initially much easier. There are some consequences of this approach. For instance, it is easy to remove an 'h' from an input word to match with a stem. It is much more difficult (but not impossible) to add 'h' in all possible positions to check against stems.
It would be possible to match most words with a relatively smaller list of stems (or roots) and generous application of word construction. This approach is not followed. One difficulty is that while words may be constructed correctly, and the underlying meaning to be found from this construction, the common usage may be obscured by a formal interpretation of the parts. In practice this occurs in 20-40% of the cases. This method is still very useful in approaching a word for which there has been no dictionary interpretation, but it puts a considerable burden on the normal user. Further, in about 10% of constructions, the result is just wrong.
In normal usage, if the program finds a simple match, it does not go further and consider what constructed words might also be valid. (One can override and force prefix/suffix construction with a switch, but one might not want to force all possible tricks.)
For instance, if there is an adjective that matches, a corresponding identically spelled, logically valid noun will not be reported unless it is explicitly found in the dictionary, even though it could be constructed or inferred from the adjective or constructed with a suffix from a verb in the dictionary.
An exception to this is that enclitics (eg., -que) are always considered. Coloque can be a verb or collo-que. The latter is in Virgil and should not be omitted. Verb syncope is also favored. In the vast majority of cases, if there is a possible syncope it is the correct parse. This is given preference over word construction with suffix. Audii is syncope of audivi, but it could also be aud-i-i. The latter is considered very unlikely.
There are a large number of paths and possibilities. Choices have been made in the code that result in the exclusion of some. It is hoped that they were the best choices. The method was constructed by taking a number of primary procedures and combining/assembling them in such a way as to give reasonable parses for a number of test cases. Basicly, this is hacking, but it might be considered and emperical starting point from which one could construct a logical rationale.
Therefore, the philosophy is to populate the stem list as densely as possible. Even easily resolved differences are included redundantly (adligo as well as alligo - ad- is most of duplicates). The advantage is that while regular single-letter modifications are fairly easy, and two letter differences are possible (but more expensive), further deviations are problematical. The better populated the stem list, the better the chance of a result.
Even in easy cases the overpopulation is helpful. Antebasis is easily parsed as ante-basis ('pedestal before', which is reasonable), but inclusion as a separate word allows the additional information that it is the hindmost pillar of the pedestal of a ballista.
The stem list is also populated with variants suggested by different sources. The problem is that the remains of classical Latin have gone through many monks along the way. These copyists may have made simple mistakes (typos!), or have made what they thought were proper corrections (spell checkers!). And twenty centuries later scholars work hard to reassemble the best Latin to present in the dictionary. But a particular document in the form presented to the reader may have have a variety of spellings for exactly the same word in the same referenced passage (Pliny's Natural History is often subject to this problem). (It may even be that modern texts and dictionaries have misprints!) All forms found in various dictionaries can be included, with the exception of those explicitly labeled 'misread' (and the argument probably could mandate their inclusion also). However, a single example of a variant in one case will not be included as a dictionary entry. If such a word is sufficiently important, if it is used frequently or by several authors, it will be entered as a UNIQUE.
Lewis and Short seem to be more willing than the more recent Oxford Latin Dictionary to raise a few examples of variation to an entry (at least an alternate). Generally, I make an entry if some dictionary does so. But within an entry I generate additional possible stems not noted elsewhere, e.g., I expand first declension verbs with '-av' perfect stems, even though no example exists in classical Latin. This is often the practice in other dictionaries also.
It is often the practice in paper dictionaries to double up on an entry that may be either adjective or noun, usually by leading with the adjective and mentioning its use as a noun. A much larger set of adjective/noun pairs is favored with separate entries. It is the philosophy of this program to make separate entries whenever there is an example in any reference dictionary. This might faciliate the task of a larger translation program which would handle phrases or sentences. However there has been no effort to explicitly generate such pair expansion if there is no precedent, and the user must still recognize the possibility of unexpanded multiple possibilities for substantives.
An argument against a large stem list is that it increases the storage required (but this is extremely modest by current standards) and increases processing time for search of the stems (this is far offset by the processing which would be required to construct or analyze words working from a smaller stem list).
Additional parts of verbs are included (first conjugation is easily filled out, even eccentric verbs if they are compounds of known parts), although they may not have been found in any well known texts. Cases can be logically constructed that are 'missing' in classical Latin. Verbs with prefix can be expanded when the base is known. That a form has not been found in surviving copies of classical documents does not mean that it was not on the lips of every centurion and his girl friend, or that it might not find its way into medieval texts.
In some cases there are good reasons not to do the mathematical expansion, and these are pointedly avoided. There is no mechanical generation of, for instance, conl- words for every coll- word, unless there is some citation or reasonable rationale. They may be paired in almost every case, but, for instance, collis and collyra are not. However, forms that are mentioned in dictionaries explicitly, or implicitly by being derived from words having variant forms, are included in order to reduce the dependence on 'tricks'. OLD has a conp- for almost every comp- (except derivatives from como). Rare exceptions seem to be rare words for which few examples (or only one) exist. Even in some of these cases, OLD (mechanically?) gives two forms. L+S follows the same pattern, except for words of late Latin (which would not be found in OLD). It is presumed that the general practice in later times was always to use comp-, and the program dictionary follows that. There are many acc-/adc- pairs, but OLD has a fair number of acc- words without mention of a corresponding adc-, and so the possible generation of these words has been resisted. If an example turns up in text, the appropriate trick procedure should suffice
One suspects that some amount of analytical expansion is present even in the best dictionaries. Otherwise how can one explain four alternate spellings for a word which apparently only appears in a single inscription.
Adjectives from participles are included if an entry is found in some reference dictionary. In some case the adjective has a special meaning not obvious from the verb. The program will return both the adjective and the participle with its verb meaning. The user should give some additional consideration to the adjective meaning in this case. If the adjective is marked rare while the verb is common, it is likely there is reference to a special meaning.
Tricks are expensive in processing time. Each possible modification is made, then the resulting word goes through the full recognition process. If it passed, that is reported as the answer. If it fails, another trick is tried. This is effective if very few words get this far. It is expected that application of single tricks will solve most of the resolvable difficulties. It would be impractical to mechanically apply several tricks in series to a word. If the dictionary is heavily and redundantly populated, tricks are rarely necessary (and therefore not an overall processing burden) and largely successful (if the input word is a valid, but unusual, variant/construction).
Further, a conventional dictionary, especially one that wishes to set
a standard for proper language, excludes words that may not
meet criteria of propriety, slang, misspellings, etc. This may
place the onus on the reader to convert words. A computer
dictionary ought to relieve the reader as much as possible.
The present program may be a far way from complete, but it's
goal is to strive for that.
Word Meanings
The meanings listed are generally those in the literature/dictionaries. In the case of common words, there is general agreement among authors. Some uncommon words display convoluted interpretations.
Generally, the meaning is given for the base word, as is usual for dictionaries. For the verb, it will be a present meaning, even when the tense input is perfect. For an adjective, the positive meaning is given, even if a comparative or superlative form is shown. This is also so when a word is constructed with a suffix, thus an adverb constructed from its adjective will show the base adjective meaning and an indication of how to make the adverb in English.
For the level of usage for this program, and for convenience in coding, the meaning field has been fixed at 80 characters. It is possible to have multiple 80 character lines for an entry, but this only necessary for the most common words. In order to conserve space, extraneous helpers like 'a', 'the', 'to', which sometimes appear in dictionary definitions, are generally omitted. The solidus ('/') is used both to separate equivalent English meanings and to conserve space.
I have taken it upon myself to add some interpretations and synonyms, and propose common usage for otherwise complex descriptive definitions. The idea is to prompt the reader, expecting that the text may not be that from which some dictionary copied the meaning (from some 18th century translator!).
Where available, the Linnean or 'scientific Latin' name is given in parentheses, mostly for plants. This is not a classical Latin name, but a modern designation. Similarity of this designation to some Latin word may not be historically significant.
The spelling of the English meanings is US (plow not plough, color not colour, and English corn is rendered as grain or wheat), in spite of the fact that most of the Latin dictionaries that I have are British and use British spelling. The reason for this is (besides uniformity in the program) that there is much computer processing and checking of the dictionary data, including spell-checking of the English. (This is not to say that everything is correct, but it is much better than it would be without the computer checking.) All my programs speak US English, so I can count on it. Only some are available in UK English, and I do not have all of those versions.
Latin dictionaries seem to be locked into the early 19th century. The English terms seem stilted, even by current British usage. This is probably because much work in translation was started then and later work tended to copy from the previous dictionaries. While this dictionary has done some modernization, some of the previous obscurities have been preserved. This was done in order that certain machine processes could compare the results of automatic translation with existing published work.
In addition, I have given US meanings to some terms that seem to be literally translated from the Latin (or German!) (a person who steals/drives off cattle is a rustler in the US).
Most dictionaries have an etymological approach, they are
driven by the derivation of words to distinguish with separate entries
words that may be identical in spelling but different derivations.
But they can lump entirely different, even contradictory, meanings
in a single entry if there is some common derivation. Philosophically,
this dictionary is usually
not sensitive to derivations, but sometimes supports multiple
entries for vastly different meanings, application areas, or eras.
Proper Names
Only a very few proper names are included, many just for test purposes, others that users have requested. The number of proper names is almost limitless but very few are applicable to a particular document, and if it is an obscure document it is unlikely that the names would be found in any dictionary.
Meaning for proper names may cite a likely example of a person with that name. This is just an example; there are lots of others with that name.
There is a switch (defaulted to Yes) that allows the program to
assume that any capitalized unknown word is a proper name, and to
ignore it. Also, one can make up a local dictionary of names
for one's particular application.
Letter Conventions
Strictly speaking, Latin did not have a V, just a consonant U, or a U character that was easier in capitals (the way Latin was written by the Romans) to write or chisel in stone as V. However, most modern texts and dictionaries (with the important exception of the OLD) make the distinction with two characters (u and v). It appeared most appropriate in a computer context (never destroy information) to make the distinction and follow the common practice. So all dictionary entries maintain the V/v. However, an input word following the U convention will be found. At an earlier version, an algorithm was kludged to convert where necessary. While this worked in most cases, there were difficulties. The present system processes the dictionary and the input word as though U and V were the same letter, although the basic dictionary maintains the distinction and the output reflects this. There is no longer any need for the user to set modes for this process.
A similar situation arises with I, and its consonant form, J. In this instance, the common practice is use only I, but there are many counter-examples, both text and dictionaries. (Lewis + Short uses J, but OLD does not.) Because of common practice, the program started out as pure-I dictionary with conversion of J-to-I on input. It remained that way through many versions, in spite of the logical inconsistency with U-V. The technique worked perfectly, but eventually the aesthetic of consistency won out and the U/V technique described above was extended to I/J. As yet, most all dictionary entries are pure-I, but the mechanism is in place to use J in both dictionary and input.
There are examples of W in some medieval Latin. I have not yet directly faced
this, and have no words in the dictionary with W. However, the W problem
is not analogous to U/V. While W sometimes could correspond to V or UU,
in most cases it is a valid letter, reflecting a Germanic origin of
the word. It will be treated as a real letter, and tricks employed were
useful.
Dictionary Codes
Several codes are associated with each dictionary entry (presently AGE, AREA, GEO, FREQ, SOURCE). These were provided against the possibility of the program using them to make a better interpretation. For the most part, this information is of little additional help to the reader, but it is carried in codes because it is not available to the program in any other way. Some of these codes, like the KIND code for nouns, may be used, others may not. The program is still in development and these are put in to experiment with a possible capability. Later versions may use them, omit them, or provide others.
The program covers a combination of time periods and applications areas. This is certainly not the way in which dictionaries are usually prepared. Usually there is a clear limit to the time or area of coverage, and with good reason. A computer dictionary may have capabilities that mitigate those reasons. Time or area can be coded into each entry, so that one could return only classical words, even though matching medieval entries existed. (The program has that capability now, but it is not yet clear how to apply it.)
There is some measure of period and frequency that can be used to discriminate between identical forms, but if there is only one possible match to an input word, it will be displayed no matter its era or rarity. The user can choose to display age and frequency warnings associated with stems and meanings, but the present default is not to.
So far these codes have not been of much use, especially since the only significant exercises have been with classical Latin. Other situations may change this. Perhaps the only impact now is for those words which have different meanings in different applications or periods. For these the warning may be useful. Otherwise, if there is only one interpretation for a word, that is given.
Rare and age specific inflection forms are also displayed,
but there is a warning associated with each such.
The designation of time period is very rough. It is presently based on
dictionary information. If the quotes cited are in the 4th century,
and none earlier, then the word is assumed to be late Latin, and one
might conclude that it was not current earlier. One flaw in this
argument could be that the citation given was just the best illustration
from a large number covering a wide period. On the other hand, the word
could have been well known in classical times but did not appear in
any surviving classical writings. In such a case, it is reasonable to
warn the reader of Cicero that this is not likely the correct
interpretation for his example. This capability is still developmental,
and its usefulness is still an open question.
If there is a classical citation, the word could be designated as classical,
but unless there is some reason to conclude otherwise, it is expected
that classical words are valid for use in all periods (X), are universal
for well considered (published) Latin.
A designation of Early (B) means that there are not classical citations,
except for poetry, in which the poet is invoking the past (or just straining
for meter). Obsolete words occur similarly in English literature and poetry.
Much which is designated late or medieval may be vulgar Latin, in common
use in classical times but not thought suitable for literary works.
In all periods the target is Latin. Archaic Latin, for purposes of the
program, is still Latin, not Etruscan or Greek. Medieval Latin is
that which was written by scholars as the universal Latin, not
versions of early French or Italian.
While the reader can make his own interpretation of the area of application
from the given meaning, there may be some cases in which the program can
also use that information (which it can only get from a direct coding).
This has not yet been used in the program, but the possibility exists.
If the reader were doing a medical text, then higher priority should
be given to words coded B, if a farming book, then A coded words should be
given preference.
The area need not apply to all the meanings, just that there
is some part of the meaning that is specialized to or applies specifically
to that area and so is called out.
This code was included to enable the program to distinguish between
different usages of a word depending on where it was used or what
country was the subject of the text.
This is a dual usage, origin or subject.
There is an indication of relative frequency for each entry. These
codes also apply to inflections, with somewhat different meaning. If there
were several matches to an input word, this key may be used to sort
the output, or to exclude rare interpretations. The first problem is to
provide the score. The initial method is to grade each word by how
much column space is allocated to it in the Oxford Latin Dictionary,
or the number of citations, on the assumption that many citations
mean a word is common. This is not the intent of the compilers of
existing dictionaries, but it is almost the only indication of frequency
that can be inferred from most dictionaries. In many cases it
seems to be a reasonable guess, certainly for those most common words,
and for those that are very rare.
With the understanding that adjustments can be made when additional
information is available, the initial numeric criteria are:
In the case of late Latin in Lewis and Short, these frequencies may
be significant underestimates, since the volume of applicable texts
considered seems to be much smaller than for classical Latin
resulting in fewer opportunities for citations. Nevertheless,
barring additional information, the system is generally followed.
For the situation where there are several slightly different spellings given
for a word, they all are given the same initial frequency. The theory is
that the spelling is author's choice while the frequency is attached
to the word no matter how it is spelled. I presume that for a specific
text the author always spells the word the same way, that there is
no distribution of spellings within a individual text. One exception
to this rule is the case where a variant spelling is cited only for
inscriptions. There may be some significance to this and a FREQ of I
is assigned. The logic of this choice is debatable. However, for some
variations there is clearly a difference in application and this can
be reflected in the frequency code. Likewise, there are situations
wherein words of the same spelling but different meanings may have different
frequencies. This may help to select the most likely interpretation.
One has a check against the frequency list of Diederich for the most
common, and those are probably the only ones that matter.
But the frequency depends on the application, and it should be possible
to run a new set of frequencies if one had a reasonable volume of
applicable text. The mechanical verification of word frequency codes
is a long-term goal of the development, but must wait until the
dictionary data is complete.
Inscription and Graffiti are designations of frequency only in that
the only citations found were of that nature. One might suppose that
if literary examples were known they would have been used. So one
might expect that such words would not be found in a student's text.
There is no implication that they were not common in the spoken language.
A very special case has been created for 'N' words, words for which the only
dictionary citation is Pliny's Natural History. It seems, from reading of
dictionaries, that this work may be the only source for these words,
that they do not appear in any other surviving texts. They are usually
names for animals, plants or stones, many without identification. Such words
may appear only in Lewis and Short and the Oxford Latin Dictionary,
the unabridged Latin classical dictionaries. These words are omitted
from most other Latin dictionaries and, although they fall in the classical
period and are from a very well known writer, there is no mention of
the omission. So there
may be an argument to disparage these words, unless one is reading Pliny.
Most of these words are of Greek origin (although that
is also true for much of Latin). For many, the dictionaries report different
forms or declensions for the word giving the same citation. Often one
dictionary will give a Greek-like form (-os, -on) where another gives
a Latinized form (-us). There is no consistency. Both OLD and L+S
disagree on Latin and Greek forms, with no overwhelming favoritism
to one form attached to either dictionary. This may be a reflection
of the fact that the dictionaries grew over a long time with several
editors, many workers, and no rigid enforcement of standards.
There is another problem that is found chiefly in connection with Pliny-type
words. Since the literature is very sparse on examples, it is often
uncertain whether a particular usage is appropriately listed as a noun,
as an adjective, or as adjective used as a substantive.
The present dictionary, in blessed innocence, records all forms without bias.
For inflections, the same type is used with different weights
Source is the dictionary or grammar which is the source of the information,
not the Cicero or Caesar text in which it is found.
For a number of entries, X is now given as Source. This is primarily
for the vocabulary (about 13000 words) which was in place before the
Source parameter was put in, and which has not been updated.
In fact, they are from no particular Source,
just general vocabulary picked up in various texts and readings. Although,
during the dictionary update beginning in 1998, all entries are being checked
against sources, it may be improper to credit (blame?) a Source when that
was not the origin of the entry, remembering that the actual entries are
of my generation entirely and may not correspond exactly to any other view.
However, in the second pass (as far as it has progressed) all classical
entries have been verified with the Oxford Latin Dictionary (OLD). (By
that I mean that I have checked, not to imply that I have not made errors.)
This does not mean that the entry necessarily agrees with the OLD, but
that I read the OLD entry with great respect and put down what I did anyway.
Newer entries, added in this process, and those checked later in the process,
if found in the OLD, have the O code.
Words added from Lewis and Short, but not in OLD, have the S code, etc.
All entries for which there is a Source will be found in some form
in that Source,
but the details of the interpretation of declension and meaning is mine.
They may not necessarily be found as primary entries, or even directly
referrenced, but they will have been constructed from information in
that source. (For instance, "adp see app" may generate more adp words
than are directly mentioned in the bulk of the dictionary.)
There should be no expectation, nor is there any claim, that the result
of the program is exactly that from the cited Source. Each entry is
my responsibility alone, and there are significant differences and
elaborations. However, in each case where there is a Source, the reader
can find the basis from which the program data was derived. If I have
done a proper job, he will not often be surprised.
The list of sources goes far beyond what has been directly used so far.
There should be no expectation at this point in the development that
all these sources have even been used. They are listed as I have copies
and as they might be consulted. They are encoded so that the program might
recognize and process the source should it come up.
I have sought and received permission for those which have been
extensively used. Others have only been used for an occasional check
(fair use) or have denied me permission (Brill for Niermeyer).
AGE
type AGE_TYPE is (
X, -- -- In use throughout the ages/unknown -- the default
A, -- archaic -- Very early forms obsolete by classical times
B, -- early -- Early Latin, pre-classical, used for effect/poetry
C, -- classical -- Limited to classical (~200 BC - 200 AD)
D, -- late -- Late, post-classical, early Christian (3-6)
E, -- later -- Latin not in use in Classical/Roman times (7-10)
F, -- medieval -- Spanning E and G, including late medieval (11-15)
G, -- modern -- Latin not in use before 16th century (16-18)
H -- neo -- Coined recently, words for new things (19-20)
);
AREA
type AREA_TYPE is (
X, -- All or none
A, -- Agriculture, Flora, Fauna, Land, Equipment, Rural
B, -- Biological, Medical, Body Parts
D, -- Drama, Music, Theater, Art, Painting, Sculpture
E, -- Ecclesiastic, Biblical, Religious
G, -- Grammar, Rhetoric, Logic, Literature, Schools
L, -- Legal, Government, Tax, Financial, Political, Titles
P, -- Poetic
S, -- Science, Philosophy, Mathematics, Units/Measures
T, -- Technical, Architecture, Topography, Surveying
W, -- War, Military, Naval, Armor
Y -- Mythology
);
GEO
type GEO_TYPE is (
X, -- All or none
A, -- Africa
B, -- Britain
C, -- China
D, -- Scandinavia
E, -- Egypt
F, -- France, Gaul
G, -- Germany
H, -- Greece
I, -- Italy, Rome
J, -- India
K, -- Balkans
N, -- Netherlands
P, -- Persia
Q, -- Near East
R, -- Russia
S, -- Spain, Iberia
U, -- Eastern Europe
Y -- Mythology
);
FREQ
A full column or more, more than 50 citations
B half column, more than 20 citations
C more then 5 citations
D 4-5 citations
E 2-3 citations
F only 1 citation
type FREQUENCY_TYPE is (
X, -- -- Unknown or unspecified
A, -- very freq -- Very frequent, in all Elementary Latin books
B, -- frequent -- Frequent, in top 10 percent
C, -- common -- For Dictionary, in top 10,000 words
D, -- lesser -- For Dictionary, in top 20,000 words
E, -- uncommon -- 2 or 3 citations
F, -- very rare -- Only one citation in OLD or L+S
I, -- inscription -- Presently not much used
M, -- graffiti -- Presently not much used
N -- Pliny -- Things that may appear only in Pliny
);
X, -- -- Unknown or unspecified
A, -- most freq -- Very frequent, the most common
B, -- sometimes -- sometimes, a not unusual variant
C, -- uncommon -- occasionally seen
D, -- infrequent -- recognizable variant, but unlikely
E, -- rare -- for a few cases, very unlikely
F, -- very rare -- singular examples,
I, -- -- Presently not used
M, -- -- Presently not used
N -- -- Presently not used
SOURCE
type SOURCE_TYPE is (
X, -- General or unknown or too common to say
A, -- Allen + Greenough, New Latin Grammar, 1888 (A+G)
B, -- C.H.Beeson, A Primer of Medieval Latin, 1925
C, -- Charles Beard, Cassell's Latin Dictionary 1892 (CAS)
D, -- J.N.Adams, Latin Sexual Vocabulary, 1982
E, -- L.F.Stelten, Dictionary of Eccles. Latin, 1995
F, -- Roy J. Deferrari, Dictionary of St. Thomas Aquinas, 1960 (DeF)
G, -- Gildersleeve + Lodge, Latin Grammar 1895 (G+L)
H, -- Harrington/Pucci/Elliott, Medieval Latin 2nd Ed 1997
I, -- Leverett, F.P., Lexicon of the Latin Language, Boston 1845
J, -- C.C./C.L. Scanlon Latin Grammar/Second Latin, TAN 1976
K, -- W. M. Lindsay, Short Historical Latin Grammar, 1895
L, -- Lewis, C.S., Elementary Latin Dictionary 1891
M, -- Latham, Revised Medieval Word List, 1980
N, -- Lynn Nelson, Wordlist
O, -- Oxford Latin Dictionary, 1982 (OLD)
P, -- Souter, A Glossary of Later Latin to 600 A.D., Oxford 1949
Q, -- Other, unspecified dictionaries
R, -- Plater and White, A Grammar of the Vulgate, Oxford 1926
S, -- Lewis and Short, A Latin Dictionary, 1879 (L+S)
T, -- Found in a translation -- no dictionary reference
U, -- Du Cange
V, -- Vademecum in opus Saxonis - Franz Blatt
W, -- My personal guess
Y, -- Niermeyer, Mediae Latinitatis Lexicon Minus
Z -- Sent by user -- no dictionary reference
-- Consulted but used only indirectly
-- Liddell + Scott Greek-English Lexicon
);
Dictionary Codes
There are a few special conventions in setting codes.
Proper Names
Proper names are often identified by the AGE in which the person lived, not the age of the text in which he is referenced, the AREA of his fame or occupation, and the GEO from which he hailed. This refers to some most-likely person of this name. A name may be shared by others in different ages. Thus Jason, the Argonaut, is Archaic, Myth, Greek (A Y H). (It is not likely that a Latin text would refer to a TV star.) Tertullian, an early 3rd century Church Father from Carthage, author of the first Christian writings in Latin, is Late, Ecclesiastic, Africa (D E A). Jupiter is (A E I), which is a bit sloppy since he is present later. Today he may be a myth, but then he was a god. But even gods are not eternal (X) in language, and an initial place is found for them. Place names are likewise coded, although with less confidence.
Vertical Bar
While not visible to the user, the dictionary contains certain
meanings starting with a vertical bar (|). This is a code used to
identify meanigs that run beyond the conventional 80 characters.
One or more vertical bars leading the meaning allows tools to
recognize that they are additional meanings to an entry already encountered,
usually the entry immediately before when the sort is for that reason.
This is only of concern to those dealing with the raw dictionary who
have asked.
Evolution of the Dictionary
The stem list was originally put together from what might be called 'common knowledge', those words that most Latin texts have. The first version had about 5000 dictionary entries, giving up to 95% coverage of simple classical texts. This grew to about 13000 entries with specific additions when gaps were found. With this number it was possible to get better than a 99% hit rate on Caesar (an area from which the dictionary was built). Parse of other works fell to 95-97%, which may be mathematically attractive but leaves a lot to be desired in a dictionary, since a translator is usually familiar with the vast bulk of the language and just needs help on the obscure words. Having just the common words is not enough, indeed not much help at all. So an attempt is made to make the dictionary as complete as possible. All possible spellings found in dictionaries are included.
Starting with the 13000, the expansion project beginning in 1998 sought to verify the existing words and supplement with any new found ones. Thus all classical Latin words are consistent with the OLD (not to say taken from, because most were not, but checked against). Any significant deviation is indicated, either as from another source, or in the definition itself.
L+S is used for later Latin and to check OLE work. This started with the thought that if a word was in L+S but not in OLE it must be later Latin, beyond the range of OLD. I was surprised at how many words with classical citations were in L+S but not in OLD, and how many are of different spelling.
The refinement is proceeding one letter at a time, as is the tradition
for all great dictionaries. First stage refinement has proceeded through COQ.
Testing
The program has been run against a few common classical texts. Initially this was mostly a check of the process and reliability of the program. It is now possible to run real texts and get valid statistics. The current program/dictionary has been run against 275,000 words of Caesar, Suetonius, Varro, and Virgil. For this set of texts only 0.2% of the words are reported as UNKNOWN (excluding proper names). While this is a mechanical test and does not assure that the form and meaning reported by the program is correct, the actual number of misses found by limited detailed examination is vanishingly small.
The hardest test is against another dictionary. While getting a 97+% hit rate on long classical texts, a run against a large dictionary might fall to 85-90%, the missing words being in those letters which the update has not reached. This is to be expected, since we both have the 10000 most common words and have made somewhat different additions beyond that. So large electronic wordlists are a check on the program, and are reserved for that purpose, not simply incorporated as such.
The
Latin Word List of Lynn Nelson is an excellent benchmark,
more so because of its medieval content.
Current Status and Future Plans
The present phase of refinement has incorporated the Oxford Latin Dictionary and Lewis and Short entries into D (about a fourth). Periodically, when I need a change of task, I run a major author (primarily from the Packard Humanities Institute CD ROM) to check the effectiveness of the code. I may then include some words which turn up frequently as unknowns, but this is done as the spirit moves me. Smaller sections of later authors may also be processed, giving some growth in medieval Latin entries. Recently I have been working the Vulgate of St. Jerome.
I will continue to refine the dictionary and the program. The major goal is to complete the inclusion of OLD and L+S, and this may take years. Along the way, and later, I will expand to medieval Latin. I am not so unrealistic as to believe that I will 'finish', indeed, this is a hobby and there is no advantage to finishing.
An eventual outcome would be to have some institution, with real Latin
capability, provide an exhaustive and authoritative program of this nature.
Until then, I and other individuals will make available our programs.
To make the dictionary files used by the program is not difficult, but it takes several auxiliary programs for checking and ordering which are best handled by one center. These are available to anyone who needs them, but it is better that any general additions to the dictionary be handled centrally that they can be included in the public release for everyone.
However,
it is possible for a user to enhance the dictionary for special situations.
This may be accomplished either by providing new dictionary entries in a
DICT.LOC file, those to be processed in the regular manner, or to add
a unique (single case/number/gender/...) in a text file called UNIQUES.
DICT.LOC
A dictionary entry for WORDS (in the simplest, editable form as read in a DICT.LOC) is
aqu aqu N 1 1 F T X X X X X water;
For a noun there are two stems. The definition of "stem" is inherent in the coding of inflections in the program. Different grammars have different definitions. There is no formal connection with any other usage.
To these stems are applied, as appropriate, the endings
S P NOM a ae GEN ae arum DAT ae is ACC am as ABL a is
Or rather, the input word is analyzed for possible endings, and when these are subtracted a match is sought with the dictionary stems. A file (INFLECTS.LAT) gives all the endings.
In this example, the first line
aqu aqucontains the two noun stems for the word found in printed dictionaries as
aqua, -ae
The second line
N 1 1 F T X X X X Xsays it is a noun (N), of the first declension, first variant, is feminine (F), and is a thing (T), as opposed to a person, location, etc. The X X X X X represents coding about the age in which it is applicable, the geographic and application area of the word, its frequency of use, and the dictionary source of the entry. None of this is necessary in a DICT.LOC although something must be filled in and X X X X X is always satisfactory.
The last line is the English definition. It can be as long as 80 characters.
water;
The case and exact spacing of the stems and codes is unimportant, as long as they are separated by at least one blank.
The PART_OF_SPEECH_TYPE that you are most interested in are (X, N, ADJ, ADV, V). X is always a valid entry. It stands for none, or all, or unknown. 0 has the same function for numeric types.
The others in the type (PRON, PACK, VPAR, SUPINE, PREP, CONJ, INTERJ, NUM, TACKON, PREFIX, SUFFIX) are either less interesting or artificial, used only internally to the code.
A noun or a verb has a DECN_RECORD consisting of two small integers. The first is the declension/conjugation, and the second is a variant within that.
N 1 1 is the conventional first declension. But there are variants (6, 7, 8) which model Greek-line declensions. (Greek-like variant start at 6);
N 2 1 is the regular -us, -i second declension.
N 2 2 is the regular -um, -i neuter form.
There is a N 2 3 for 'r' forms like puer, pueri. In this case there is the possibility of a difference in stems (ager, agri has stems coded as ager, agr).
Again there are Greek-like variants (6, 7, 8, 9).
N 3 1 is regular third declension (lex, legis -> lex, leg) for masculine and feminine.
N 3 2 is for neuter (iter, itineris -> iter, itiner).
Variants 3 and 4 are for I-stems. And so it goes.
Each noun has a GENDER_TYPE (X, M, F, N, C). X for unknown (something I avoid for gender - guess if you have to) or all genders (useful in the code but not in a dictionary), and C for common (M + F).
There is also a
NOUN_KIND_TYPE (X, -- unknown, nondescript N, -- proper Name L, -- Locale, country, city W, -- a place Where P, -- a Person type T) -- a Thingwhich you probably do not care about either. Most entries will all be Thing.
Other codes are enumerated in the body of this document.
Verbs are done likewise, but there are four stems, as described below. An example is
am am amav amat V 1 1 X X X X A O love;
Now comes the hard part. When starting from a dictionary one has all the information to decide the values. Just having a single instance of the word lacks a lot. Consider some examples from a user.
elytris is surely from the Greek for sheath. The question is how Latinized did it get. I suspect that by the 17th century it was completely Latinized. Even in classical times there was very little left in the way of Greek forms (< 1%). So let us guess elythris (or -es), elythris (N 3 3) but it could be a Greek-like form (N 3 9). I do not even know what case I started with, if NOM, then it must be -is, -is, if GEN then -es, -is is reasonable. Then again, if it is DAT P we might have a N 1 1.
All this seems very uncertain, and, in the absence of a real dictionary entry, it is. However you can make the choices such that the result (the output of the code) matches exactly what you have. If you have more information, lots of examples, the uncertainty shrinks. If you have just a single isolated example, there are limits. (But if you do 100 and have more information about some, you can make better guesses about the rest.)
Next we need a gender. It may not make much difference (if M or F, or C) in this case, but sometimes it matters. You might be able to figure that out from the text.
It is a thing (T), but X will work for your purposes. For the rest, X X X X X works fine.
So we have
elythris elythr N 3 3 F T X X X X X elytra, wing cover of beetles
sat, I happen to know is an abbreviated form of satis, so it is easy. If you want the adverb form, as you indicate:
sat ADV POS X X X X X sufficiently, adequately; quite, well enough; fairly, (moderately)
Adverbs have a comparison parameter (X, POS, COMP, SUPER). Most will be POS.
It also is an indeclinable (N 9 9) substantive:
sat N 9 9 N T X X X X X enough, sufficient; enough and some to spare; one of sufficient power
deplanata seems to be a 1-2 declension adjective, the -us, -a, -um form. It also seems to derived from the verb deplanto (V 1 1) - break off/sever (branch/shoot).
deplanat deplanat ADJ 1 1 POS X X X X X broken off/severed (branch/shoot); (flattened)
Adjectives have a DECN and a comparison.
The following were not at the time in the dictionary, but were in the OLD.
alat alat ADJ 1 1 POS X X X X X winged, having wings; having a broad/expanded margin (punct - ul - at -> hole/prick/puncture - small - having) punctulat punctulat ADJ 1 1 POS X X X X X punctured; having small holes/pricks/stabs/punctures appendiculat appendiculat ADJ 1 1 POS X X X X X appendiculate; having/fringed by small appendages/bodies acetabul acetabul N 2 2 N T X X X X X small cup (vinegar), 1/8 pint; cupped part (plant); sucker; socket, (cavity) ruf ruf ADJ 1 1 POS X X X X X red (various); tawny; red-haired (persons); strong yellow/moderate orange testace testace ADJ 1 1 POS X X X X X bricks; resembling bricks (esp. color); having hard covering/shell (animals)
This one had no classical correspondence.
brunne brunne ADJ 1 1 POS X X X X X brown
There is one other remark. It is probably wise to include in the definition a more complete English meaning. Just saying appendiculat -> appendiculate is not as interesting as it might be.
All the inflections are in a file called INFLECTS.LAT now a part of
the general distribution of source code and date files
http://www.erols.com/whitaker/wordsall.zip
Here is a quick reference for the most common types.
-- All first declension nouns - N 1 1 -- Ex: aqua aquae => aqu aqu -- Second declension nouns in "us" - N 2 1 -- Ex: amicus amici => amic amic -- Second declension neuter nouns - N 2 2 -- Ex: verbum verbi => verb verb -- Second declension nouns in "er" whether of not the "er" in base - N 2 3 -- Ex; puer pueri => puer puer -- Ex: ager agri => ager agr -- Early (BC) 2nd declension nouns in ius/ium (not filius-like) - N 2 4 -- for the most part formed GEN S in 'i', not 'ii' -- G+L 33 R 1 -- Dictionaries often show as ...(i)i -- N 2 4 uses GENDER discrimination to reduce to single VAR -- Ex: radius rad(i)i => radi radi M -- Ex: atrium atr(i)i => atri atri N -- Third declension M or F nouns whose stems end in a consonant - N 3 1 -- Ex: miles militis => miles milit -- Ex: lex legis => lex leg -- Ex: frater fratris => frater fratr -- Ex: soror sororis => soror soror -- All third declension that have the endings -udo, -io, -tas, -x -- Ex: pulcritudo pulcritudinis => plucritudo pulcritudin -- Ex: legio legionis => legio legion -- Ex: varietas varietatis => varietas varietat -- Ex: radix radicis => radix radic -- Third declension N nouns with stems ending in a consonant - N 3 2 -- Ex: nomen nomenis => nomen nomen -- Ex: iter itineris => iter itiner -- Ex: tempus temporis => tempus tempor -- Third declension nouns I-stems (M + F) - N 3 3 -- Ex: hostis hostis => hostis host -- Ex: finis finis => finis fin -- Consonant i-stems -- Ex: urbs urbis => urbs urb -- Ex: mons montis => mons mont -- Also use this for present participles (-ns) used as substantives in M + F -- Third declension nouns I-stems (N) - N 3 4 -- Ex: mare amris => mare mar -- ending in "e" -- Ex: animal animalis => animal animal -- ending in "al" -- Ex: exemplar exemplaris => exemplar exemplar -- ending in "ar" -- Also use this for present participles (-ns) used as substantives in N -- Fourth declension nouns M + F in "us" - N 4 1 -- Ex: passus passus => pass pass -- Ex: manus manus => man man -- Fourth declension nouns N in "u" - N 4 2 -- Ex: genu genus => gen gen -- Ex: cornu cornus => corn corn -- All fifth declension nouns - N 5 1 -- Ex: dies diei => di di -- Ex: res rei => r r -- Adjectives will mostly only be POS and have only the first two stems -- ADJ X have four stems, zzz stands for any unknown/non-existent stem -- Adjectives of first and second declension (-us in NOM S M) - ADJ 1 1 -- Two stems for POS, third is for COMP, fourth for SUPER -- Ex: malus mala malum => mal mal pei pessi -- Ex: altus alta altum => alt alt alti altissi -- Adjectives of first and second declension (-er) - ADJ 1 2 -- Ex: miser misera miserum => miser miser miseri miserri -- Ex: sacer sacra sacrum => sacer sacr zzz sacerri -- no COMP -- Ex: pulcher pulchri => pulcher pulchr pulchri pulcherri -- Adjectives of third declension - one ending - ADJ 3 1 -- Ex: audax (gen) audacis => audax audac audaci audacissi -- Ex: prudens prudentis => prudens prudent prudenti prudentissi -- Adjectives of third declension - two endings - ADJ 3 2 -- Ex: brevis breve => brev brev brevi brevissi -- Ex: facil facil => facil facil facili facilli -- Adjectives of third declension - three endings - ADJ 3 3 -- Ex: celer celeris celere => celer celer celeri celerri -- Ex: acer acris acre => acer acr acri acerri -- Verbs are mostly TRANS or INTRANS, but X works fine -- Depondent verbs must have DEP -- Verbs have four stems -- The first stem is the first principal part (dictionary entry) - less 'o' -- For 2nd decl, the 'e' is omitted, for 3rd decl i-stem, the 'i' is included -- Third principal part always ends in 'i', this is omitted in stem -- Fourth part in dictionary ends in -us (or -um), this is omitted -- DEP verbs omit (have zzz) the third stem -- Verbs of the first conjugation -- V 1 1 -- Ex: voco vocare vocavi vocatus => voc voc vocav vocat -- Ex: porto portave portavi portatus => port port portav portat -- Verbs of the second conjugation - V 2 1 -- The characteristic 'e' is in the inflection, not carried in the stem -- Ex: moneo monere monui monitum => mon mon monu monit -- Ex: habeo habere habui habitus => hab hab habu habit -- Ex: deleo delere delevi deletus => del del delev delet -- Ex: iubeo iubere iussi iussus => iub iub iuss iuss -- Ex: video videre vidi visus => vid vid vid vis -- Verbs of the third conjugation, variant 1 - V 3 1 -- Ex: rego regere rexi rectum => reg reg rex rect -- Ex: pono ponere posui positus => pon pon posu posit -- Ex: capio capere cepi captus => capi cap cep capt -- I-stem too w/KEY -- Verbs of the fourth conjugation are coded as a variant of third - V 3 4 -- Ex: audio audire audivi auditus => audi aud audiv audit -- Verbs like to be - coded as V 5 1 -- Ex: sum esse fui futurus => s . fu fut -- Ex: adsum adesse adfui adfuturus => ads ad adfu adfut
There are a few Latin words that cannot be represented with the scheme of stems and endings used by the program. For these very few cases, the program invokes a unique procedure. The file UNIQUES. contains a list of such words and is read in at the loading of the program. This is a simple ASCII text file which the user can augment. It is expected that there will be very few occasions to do so, indeed, the tendency has been that better processing has allowed uniques to be removed. If a user finds an important word that should be included, please communicate that to the author.
The UNIQUES record is essentially the form as one might have it in output if the word was processed normally. In addition there are some additional fields that the program presently expects. While these could be eliminated, it is convenient for the program not to make the UNIQUES a special case. So a noun form
N 3 1 ACC S F Tis followed by two zeros and an X
N 3 1 ACC S F T 0 0 X X X X B Oand then the "five X's" or, more properly, the dictionary codes.
N 3 1 ACC S F T 0 0 X X X X B O
These pro forma codes are absolutely necessary, but have no further impact.
The program is written in Ada and uses Ada techniques. Ada is designed for high reliability systems (there is no claim the WORDS was developed with all the other safeguards that that implies!) as a consequence is unforgiving. The exact form is required. If you want to be sloppy you have to deliberately program that in.
The following examples, and an examination of the UNIQUES.LAT file, should allow the user to insert any unique necessary.
requiem N 3 1 ACC S F T 0 0 X X X X B O rest (from labor), respite; intermission, pause, break; amusement, hobby; bobus N 3 1 DAT P C T 0 0 X X X X C X ox, bull; cow; cattle (pl.) quicquid PRON 1 6 NOM S N INDEF 0 0 X X X X B X whatever, whatsoever; everything which; each one; each; everything; anything mavis V 6 2 PRES ACTIVE IND 2 S X 0 0 X X X X B X prefer cette V 3 1 PRES ACTIVE IMP 2 P TRANS 0 0 X X X X B O give/bring here!/hand over, come (now/here); tell/show us, out with it! behold!
There is a WORDSALL.ZIP zip of all the Ada source files for WORDS, and support programs and data to generate the necessary dictionaries and inflections for re-hosting the WORDS Latin-to-English word parsing/translation system on any machine with an Ada 95 compiler. (It can be made to work with Ada 83 also by replacing on routine.)
This a console program (keyboard entry), without fancy Windows GUI, and is thereby system independent.
WORDSALL contains the Ada source files for WORDS
strings_package.ads strings_package.adb latin_file_names.ads latin_file_names.adb config.ads preface.ads word_parameters.ads developer_parameters.ads preface.adb put_stat.adb word_parameters.adb inflections_package.adb inflections_package.ads dictionary_package.ads dictionary_package.adb addons_package.ads addons_package.adb uniques_package.ads word_support_package.ads latin_debug.ads word_support_package.adb latin_debug.adb word_package.ads line_stuff.ads line_stuff.adb developer_parameters.adb tricks_package.ads word_package.adb tricks_package.adb list_package.ads list_list.adb dictionary_form.adb put_example_line.adb list_package.adb parse.adb words.adb
three supporting programs
makedict.adb makestem.adb makeinfl.adb
and DOS ASCII data files for them to act upon to produce WORDS data files
DICTLINE.GEN STEMDICT.GEN INFLECTS.LAT
the other WORDS DOS ASCII supporting files
ADDONS. UNIQUES.
and a couple of DOS ASCII text files for testing
TEST500. 1000.
The process is to download the WORDSALL.ZIP and unzip into a suitable subdirectory. (If the zip form is unsuitable for your system, I can provide the files in an uncompressed form.) The wordy file names are for compliance with the restrictions of the GNAT system. They may be renamed, and I can provide an alternative. However, the long file names demand an UNZIP that preserves them, if GNAT is to be used. For example, in a GNAT environment (one would maximally optimize the main program):
gnatmake -O3 words gnatmake makedict gnatmake makestem gnatmake makeinfl
This produces executables for WORDS, MAKEDICT, MAKESTEM, and MAKEINFL. Executing the latter three against the input respectively of
DICTLINE.GEN STEMLIST.GEN INFLECTS.LAT
(when they ask for DICTIONARY say G) produces
DICTFILE.GEN STEMFILE.GEN INDXFILE.GEN INFLECTS.SEC
Along with ADDONS.LAT and UNIQUES.LAT, this is the set of data for WORDS.
The only major problem that has appeared on porting so far is that one must be careful of file names. WORDS uses several files that it calls by name, among these are ADDONS. and UNIQUES. These names are set in LATIN_FILE_NAMES and can be changed, but they are at any one time fixed. There have been difficulties in the transfer to another system in losing the period. These have been problems in transfer, not in either source or target system, and have been easily rectified by inspection. All of my systems are case-independent on file names. If one is running in a case-dependent system (e.g., UNIX), this is a point to check. Note that the data files are capitalized, source files are not.
For comments mail to
whitaker@erols.com