Contents:
Note: The Quoteriser's database compiler is somewhat experimental. I had originally planned it for version 3.00, but I fast-tracked development because I wanted to use it for something. Future versions may not be back-compatible with this version, so please don't get too carried away with the current version.
The Quoteriser's database compiler is a utility for compiling quote and author databases from text files, or de-compiling databases into a text file format. It was written with the intention of converting plain text quote dictionaries and similar things into Quoteriser databases. Some users may also find it easier to enter large amounts of data using the compiler rather than by using the GUI. If the Quoteriser has been compiled for a non-OS/2 system, of course, there will be no GUI at all (unless the user also has an OS/2 version somewhere).
The database compiler is probably the simplest program possible for the job. The author could have spend a lot of time writing a fantastically sophisticated compiler that could (say) compile the Bible into all its proper verses from some free-format text file. However, if the user could program such a sophisticated Quoteriser, the user can probably also use one of the many good text-manipulation languages (such as Perl) to convert his or her free-format Bible (or whatever) into the stricter format required by the Quoteriser.
The command line syntax for quoterc is:
quoterc <command1> [<command2>] [<command3>] ...
where <commandN>
's are the names of command files, the format
of which is described in the command files section below.
Command files are usually given the extension .qc
, which can be
omitted from the command line, if desired.
The quoterc program processes each command file in order, aborting with a short error message if it runs into an error. Command files are interpreted, not compiled, and so anything occuring before an error will be executed (like BASIC).
If quoterc is run without any arguments, it will print out a brief help screen.
<mode> <type> <database> { <command1>; <command2>; : <commandN>; }
(It is not necessary to indent the command lines; this author simply does so
from long habit of programming). There can be as many such sections as the user
wishes. Keywords are case-insensitive; that is, COMPILE
is the
same as compile
, and so on (but the database name will be
case-sensitive if the local file system is, of course).
<mode>
<type>
<database>
Each command is a line consisting of a keyword, one or more arguments and then a semi-colon. Possible commands are:
append <file> <format>;
<file>
to the
database. If de-compiling, append all the quotes/authors in the database to
<file>
, creating the file if it doesn't exist.
author <code>;
authors <database>;
<database>
. The database name should have no
extension.
create <file> <format>;
<file>
. If de-compiling,
create a file <file>
containing all quotes/authors in
the database.
source <title>;
stem <code>;
<code>
.
The <format>
string is detailed in the
format string section below.
If one of the parameters contains a space, tab or semi-colon, this must be escaped by use of a percent sign to distinguish it from the word delimiters. That is, use '% ' for a space and '%;' for a semi-colon. Use '%%' for a percent sign.
When compiling, if the format string does not contain a %a (authors) or %q (quotes) item, the author/quote code will be generated automatically using a counter, and the current stem as specified by the "stem" command. The stem defaults to an empty string. The first code will be 000000000000000000001, the second 0000000000000000002, and so on. When in append mode, codes that are already in the database will not be generated for the new data.
The <format>
string specifies the format of the text file.
When de-compiling, it specifies what the output will look like, and when
compiling it specifies what the Quoteriser should expect to find in its input
files. Designing a good format string can be quite an art form, particularly
for compilation.
A format string is a sequence of characters and special symbols a little like the format argument to the printf() function in C. Ordinary characters are written verbatim to the output when de-compiling, or are expected to exist verbatim in the input. Special symbols are used to indicate where the database data should be.
Each data item is represented by a % character followed by another character, as follows:
In addition to the data items, there also some special symbols for producing and recognising characters that can't be inserted verbatim into the command file:
Note that the special symbols are case-sensitive, that is, "%A" is different from "%a". If a % character is followed by a character that does not form a valid special symbol, the Quoteriser will ignore the % character.
A format string must start with one of the data item symbols.
When de-compiling, the Quoteriser simply prints each character (replacing special symbols with database data) in sequence.
When compiling, the Quoteriser uses the non-data items to delimit items that it reads into the database. There is, therefore, quite an art to selecting delimiting patterns that do not occur anywhere in the input data. For example, delimiting with the obvious punctuation symbols such as commas and semi-colons can be hazardous because quote texts and author descriptions generally contain such punctuation, and this will hopelessly confuse the compiler. Quote texts and author descriptions will need to be delimited by combinations of symbols that do not occur in ordinary text, such as multiple blanks lines and conglomerations of punctuation marks.
Examples
Example 1: an author database
Suppose we wished to build a biographical database of all the English
monarchs. We will be a little cavalier with the meanings of the Quoteriser's
fields and use the birth and death fields to represent the monarch's reign.
Each monarch will have a given name (including his or her numeral), the
limits of his or her reign and a one-line description. We will give a separate
sequence of automatically-generated author codes to each house. Our database
will be called england
. We could do this with the following command
file:
compile authors england { stem WESTSAXON create westsaxon.txt %f%_(%b-%x):%_%d%n; stem DANISH append danish.txt %f%_(%b-%x):%_%d%n; stem NORMAN append norman.txt %f%_(%b-%x):%_%d%n; : : : stem WINDSOR append windsor.txt %f%_(%b-%x):%_%d%n; }
One way or another, we will have prepared a series of plain text files called westsaxon.txt, danish.txt, and so on, containing the data, all in the same format (though we could have specified different formats for each if we had wanted). For example, the Danish kings are in file danish.txt:
Canute (1016-1035): King of England, Denmark and Norway Hardicanute (1035-1042): Absent in Denmark 1035-1037; restored 1040-1042 Harold I (1037-1040): Regent 1035-1037; king 1037-1040
Canute will be assigned code DANISH0000000000001, Hardicanute will be assigned code DANISH0000000000002 and Harold will be assigned code DANISH0000000000003. William the Conqueror will be NORMAN0000000000001 and Elizabeth II will be WINDSOR000000000005. As Elizabeth II is still reigning at the time of writing, the closing parenthesis will immediately follow the dash in the date section, and the compiler will read an empty entry for this field.
Having compiled the database, we could dump all of our monarchs into a text file called monarchs.txt using the following command file:
decompile authors england { create monarchs.txt %a%_--%_%f%_(%b-%x)%>%d%n; }
Which will produce lines like:
PLANTAGENET00000002 -- Richard I (1189-1199) Absent for most of his reign WINDSOR000000000003 -- Edward VIII (1936-1936) Abdicated due to proposed marriage
Note that the monarchs output in this way will not be in any recognisable order.
This could be fixed, for example, by using the 'sort' program.
Example 2: a quote database
We are going to build a database of some famous quotes. Since William Shakespeare has provided so many, we will devote a whole file shakespeare.txt to him, and all the other authors will have to share a file quotes.txt.
compile quotes mydb { author SHAKESPEARE; create shakespeare.txt %t%n%n-%_%s%n%n; append quotes.txt %t%_(%a)%n; }
Note that the %a item in the quotes.txt line over-rides the specification of SHAKESPEARE as the author. Without this %a, all these quotes would also be attributed to SHAKESPEARE.
Here are some cheery quotes from our shakespeare.txt file:
To be or not to be: that is the question:<br> Whether 'tis nobler in the mind to suffer<br> The slings and arrows of outrageous fortune,<br> Or to take arms against a sea of troubles,<br> And by opposing end them? - Hamlet Life's but a walking shadow, a poor player<br> That struts and frets his hour upon the stage<br> And then is heard no more: it is a tale<br> Told by an idiot, full of sound and fury,<br> Signifying nothing. - MacbethSince there are line-breaks within the quotes themselves, we need to delimit them with two line breaks (i.e. a blank line). Note that we need an extra blank line at the end of the file to match the last two line breaks when reading the Macbeth quote.
The format of our other quote file is somewhat different:
No man is an island. (DONNE) Ask not what your country can do for you, but what you can do for your country. (JFK) I came, I saw, I conquered. (CAESAR)
It is assumed that all of our quotes here will be one-liners, and so we need only a single %n character to delimit them. Note that the item read by %a is the author's code in whatever author database that is to be used with the compiled quote database, not the author's name.
The database compiler has a weird and wonderful array of terse and cryptic error messages. Here, we list them all and give long desriptions of what they mean and what may have caused them.