The Quoteriser, Version 2.10

Database Compiler

Contents:

  1. Introduction
  2. Using quoterc
  3. Command files
  4. Format strings
  5. Examples
    1. An author database
    2. A quote database
  6. Error messages

Introduction

Note: The Quoteriser's database compiler is somewhat experimental. I had originally planned it for version 3.00, but I fast-tracked development because I wanted to use it for something. Future versions may not be back-compatible with this version, so please don't get too carried away with the current version.

The Quoteriser's database compiler is a utility for compiling quote and author databases from text files, or de-compiling databases into a text file format. It was written with the intention of converting plain text quote dictionaries and similar things into Quoteriser databases. Some users may also find it easier to enter large amounts of data using the compiler rather than by using the GUI. If the Quoteriser has been compiled for a non-OS/2 system, of course, there will be no GUI at all (unless the user also has an OS/2 version somewhere).

The database compiler is probably the simplest program possible for the job. The author could have spend a lot of time writing a fantastically sophisticated compiler that could (say) compile the Bible into all its proper verses from some free-format text file. However, if the user could program such a sophisticated Quoteriser, the user can probably also use one of the many good text-manipulation languages (such as Perl) to convert his or her free-format Bible (or whatever) into the stricter format required by the Quoteriser.


Using quoterc

The command line syntax for quoterc is:

quoterc <command1> [<command2>] [<command3>] ...

where <commandN>'s are the names of command files, the format of which is described in the command files section below. Command files are usually given the extension .qc, which can be omitted from the command line, if desired.

The quoterc program processes each command file in order, aborting with a short error message if it runs into an error. Command files are interpreted, not compiled, and so anything occuring before an error will be executed (like BASIC).

If quoterc is run without any arguments, it will print out a brief help screen.


Command files

Command files are plain text files made up of sequences of sections consisting of a header and some commands delimited by braces:
<mode> <type> <database> {
	<command1>;
	<command2>;
	   :
	<commandN>;
}

(It is not necessary to indent the command lines; this author simply does so from long habit of programming). There can be as many such sections as the user wishes. Keywords are case-insensitive; that is, COMPILE is the same as compile, and so on (but the database name will be case-sensitive if the local file system is, of course).

<mode>
Either the word "compile" or the word "decompile", specifying whether the database is being compiled from a text file, or de-compiled into one.
<type>
Either the word "authors" or the word "quotes", specifying what type of database is being used.
<database>
The name of the database being processed, without extension. If the database is to be de-compiled, it must exist or an error will be reported.

Each command is a line consisting of a keyword, one or more arguments and then a semi-colon. Possible commands are:

append <file> <format>;
If compiling, add all the quotes/authors in <file> to the database. If de-compiling, append all the quotes/authors in the database to <file>, creating the file if it doesn't exist.
author <code>;
(For compiling quotes only). Assign this author code to all the quotes.
authors <database>;
(For de-compiling quotes only). Obtain author information from the author database <database>. The database name should have no extension.
create <file> <format>;
If compiling, create a new database (deleting any old one of the same name) containing the quotes/authors in <file>. If de-compiling, create a file <file> containing all quotes/authors in the database.
source <title>;
(For compiling quotes only). Assign this source title to all the quotes.
stem <code>;
(For compiling only). Automatically-generated quote/author codes will begin with <code>.

The <format> string is detailed in the format string section below.

If one of the parameters contains a space, tab or semi-colon, this must be escaped by use of a percent sign to distinguish it from the word delimiters. That is, use '% ' for a space and '%;' for a semi-colon. Use '%%' for a percent sign.

When compiling, if the format string does not contain a %a (authors) or %q (quotes) item, the author/quote code will be generated automatically using a counter, and the current stem as specified by the "stem" command. The stem defaults to an empty string. The first code will be 000000000000000000001, the second 0000000000000000002, and so on. When in append mode, codes that are already in the database will not be generated for the new data.


Format Strings

The <format> string specifies the format of the text file. When de-compiling, it specifies what the output will look like, and when compiling it specifies what the Quoteriser should expect to find in its input files. Designing a good format string can be quite an art form, particularly for compilation.

A format string is a sequence of characters and special symbols a little like the format argument to the printf() function in C. Ordinary characters are written verbatim to the output when de-compiling, or are expected to exist verbatim in the input. Special symbols are used to indicate where the database data should be.

Each data item is represented by a % character followed by another character, as follows:

%a
Author code
%b
The author's birth date
%d
The author's description
%f
The author's given name
%g
The author's surname
%q
Quote code
%s
The quote's source
%t
The quote text
%x
The author's death date
%1, %2, %3, %4 and %5
Quote keywords

In addition to the data items, there also some special symbols for producing and recognising characters that can't be inserted verbatim into the command file:

%n
A new line.
%%
A percent sign.
%_
When de-compiling, this prints a space character. When compiling, this symbol will match any white space character, that is, a tab, a space or a new line. Note that the CR/LF pair at the end of the line on PC systems is treated as a single white space character.
%;
A semi-colon.
%>
A tab character.

Note that the special symbols are case-sensitive, that is, "%A" is different from "%a". If a % character is followed by a character that does not form a valid special symbol, the Quoteriser will ignore the % character.

A format string must start with one of the data item symbols.

When de-compiling, the Quoteriser simply prints each character (replacing special symbols with database data) in sequence.

When compiling, the Quoteriser uses the non-data items to delimit items that it reads into the database. There is, therefore, quite an art to selecting delimiting patterns that do not occur anywhere in the input data. For example, delimiting with the obvious punctuation symbols such as commas and semi-colons can be hazardous because quote texts and author descriptions generally contain such punctuation, and this will hopelessly confuse the compiler. Quote texts and author descriptions will need to be delimited by combinations of symbols that do not occur in ordinary text, such as multiple blanks lines and conglomerations of punctuation marks.


Examples

Example 1: an author database

Suppose we wished to build a biographical database of all the English monarchs. We will be a little cavalier with the meanings of the Quoteriser's fields and use the birth and death fields to represent the monarch's reign. Each monarch will have a given name (including his or her numeral), the limits of his or her reign and a one-line description. We will give a separate sequence of automatically-generated author codes to each house. Our database will be called england. We could do this with the following command file:

compile authors england {
	stem WESTSAXON
	create westsaxon.txt %f%_(%b-%x):%_%d%n;
	stem DANISH
	append danish.txt %f%_(%b-%x):%_%d%n;
	stem NORMAN
	append norman.txt %f%_(%b-%x):%_%d%n;
	   :       :             :
	stem WINDSOR
	append windsor.txt %f%_(%b-%x):%_%d%n;
}

One way or another, we will have prepared a series of plain text files called westsaxon.txt, danish.txt, and so on, containing the data, all in the same format (though we could have specified different formats for each if we had wanted). For example, the Danish kings are in file danish.txt:

Canute (1016-1035): King of England, Denmark and Norway
Hardicanute (1035-1042): Absent in Denmark 1035-1037; restored 1040-1042
Harold I (1037-1040): Regent 1035-1037; king 1037-1040

Canute will be assigned code DANISH0000000000001, Hardicanute will be assigned code DANISH0000000000002 and Harold will be assigned code DANISH0000000000003. William the Conqueror will be NORMAN0000000000001 and Elizabeth II will be WINDSOR000000000005. As Elizabeth II is still reigning at the time of writing, the closing parenthesis will immediately follow the dash in the date section, and the compiler will read an empty entry for this field.

Having compiled the database, we could dump all of our monarchs into a text file called monarchs.txt using the following command file:

decompile authors england {
	create monarchs.txt %a%_--%_%f%_(%b-%x)%>%d%n;
}

Which will produce lines like:

PLANTAGENET00000002 -- Richard I (1189-1199)	Absent for most of his reign
WINDSOR000000000003 -- Edward VIII (1936-1936)	Abdicated due to proposed marriage

Note that the monarchs output in this way will not be in any recognisable order. This could be fixed, for example, by using the 'sort' program.

Example 2: a quote database

We are going to build a database of some famous quotes. Since William Shakespeare has provided so many, we will devote a whole file shakespeare.txt to him, and all the other authors will have to share a file quotes.txt.

compile quotes mydb {
	author SHAKESPEARE;
	create shakespeare.txt %t%n%n-%_%s%n%n;
	append quotes.txt %t%_(%a)%n;
}

Note that the %a item in the quotes.txt line over-rides the specification of SHAKESPEARE as the author. Without this %a, all these quotes would also be attributed to SHAKESPEARE.

Here are some cheery quotes from our shakespeare.txt file:

To be or not to be: that is the question:<br>
Whether 'tis nobler in the mind to suffer<br>
The slings and arrows of outrageous fortune,<br>
Or to take arms against a sea of troubles,<br>
And by opposing end them?

- Hamlet

Life's but a walking shadow, a poor player<br>
That struts and frets his hour upon the stage<br>
And then is heard no more: it is a tale<br>
Told by an idiot, full of sound and fury,<br>
Signifying nothing.

- Macbeth

Since there are line-breaks within the quotes themselves, we need to delimit them with two line breaks (i.e. a blank line). Note that we need an extra blank line at the end of the file to match the last two line breaks when reading the Macbeth quote.

The format of our other quote file is somewhat different:

No man is an island. (DONNE)
Ask not what your country can do for you, but what you can do for your country. (JFK)
I came, I saw, I conquered. (CAESAR)

It is assumed that all of our quotes here will be one-liners, and so we need only a single %n character to delimit them. Note that the item read by %a is the author's code in whatever author database that is to be used with the compiled quote database, not the author's name.


Error Messages

The database compiler has a weird and wonderful array of terse and cryptic error messages. Here, we list them all and give long desriptions of what they mean and what may have caused them.

out of memory
The program could not allocate enough memory to do whatever it was trying to do.On modern operating systems with virtual memory, this shouldn't happen unless you do something really silly (like try to read in a five-hundred-megabyte quote).
file error
There was some error while trying to read or write the file. I don't know what might cause this.
unrecognised command keyword
The compiler found a keyword it does not recognise - probably a typo.
database type missing
The "authors" or "quotes" word is missing from the header.
unrecognised database type
Something other than "authors" or "quotes" appears at this position in the header.
missing file name
The database name is missing from the header
could not open file
The compiler could not open a file, probably because it doesn't exist or the compiler doesn't have permission to open it.
missing format string
The format string is missing from a "create" or "append" line.
invalid data item
There is a data item symbol in the format string that does not make sense for the task at hand - e.g. quote text (%t) when compiling authors.
permission denied
The compiler was denied permission to do something; probably open a file or database.
missing argument
An argument is missing from a command line.
format doesn't start with an data item
The format string doesn't begin with a data item (%a, %q, etc.)
undelimited data items
Two data items are adjacent to each other in the format string with nothing to separate them.
unknown error
Something unanticipated by the author has gone wrong. Please report all occurrences of unknown errors as bugs.