Book Master version 1 documentation

This specification is superseded by bookmaster-2.

This specification and the corresponding bixutls.py program evolved over a period of years as the technologies were being developed. Eventually, the limitations of the way this XML format evolved became cumbersome, and the original formats from which the bookmaster XML format was derived (bookindex and bookix) had become extinct. At that point, it was no longer necessary to retain XML as a basis for the format. Hence this specification, developed as "bookmaster" has now become "bookmaster version 1", and bixutls.py was enhanced to convert the bookmaster-1 format to the bookmaster-2 format.

As soon as all extant bookmaster-1 files have been converted to bookmaster-2 format, and the bookmaster-2 format is fully documented, this specification will have only historical significance.

For the record, some of the cumbersome issues with XML were:
  • XML generators don't preserve the order of attributes for a tag. Due to the overuse of attributes on the <book> tag, order preservation became important, and generation of the format had to be done with custom code instead of just an XML dump.
  • Overuse of attributes also made it difficult to generate lists of key/value pairs. Over time, the importance of lists of key/value pairs increased, and various workarounds were applied. The straw that broke the camel's back was when key/multi-value tuples were needed. This became extremely cumbersome.
  • Standard XML parsers don't produce particularly useful error messages, although the one used here does pinpoint the exact line and character position of an error, which overcomes the deficiency of the messages somewhat. Errors in the source file always produced "interesting" stack traces.
  • Limitations on the character set used for attribute names required extra work and features to work around them.

The Book Master file format can be used to produce a collection of short items with simple formatting such as hymns and poems in various e-book formats usable by various generic and customized e-book readers. Note that the goal of the .pdf and .txt formats is not to produce printable books (those are available from other sources) but to provide easily searchable books, so particular hymns or poems can be located quickly, based on remembered phrases or words.

bixutls.py program

The bixutls.py program is a set of Python 3.1+ utility functions and file format conversion functions that can be used to manipulate and convert between bookmaster, bookix, and bookindex formats. It can also produce other formats such as plain txt, PDF, epub, and mobi (the latter three being standardized publishing formats). bixutils.py makes most of its functionality available from the command line as a standalone program as well as the utility functions which can be imported into other Python 3.2+ programs.

Originally created to manipulate bookindex and bookix files, it seemed appropriate to have a master file format (.master.txt) from which the others could be created as needed, and then it was observed that some other formats could usefully be generated as well. Programmers may contact me for more details about the bixutils.py program.

Song Printer for Windows

One application of the Book Master format and bixutls.py is to convert to and from Song Printer for Windows format. Song Printer for Windows is a music typesetting program targeted for creation of hymnbooks with 4-part harmony.

File types

The details of the file formats are given below. The bookmaster format is the primary format, and the bookindex and bookix are similar, so are defined in terms of concepts defined for the bookmaster format.

All the file formats defined here use UTF-8 character encoding.

bookmaster (.master.txt) format

The bookmaster format is a variation of XML. A DTD has not been created for it, but probably could be. An XML parser will parse it successfully, although without validation. A listing of the tags is below, and for each tag, a listing of attributes that are defined, and their expected values. Programs that process bookmaster files must be prepared to do default handling of new attributes (possibly as simple as ignoring them) which may be defined in the future.

Programs using the bookmaster format include bixutls.py.

Tags

book tag

Used to enclose a complete book. Attributes are:

(1) Note that all characters used in the items in the file must be found in one of sort, ideograph, whitechars, or otherchars.

info tag

No attributes. Must precede all <item> tags.

Content is descriptive text for the book, preferably in English, possibly also in the language of the book.

item tag

The content of each item in the book consists of the plain text of the item. Lines of text indented with one tab may be italicized in some output output formats, and may be considered the chorus or refrain of a hymn. Lines of text beginning with ¤ followed by one tab will be elided in most output formats (used for echo voices for hymns for Song Printer for Windows). Song Printer for Windows needs syllabication, so additional ¤ characters have been used to divide syllables according to the rules for Song Printer for Windows (see its documentation) -- these are all stripped when exporting to other formats.

New rules for syllabication and note skipping have been established in 2015. These can be briefly described.

There are 5 special characters, space, hyphen, ¤, ␣, and █. Each text corresponding to a musical note (syllable) contains other characters, and ends with one of the first 3 special characters. Space means end of word, hyphen means syllable break between syllables of a compound word where the hyphen is part of the spelling and must appear, and ¤ is used for other syllable breaks, and a hyphen may optionally be generated if the adjacent syllables must be spaced apart during formatting. A space is implied at the end of a line if no other syllable end character is found. The 4th special character, ␣, is used as a placeholder between syllables or at the end of line where there is a soprano note that does not have a corresponding text. The 5th special character, █, is used to denote a desired horizontal gap in the music and lyrics, which may alternately be used as a line break for narrow formats.
These can be translated back to Song Printer for Windows notation if required.

Programs that read bookmaster or bookix texts should be prepared to handle new attributes which may be added in the future.

Attributes fall into three categories, those used for Song Printer for Windows only, those used more generally, and those that can be calculated by bixutils.py.

Attributes specifically for Song Printer for Windows all start with "_SPW":

General attributes:

Calculated attributes (names start with two underscore characters):

bookix format

New books should all be created as .master.txt files, and bookix should only be an output format from bixutls.py. The bixutls.py program can convert from bookix format to bookindex format or bookmaster format, but the latter needs to be done with care, and with additional information supplied manually.

The bookix format is a variation of XML. A DTD has not been created for it, but proably could be. An XML parser will parse it successfully, although without validation. A listing of the tags is below, and for each tag, a listing of attributes that are defined, and their expected values. Programs that process .master.txt files must be prepared to do default handling of new attributes (possibly as simple as ignoring them) which may be defined in the future.

The bookix format is used as an input format by Book Index version 2.

The bookindex format can be created by bixutls.py from the bookmaster format.

Tags

book tag

Used to enclose a complete book. Attributes are the same as for the .master.txt format book tag.

info tag

Same as for .master.txt format.

item tag

Content is the same as for .master.txt format. The attributes are also the same, but none of the attributes starting with "_SPW" are defined for bookix format.

bookindex format

This format is officially unsupported and extinct, as is the Book Index program that used it. New books should all be created as .master.txt files, and bookindex should only be an output format from bixutls.py, with dwindling usage. The bixutls.py program can convert from bookindex format to bookix format or bookmaster format, but such needs to be done with care, and with additional information supplied manually.

The bookindex format is used as an input format by Book Index 2009-10-19.

The bookindex format can be created by bixutls.py from the bookmaster or bookix formats.

The bookindex format is a sequence of XML-like tags, but has no enclosing tag, so may not be properly formed XML. No attributes are used for any of the tags.

Tags

worddef tag

No attributes. Must be first if it exists.

The same as the value of the sort attribute for the book tag of the .master.txt format, except this used Perl regular expression syntax for the list of characters, requiring that metadata characters be escaped. In practice, with existing books created in bookindex format, this was only the "-" character, so it is the only one converted when reading bookindex format files.

info tag

No attributes. Must precede all <item> or <hymn> tags.

Content is descriptive text for the book.

item tag

No attributes.

Content is an optional item number (digits following by a period and whitespace), followed by item text. If a number can be parsed, it is used (with a warning if non-sequential) and begins a new sequence; otherwise a sequence number is used, starting from 1 (or from the last parsed number). Duplicate numbers cause items to be combined, this is not recommended or guaranteed for the future.

hymn tag

An older form of the <item> tag. Use either one, but don't mix them in the same file.