Tuesday, March 3, 2015

WatSON Composite Ingredients

Simple ingredients represent a single value, like a number or a string. My last post covered the basic details for simple ingredients. Composite ingredients are designed to contain other ingredients or multiple values. Some are used to change how things are written to the file, like the Compressed Ingredient. Others are designed to provide structure to the file like the Container and Map Ingredients.

For simplicity, I am listing only the 8-bit sizes for the Ingredients, but the structure is valid for any size type.

Byte Reduction Ingredients

The library and compressed ingredients are designed to reduce the number of bytes required by data stored in WatSON format.

Library ingredients contain strings that are used elsewhere in the document. This primarily applies to keys for the map ingredient, but also applies to bytes ingredients. Libraries use a zero based index (the first element is index zero), but the string at index zero is always an empty string. The empty string is because the index zero is reserved.. See the map and byte ingredients for more details.

Library scope is going to be mentioned in the description for byte and map ingredients. Right now I am defining that as the nearest library in a parent container, although scope should get its own dedicated post in the future.

<library-ingredient> ::= ‘L’ <8-bit-size> <empty-string-ingredient> <string-ingredient>*

Compressed ingredients contain a single ingredient that has been compressed. I toyed around with the idea of making them a container as well, but I felt the single ingredient child would make implementations more straightforward at the cost of a few extra bytes.

I am thinking of using Snappy for the compression method. I haven't decided how flexible that will be in the future.

<compressed-ingredient> ::= ‘Z’ <8-bit-size> <data>*

Structure Ingredients

Container and map ingredients are designed for nesting and providing structure a WatSON document.

A container contains other ingredients. It is used for nesting and grouping ingredients. Think of it as a vector or list. Nothing inside a WatSON document references positions in a container. Order will probably be important. especially with regard to library and header ingredients. 

<container-ingredient> ::= ‘C’ <8-bit-size> <ingredient>*

A map ingredient is a key value structure. The keys are 32 bit unsigned integers. Positive keys reference the in scope library. A key of zero is reserved for an empty string key. I think I will be using that key as optional metadata, but I haven’t thought through what and how that metadata will be used. That will probably get a dedicated post in the future.

<map-ingredient> ::= ‘M’ <8-bit-size> <map-data>*
<map-data> ::= <uint32> <ingredient>

Extension Ingredients

Binary and header ingredients change how a parser should interpret WatSON data that follows.

Binary ingredients store opaque binary data. A positive marshal hint is a reference into the in scope library. A marshal hint of zero is reserved for an undefined marshal hint. I am thinking of the marshal hint as a place to store the run-time type. WatSON doesn’t specify anything about the data.

<bytes-ingredient> ::= ‘B’ <8-bit-size> <marshall-hint> <data>*

Headers are string based maps that contain information about the file contents. They are optional ingredients, and used to document requirements for parsing the rest of the file. An example would be the character encoding for strings, although I am leaning towards utf-8 being mandatory. They can also be used to document schema information or metadata like the program that generated the file, etc.

<header-ingredient> ::= ‘H’ <size> <header-data>*
<header-data> ::= <c-string> <ingredient>

I like how the format is coming together. I have my incomplete reference implementation foundation, as rough as it is, checked in on github.