Cinelab file format ------------------- Introduction ============ The data model defined in the previous sections must be serialized to be exchangeable. To this effect, 2 serialization formats have been specified, in order to improve interoperability between applications that use the Cinelab data model. The first format is based on XML, while the second one uses the JSON syntax. In addition, a Zip-based serialization has been specified in order to make it more convenient to store huge, non-text resources. File extensions =============== To ease identification of Cinelab package, applications SHOULD use the following file extensions: ``.cxp`` for plain XML files (Cinelab XML Package), ``.czp`` for compressed files (Cinelab Zip Package), ``.cjp`` for JSON files (Cinelab JSON Package). Cinelab Zip-Packages ==================== XML does not offer a standard way to correctly handle large binary objects like images, application files, etc. Moreover, plain XML files can reach huge sizes. The same arguments apply to JSON syntax. We thus use a OpenDocument-like format to store the XML representation of a package with its associated binary files, and to compress this content. The file is a standard Zip file, whose structure is described below. Information about the files present in the package is stored in a XML *manifest* file. It is always stored as ``META-INF/manifest.xml``. Its main data is * a list of all package files * the MIME type for each file * if one of the files is encrypted, the necessary information to allow its decryption. General layout ~~~~~~~~~~~~~~ The package is serialized as a .zip file, using the same layout and principles as the OpenDocument format (see pp. 684-692 of http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf ). General layout: * ``mimetype`` : the MIME type (``application/x-advene-zip-package``) * ``content.xml`` : le plain XML package * ``userfiles/`` : a file hierarchy (accessible through the ``package/resources`` TALES path), containing any file (CSS, icons...) necessary to build views from the package. From the core model point of view, each file is a resource, whose id has the ``:user_file:`` prefix, and encodes the path by separating directory names with ``:``. NB: directories will also appear as resources, of a specific type ``inode/directory``. * ``data/`` : internal data associated to the package (rich/externalized annotation content) * ``preview.xml`` : aggregated statistical data, to ease previews/searches Contents (for annotations, relations, views, etc) can either be stored directly in the XML file, or externalized in the ``data/`` directory. (cf OpenDocument p. 686) In a given file contained in a package, relative URIs are used to reference other files of the same package, but also to reference other files of the filesystem. The following restrictions are imposed for internal references: * only files of the same package can be referenced internally * URIs referencing another file of the same package MUST be relative and MUST NOT contain paths that are not part of the package. This notably means that files in a package MUST NOT be referenced through an absolute URI. * a file in a package cannot be referenced from the outside of the package (either from the filesystem or another package) A relative path present in a file contained in a package must be parsed exactly like it would if the package is uncompressed in a directory with the same basename as the package. The base URI of relative path is the URI of the directory containing the file containing the relative path. For instance, the ``userfiles/foo.txt`` references a user file (package resource). ``../file.txt`` allows to access a file in the same directory as the package. Any other URI reference, specifically those that specify a protocol (http:), an authority (i.e. //) or an absolute path (i.e. /) do not need any specific processing. This means that absolute paths do not reference files inside of the package, but inside of the hierarchy (filesystem most of the time) containing the package. Thumbnails ~~~~~~~~~~ A graphical, iconic representation of the document MAY be generated when the file is saved. It should be a representation of the default view for the packagem, and should be generated without effect, frame or borders. The icon is saved as ``Thumbnails/thumbnail.png``. The file and containing directory are not mentioned in the ``manifest.xml`` file, since they are not really part of the document. In accordance with the *Thumbnail Managing Standard* (TMS) (cf www.freedesktop.org), icons MUST be saved as 24-bit PNG files, non-interlaces, with complete alpha transparency. The required size is 128x128 pixels. Manifest file ~~~~~~~~~~~~~ Cf OpenDocument spec, p. 687 XML serialization ================= Encoding ~~~~~~~~ The encoding of XML serialisation MUST be UTF-8. Metadata ~~~~~~~~ In accordance with the model, package metadata MUST contain the following keys: ``dc:creator``, ``dc:created``, ``dc:contributor``, ``dc:contributed``. In package elements, these metadata may be omitted from the serialisation, and are then inherited (since they must be available in the model) using the following rules: * ``dc:creator``, ``dc:created``: the element inherits the value from its package * ``dc:contributor``: if the ``dc:creator`` is explicitly specified for the element, its value is used; else, the ``dc:contributor`` package value is used. * ``dc:modified``: if the ``dc:created`` is explicitly specified for the element, its value is used; else, the ``dc:modified`` package value is used. In the `example XML file`_, multiple commented cases are proposed. .. _`example XML file`: http://advene.org/cinelab/example.cxp Namespaces ~~~~~~~~~~ The package ``pm:namespaces`` metadata is specifically processed: it is encoded in the XML root element as ``xmlns`` attributes. Type declaration ~~~~~~~~~~~~~~~~ To make the generated XML easier to read, some metadata specified in the applicative model are encoded as attributes instead of plain metadata (``type`` for annotations and relations), or as elements (*annotation-type*, *relation-type*, *schema*). See the RelaxNG below for more information. RelaxNG schema ~~~~~~~~~~~~~~ The compact RelaxNG notation is used to specify the proposed format: cinelab.rnc_ .. _cinelab.rnc: http://advene.org/cinelab/cinelab.rnc .. literalinclude:: ../cinelab.rnc Example XML file ~~~~~~~~~~~~~~~~ An example of conforming XML is given below, and can be `downloaded here`_. .. _`downloaded here`: http://advene.org/cinelab/example.cxp .. literalinclude:: ../example.cxp :language: xml JSON serialization ================== The JSON serialization has been defined to facilitate the exchange of package information in web-based contexts. Encoding ~~~~~~~~ The encoding of JSON serialisation MUST be UTF-8. Type declaration ~~~~~~~~~~~~~~~~ To make the generated JSON easier to read, some metadata specified in the applicative model are encoded as attributes instead of plain metadata (``type`` for annotations and relations), or as elements (*annotation-type*, *relation-type*, *schema*). General layout ~~~~~~~~~~~~~~ The package is represented by a JSON object with the following properties: * ``format``: always ``"http://advene.org/ns/cinelab/"`` * each of the following property will reference an array of JSON objects: ``imports``, ``medias``, ``annotations``, ``relations``, ``tags``, ``annotation_types``, ``relation_types``, ``lists``, ``schemas``, ``queries``, ``views``, ``resources`` * for every element (the top-level package, and all defined model elements), an associated ``meta`` object holds its metadata, model-defined and user-defined. Metadata ~~~~~~~~ In accordance with the model, package metadata MUST contain the following keys: ``creator``, ``created``, ``contributor``, ``contributed``. In package elements, these metadata may be omitted from the serialisation, and are then inherited (since they must be available in the model) using the following rules: * ``creator``, ``created``: the element inherits the value from its package * ``contributor``: if the ``creator`` is explicitly specified for the element, its value is used; else, the ``contributor`` package value is used. * ``modified``: if the ``created`` is explicitly specified for the element, its value is used; else, the ``modified`` package value is used. The following `example JSON file`_ provides an example package. .. _`example JSON file`: http://advene.org/cinelab/example.cjp .. literalinclude:: ../example.cjp JSON-Schema ~~~~~~~~~~~ Two `JSON-Schema`_ schemas are proposed: a `general schema`_ and a `more strict schema`_ that does not allow additional undefined properties to be added to elements. .. _`JSON-Schema`: http://json-schema.org/ .. _`general schema`: http://advene.org/cinelab/cinelab.jsons .. _`more strict schema`: http://advene.org/cinelab/cinelab-strict.jsons We include below the more permissive schema: .. literalinclude:: ../cinelab.jsons