aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/manuals_generated/sisu_manual/sisu_introduction/sax.xml
diff options
context:
space:
mode:
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/sisu_introduction/sax.xml')
-rw-r--r--data/doc/manuals_generated/sisu_manual/sisu_introduction/sax.xml599
1 files changed, 0 insertions, 599 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/sisu_introduction/sax.xml b/data/doc/manuals_generated/sisu_manual/sisu_introduction/sax.xml
deleted file mode 100644
index 4264c388..00000000
--- a/data/doc/manuals_generated/sisu_manual/sisu_introduction/sax.xml
+++ /dev/null
@@ -1,599 +0,0 @@
-<?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<?xml-stylesheet type="text/css" href="../_sisu/css/sax.css"?>
-<!-- Document processing information:
- * Generated by: SiSU 0.59.1 of 2007w39/2 (2007-09-25)
- * Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
- *
- * Last Generated on: Tue Sep 25 02:52:54 +0100 2007
- * SiSU http://www.jus.uio.no/sisu
--->
-
-<document>
-<head>
- <meta>Title:</meta>
- <title class="dc">
- SiSU - Commands
- </title>
- <br />
- <meta>Creator:</meta>
- <creator class="dc">
- Ralph Amissah
- </creator>
- <br />
- <meta>Rights:</meta>
- <rights class="dc">
- Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3
- </rights>
- <br />
- <meta>Type:</meta>
- <type class="dc">
- information
- </type>
- <br />
- <meta>Subject:</meta>
- <subject class="dc">
- ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search
- </subject>
- <br />
- <meta>Date created:</meta>
- <date_created class="extra">
- 2002-08-28
- </date_created>
- <br />
- <meta>Date issued:</meta>
- <date_issued class="extra">
- 2002-08-28
- </date_issued>
- <br />
- <meta>Date available:</meta>
- <date_available class="extra">
- 2002-08-28
- </date_available>
- <br />
- <meta>Date modified:</meta>
- <date_modified class="extra">
- 2007-09-16
- </date_modified>
- <br />
- <meta>Date:</meta>
- <date class="dc">
- 2007-09-16
- </date>
- <br />
-</head>
-<body>
-<object id="1">
- <ocn>1</ocn>
- <text class="h1">
- SiSU - Commands,<br /> Ralph Amissah
- </text>
-</object>
-<object id="2">
- <ocn>2</ocn>
- <text class="h2">
- What is SiSU?
- </text>
-</object>
-<object id="3">
- <ocn>3</ocn>
- <text class="h3">
- Description
- </text>
-</object>
-<object id="4">
- <ocn>4</ocn>
- <text class="h4">
- 1. Introduction - What is SiSU?
- </text>
-</object>
-<object id="5">
- <ocn>5</ocn>
- <text class="norm">
- <b>SiSU</b> is a system for document markup, publishing (in multiple
-open standard formats) and search
- </text>
-</object>
-<object id="6">
- <ocn>6</ocn>
- <text class="norm">
- <b>SiSU</b><en>1</en> is a<en>2</en> framework for document
-structuring, publishing and search, comprising of (a) a lightweight
-document structure and presentation markup syntax and (b) an
-accompanying engine for generating standard document format outputs
-from documents prepared in sisu markup syntax, which is able to produce
-multiple standard outputs that (can) share a common numbering system
-for the citation of text within a document.
- </text>
- <endnote notenumber="1">
- <number>1</number>
- <note>
- "<b>SiSU</b> information Structuring Universe" or "Structured
-information, Serialized Units".<br /> also chosen for the meaning of
-the Finnish term "sisu".
- </note>
- </endnote>
- <endnote notenumber="2">
- <number>2</number>
- <note>
- Unix command line oriented
- </note>
- </endnote>
-</object>
-<object id="7">
- <ocn>7</ocn>
- <text class="norm">
- <b>SiSU</b> is developed under an open source, software libre license
-(GPL3). It has been developed in the context of coping with large
-document sets with evolving markup related technologies, for which you
-want multiple output formats, a common mechanism for
-cross-output-format citation, and search.
- </text>
-</object>
-<object id="8">
- <ocn>8</ocn>
- <text class="norm">
- <b>SiSU</b> both defines a markup syntax and provides an engine that
-produces open standards format outputs from documents prepared with
-<b>SiSU</b> markup. From a single lightly prepared document sisu custom
-builds several standard output formats which share a common (text
-object) numbering system for citation of content within a document
-(that also has implications for search). The sisu engine works with an
-abstraction of the document's structure and content from which it is
-possible to generate different forms of representation of the document.
-Significantly <b>SiSU</b> markup is more sparse than html and outputs
-which include html, LaTeX, landscape and portrait pdfs, Open Document
-Format (ODF), all of which can be added to and updated. <b>SiSU</b> is
-also able to populate SQL type databases at an object level, which
-means that searches can be made with that degree of granularity.
-Results of objects (primarily paragraphs and headings) can be viewed
-directly in the database, or just the object numbers shown - your
-search criteria is met in these documents and at these locations within
-each document.
- </text>
-</object>
-<object id="9">
- <ocn>9</ocn>
- <text class="norm">
- Source document preparation and output generation is a two step
-process: (i) document source is prepared, that is, marked up in sisu
-markup syntax and (ii) the desired output subsequently generated by
-running the sisu engine against document source. Output representations
-if updated (in the sisu engine) can be generated by re-running the
-engine against the prepared source. Using <b>SiSU</b> markup applied to
-a document, <b>SiSU</b> custom builds various standard open output
-formats including plain text, HTML, XHTML, XML, OpenDocument, LaTeX or
-PDF files, and populate an SQL database with objects<en>3</en>
-(equating generally to paragraph-sized chunks) so searches may be
-performed and matches returned with that degree of granularity ( e.g.
-your search criteria is met by these documents and at these locations
-within each document). Document output formats share a common object
-numbering system for locating content. This is particularly suitable
-for "published" works (finalized texts as opposed to works that are
-frequently changed or updated) for which it provides a fixed means of
-reference of content.
- </text>
- <endnote notenumber="3">
- <number>3</number>
- <note>
- objects include: headings, paragraphs, verse, tables, images, but not
-footnotes/endnotes which are numbered separately and tied to the object
-from which they are referenced.
- </note>
- </endnote>
-</object>
-<object id="10">
- <ocn>10</ocn>
- <text class="norm">
- In preparing a <b>SiSU</b> document you optionally provide semantic
-information related to the document in a document header, and in
-marking up the substantive text provide information on the structure of
-the document, primarily indicating heading levels and footnotes. You
-also provide information on basic text attributes where used. The rest
-is automatic, sisu from this information custom builds<en>4</en> the
-different forms of output requested.
- </text>
- <endnote notenumber="4">
- <number>4</number>
- <note>
- i.e. the html, pdf, odf outputs are each built individually and
-optimised for that form of presentation, rather than for example the
-html being a saved version of the odf, or the pdf being a saved version
-of the html.
- </note>
- </endnote>
-</object>
-<object id="11">
- <ocn>11</ocn>
- <text class="norm">
- <b>SiSU</b> works with an abstraction of the document based on its
-structure which is comprised of its frame<en>5</en> and the
-objects<en>6</en> it contains, which enables <b>SiSU</b> to represent
-the document in many different ways, and to take advantage of the
-strengths of different ways of presenting documents. The objects are
-numbered, and these numbers can be used to provide a common base for
-citing material within a document across the different output format
-types. This is significant as page numbers are not suited to the
-digital age, in web publishing, changing a browser's default font or
-using a different browser means that text appears on different pages;
-and in publishing in different formats, html, landscape and portrait
-pdf etc. again page numbers are of no use to cite text in a manner that
-is relevant against the different output types. Dealing with documents
-at an object level together with object numbering also has implications
-for search.
- </text>
- <endnote notenumber="5">
- <number>5</number>
- <note>
- the different heading levels
- </note>
- </endnote>
- <endnote notenumber="6">
- <number>6</number>
- <note>
- units of text, primarily paragraphs and headings, also any tables,
-poems, code-blocks
- </note>
- </endnote>
-</object>
-<object id="12">
- <ocn>12</ocn>
- <text class="norm">
- One of the challenges of maintaining documents is to keep them in a
-format that would allow users to use them without depending on a
-proprietary software popular at the time. Consider the ease of dealing
-with legacy proprietary formats today and what guarantee you have that
-old proprietary formats will remain (or can be read without proprietary
-software/equipment) in 15 years time, or the way the way in which html
-has evolved over its relatively short span of existence. <b>SiSU</b>
-provides the flexibility of outputing documents in multiple
-non-proprietary open formats including html, pdf<en>7</en> and the ISO
-standard ODF.<en>8</en> Whilst <b>SiSU</b> relies on software, the
-markup is uncomplicated and minimalistic which guarantees that future
-engines can be written to run against it. It is also easily converted
-to other formats, which means documents prepared in <b>SiSU</b> can be
-migrated to other document formats. Further security is provided by the
-fact that the software itself, <b>SiSU</b> is available under GPL3 a
-licence that guarantees that the source code will always be open, and
-free as in libre which means that that code base can be used updated
-and further developed as required under the terms of its license.
-Another challenge is to keep up with a moving target. <b>SiSU</b>
-permits new forms of output to be added as they become important, (Open
-Document Format text was added in 2006), and existing output to be
-updated (html has evolved and the related module has been updated
-repeatedly over the years, presumably when the World Wide Web
-Consortium (w3c) finalises html 5 which is currently under development,
-the html module will again be updated allowing all existing documents
-to be regenerated as html 5).
- </text>
- <endnote notenumber="7">
- <number>7</number>
- <note>
- Specification submitted by Adobe to ISO to become a full open ISO
-specification <br /> &lt;<link
-xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple"
-xlink:href="http://www.linux-watch.com/news/NS7542722606.html">http://www.linux-watch.com/news/NS7542722606.html</link>&gt;
- </note>
- </endnote>
- <endnote notenumber="8">
- <number>8</number>
- <note>
- ISO/IEC 26300:2006
- </note>
- </endnote>
-</object>
-<object id="13">
- <ocn>13</ocn>
- <text class="norm">
- The document formats are written to the file-system and available for
-indexing by independent indexing tools, whether off the web like Google
-and Yahoo or on the site like Lucene and Hyperestraier.
- </text>
-</object>
-<object id="14">
- <ocn>14</ocn>
- <text class="norm">
- <b>SiSU</b> also provides other features such as concordance files and
-document content certificates, and the working against an abstraction
-of document structure has further possibilities for the research and
-development of other document representations, the availability of
-objects is useful for example for topic maps and the commercial law
-thesaurus by Vikki Rogers and Al Krtizer, together with the flexibility
-of <b>SiSU</b> offers great possibilities.
- </text>
-</object>
-<object id="15">
- <ocn>15</ocn>
- <text class="norm">
- <b>SiSU</b> is primarily for published works, which can take advantage
-of the citation system to reliably reference its documents. <b>SiSU</b>
-works well in a complementary manner with such collaborative
-technologies as Wikis, which can take advantage of and be used to
-discuss the substance of content prepared in <b>SiSU</b>.
- </text>
-</object>
-<object id="16">
- <ocn>16</ocn>
- <text class="norm">
- &lt;<link xmlns:xlink="http://www.w3.org/1999/xlink"
-xlink:type="simple"
-xlink:href="http://www.jus.uio.no/sisu">http://www.jus.uio.no/sisu</link>&gt;
- </text>
-</object>
-<object id="17">
- <ocn>17</ocn>
- <text class="h4">
- 2. How does sisu work?
- </text>
-</object>
-<object id="18">
- <ocn>18</ocn>
- <text class="norm">
- <b>SiSU</b> markup is fairly minimalistic, it consists of: a (largely
-optional) document header, made up of information about the document
-(such as when it was published, who authored it, and granting what
-rights) and any processing instructions; and markup within the
-substantive text of the document, which is related to document
-structure and typeface. <b>SiSU</b> must be able to discern the
-structure of a document, (text headings and their levels in relation to
-each other), either from information provided in the document header or
-from markup within the text (or from a combination of both). Processing
-is done against an abstraction of the document comprising of
-information on the document's structure and its objects,[2] which the
-program serializes (providing the object numbers) and which are
-assigned hash sum values based on their content. This abstraction of
-information about document structure, objects, (and hash sums),
-provides considerable flexibility in representing documents different
-ways and for different purposes (e.g. search, document layout,
-publishing, content certification, concordance etc.), and makes it
-possible to take advantage of some of the strengths of established ways
-of representing documents, (or indeed to create new ones).
- </text>
-</object>
-<object id="19">
- <ocn>19</ocn>
- <text class="h4">
- 3. Summary of features
- </text>
-</object>
-<object id="20">
- <ocn>20</ocn>
- <text class="indent_bullet">
- sparse/minimal markup (clean utf-8 source texts). Documents are
-prepared in a single UTF-8 file using a minimalistic mnemonic syntax.
-Typical literature, documents like "War and Peace" require almost no
-markup, and most of the headers are optional.
- </text>
-</object>
-<object id="21">
- <ocn>21</ocn>
- <text class="indent_bullet">
- markup is easily readable/parsable by the human eye, (basic markup is
-simpler and more sparse than the most basic HTML), [this may also be
-converted to XML representations of the same input/source document].
- </text>
-</object>
-<object id="22">
- <ocn>22</ocn>
- <text class="indent_bullet">
- markup defines document structure (this may be done once in a header
-pattern-match description, or for heading levels individually); basic
-text attributes (bold, italics, underscore, strike-through etc.) as
-required; and semantic information related to the document (header
-information, extended beyond the Dublin core and easily further
-extended as required); the headers may also contain processing
-instructions. <b>SiSU</b> markup is primarily an abstraction of
-document structure and document metadata to permit taking advantage of
-the basic strengths of existing alternative practical standard ways of
-representing documents [be that browser viewing, paper publication, sql
-search etc.] (html, xml, odf, latex, pdf, sql)
- </text>
-</object>
-<object id="23">
- <ocn>23</ocn>
- <text class="indent_bullet">
- for output produces reasonably elegant output of established industry
-and institutionally accepted open standard formats.[3] takes advantage
-of the different strengths of various standard formats for representing
-documents, amongst the output formats currently supported are:
- </text>
-</object>
-<object id="24">
- <ocn>24</ocn>
- <text class="indent_bullet1">
- html - both as a single scrollable text and a segmented document
- </text>
-</object>
-<object id="25">
- <ocn>25</ocn>
- <text class="indent_bullet1">
- xhtml
- </text>
-</object>
-<object id="26">
- <ocn>26</ocn>
- <text class="indent_bullet1">
- XML - both in sax and dom style xml structures for further
-development as required
- </text>
-</object>
-<object id="27">
- <ocn>27</ocn>
- <text class="indent_bullet1">
- ODF - open document format, the iso standard for document storage
- </text>
-</object>
-<object id="28">
- <ocn>28</ocn>
- <text class="indent_bullet1">
- LaTeX - used to generate pdf
- </text>
-</object>
-<object id="29">
- <ocn>29</ocn>
- <text class="indent_bullet1">
- pdf (via LaTeX)
- </text>
-</object>
-<object id="30">
- <ocn>30</ocn>
- <text class="indent_bullet1">
- sql - population of an sql database, (at the same object level
-that is used to cite text within a document)
- </text>
-</object>
-<object id="31">
- <ocn>31</ocn>
- <text class="norm">
- Also produces: concordance files; document content certificates (md5 or
-sha256 digests of headings, paragraphs, images etc.) and html manifests
-(and sitemaps of content). (b) takes advantage of the strengths
-implicit in these very different output types, (e.g. PDFs produced
-using typesetting of LaTeX, databases populated with documents at an
-individual object/paragraph level, making possible granular search (and
-related possibilities))
- </text>
-</object>
-<object id="32">
- <ocn>32</ocn>
- <text class="indent_bullet">
- ensuring content can be cited in a meaningful way regardless of
-selected output format. Online publishing (and publishing in multiple
-document formats) lacks a useful way of citing text internally within
-documents (important to academics generally and to lawyers) as page
-numbers are meaningless across browsers and formats. sisu seeks to
-provide a common way of pinpoint the text within a document, (which can
-be utilized for citation and by search engines). The outputs share a
-common numbering system that is meaningful (to man and machine) across
-all digital outputs whether paper, screen, or database oriented, (pdf,
-HTML, xml, sqlite, postgresql), this numbering system can be used to
-reference content.
- </text>
-</object>
-<object id="33">
- <ocn>33</ocn>
- <text class="indent_bullet">
- Granular search within documents. SQL databases are populated at an
-object level (roughly headings, paragraphs, verse, tables) and become
-searchable with that degree of granularity, the output information
-provides the object/paragraph numbers which are relevant across all
-generated outputs; it is also possible to look at just the matching
-paragraphs of the documents in the database; [output indexing also work
-well with search indexing tools like hyperestraier].
- </text>
-</object>
-<object id="34">
- <ocn>34</ocn>
- <text class="indent_bullet">
- long term maintainability of document collections in a world of
-changing formats, having a very sparsely marked-up source document
-base. there is a considerable degree of future-proofing, output
-representations are "upgradeable", and new document formats may be
-added. e.g. addition of odf (open document text) module in 2006 and in
-future html5 output sometime in future, without modification of
-existing prepared texts
- </text>
-</object>
-<object id="35">
- <ocn>35</ocn>
- <text class="indent_bullet">
- SQL search aside, documents are generated as required and static once
-generated.
- </text>
-</object>
-<object id="36">
- <ocn>36</ocn>
- <text class="indent_bullet">
- documents produced are static files, and may be batch processed, this
-needs to be done only once but may be repeated for various reasons as
-desired (updated content, addition of new output formats, updated
-technology document presentations/representations)
- </text>
-</object>
-<object id="37">
- <ocn>37</ocn>
- <text class="indent_bullet">
- document source (plaintext utf-8) if shared on the net may be used as
-input and processed locally to produce the different document outputs
- </text>
-</object>
-<object id="38">
- <ocn>38</ocn>
- <text class="indent_bullet">
- document source may be bundled together (automatically) with associated
-documents (multiple language versions or master document with
-inclusions) and images and sent as a zip file called a sisupod, if
-shared on the net these too may be processed locally to produce the
-desired document outputs
- </text>
-</object>
-<object id="39">
- <ocn>39</ocn>
- <text class="indent_bullet">
- generated document outputs may automatically be posted to remote sites.
- </text>
-</object>
-<object id="40">
- <ocn>40</ocn>
- <text class="indent_bullet">
- for basic document generation, the only software dependency is
-<b>Ruby</b>, and a few standard Unix tools (this covers plaintext,
-HTML, XML, ODF, LaTeX). To use a database you of course need that, and
-to convert the LaTeX generated to pdf, a latex processor like tetex or
-texlive.
- </text>
-</object>
-<object id="41">
- <ocn>41</ocn>
- <text class="indent_bullet">
- as a developers tool it is flexible and extensible
- </text>
-</object>
-<object id="42">
- <ocn>42</ocn>
- <text class="norm">
- Syntax highlighting for <b>SiSU</b> markup is available for a number of
-text editors.
- </text>
-</object>
-<object id="43">
- <ocn>43</ocn>
- <text class="norm">
- <b>SiSU</b> is less about document layout than about finding a way with
-little markup to be able to construct an abstract representation of a
-document that makes it possible to produce multiple representations of
-it which may be rather different from each other and used for different
-purposes, whether layout and publishing, or search of content
- </text>
-</object>
-<object id="44">
- <ocn>44</ocn>
- <text class="norm">
- i.e. to be able to take advantage from this minimal preparation
-starting point of some of the strengths of rather different established
-ways of representing documents for different purposes, whether for
-search (relational database, or indexed flat files generated for that
-purpose whether of complete documents, or say of files made up of
-objects), online viewing (e.g. html, xml, pdf), or paper publication
-(e.g. pdf)...
- </text>
-</object>
-<object id="45">
- <ocn>45</ocn>
- <text class="norm">
- the solution arrived at is by extracting structural information about
-the document (about headings within the document) and by tracking
-objects (which are serialized and also given hash values) in the manner
-described. It makes possible representations that are quite different
-from those offered at present. For example objects could be saved
-individually and identified by their hashes, with an index of how the
-objects relate to each other to form a document.
- </text>
-</object>
-<object id="0">
- <ocn>0</ocn>
- <text class="h4">
- Endnotes
- </text>
-</object>
-</body>
-</document>