From a72e66db913de3a2e508080c8b1fc8d1342a899b Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Tue, 25 Sep 2007 23:23:03 +0100 Subject: remove generated output from main package --- .../sisu_manual/sisu_description/plain.txt | 1566 -------------------- 1 file changed, 1566 deletions(-) delete mode 100644 data/doc/manuals_generated/sisu_manual/sisu_description/plain.txt (limited to 'data/doc/manuals_generated/sisu_manual/sisu_description/plain.txt') diff --git a/data/doc/manuals_generated/sisu_manual/sisu_description/plain.txt b/data/doc/manuals_generated/sisu_manual/sisu_description/plain.txt deleted file mode 100644 index a2a490e2..00000000 --- a/data/doc/manuals_generated/sisu_manual/sisu_description/plain.txt +++ /dev/null @@ -1,1566 +0,0 @@ -SISU - DESCRIPTION, -RALPH AMISSAH -********************************** - -SISU AN ATTEMPT TO DESCRIBE -=========================== - -1. DESCRIPTION --------------- - -1.1 OUTLINE -........... - -*SiSU* is a flexible document preparation, generation publishing and search -system.[^1] - - -- [1]: This information was first placed on the web 12 November 2002; with - predating material taken from - part of a site started and - developed since 1993. See document metadata section - for information on this - version. Dates related to the development of *SiSU* are mostly contained - within the Chronology section of this document, e.g. - - -*SiSU* ("*SiSU* information Structuring Universe" or "Structured information, -Serialized Units"),[^2] is a Unix command line oriented framework for document -structuring, publishing and search. Featuring minimalistic markup, multiple -standard outputs, a common citation system, and granular search. - - -- [2]: also chosen for the meaning of the Finnish term "sisu". - -Using markup applied to a document, *SiSU* can produce plain text, HTML, XHTML, -XML, OpenDocument, LaTeX or PDF files, and populate an SQL database with -objects[^3] (equating generally to paragraph-sized chunks) so searches may be -performed and matches returned with that degree of granularity (e.g. your -search criteria is met by these documents and at these locations within each -document). Document output formats share a common object numbering system for -locating content. This is particularly suitable for "published" works -(finalized texts as opposed to works that are frequently changed or updated) -for which it provides a fixed means of reference of content. - - -- [3]: objects include: headings, paragraphs, verse, tables, images, but not - footnotes/endnotes which are numbered separately and tied to the object from - which they are referenced. - -*SiSU* is the data/information structuring and transforming tool, that has -resulted from work on one of the oldest law web projects. It makes possible the -one time, simple human readable markup of documents, that *SiSU* can then -publish in various forms, suitable for paper[^4], web[^5] and relational -database[^6] presentations, retaining common data-structure and -meta-information across the output/presentation formats. Several requirements -of legal and scholarly publication on the web have been addressed, including -the age old need to be able to reliably cite/pinpoint text within a document, -to easily make footnotes/endnotes, to allow for semantic document meta-tagging, -and to keep required markup to a minimum. These and other features of interest -are listed and described below. A few points are worth making early (and will -be repeated a number of times): - - -- [4]: pdf via LaTeX or lout - -- [5]: currently html (two forms of html presentation one based on css the other on - tables), and /PHP/; potentially structured XML - -- [6]: any SQL - currently PostgreSQL and /sqlite/ (for portability, testing and - development) - - (i) The *SiSU* document generator was the first to place material on the web - with a system that makes possible citation across different document types, - with paragraph, or rather object citation numbering[^7] a text positioning - system, available for the pinpointing of text, 1997, a simple idea from which - much benefit, and *SiSU* remains today, to the best of my knowledge, the only - multiple format e-book/ electronic-document system on the web that gives you - this possibility (including for relational databases). - - -- [7]: previously called "text object numbering" - - (ii) Markup is done once for the multiple formats produced. - - - (iii) Markup is simple, and human readable (with a little practice), in - almost all cases there is less and simpler markup required than basic html. - In any event the markup required is very much simpler than the html, LaTeX, - [lout], structured XML, ODF (OpenDocument), PostgreSQL or SQLite feed etc. - that you can have *SiSU* generate for you. - - - (iv) *SiSU* is a batch processor, dealing with as many files as you need to - generate at a time. - - - (v) Scalability is dependent on your file system (in my case Reiserfs), the - database (currently Postgresql and/or SQLite) and your hardware. - - -*SiSU* Sabaki[^8] (or just *SiSU*) is the provisional name given to the -software described here that helps structure documents for web and other -publication. The name *SiSU* is a loose anagram for something along the lines -of */"SiSU is structuring unit"/*, or /"*SiSU*, information structuring unit"/ -or the more descriptive /"Structured information, Serialized Units"/ or -*/"simple - information structuring unit"/* or the more descriptive -/"Structured information, Serialized Units"/ or what it may be directed towards -/"*semantic* and *information structuring universe*" /,[^9] tongue in cheek, -only just. Guess I'll get away with */"Simple - information Structuring -Universe"/*. *SiSU* is also a Finnish word roughly meaning guts, inner strength -and perseverance.[^10] - - -- [8]: *SiSU* Sabaki, release version. Pre-release version *SiSU* Scribe, and - version prior to that *SiSU* nicknamed Scribbler. Pre-release versions go back - several years. Both Scribbler and Scribe (still maintained) made system calls - to *SiSU*'s various parts, instead of using libraries. - -- [9]: A little universe it may be, but semantic you may have a hard time getting - away with, given the meaning the word has taken on with markup. On a document - wide basis semantic information may be provided, which can be really useful, - (and meaningful, especially) if you have a large document set, and use this - with rss feeds or in an sql database etc. On a markup level, I have little - inclination to add semantic markup formally beyond references, title, author - [Dublin Core entities? addresses?] etc. Actually this deserves a bit of - thought possibly use letter tags (including letter alias/synonyms for font - faces) to create a small set of default semantic tags, with the possibility - for per document adjustments. Will seek to permit XML entity tagging, within - *SiSU* markup and have that ignored/removed by the parts of the program that - have no use for it. - -- [10]: "Sisu refers not to the courage of optimism, but to a concept of life that - says, 'I may not win, but I will gladly give my life for what I believe.'" - Aini Rajanen, Of Finnish Ways, 1981, p. 10. - -- - -- "Every Finn has his own pet definition. To me, sisu means patience without - passion. But there are many varieties of sisu. Sisu can be a sudden outburst - or it can be the kind that lasts. A man can have both kinds. It is outside - reason. It is something in the soul. It comes from oneself. For instance, it - makes a soldier do things because he himself must, not because he has been - told." Paavo Nurmi - -- - -*SiSU* was born of the need to find a way, with minimal effort, and for as wide -a range of document types as possible, to produce high quality publishing -output in a variety of document formats. As such it was necessary to find a -simple document representation that would work across a large number of -document types, and the most convenient way(s) to produce acceptable output -formats. The project leading to this program was started in 1993 (together with -the trade law project now known as Lex Mercatoria) as an investigation of how -to effectively/efficiently place documents on the web. The unified document -handling, together with features such as paragraph numbering, endnote handling -and tables... appeared in 1996/97. *SiSU* was originally written in Perl,[^11] -and converted to *Ruby*, [^12] in 2000, one of the most impressive programming -languages in existence! In its current form it has been written to run on the -*Gnu* /Linux platform, and in particular on *Debian*, [^13] taking advantage of -many of the wonderful projects that are available there. - - -- [11]: - -- [12]: - -- [13]: - -*SiSU* markup is based on requiring the minimum markup needed to determine the -structure of a document. (This can be as little as saying in a header to look -for the word Book at a specified level and the word Chapter at another level). -*SiSU* then breaks a document into its smallest parts (at a heading, and -paragraph level) while retaining all structural information. This break up of -the document and information on its structure is taken advantage of in the -transformations made in generating the very different output types that can be -created, and in providing as much as can be for what each output type is best -at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf -or Postscript), XML (in this case, structural representation), ODF -(OpenDocument [experimental]), SQL (e.g. document search; representing -constituent parts of documents based on their structure, headings, chapters, -paragraphs as required; user control).[^14] - - -- [14]: where explicit structure is provided through the use of tagging headings, - it could be reduced (still) further, for example by reducing the number of - characters used to identify heading levels; but in many cases even that - information is not required as regular expressions can be used to extract the - implicit structure. - -From markup that is simpler and more sparse than html you get: - - -* far greater output possibilities, including html, XML, ODF (OpenDocument), -LaTeX (pdf), and SQL; - - -* the advantages implicit in the very different output possibilities; - - -* a common citation system (for all outputs - including the relational -database, search results are relevant for all outputs); - - -For more see the short summary of features provided below. - - -*SiSU* processes files with minimal tagging to produce various document outputs -including html, LaTeX or lout (which is converted to pdf) and if required loads -the structured information into an SQL database (PostgreSQL and SQLite have -been used for this). *SiSU* produces an intermediate processing format.[^15] - - -- [15]: This proved to be the easiest way to develop syntax, changes could be made, - or alternatives provided for the markup syntax whilst the intermediate markup - syntax was largely held constant. There is actually an optional second - intermediate markup format in YAML - -*SiSU* is used in constructing Lex Mercatoria or - (one of the oldest law web sites), and considerable -thought went into producing output that would be suitable for legal and -academic writings (that do not have formulae) given the limitations of html, -and publication in a wide variety of "formats", in particular in relation to -the convenient and accurate citation of text. However, the construction of Lex -Mercatoria uses only a fraction of the features available from *SiSU* today, -/vis/ generation of flat file structures, rather than in addition the building -of ("granular") SQL database content, (at an object level with relevant -relational tables, and other outputs also available). - - -1.2 SHORT SUMMARY OF FEATURES -............................. - -*(i)* markup syntax: (a) simpler than html, (b) mnemonic, influenced by -mail/messaging/wiki markup practices, (c) human readable, and easily writable, - - -*(ii)* (a) minimal markup requirement, (b) single file marked up for multiple -outputs, - - -notes: - - -* documents are prepared in a single UTF-8 file using a minimalistic mnemonic -syntax. Typical literature, documents like "War and Peace" require almost no -markup, and most of the headers are optional. - - -* markup is easily readable/parsed by the human eye, (basic markup is simpler -and more sparse than the most basic html), [this may also be converted to XML -representations of the same input/source document]. - - -* markup defines document structure (this may be done once in a header -pattern-match description, or for heading levels individually); basic text -attributes (bold, italics, underscore, strike-through etc.) as required; and -semantic information related to the document (header information, extended -beyond the Dublin core and easily further extended as required); the headers -may also contain processing instructions. - - -*(iii)* (a) multiple outputs primarily industry established and institutionally -accepted open standard formats, include amongst others: plaintext (UTF-8); -html; (structured) XML; ODF (Open Document text)l; LaTeX; PDF (via LaTeX); SQL -type databases (currently PostgreSQL and SQLite). Also produces: concordance -files; document content certificates (md5 or sha256 digests of headings, -paragraphs, images etc.) and html manifests (and sitemaps of content). (b) -takes advantage of the strengths implicit in these very different output types, -(e.g. PDFs produced using typesetting of LaTeX, databases populated with -documents at an individual object/paragraph level, making possible granular -search (and related possibilities)) - - -*(iv)* outputs share a common numbering system (dubbed "object citation -numbering" (ocn)) that is meaningful (to man and machine) across various -digital outputs whether paper, screen, or database oriented, (PDF, html, XML, -sqlite, postgresql), this numbering system can be used to reference content. - - -*(v)* SQL databases are populated at an object level (roughly headings, -paragraphs, verse, tables) and become searchable with that degree of -granularity, the output information provides the object/paragraph numbers which -are relevant across all generated outputs; it is also possible to look at just -the matching paragraphs of the documents in the database; [output indexing also -work well with search indexing tools like hyperesteier]. - - -*(vi)* use of semantic meta-tags in headers permit the addition of semantic -information on documents, (the available fields are easily extended) - - -*(vii)* creates organised directory/file structure for (file-system) output, -easily mapped with its clearly defined structure, with all text objects -numbered, you know in advance where in each document output type, a bit of text -will be found (e.g. from an SQL search, you know where to go to find the -prepared html output or PDF etc.)... there is more; easy directory management -and document associations, the document preparation (sub-)directory may be used -to determine output (sub-)directory, the skin used, and the SQL database used, - - -*(viii)* "Concordance file" wordmap, consisting of all the words in a document -and their (text/ object) locations within the text, (and the possibility of -adding vocabularies), - - -*(ix)* document content certification and comparison considerations: (a) the -document and each object within it stamped with an md5 hash making it possible -to easily check or guarantee that the substantive content of a document is -unchanged, (b)version control, documents integrated with time based source -control system, default RCS or CVS with use of $Id: sisu_description.sst,v 1.25 -2007/08/23 12:22:36 ralph Exp $ tag, which *SiSU* checks - - -*(x)* *SiSU*'s minimalist markup makes for meaningful "diffing" of the -substantive content of markup-files, - - -*(xi)* easily skinnable, document appearance on a project/site wide, directory -wide, or document instance level easily controlled/changed, - - -*(xii)* in many cases a regular expression may be used (once in the document -header) to define all or part of a documents structure obviating or reducing -the need to provide structural markup within the document, - - -*(xiii)* prepared files may be batch process, documents produced are static -files so this needs to be done only once but may be repeated for various -reasons as desired (updated content, addition of new output formats, updated -technology document presentations/representations) - - -*(xiv)* possible to pre-process, which permits: the easy creation of standard -form documents, and templates/term-sheets, or; building of composite documents -(master documents) from other sisu marked up documents, or marked up parts, -i.e. import documents or parts of text into a main document should this be -desired - - -there is a considerable degree of future-proofing, output representations are -"upgradeable", and new document formats may be added. - - -*(xv)* there is a considerable degree of future-proofing, output -representations are "upgradeable", and new document formats may be added: (a) -modular, (thanks in no small part to *Ruby*) another output format required, -write another module.... (b) easy to update output formats (eg html, XHTML, -LaTeX/PDF produced can be updated in program and run against whole document -set), (c) easy to add, modify, or have alternative syntax rules for input, -should you need to, - - -*(xvi)* scalability, dependent on your file-system (ext3, Reiserfs, XFS, -whatever) and on the relational database used (currently Postgresql and -SQLite), and your hardware, - - -*(xvii)* only marked up files need be backed up, to secure the larger document -set produced, - - -*(xviii)* document management, - - -*(xix)* Syntax highlighting for *SiSU* markup is available for a number of text -editors. - - -*(xx)* remote operations: (a) run *SiSU* on a remote server, (having prepared -sisu markup documents locally or on that server, i.e. this solution where sisu -is installed on the remote server, would work whatever type of machine you -chose to prepare your markup documents on), (b) generated document outputs may -be posted by sisu to remote sites (using rsync/scp) (c)document source -(plaintext utf-8) if shared on the net may be identified by its url and -processed locally to produce the different document outputs. - - -*(xxi)* document source may be bundled together (automatically) with associated -documents (multiple language versions or master document with inclusions) and -images and sent as a zip file called a sisupod, if shared on the net these too -may be processed locally to produce the desired document outputs, these may be -downloaded, shared as email attachments, or processed by running sisu against -them, either using a url or the filename. - - -*(xxii)* for basic document generation, the only software dependency is *Ruby*, -and a few standard Unix tools (this covers plaintext, html, XML, ODF, LaTeX). -To use a database you of course need that, and to convert the LaTeX generated -to PDF, a LaTeX processor like tetex or texlive. - - -as a developers tool it is flexible and extensible - - -*SiSU* was developed in relation to legal documents, and is strong across a -wide variety of texts (law, literature...). *SiSU* handles images but is not -suitable for formulae/ statistics, or for technical writing at this time. - - -*SiSU* has been developed and has been in use for several years. Requirements -to cover a wide range of documents within its use domain have been explored. - - -Some modules are more mature than others, the most mature being Html and LaTeX -/ pdf. PostgreSQL and search functions are useable and together with /ocn/ -unique (to the best of my knowledge). The XML output document set is "well -formed" but largely proof of concept. - - -1.3 HOW IT WORKS -................ - -*SiSU* markup is fairly minimalistic, it consists of: a (largely optional) -document header, made up of information about the document (such as when it was -published, who authored it, and granting what rights) and any processing -instructions; and markup within text which is related to document structure and -typeface. *SiSU* must be able to discern the structure of a document, (text -headings and their levels in relation to each other), either from information -provided in the instruction header or from markup within the text (or from a -combination of both). Processing is done against an abstraction of the document -comprising of information on the document's structure and its objects,[^16] -which the program serializes (providing the object numbers) and which are -assigned hash sum values based on their content. This abstraction of -information about document structure, objects, (and hash sums), provides -considerable flexibility in representing documents different ways and for -different purposes (e.g. search, document layout, publishing, content -certification, concordance etc.), and makes it possible to take advantage of -some of the strengths of established ways of representing documents, (or indeed -to create new ones). - - -- [16]: objects include: headings, paragraphs, verse, tables, images, but not - footnotes/endnotes which are numbered separately and tied to the object from - which they are referenced. - -1.4 SIMPLE MARKUP -................. - -*SiSU* markup is based on requiring the minimum markup needed to determine the -structure of a document. (This can be as little as saying in a header to look -for the word Book at a specified level and the word Chapter at another level). -*SiSU* then breaks a document into its smallest parts (at a heading, and -paragraph level) while retaining all structural information. This break up of -the document and information on its structure is taken advantage of in the -transformations made in generating the very different output types that can be -created, and in providing as much as can be for what each output type is best -at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf -or Postscript), XML (in this case, structural representation), ODF -(OpenDocument), SQL (e.g. document search; representing constituent parts of -documents based on their structure, headings, chapters, paragraphs as required; -user control).[^17] - - -- [17]: where explicit structure is provided through the use of tagging headings, - it could be reduced (still) further, for example by reducing the number of - characters used to identify heading levels; but in many cases even that - information is not required as regular expressions can be used to extract the - implicit structure. - -1.4.1 SPARSE MARKUP REQUIREMENT, TRY TO GET THE MOST OUT OF MARKUP -.................................................................. - -One of its strengths is that very small amounts of initial tagging is required -for the program to generate its output. - - -This is a basic markup example: - - -* basic markup example, text file - an international convention [link:] - -[^18] - - -- [18]: - output provided as example in the next section - -* view basic markup, as it would be highlighted by vim editor [link:] - -[^19] - - -- [19]: - as it would appear with syntax highlighting (by vim) - -Emphasis has been on simplicity and minimalism in markup requirements. Design -philosophy is to try keep the amount of markup required low, for whatever has -been determined to be acceptable output.[^20] - - -- [20]: seems there are several "smart ASCIIs" available, primarily for ascii to - html conversion, that make this, and reasonable looking ascii their goal - -- - -- - -- - -*SiSU*'s markup is more minimalistic and simpler than (the equivalent) html and -for it, you get considerably more than just html, as this preparation gives you -all available output formats, upon request. - - -1.4.2 SINGLE MARKUP FILE PROVIDES MULTIPLE OUTPUT FORMATS -......................................................... - -For each document, there is only one (input, minimalistically marked up) file -from which all the available output types are generated.[^21] - - -- [21]: These include richly laid out and linked html (table or css variants), - /PHP/, LaTeX (from which pdf portrait and landscape documents are produced), - texinfo (for info files etc.), and PostgreSQL and/or SQLite. And the - opportunity to fairly easily build additional modules, such as XML. See the - examples provided in this document. - -Eg. the markup example: - - -* original text file - an international convention [link:] - -[^22] - - -- [22]: - -* view as syntax would be highlighted by vim editor [link:] - -[^23] - - -- [23]: - -Produces the following output: - - -* Segmented html version of document [link:] - -[^24] - - -- [24]: - -* Full length html document [link:] - -[^25] - - -- [25]: - -* pdf landscape version of document [link:] - -[^26] - - -- [26]: - -* pdf portrait version of document [link:] - -[^27] - - -- [27]: - -* clean tex ascii version of document [link:] - -[^28] - - -- [28]: - -* /xml/ sax version of document [link:] - -[^29] - - -- [29]: - -* /xml/ dom version of document [link:] - -[^30] - - -- [30]: - -* Concordance [link:] - -[^31] - - -- [31]: - -(and in addition to these: PostgreSQL, SQLite, texinfo and YAML -[^32] versions if desired) - - -- [32]: discontinued for the time being - -1.4.3 SYNTAX RELATIVELY EASY TO READ AND REMEMBER -................................................. - -Syntax is kept simple and mnemonic.[^33] - - -- [33]: *SiSU* markup syntax, an incomplete summary: - - -- Visual check of elementary font face modifiers: *bold* *bold* - emphasis /italics/ _underscore_ strikethrough - ^superscript^ [subscript] - -1.4.4 KEPT SIMPLE BY HAVING A LIMITED PUBLISHING FEATURE SET, AND FEATURES -IDENTIFIED AS MOST IMPORTANT, ARE AVAILABLE ACROSS SEVERAL DOCUMENT TYPES -.............................................................................. - -To keep *SiSU* markup sparse and simple *SiSU* deliberately provides a limited -publishing feature set, including: indent levels; bold; italics; superscript; -subscript; simple tables; images; tables of contents and; endnotes. Which in -most cases are available across the different output formats. - - -The publishing feature set may be expanded as required. - - -1.5 DESIGNED WITH USABILITY IN MIND -................................... - -Output is designed to be uniform, easy to read, navigate and cite. - - -1.6 CODE SEPARATE FROM CONTENT -.............................. - -Code[^34] is separated from content. This means that when changes are desired -in the output presentation, the code that produces them, and not the marked up -text data set (which could be thousands of documents) is modified. Separating -code from content makes large scale changes to output appearance trivial, and -permits the easy addition of new output modules. - - -- [34]: the program that generates the documents - -1.7 OBJECT CITATION NUMBERING, A TEXT OR OBJECT POSITIONING / CITATION SYSTEM - -"PARAGRAPH" (OR TEXT OBJECT) NUMBERING, THAT REMAINS SAME AND USABLE ACROSS ALL -OUTPUT FORMATS BY PEOPLE AND MACHINE -.............................................................................. - -Object citation numbering is a simple object (text) positioning and cition -system that is human relevant and machine useable, used by *SiSU* for all -manner of presentations, and that is available for use in all text mappings. It -is based on the automated sequential numbering of objects (roughly paragraphs, -(headings, tables, verse) or other blocks of text or images etc.). The text -positioning system (in which I claim copyright) is invaluable for publishing -requiring the citing text across multiple output formats, and for the general -mapping of text within a document: - - -* in html, html not being easily citeable (change font size, or use a different -browser and the page on which specific text appears has changed), and - - -* across multiple formats being common to all output formats html/xml/pdf/sql -output, - - -* the results of an sql search can just be "live" citation references to the -documents in which the text is found, much like an index (see image examples -provided). [link:] [^35] - - -- [35]: - -I claim copyright on the system I use which is the most basic of all, numbering -all text in headings and paragraphs sequentially (with tables and images being -treated as a single paragraph) and only footnotes/endnotes not following this -numbering, as their position in text is not strictly determined, (a change from -footnotes to endnotes would change their numbering), footnotes instead "belong" -to the paragraph from which they are referenced, and have sequential numbers of -their own. - - -*SiSU* has a paragraph numbering system, that remains the same regardless of -the output format. This provides an effective means of citation, pinpointing -text accurately in all output formats, using the same reference. This is -particularly useful where text has to be located across different output -formats - for example once html is printed the number of pages and pages on -which given text is found will vary depending on the browser, its settings the -font size setting etc. Similarly *SiSU* produces pdf in different forms, eg. on -the example site Lex Mercatoria as portrait and landscape documents - here too -page numbering varies, but paragraph numbering is the same, /vis a vis/ all -versions of the text (portrait and landscape pdf and the html versions of the -text, and as stored (with "paragraphs" as records) to the PostgreSQL or SQLite -database). - - -These numbers are placed in the text margins and are intended to be independent -of and not to interfere with authors tagging. [The citation system (object -citation numbering system, automated "paragraph numbering") which is -automatically generated and is common and identical across all document -formats] The paragraph numbering system is more accurately described as an -(text) object numbering system, as headings are also numbered... all headings -and paragraphs are numbered sequentially. Endnotes are automatically numbered -independently and rather "belong" to the paragraph from which they are -referenced, as an endnote does not (necessarily) form a part of a documents -sequence, (they may be produced as either endnotes or footnotes (or both -depending on what output you choose to look at - if you take the segmented html -version document provided as an example, you will find that the endnotes are -placed both at the end of each section, and in a separate section of their own -called endnotes, and these are hyper-linked)). An attractive feature of -providing citation numbering in this way is that it is independent of the -document structure... it remains the same regardless of what is done about the -document structure. - - -The rules have been kept very simple, unique incremental object citation -numbers are assigned to headings, paragraphs, verse, tables and images. It is -possible to manually override this feature on a per heading or comment basis -though this should be used exceptionally, it may be of use where there a -substantive text, and the addition of a minor comment by the publisher that -should not be mapped as part of the text. - - -The object citation number markers contain additional numbering information -with regard to the document structure, that can be used for alternative -presentations, including such detail as the type of object (heading, paragraph, -table, image, etc.), numbered sequentially. - - -An advantage is that the numbering remains the same regardless of document -structure. - - -Text object ("paragraph") numbering is the same for all output versions of the -same document, vis html, pdf, pgsql, yaml etc. - - -In the relational database, as individual text objects of a document stored -(and indexed) together with object numbers, and all versions of the document -have the same numbering, the results of searches may be tailored just to -provide the location of the search result in all available document formats. - - -/ Note: there is a bug in the released behaviour of object citation numbering, -(not certain when it was introduced) tables should be numbered, ie each table -gets an ocn, required amongst other things for relational database. This will -be corrected in a future release. Citation numbering of existing documents that -contain tables will changed. / - - -1.8 HANDLING OF DUBLIN CORE META-TAGS MAKING USE OF THE RESOURCE DESCRIPTION -FRAMEWORK -.............................................................................. - -*SiSU* is able to use meta tags based on the Dublin Core[^36] and Resource -Description Framework[^37] - - -- [36]: - -- [37]: - -This provides the means of providing semantic information about a document, -both as computer processable meta-tags, and as human readable information that -may be of value for classification purposes. - - -This information is provided both in html metatags, and (where available) under -the section titled "Document Information - MetaData", near the end of a -document, for example in the segmented html version of this text at: - - - -1.9 EASY DIRECTORY MANAGEMENT -............................. - -1. Directory file association, skins and special image management, made -simpler.[^38] - - -- [38]: The previous way was directory associations for file output were set up in - the configuration file. The present system is a more natural way to work - requireing less configuration. - -The last part of the name of the work directory in which markup is being done, -or rather from where *SiSU* is run in order to generate document output, is -used in determining the sub-directory name for output files, that is created in -the document output directory. This provides a rather easy way to associate -documents e.g. of a given subject, or by owner. - - - - /www/docs - /intellectual_property - /arbitration - /contract_law - /www/docs - /ralph - /sisu - -all are placed in their own directories within the directory structure created. -Similar rules are used in the creation of sql type databases (though they can -be overridden). - - -There are a couple of further associations with these directories. - - -Directory wide skins. - - -Directory specific images. - - -2. If there is a "directory skin", that is a skin of the same name as the -directory, it is used in the generation of the documents within it, rather than -the default skin, unless the document has a specific skin associated with it. - - - a. default skin (always available) - - - b. directory skin (precedence over default if exists) - - - c. document skin (takes precedence wherever document requests a specific - skin) - - -Skins are defined in the document skin directory and if a directory association -is desired a softlink made to the relevant skin. Skins (directory association -auto load) auto load skin if a directory skin exists of same name as directory -stub, (and there is no specific doc skin) - - -3. If the working directory has within it a sub-directory called image_local, -the images within that directory are used for references to images, that are -not part of the default site build. - - -1.10 DOCUMENT VERSION CONTROL INFORMATION -......................................... - -The possibility of citing an exact document version. - - -Permits the inclusion of document version control information to the document -body and metatags.[^39] This provides a much more certain method of referring -to the exact version of a particular document, (assuming that the document is -from a trusted source, that will retain earlier versions of a document).[^40] - - -- [39]: from a version control system such as CVS - -- [40]: The version control system must be run, so the version number is obtained, - prior to the *SiSU* document generation, and subsequent posting of the - document. - -This information (where available) is provided under the section of the -document titled "Document Information - MetaData", near the end of a document, -for example in the segmented html version of this text at: - - - -1.11 TABLE OF CONTENTS -...................... - -*SiSU* produces a rudimentary a table of contents based on document headings. - - -1.12 AUTO-NUMBERING OF HEADINGS -............................... - -Headings can be automatically numbered, (and automatically named for -hyper-linking) - - -1.13 NUMBERING AND CROSS-HYPERLINKING OF ENDNOTES -................................................. - -*SiSU* can automatically number footnotes/endnotes. This is the default -operation where no number is provided. - - -Footnotes/endnotes may also be manually numbered. Where a number, or numbers -are provided for a footnote/endnote, this does not increment the automatic -footnote/endnote number counter. - - -In the html output footnotes/endnotes are cross-hyper-linked (to their -reference point and vice versa). In th pdf output footnotes are linked from -their reference point only. - - -1.14 "SKINNABLE" -................ - -*SiSU* is skinnable, on a site-wide, directory-wide and per document basis, so -different looking versions of things may be produced with little difficulty. -There is a default skin which may be modified, as the background site skin, and -each working directory may have a skin associated with it, as may each -individual document. The hierarchy of application is document, directory, then -site... ie if a document skin exists it gets precedence. - - -Whilst it is skinnable, the default output styles are selected to work across -the widest possible range of document types. - - -1.15 MULTIPLE OUTPUTS -..................... - -From markup that is simpler and more sparse than html you get: - - -* far greater output possibilities, including multiple html types, XML -(different structured types), LaTeX (pdf landscape, portrait), and SQL -(Postgresql or SQLite or other); - - -* the advantages implicit in these very different output possibilities;[^41] - - -- [41]: e.g. LaTeX (professional document typesetting, easy conversion to pdf or - Postscript), XML (in this case, structural representation), SQL (e.g. document - set searches; representation of the constituent parts of documents based on - their structure, headings, chapters, paragraphs as desired; control of use) - -* a common citation system - - -As many output formats/presentations as one cares to write modules for - -several types of html (e.g. structure based on css, or structure based on -tables); /LaTeX/pdf/ and /Lout/pdf/; pgsql other databases easily added; -yaml... - - -1.15.1 HTML - SEVERAL PRESENTATIONS: FULL LENGTH & SEGMENTED; CSS & TABLE BASED -.............................................................................. - -Most documents are produced in single and segmented html versions, described -below: - - -*The Scroll (full length text presentations)* - - -The full length of the text in a single scrollable document.[^42] As a rule the -files they are saved in are named: /doc/ or more precisely /doc.html/ - - -- [42]: CISG - - -- The Unidroit Contract Principles - or - -- The Autonomous Contract - - -For various reasons texts may only be provided in this form (such as this one -which is short), though most are also provided as segmented texts. - - -"Scroll" is a reference to the historical scroll, a single long document/ -parchment, and also no doubt to what you will have to do to get to the bottom -of the text.[^43] - - -- [43]: Scrolling is not however necessarily confined to full length documents as - you will have to scroll to get to the bottom of any long segment (eg. chapter) - of a segmented text. - -*The Segmented Text* - - -The text divided into segments (such as articles or chapters depending on the -text)[^44] As a rule the files they are saved in are named: /toc/ and /index/ -or more precisely /toc.html/ and /index.html/ - - -- [44]: CISG - - -- The Unidroit Principles - - -- The Autonomous Contract - or - -- WTA 1994 - -If you know exactly what you are looking for, loading a segment of text is -faster (the segments being smaller). Occasionally longer documents such as the -WTA 1994 are only provided in segmented -form. - - -*Cascading Style Sheet, and Table based html* - - -*SiSU* outputs html, two current standard forms available are: - - -css based [link:] - - -and - - -table based [largely discontinued ][^45] - - -- [45]: formatting possibility still exists in code tree but maintenance has been - largely discontinuted. - -*The html is tested across several browsers* - - -I like to remind you that there are other excellent browsers out there, many of -which have long supported practical features like tabbing. - - -The html is tested across several browsers, including: - - -* *Firefox* (Mozilla-Firefox) [link:] - [^46] - - -- [46]: - -* Kazehakase [link:] [^47] - - -- [47]: - -* Konqueror [link:] [^48] - - -- [48]: - -* Mozilla [link:] [^49] - - -- [49]: - -* MS Internet Explorer [link:] - [^50] - - -- [50]: - -* Netscape [link:] - [^51] - - -- [51]: - -* Opera [link:] [^52] - - -- [52]: - -Also lighter weight graphical browsers: - - -* Dillo [link:] [^53] - - -- [53]: - -* *Epiphany* [link:] [^54] - - -- [54]: - -* *Galeon* [link:] [^55] - - -- [55]: - -And for console/text browsing: - - -* *elinks* [link:] [^56] - - -- [56]: - -* *links2* [link:] [^57] - - -- [57]: - -* *w3m* [link:] [^58] - - -- [58]: - -The html tables output is rendered more accurately across a wider variety set -and older versions of browsers (than the html css output). - - -1.15.2 XML -.......... - -*SiSU* generates well formed XML, and multiple versions. An XML SAX version -with a flat/shallow structure, and XML DOM version with a deeper (embedded) -structure. There is also a released working xhtml module. Examples of SAX and -DOM versions are provided within this document. - - -1.15.3 ODT:ODF, OPEN DOCUMENT FORMAT - ISO/IEC 26300:2006 -......................................................... - -*SiSU* generates Open Document Output format. - - -1.15.4 PDF - PORTRAIT AND LANDSCAPE, (THROUGH THE GENERATION OF LATEX OUTPUT -WHICH IS THEN TRANSFORMED TO PDF) -.............................................................................. - -*SiSU* outputs LaTeX if required which is easily transformed to PDF.[^59] PDF -documents are generated on the site from the same source files and *Ruby* -program that produce html. Landscape oriented pdf introduced, providing easier -screen viewing, they are also (paper saving, being currently) formatted to have -fewer pages than their portrait equivalents. - - -- [59]: LaTeX and pdf features introduced 18^th^ June 2001, Landscape and portrait - pdfs introduced 7^th^ October 2001., Lout is a more recent addition 22^th^ - April 2003 - -* Adobe Reader [link:] -[^60] - - -- [60]: - -* *Evince* [link:] [^61] - - -- [61]: - -* xpdf [link:] [^62] - - -- [62]: - -1.15.5 SEARCH - LOADING/POPULATING OF RELATIONAL DATABASE WHILE RETAINING -DOCUMENT STRUCTURE INFORMATION, OBJECT CITATION NUMBERING AND OTHER FEATURES -(CURRENTLY POSTGRESQL AND/OR SQLITE) -.............................................................................. - -*SiSU* (from the same markup input file) automatically feeds into -PostgreSQL[^63] and/or SQLite[^64] database (could be any other of the better -relational databases)[^65] - together with all additional information related -to document structure, and the alternative ways in which it is generated on the -site retained. As regards scaling of the database, it is as scalable as the -database (here Postgresql or SQLite) and hardware allow. I will prune the -images later. - - -- [63]: - -- - -- - -- [64]: - -- - -- [65]: Relational database features retaining document structure and citation - introduced 15^th^ July 2002 - -This is one of the more interesting output forms, as all the structural data -for the documents are retained (though can be ignored by the user of the -database should they so choose). All site texts/documents are (currently) -streamed to four pgsql database tables: - - - * one containing semantic (and other) headers, including, title, author, - subject, (the Dublin Core...); - - - * another the substantive texts by individual "paragraph" (or object) - along - with structural information, each paragraph being identifiable by its - paragraph number (if it has one which almost all of them do), and the - substantive text of each paragraph quite naturally being searchable (both in - formatted and clean text versions for searching); and - - - * a third containing endnotes cross-referenced back to the paragraph from - which they are referenced (both in formatted and clean text versions for - searching). - - - * a fourth table with a one to one relation with the headers table contains - full text versions of output, eg. pdf, html, xml, and ascii. - - -There is of course the possibility to add further structures. - - -At this level *SiSU* loads a relational database with documents broken in to -their smallest logical structurally constituent parts, as text objects, with -their object citation number and all other structural information needed to -construct the structured document. Text is stored (at this text object level) -with and without elementary markup tagging, the stripped version being so as to -facilitate ease of searching. - - -Because the document structure of sites created is clearly defined, and the -text object citation system is available for all forms of output, it is -possible to search the sql database, and either read results from that -database, or just as simply map the results to the html output, which has -richer text markup. - - -The combination of the *SiSU* citation system with a relational database is -pretty powerful, giving rise to several possibilities. As individual text -objects of a document stored (and indexed) together with object numbers, and -all versions of the document have the same numbering, complex searches can be -tailored to return just the locations of the search results relevant for all -available output formats, with live links to the precise locations in the -database or in html/xml documents; or, the structural information provided -makes it possible to search the full contents of the database and have headings -in which search content appears, or to search only headings etc. (as the Dublin -Core is incorporated it is easy to make use of that as well). - - -This is a larger scale project, (with little development on the front end -largely ignored), though the "infrastructure" has been in place since 2002. - - -1.15.6 SEARCH - DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES, -INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL) -.............................................................................. - -Sample search frontend [link:] [^66] A small -database and sample query front-end (search from) that makes use of the -citation system, _object citation numbering_ to demonstrates -functionality.[^67] - - -- [66]: - -- [67]: (which could be extended further with current back-end). As regards scaling - of the database, it is as scalable as the database (here Postgresql) and - hardware allow. - -*SiSU* can provide information on which documents are matched and at what -locations within each document the matches are found. These results are -relevant across all outputs using object citation numbering, which includes -html, XML, LaTeX, PDF and indeed the SQL database. You can then refer to one of -the other outputs or in the SQL database expand the text within the matched -objects (paragraphs) in the documents matched. - - -(further work needs to be done on the sample search form, which is rudimentary -and only passes simple booleans correctly at present to the SQL engine) - - -A few canned searches, showing object numbers. Search for: - - -English documents matching Linux OR Debian [link:] - - - -GPL OR Richard Stallman [link:] - - - -invention OR innovation in English language [link:] - - - -copyright in English language documents [link:] - - - -Note that the searches done in this form are case sensitive. - - -Expand those same searches, showing the matching text in each document: - - -English documents matching Linux OR Debian [link:] - - - -GPL OR Richard Stallman [link:] - - - -invention OR innovation in English language [link:] - - - -copyright in English language documents [link:] - - - -Note you may set results either for documents matched and object number -locations within each matched document meeting the search criteria; or display -the names of the documents matched along with the objects (paragraphs) that -meet the search criteria.[^68] - - -- [68]: of this feature when demonstrated to an IBM software innovations evaluator - in 2004 he said to paraphrase: this could be of interest to us. We have large - document management systems, you can search hundreds of thousands of documents - and we can tell you which documents meet your search criteria, but there is no - way we can tell you without opening each document where within each your - matches are found. - -*OCN index mode,* (object citation number) the numbers displayed are relevant -(and may be used to reference the match) in any sisu generated rendition of the -text[^69] the links provided are to the locations of matches within the html -generated by *SiSU*. - - -- [69]: OCN are provided for HTML, XML, pdf ... though currently omitted in - plain-text and opendocument format output - -*Paragraph mode,* you may alternatively display the text of each paragraph in -which the match was made, again the object/paragraph numbers are relevant to -any *SiSU* generated/published text. - - -Several options for output - select database to search, show results in index -view (links to locations within text), show results with text, echo search in -form, show what was searched, create and show a "canned url" for search, show -available search fields. Also shows counters number of documents in which found -and number of locations within documents where found. [could consider sorting -by document with most occurrences of the search result]. - - -Earlier version of the search frontend - Simple search, results with files in -which search found, and locations where found within files. - - -Simple search, results with files in which search found, and text object -(paragraph or endnote) where found within files. - - -1.15.7 OTHER FORMS -.................. - -There are other forms as well, YAML file, *Ruby* Marshal dumps, document -pre-processing (processing of documents prior to the steps described here, to -produce input suitable for the program) snap in a new module as -required/desired, well formed XML, no problem. - - -1.16 CONCORDANCE / WORD MAP OR RUDIMENTARY INDEX -................................................ - -Concordance /WordMaps:[^70] *SiSU* produces a rudimentary index based on the -words within the text, making use of paragraph numbers to identify text -locations. This is generated in html and hyper-linked but identifies these -words locations in the other document formats. Though it is possible to search -using a search engine, this is a means for browsing an alphabetical list of -words which may suggest other useful content. - - -- [70]: Concordance/ WordMaps introduced 15^th^ August 2002 - -1.17 MANAGED (DOCUMENT) DIRECTORY, DATABASE, OR SITE STRUCTURE -.............................................................. - -*SiSU* builds the web site (or more generically provides a suitable directory -structure) - placing various output texts in the hierarchy of the web-site (or -db), which (for directories) is a sub-directory with the name of the text file. - - -1.18 BATCH PROCESSING -..................... - -*SiSU* is a batch processing tool, handling and transforming multiple (or -individual) documents (in many ways) with a single instruction. - - -1.19 INTEGRATION TO SUPERIOR GNU/LINUX AND UNIX TOOLS -..................................................... - -As should have been noted by the above description of *SiSU*, it makes use of -existing programs found on *Gnu* /Linux and Unix, amongst those already -mentioned include the LaTeX to pdf converters and the database PostgreSQL or -SQLite. - - -1.19.1 BACKUP AND VERSION CONTROL -................................. - -Unix provides many tools for version control. For documents Subversion, CVS and -even the old RCS are useful for the per-document histories they provide. - - -For writing code superior (more recent) version control system exist. These can -also be used for documents though they tend to take stamps of changes across -the repository as a whole, rather than for each individual file that is -tracked, (as CVS and RCS do). My personal preference is for distributed systems -such as Git, Mercurial or Darcs, of which I use Git for both code and -documents. - - -Several backup tools exist. At the base level I tend to use rdiff. - - -1.19.2 EDITOR SUPPORT -..................... - -*SiSU* documents are prepared / marked up in utf-8 text _you are free to use -the text editor of your choice._ - - -Syntax highlighting for a number of editors are provided. Amongst them Vim, -Kwrite, Kate, Gedit and diakonos. These may be found with configuration -instructions at . Vim [link:] - [^71] as of version 7 has built in sytax highlighting for -*SiSU*. - - -- [71]: - -1.20 MODULAR DESIGN, NEED SOMETHING NEW ADD A MODULE -.................................................... - -Need a new output format that does not already exist, write a new module. - - -Prefer a new input syntax, you could write a new syntax matching the existing -design, though my personal preference is some uniformity in entry appearance. -If necessary has been fairly easy to extend the design parameters. It is -intended to incorporate some additional basic semantic tagging, (book, article, -author etc.) However, keeping the requirements for input minimal, and -relatively simple has been a design goal. - - -DOCUMENT INFORMATION (METADATA) -******************************* - -METADATA --------- - -Document Manifest @ - - - -*Dublin Core* (DC) - - -/DC tags included with this document are provided here./ - - -DC Title: _SiSU - Description_ - - -DC Creator: _Ralph Amissah_ - - -DC Rights: _Copyright (C) Ralph Amissah 2007, part of SiSU documentation, -License GPL 3_ - - -DC Type: _information_ - - -DC Date created: _2002-11-12_ - - -DC Date issued: _2002-11-12_ - - -DC Date available: _2002-11-12_ - - -DC Date modified: _2007-08-30_ - - -DC Date: _2007-08-30_ - - -*Version Information* - - -Sourcefile: _sisu_description.sst_ - - -Filetype: _SiSU text 0.57_ - - -Sourcefile Digest, MD5(sisu_description.sst)= -_b89ccdad9f6d9c2260d8d383d6b35ccc_ - - -Skin_Digest: -MD5(/home/ralph/grotto/theatre/dbld/builds/sisu/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= -_20fc43cf3eb6590bc3399a1aef65c5a9_ - - -*Generated* - - -Document (metaverse) last generated: _Tue Sep 25 02:54:06 +0100 2007_ - - -Generated by: _SiSU_ _0.59.1_ of 2007w39/2 (2007-09-25) - - -Ruby version: _ ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]_ - - - -============================================================================== - - title: SiSU - Description - - creator: Ralph Amissah - - rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, - License GPL 3 - - type: information - - subject: ebook, epublishing, electronic book, electronic publishing, - electronic document, electronic citation, data structure, - citation systems, search - - date.created: 2002-11-12 - - date.issued: 2002-11-12 - - date.available: 2002-11-12 - - date.modified: 2007-08-30 - - date: 2007-08-30 - - - - - -============================================================================== -nil - -Other versions of this document: -manifest: - http://www.jus.uio.no/sisu/sisu_description/sisu_manifest.html -html: - http://www.jus.uio.no/sisu/sisu_description/toc.html -pdf: - http://www.jus.uio.no/sisu/sisu_description/portrait.pdf - http://www.jus.uio.no/sisu/sisu_description/landscape.pdf -plaintext (plain text): - http://www.jus.uio.no/sisu/sisu_description/plain.txt -at: - http://www.jus.uio.no/sisu -* Generated by: SiSU 0.59.1 of 2007w39/2 (2007-09-25) -* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] -* Last Generated on: Tue Sep 25 02:54:08 +0100 2007 -* SiSU http://www.jus.uio.no/sisu -- cgit v1.2.3