From a72e66db913de3a2e508080c8b1fc8d1342a899b Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Tue, 25 Sep 2007 23:23:03 +0100 Subject: remove generated output from main package --- .../sisu_manual/sisu_introduction/plain.txt | 460 --------------------- 1 file changed, 460 deletions(-) delete mode 100644 data/doc/manuals_generated/sisu_manual/sisu_introduction/plain.txt (limited to 'data/doc/manuals_generated/sisu_manual/sisu_introduction/plain.txt') diff --git a/data/doc/manuals_generated/sisu_manual/sisu_introduction/plain.txt b/data/doc/manuals_generated/sisu_manual/sisu_introduction/plain.txt deleted file mode 100644 index 7b7933df..00000000 --- a/data/doc/manuals_generated/sisu_manual/sisu_introduction/plain.txt +++ /dev/null @@ -1,460 +0,0 @@ -SISU - COMMANDS, -RALPH AMISSAH -******************************* - -WHAT IS SISU? -============= - -DESCRIPTION -=========== - -1. INTRODUCTION - WHAT IS SISU? -------------------------------- - -*SiSU* is a system for document markup, publishing (in multiple open standard -formats) and search - - -*SiSU*[^1] is a[^2] framework for document structuring, publishing and search, -comprising of (a) a lightweight document structure and presentation markup -syntax and (b) an accompanying engine for generating standard document format -outputs from documents prepared in sisu markup syntax, which is able to produce -multiple standard outputs that (can) share a common numbering system for the -citation of text within a document. - - -- [1]: "*SiSU* information Structuring Universe" or "Structured information, - Serialized Units". - -- also chosen for the meaning of the Finnish term "sisu". - -- [2]: Unix command line oriented - -*SiSU* is developed under an open source, software libre license (GPL3). It has -been developed in the context of coping with large document sets with evolving -markup related technologies, for which you want multiple output formats, a -common mechanism for cross-output-format citation, and search. - - -*SiSU* both defines a markup syntax and provides an engine that produces open -standards format outputs from documents prepared with *SiSU* markup. From a -single lightly prepared document sisu custom builds several standard output -formats which share a common (text object) numbering system for citation of -content within a document (that also has implications for search). The sisu -engine works with an abstraction of the document's structure and content from -which it is possible to generate different forms of representation of the -document. Significantly *SiSU* markup is more sparse than html and outputs -which include html, LaTeX, landscape and portrait pdfs, Open Document Format -(ODF), all of which can be added to and updated. *SiSU* is also able to -populate SQL type databases at an object level, which means that searches can -be made with that degree of granularity. Results of objects (primarily -paragraphs and headings) can be viewed directly in the database, or just the -object numbers shown - your search criteria is met in these documents and at -these locations within each document. - - -Source document preparation and output generation is a two step process: (i) -document source is prepared, that is, marked up in sisu markup syntax and (ii) -the desired output subsequently generated by running the sisu engine against -document source. Output representations if updated (in the sisu engine) can be -generated by re-running the engine against the prepared source. Using *SiSU* -markup applied to a document, *SiSU* custom builds various standard open output -formats including plain text, HTML, XHTML, XML, OpenDocument, LaTeX or PDF -files, and populate an SQL database with objects[^3] (equating generally to -paragraph-sized chunks) so searches may be performed and matches returned with -that degree of granularity ( e.g. your search criteria is met by these -documents and at these locations within each document). Document output formats -share a common object numbering system for locating content. This is -particularly suitable for "published" works (finalized texts as opposed to -works that are frequently changed or updated) for which it provides a fixed -means of reference of content. - - -- [3]: objects include: headings, paragraphs, verse, tables, images, but not - footnotes/endnotes which are numbered separately and tied to the object from - which they are referenced. - -In preparing a *SiSU* document you optionally provide semantic information -related to the document in a document header, and in marking up the substantive -text provide information on the structure of the document, primarily indicating -heading levels and footnotes. You also provide information on basic text -attributes where used. The rest is automatic, sisu from this information custom -builds[^4] the different forms of output requested. - - -- [4]: i.e. the html, pdf, odf outputs are each built individually and optimised - for that form of presentation, rather than for example the html being a saved - version of the odf, or the pdf being a saved version of the html. - -*SiSU* works with an abstraction of the document based on its structure which -is comprised of its frame[^5] and the objects[^6] it contains, which enables -*SiSU* to represent the document in many different ways, and to take advantage -of the strengths of different ways of presenting documents. The objects are -numbered, and these numbers can be used to provide a common base for citing -material within a document across the different output format types. This is -significant as page numbers are not suited to the digital age, in web -publishing, changing a browser's default font or using a different browser -means that text appears on different pages; and in publishing in different -formats, html, landscape and portrait pdf etc. again page numbers are of no use -to cite text in a manner that is relevant against the different output types. -Dealing with documents at an object level together with object numbering also -has implications for search. - - -- [5]: the different heading levels - -- [6]: units of text, primarily paragraphs and headings, also any tables, poems, - code-blocks - -One of the challenges of maintaining documents is to keep them in a format that -would allow users to use them without depending on a proprietary software -popular at the time. Consider the ease of dealing with legacy proprietary -formats today and what guarantee you have that old proprietary formats will -remain (or can be read without proprietary software/equipment) in 15 years -time, or the way the way in which html has evolved over its relatively short -span of existence. *SiSU* provides the flexibility of outputing documents in -multiple non-proprietary open formats including html, pdf[^7] and the ISO -standard ODF.[^8] Whilst *SiSU* relies on software, the markup is uncomplicated -and minimalistic which guarantees that future engines can be written to run -against it. It is also easily converted to other formats, which means documents -prepared in *SiSU* can be migrated to other document formats. Further security -is provided by the fact that the software itself, *SiSU* is available under -GPL3 a licence that guarantees that the source code will always be open, and -free as in libre which means that that code base can be used updated and -further developed as required under the terms of its license. Another challenge -is to keep up with a moving target. *SiSU* permits new forms of output to be -added as they become important, (Open Document Format text was added in 2006), -and existing output to be updated (html has evolved and the related module has -been updated repeatedly over the years, presumably when the World Wide Web -Consortium (w3c) finalises html 5 which is currently under development, the -html module will again be updated allowing all existing documents to be -regenerated as html 5). - - -- [7]: Specification submitted by Adobe to ISO to become a full open ISO - specification - -- - -- [8]: ISO/IEC 26300:2006 - -The document formats are written to the file-system and available for indexing -by independent indexing tools, whether off the web like Google and Yahoo or on -the site like Lucene and Hyperestraier. - - -*SiSU* also provides other features such as concordance files and document -content certificates, and the working against an abstraction of document -structure has further possibilities for the research and development of other -document representations, the availability of objects is useful for example for -topic maps and the commercial law thesaurus by Vikki Rogers and Al Krtizer, -together with the flexibility of *SiSU* offers great possibilities. - - -*SiSU* is primarily for published works, which can take advantage of the -citation system to reliably reference its documents. *SiSU* works well in a -complementary manner with such collaborative technologies as Wikis, which can -take advantage of and be used to discuss the substance of content prepared in -*SiSU*. - - - - - -2. HOW DOES SISU WORK? ----------------------- - -*SiSU* markup is fairly minimalistic, it consists of: a (largely optional) -document header, made up of information about the document (such as when it was -published, who authored it, and granting what rights) and any processing -instructions; and markup within the substantive text of the document, which is -related to document structure and typeface. *SiSU* must be able to discern the -structure of a document, (text headings and their levels in relation to each -other), either from information provided in the document header or from markup -within the text (or from a combination of both). Processing is done against an -abstraction of the document comprising of information on the document's -structure and its objects,[2] which the program serializes (providing the -object numbers) and which are assigned hash sum values based on their content. -This abstraction of information about document structure, objects, (and hash -sums), provides considerable flexibility in representing documents different -ways and for different purposes (e.g. search, document layout, publishing, -content certification, concordance etc.), and makes it possible to take -advantage of some of the strengths of established ways of representing -documents, (or indeed to create new ones). - - -3. SUMMARY OF FEATURES ----------------------- - -* sparse/minimal markup (clean utf-8 source texts). Documents are prepared in a -single UTF-8 file using a minimalistic mnemonic syntax. Typical literature, -documents like "War and Peace" require almost no markup, and most of the -headers are optional. - - -* markup is easily readable/parsable by the human eye, (basic markup is simpler -and more sparse than the most basic HTML), [this may also be converted to XML -representations of the same input/source document]. - - -* markup defines document structure (this may be done once in a header -pattern-match description, or for heading levels individually); basic text -attributes (bold, italics, underscore, strike-through etc.) as required; and -semantic information related to the document (header information, extended -beyond the Dublin core and easily further extended as required); the headers -may also contain processing instructions. *SiSU* markup is primarily an -abstraction of document structure and document metadata to permit taking -advantage of the basic strengths of existing alternative practical standard -ways of representing documents [be that browser viewing, paper publication, sql -search etc.] (html, xml, odf, latex, pdf, sql) - - -* for output produces reasonably elegant output of established industry and -institutionally accepted open standard formats.[3] takes advantage of the -different strengths of various standard formats for representing documents, -amongst the output formats currently supported are: - - - * html - both as a single scrollable text and a segmented document - - - * xhtml - - - * XML - both in sax and dom style xml structures for further development as - required - - - * ODF - open document format, the iso standard for document storage - - - * LaTeX - used to generate pdf - - - * pdf (via LaTeX) - - - * sql - population of an sql database, (at the same object level that is used - to cite text within a document) - - -Also produces: concordance files; document content certificates (md5 or sha256 -digests of headings, paragraphs, images etc.) and html manifests (and sitemaps -of content). (b) takes advantage of the strengths implicit in these very -different output types, (e.g. PDFs produced using typesetting of LaTeX, -databases populated with documents at an individual object/paragraph level, -making possible granular search (and related possibilities)) - - -* ensuring content can be cited in a meaningful way regardless of selected -output format. Online publishing (and publishing in multiple document formats) -lacks a useful way of citing text internally within documents (important to -academics generally and to lawyers) as page numbers are meaningless across -browsers and formats. sisu seeks to provide a common way of pinpoint the text -within a document, (which can be utilized for citation and by search engines). -The outputs share a common numbering system that is meaningful (to man and -machine) across all digital outputs whether paper, screen, or database -oriented, (pdf, HTML, xml, sqlite, postgresql), this numbering system can be -used to reference content. - - -* Granular search within documents. SQL databases are populated at an object -level (roughly headings, paragraphs, verse, tables) and become searchable with -that degree of granularity, the output information provides the -object/paragraph numbers which are relevant across all generated outputs; it is -also possible to look at just the matching paragraphs of the documents in the -database; [output indexing also work well with search indexing tools like -hyperestraier]. - - -* long term maintainability of document collections in a world of changing -formats, having a very sparsely marked-up source document base. there is a -considerable degree of future-proofing, output representations are -"upgradeable", and new document formats may be added. e.g. addition of odf -(open document text) module in 2006 and in future html5 output sometime in -future, without modification of existing prepared texts - - -* SQL search aside, documents are generated as required and static once -generated. - - -* documents produced are static files, and may be batch processed, this needs -to be done only once but may be repeated for various reasons as desired -(updated content, addition of new output formats, updated technology document -presentations/representations) - - -* document source (plaintext utf-8) if shared on the net may be used as input -and processed locally to produce the different document outputs - - -* document source may be bundled together (automatically) with associated -documents (multiple language versions or master document with inclusions) and -images and sent as a zip file called a sisupod, if shared on the net these too -may be processed locally to produce the desired document outputs - - -* generated document outputs may automatically be posted to remote sites. - - -* for basic document generation, the only software dependency is *Ruby*, and a -few standard Unix tools (this covers plaintext, HTML, XML, ODF, LaTeX). To use -a database you of course need that, and to convert the LaTeX generated to pdf, -a latex processor like tetex or texlive. - - -* as a developers tool it is flexible and extensible - - -Syntax highlighting for *SiSU* markup is available for a number of text -editors. - - -*SiSU* is less about document layout than about finding a way with little -markup to be able to construct an abstract representation of a document that -makes it possible to produce multiple representations of it which may be rather -different from each other and used for different purposes, whether layout and -publishing, or search of content - - -i.e. to be able to take advantage from this minimal preparation starting point -of some of the strengths of rather different established ways of representing -documents for different purposes, whether for search (relational database, or -indexed flat files generated for that purpose whether of complete documents, or -say of files made up of objects), online viewing (e.g. html, xml, pdf), or -paper publication (e.g. pdf)... - - -the solution arrived at is by extracting structural information about the -document (about headings within the document) and by tracking objects (which -are serialized and also given hash values) in the manner described. It makes -possible representations that are quite different from those offered at -present. For example objects could be saved individually and identified by -their hashes, with an index of how the objects relate to each other to form a -document. - - -DOCUMENT INFORMATION (METADATA) -******************************* - -METADATA --------- - -Document Manifest @ - - - -*Dublin Core* (DC) - - -/DC tags included with this document are provided here./ - - -DC Title: _SiSU - Commands_ - - -DC Creator: _Ralph Amissah_ - - -DC Rights: _Copyright (C) Ralph Amissah 2007, part of SiSU documentation, -License GPL 3_ - - -DC Type: _information_ - - -DC Date created: _2002-08-28_ - - -DC Date issued: _2002-08-28_ - - -DC Date available: _2002-08-28_ - - -DC Date modified: _2007-09-16_ - - -DC Date: _2007-09-16_ - - -*Version Information* - - -Sourcefile: _sisu_introduction.sst_ - - -Filetype: _SiSU text 0.58_ - - -Sourcefile Digest, MD5(sisu_introduction.sst)= -_877333106803c1fc864bccdbd0c667e2_ - - -Skin_Digest: -MD5(/home/ralph/grotto/theatre/dbld/builds/sisu/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= -_20fc43cf3eb6590bc3399a1aef65c5a9_ - - -*Generated* - - -Document (metaverse) last generated: _Tue Sep 25 02:52:52 +0100 2007_ - - -Generated by: _SiSU_ _0.59.1_ of 2007w39/2 (2007-09-25) - - -Ruby version: _ ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]_ - - - -============================================================================== - - title: SiSU - Commands - - creator: Ralph Amissah - - rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, - License GPL 3 - - type: information - - subject: ebook, epublishing, electronic book, electronic publishing, - electronic document, electronic citation, data structure, - citation systems, search - - date.created: 2002-08-28 - - date.issued: 2002-08-28 - - date.available: 2002-08-28 - - date.modified: 2007-09-16 - - date: 2007-09-16 - - - - - -============================================================================== -nil - -Other versions of this document: -manifest: - http://www.jus.uio.no/sisu/sisu_introduction/sisu_manifest.html -html: - http://www.jus.uio.no/sisu/sisu_introduction/toc.html -pdf: - http://www.jus.uio.no/sisu/sisu_introduction/portrait.pdf - http://www.jus.uio.no/sisu/sisu_introduction/landscape.pdf -plaintext (plain text): - http://www.jus.uio.no/sisu/sisu_introduction/plain.txt -at: - http://www.jus.uio.no/sisu -* Generated by: SiSU 0.59.1 of 2007w39/2 (2007-09-25) -* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] -* Last Generated on: Tue Sep 25 02:52:53 +0100 2007 -* SiSU http://www.jus.uio.no/sisu -- cgit v1.2.3