From 7d9fced3aee0c451031e57ffd01086ed42d5e428 Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Sat, 29 Sep 2007 10:10:13 +0100 Subject: sisu documentation related --- .../sisu/sisu_markup_samples/sisu_manual/sisu.ssm | 8 +-- .../sisu_manual/sisu_interesting_to_whom.ssi | 54 ++++++++++++++++ .../sisu_manual/sisu_introduction.ssi | 58 ----------------- .../sisu_manual/sisu_manual.ssm | 4 ++ .../sisu_manual/sisu_work_needed_and_wishlist.ssi | 75 ++++++++++++++++++++++ 5 files changed, 137 insertions(+), 62 deletions(-) create mode 100644 data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_interesting_to_whom.ssi create mode 100644 data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_work_needed_and_wishlist.ssi diff --git a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu.ssm b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu.ssm index 33fbc344..322b3620 100644 --- a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu.ssm +++ b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu.ssm @@ -18,9 +18,9 @@ @date.available: 2002-08-28 -@date.modified: 2007-08-30 +@date.modified: 2007-09-29 -@date: 2007-08-30 +@date: 2007-09-29 @level: new=C; break=1; num_top=1 @@ -51,12 +51,12 @@ sisu [-CcFLSVvW] << |sisu_introduction.ssi|@|^| -<< |sisu_help.sst|@|^| - % :B~? SiSU Commands << |sisu_commands.sst|@|^| +<< |sisu_help.sst|@|^| + % :B~? SiSU Markup << |sisu_markup.sst|@|^| diff --git a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_interesting_to_whom.ssi b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_interesting_to_whom.ssi new file mode 100644 index 00000000..b8e5a5d6 --- /dev/null +++ b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_interesting_to_whom.ssi @@ -0,0 +1,54 @@ +% SiSU 0.58 + +@title: SiSU + +@subtitle: Who Might Be Interested + +@creator: Ralph Amissah + +@rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3 + +@type: information + +@subject: ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search + +@date.created: 2002-08-28 + +@date.issued: 2002-08-28 + +@date.available: 2002-08-28 + +@date.modified: 2007-09-16 + +@date: 2007-09-16 + +@level: new=C; break=1; num_top=1 + +@skin: skin_sisu_manual + +@bold: /Gnu|Debian|Ruby|SiSU/ + +@links: { SiSU Manual }http://www.jus.uio.no/sisu/sisu_manual/ +{ Book Samples and Markup Examples }http://www.jus.uio.no/sisu/SiSU/2.html +{ SiSU @ Wikipedia }http://en.wikipedia.org/wiki/SiSU +{ SiSU @ Freshmeat }http://freshmeat.net/projects/sisu/ +{ SiSU @ Ruby Application Archive }http://raa.ruby-lang.org/project/sisu/ +{ SiSU @ Debian }http://packages.qa.debian.org/s/sisu.html +{ SiSU Download }http://www.jus.uio.no/sisu/SiSU/download.html +{ SiSU Changelog }http://www.jus.uio.no/sisu/SiSU/changelog.html +{ SiSU help }http://www.jus.uio.no/sisu/sisu_manual/sisu_help/ +{ SiSU help sources }http://www.jus.uio.no/sisu/sisu_manual/sisu_help_sources/ + +:A~? @title @creator + +:B~? Who might SiSU interest? + +1~sisu_interest Who might be interested in the SiSU feature set? + +SiSU is most likely to be of interest to people who are working with medium to large volumes of published texts that would like to have the presented in a uniform way that is searchable (either using sisu database integration or an appropriate indexing tool), with the possibility of multiple alternative output formats that may be added to and upgraded/updated over time. SiSU should be of interest to institutions/ organisations/ governments/ individuals with document collections and some technical knowhow that are interested in: + +_* long term maintenance and reducing downstream/future costs of maintaining those document sets for which SiSU is suited. + +_* the ability to output multiple standard format outputs for various purposes. + +_* the implications for search offered diff --git a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_introduction.ssi b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_introduction.ssi index 9a2e2ddd..53301848 100644 --- a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_introduction.ssi +++ b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_introduction.ssi @@ -73,61 +73,3 @@ http://www.jus.uio.no/sisu % SiSU is a way of preparing, publishing, managing and searching documents. -1~sisu_how How does sisu work? - -SiSU markup is fairly minimalistic, it consists of: a (largely optional) document header, made up of information about the document (such as when it was published, who authored it, and granting what rights) and any processing instructions; and markup within the substantive text of the document, which is related to document structure and typeface. SiSU must be able to discern the structure of a document, (text headings and their levels in relation to each other), either from information provided in the document header or from markup within the text (or from a combination of both). Processing is done against an abstraction of the document comprising of information on the document's structure and its objects,[2] which the program serializes (providing the object numbers) and which are assigned hash sum values based on their content. This abstraction of information about document structure, objects, (and hash sums), provides considerable flexibility in representing documents different ways and for different purposes (e.g. search, document layout, publishing, content certification, concordance etc.), and makes it possible to take advantage of some of the strengths of established ways of representing documents, (or indeed to create new ones). - -1~sisu_feature_summary Summary of features - -_* sparse/minimal markup (clean utf-8 source texts). Documents are prepared in a single UTF-8 file using a minimalistic mnemonic syntax. Typical literature, documents like "War and Peace" require almost no markup, and most of the headers are optional. - -_* markup is easily readable/parsable by the human eye, (basic markup is simpler and more sparse than the most basic HTML), [this may also be converted to XML representations of the same input/source document]. - -_* markup defines document structure (this may be done once in a header pattern-match description, or for heading levels individually); basic text attributes (bold, italics, underscore, strike-through etc.) as required; and semantic information related to the document (header information, extended beyond the Dublin core and easily further extended as required); the headers may also contain processing instructions. SiSU markup is primarily an abstraction of document structure and document metadata to permit taking advantage of the basic strengths of existing alternative practical standard ways of representing documents [be that browser viewing, paper publication, sql search etc.] (html, xml, odf, latex, pdf, sql) - -_* for output produces reasonably elegant output of established industry and institutionally accepted open standard formats.[3] takes advantage of the different strengths of various standard formats for representing documents, amongst the output formats currently supported are: - -_1* html - both as a single scrollable text and a segmented document - -_1* xhtml - -_1* XML - both in sax and dom style xml structures for further development as required - -_1* ODF - open document format, the iso standard for document storage - -_1* LaTeX - used to generate pdf - -_1* pdf (via LaTeX) - -_1* sql - population of an sql database, (at the same object level that is used to cite text within a document) - -Also produces: concordance files; document content certificates (md5 or sha256 digests of headings, paragraphs, images etc.) and html manifests (and sitemaps of content). (b) takes advantage of the strengths implicit in these very different output types, (e.g. PDFs produced using typesetting of LaTeX, databases populated with documents at an individual object/paragraph level, making possible granular search (and related possibilities)) - -_* ensuring content can be cited in a meaningful way regardless of selected output format. Online publishing (and publishing in multiple document formats) lacks a useful way of citing text internally within documents (important to academics generally and to lawyers) as page numbers are meaningless across browsers and formats. sisu seeks to provide a common way of pinpoint the text within a document, (which can be utilized for citation and by search engines). The outputs share a common numbering system that is meaningful (to man and machine) across all digital outputs whether paper, screen, or database oriented, (pdf, HTML, xml, sqlite, postgresql), this numbering system can be used to reference content. - -_* Granular search within documents. SQL databases are populated at an object level (roughly headings, paragraphs, verse, tables) and become searchable with that degree of granularity, the output information provides the object/paragraph numbers which are relevant across all generated outputs; it is also possible to look at just the matching paragraphs of the documents in the database; [output indexing also work well with search indexing tools like hyperestraier]. - -_* long term maintainability of document collections in a world of changing formats, having a very sparsely marked-up source document base. there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added. e.g. addition of odf (open document text) module in 2006 and in future html5 output sometime in future, without modification of existing prepared texts - -_* SQL search aside, documents are generated as required and static once generated. - -_* documents produced are static files, and may be batch processed, this needs to be done only once but may be repeated for various reasons as desired (updated content, addition of new output formats, updated technology document presentations/representations) - -_* document source (plaintext utf-8) if shared on the net may be used as input and processed locally to produce the different document outputs - -_* document source may be bundled together (automatically) with associated documents (multiple language versions or master document with inclusions) and images and sent as a zip file called a sisupod, if shared on the net these too may be processed locally to produce the desired document outputs - -_* generated document outputs may automatically be posted to remote sites. - -_* for basic document generation, the only software dependency is Ruby, and a few standard Unix tools (this covers plaintext, HTML, XML, ODF, LaTeX). To use a database you of course need that, and to convert the LaTeX generated to pdf, a latex processor like tetex or texlive. - -_* as a developers tool it is flexible and extensible - -Syntax highlighting for SiSU markup is available for a number of text editors. - -SiSU is less about document layout than about finding a way with little markup to be able to construct an abstract representation of a document that makes it possible to produce multiple representations of it which may be rather different from each other and used for different purposes, whether layout and publishing, or search of content - -i.e. to be able to take advantage from this minimal preparation starting point of some of the strengths of rather different established ways of representing documents for different purposes, whether for search (relational database, or indexed flat files generated for that purpose whether of complete documents, or say of files made up of objects), online viewing (e.g. html, xml, pdf), or paper publication (e.g. pdf)... - -the solution arrived at is by extracting structural information about the document (about headings within the document) and by tracking objects (which are serialized and also given hash values) in the manner described. It makes possible representations that are quite different from those offered at present. For example objects could be saved individually and identified by their hashes, with an index of how the objects relate to each other to form a document. - diff --git a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_manual.ssm b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_manual.ssm index 4a4ecef8..0aab18c8 100644 --- a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_manual.ssm +++ b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_manual.ssm @@ -109,6 +109,10 @@ << |sisu_faq.sst|@|^| +<< |sisu_interesting_to_whom.ssi|@|^| + +<< |sisu_work_needed_and_wishlist.ssi|@|^| + << |sisu_syntax_highlighting.sst|@|^| << |sisu_help_sources.sst|@|^| diff --git a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_work_needed_and_wishlist.ssi b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_work_needed_and_wishlist.ssi new file mode 100644 index 00000000..de9033e8 --- /dev/null +++ b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_work_needed_and_wishlist.ssi @@ -0,0 +1,75 @@ +% SiSU 0.58 + +@title: SiSU + +@subtitle: Work Needed and Wishlist + +@creator: Ralph Amissah + +@rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3 + +@type: information + +@subject: ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search + +@date.created: 2002-08-28 + +@date.issued: 2002-08-28 + +@date.available: 2002-08-28 + +@date.modified: 2007-09-16 + +@date: 2007-09-16 + +@level: new=C; break=1; num_top=1 + +@skin: skin_sisu_manual + +@bold: /Gnu|Debian|Ruby|SiSU/ + +@links: { SiSU Manual }http://www.jus.uio.no/sisu/sisu_manual/ +{ Book Samples and Markup Examples }http://www.jus.uio.no/sisu/SiSU/2.html +{ SiSU @ Wikipedia }http://en.wikipedia.org/wiki/SiSU +{ SiSU @ Freshmeat }http://freshmeat.net/projects/sisu/ +{ SiSU @ Ruby Application Archive }http://raa.ruby-lang.org/project/sisu/ +{ SiSU @ Debian }http://packages.qa.debian.org/s/sisu.html +{ SiSU Download }http://www.jus.uio.no/sisu/SiSU/download.html +{ SiSU Changelog }http://www.jus.uio.no/sisu/SiSU/changelog.html +{ SiSU help }http://www.jus.uio.no/sisu/sisu_manual/sisu_help/ +{ SiSU help sources }http://www.jus.uio.no/sisu/sisu_manual/sisu_help_sources/ + +:A~? @title @creator + +:B~? Work Needed and Wishlist + +1~sisu_work_needed Work Needed + +SiSU is fairly mature and for most purposes the syntax and what it is supposed to do is clear. For the most part additions and changes are minor and backward compatible, (in particular there may be things of interest that to be able to achieve will require additions to the syntax). + +_* Amongst the most requested features is a way to represent and extract bibliographies from scholarly and other writings. This involves an extension of sisu markup syntax and a new module to extract the bibliography. + +_* Integration of postgresql tsearch2 / gin indexing, (which currently needs to be done manually, and) which has been waiting for the integration of tsearch2 / gin into Postgresql main, which is supposed to occur in Postgresql 8.3 + +1~sisu_wishlist Wishlist + +SiSU provides a lot of "plumbing" and is readily usable as a tool by those comfortable with marking up documents with an editor. The syntax is fairly easy to learn, especially the subset required to start using SiSU effectively. + +SiSU might also be of interest to developers interested in: + +_* experimenting with the search implications offered + +_* producing additional output formats + +_* producing conversion tools + +_* producing input interfaces, (experimenting with additional interfaces for producing sisu source documents) + +Several tools that are of interest would come under the heading interface and conversion. Amongst others, the following are of interest: + +_* Converters from various document formats, such as Open Document Text (ODF), MS Word(TM) and Word Perfect(TM), even html. The problem here is one of the most important things for SiSU is to be able to recognise the structure of a document, and many documents prepared in other formats have not been prepared strictly with a view to representing structure, but appearance - so heading levels may be "painted" to look right rather than have the correct structural representation. Even if conversion is not perfect this may serve as a first step in assisting in conversion of documents to SiSU for those with legacy document sets that they would like to have in sisu format. (once in SiSU it is easier to get out in various other formats as this is what sisu does, within the constraints of the information that sisu uses to generate output) + +_* The possibility to save directly from from various word processors, and possibly templates within them to assist in making sure the document structure is "understood" by SiSU. + +_* Web interface/front-end, a form like front end for the writing or submission of sisu documents to a server which uses SiSU to generate output. Headers could be made available as separate small entry forms with help provided to explain where they might be used. Apart from the most important headers such as title, author, date and possibly subject the remainder of the header forms could be placed after the form for substantive content. This would offer a more Web 2.0 like approach to the use of SiSU and the possibility of using it for collaborative editing of content (possibly for documents that are to be finalised/published as the citation system is most suited to published works). [Collaborative editing is currently possible through use of a collaborative editor such as Gobby which makes use of the Obby protocol]. + -- cgit v1.2.3