| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
test/test-abstraction-ssp.sh: shell script that generates
.ssp files for all sample documents and diffs against
committed reference files in test/reference/abstraction/.
Usage:
./test/test-abstraction-ssp.sh --generate ./bin/spine-ldc
./test/test-abstraction-ssp.sh ./bin/spine-ldc
Exits 0 if all .ssp files match, 1 if differences found.
Reports changed, missing, and new files.
test/reference/abstraction/: 35 reference .ssp files
generated from the current abstraction, covering all
sample documents including 9-language live-manual.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Finer-grained control over when .ssp files are produced:
--show-abstraction writes .ssp to OUTPUT/lang/abstraction/
independently of any pod flag
--pod builds pod without .ssp bundled
--pod2 builds pod with .ssp in media/abstraction/
Changes to spine.d:
- show_abstraction() now only responds to its own flag and
pod2, no longer triggered by source_or_pod
- Add pod2 to opts init, getopt, OptActions
- pod() returns true for both --pod and --pod2
- source_or_pod() includes pod2
Changes to source_pod.d:
- Remove per-document pod directory (rmdirRecurse) before
regeneration, ensuring clean slate on every run. This
prevents stale content from previous runs (e.g. a --pod2
run followed by --pod would otherwise leave an outdated
media/abstraction/ directory)
- Gate abstraction directory creation and .ssp bundling on
pod2 flag specifically
Tested: --pod (no .ssp), --pod2 (.ssp in pod + zip),
--show-abstraction (standalone .ssp), --pod after --pod2
(stale abstraction cleaned up). All 35 sample documents pass.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add empty-string guards to array property loops
(.stow_link, .lev4_subtoc, .anchor_tag) so entries with
zero-length values are not emitted. Empty properties have
no value for PEG parsing - absent lines are faster to skip
than matching a property name to find an empty value.
Removes 1488 empty .anchor_tag: lines from Wealth of
Networks .ssp alone.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add explicit child heading OCN lists to heading objects,
pre-computed in a single O(n) pass over the body section
before serialization. This makes the document tree directly
navigable without scanning - each heading lists its direct
sub-heading OCNs.
- Example output for a chapter heading:
[10] heading :1
.last_descendant: 65
.children: 14 24 42 57
- Implementation: builds an int[][int] map (parent_ocn ->
child heading OCNs) from one pass over the body objects,
then emits .children: during serialization for headings
that have entries in the map.
- The tree was already reconstructable from parent_ocn +
last_descendant_ocn, but .children makes it immediate -
no scanning required to find a heading's sub-structure.
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Make the .ssp format a complete representation of the
document abstraction by serializing all remaining fields
from ObjGenericComposite (only omitting ptr.* runtime
indices which are meaningless outside the in-memory context).
- New fields added:
.ancestors_collapsed: - collapsed level ancestor chain
.dom_status: - DOM structure markedup tags status[8]
.dom_status_collapsed: - DOM structure collapsed status[8]
.heading_lev_collapsed: - collapsed heading level
.parent_lev: - parent heading level (markup)
.o_n_type: - object numbering type (0=ocn, 1=non, 2=bkidx)
.is_of_type: - para/block type classification
.attrib: - general attributes string
.meta_lang: - block language (group/block/quote)
.meta_syntax: - codeblock syntax from metainfo
.sha256: - hex-encoded SHA-256 digest of object content
.has: images_no_dim - image without dimensions flag
.table_aligns: - column alignment array
.table_walls: - table walls/borders flag
.stow_link: - extracted URLs (one per line)
.heading_lev_anchor: - heading level anchor tag
.segment_epub: - EPUB segment anchor tag
.heading_ancestors_text: - pipe-separated ancestor headings
.lev4_subtoc: - sub-table-of-contents entries (one per line)
.anchor_tag: - additional anchor tags (one per line)
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- For heading objects, the identifier was always emitted on the
declaration line (e.g. "[10] heading :1 10") even when it was
just the OCN repeated. Now only emits the identifier when it
differs from the OCN (i.e. when there is a named segment like
"acknowledgments" or "a1"), reducing redundancy.
Before: [10] heading :1 10
After: [10] heading :1
Named segments still appear: [0] heading :1 a1
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- When --source/--pod is used, automatically generate the .ssp
document abstraction and bundle it into the pod at
media/abstraction/{doc_uid}.{lang}.ssp
- This makes show_abstraction implicitly true when source_or_pod
is active, so the .ssp file is generated before the pod
assembler runs (abstraction runs before outputHub, and
source_or_pod is the first task in outputHub).
- Changes:
paths_source.d:
Add abstraction_root() path helper to _PodPaths struct,
following the same pattern as image_root(). Produces
paths like pod/media/abstraction/ for both zpod (inside
zip) and filesystem_open_zpod (open directory).
source_pod.d:
- Create media/abstraction/ directory in
podArchive_directory_tree
- Bundle .ssp file in pod_zipMakeReady: reads from the
abstraction output directory, copies to open pod
directory, adds to zip archive, computes SHA-256 digest
- Write .ssp digest in zipArchiveDigest alongside sstm
and ssi digests
spine.d:
Make show_abstraction() return true when source_or_pod is
active (previously only returned true for explicit
--show-abstraction flag).
- The .ssp is always included when building pods - no exclusion
flag for this experimental feature to keep things simple.
Not generated for non-pod outputs (--text, --html, etc.)
unless --show-abstraction is explicitly passed.
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--show-abstraction-db flag to write per-document
- SQLite database of document abstraction
(Claude-Code primary assist)
- Add a new output mode that serializes the in-memory document
abstraction to a per-document SQLite database. This complements
the .ssp text format (--show-abstraction) with a queryable
database representation of the same data.
- Schema:
metadata table - key/value pairs for document metadata
(title, creator, dates, rights, classify, identifiers,
language, notes, make settings, doc_has counts)
objects table - one row per document object with columns:
section, seq (position within section), ocn, is_a,
is_of_part, is_of_type, heading_level, identifier,
parent_ocn, last_descendant_ocn, ancestors,
indent/bullet/lang, has_* flags, segment/anchor tags,
table/code properties, text content
Indexed on: section, ocn, parent_ocn, is_a, heading_level
- Uses prepared statements via d2sqlite3 (existing dependency)
for safe and efficient insertion. Each document produces a
standalone .abstraction.db file in the abstraction/ output
directory.
- New files:
src/sisudoc/io_out/create_abstraction_db.d
Follows the same pattern as create_abstraction_txt.d.
Creates schema, populates metadata via key/value inserts,
then iterates all sections writing objects with prepared
statements within a single transaction.
- Changes to spine.d:
- Add "show-abstraction-db" to opts init, getopt, OptActions
- Add to abstraction(), require_processing_files(), and
meta_processing_general() gates
- Insert call at both spineAbstraction sites
- Tested against all 35 sample documents (including 9-language
live-manual) - zero failures. Works standalone or combined
with --show-abstraction and other output flags.
- Example queries the database supports:
SELECT ocn, heading_level, text FROM objects
WHERE is_a = 'heading' AND section = 'body';
SELECT * FROM objects WHERE parent_ocn = 10;
SELECT key, value FROM metadata WHERE key LIKE 'title.%';
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--show-abstraction flag to write .ssp document abstraction files
- Add a new output mode that serializes the in-memory document
abstraction (produced by spineAbstraction) to a human-readable,
line-oriented text format (.ssp). This captures the full object
model after parsing and abstraction but before output generation.
- The .ssp format uses unambiguous line prefixes:
@section { } - section boundaries (head/toc/body/endnotes/...)
[N] type - object declaration with OCN
.name: value - object properties (only non-defaults)
| content - text content lines
% comment - comments
- New files:
src/sisudoc/io_out/create_abstraction_txt.d
Serializer module following the same template pattern as
metadoc_show_summary.d. Walks doc.abstraction() section by
section, writing metadata preamble (@meta, @make, @doc_has)
then each object with its properties and text content.
Output goes to {output_path}/{lang}/abstraction/{doc}.ssp
- Changes to spine.d:
- Add "show-abstraction" to opts initialization, getopt, and
OptActions struct
- Add show_abstraction to abstraction(), require_processing_files(),
and meta_processing_general() so the flag triggers full document
processing
- Insert call at both spineAbstraction sites (parallel and serial
branches), gated by show_abstraction flag, following the same
pattern as show_config/show_summary/show_make
- Tested against all 35 sample documents (including multilingual
live-manual in 9 languages) - zero failures. Works standalone
(--show-abstraction) or combined with other output flags
(--show-abstraction --html --text). No effect on existing code
paths when the flag is not used.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| | |
|
| | |
|
| |
|
|
|
|
| |
- claude contributed src
- processes zip from url using (system
installed) curl for download
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- claude contributed src
- Opens the zip with std.zip.ZipArchive (reads the whole file into
memory)
- Locates pod.manifest inside the archive to discover document paths
and languages
- Extracts markup files (.sst/.ssm/.ssi) as in-memory strings
- Extracts images as in-memory byte arrays
- Extracts conf/dr_document_make if present
- Presents these to the existing pipeline as if they were read from
the filesystem
- Some security mitigations:
- Zip Slip / Path Traversal: Reject entries containing `..` or
starting with `/`; canonicalize resolved paths and verify they
fall within extraction root
- Zip Bomb: Check `ArchiveMember.size` before extracting; enforce
per-file (50MB) and total size limits (500MB)
- Entry Count: Limit number of entries (a pod should have at most
~100 files)
- Path depth: limit (Maximum 10 path components).
- Symlinks: Verify no symlinks in extracted content before
processing (post-extraction recursive scan)
- Filename Validation: Only allow expected characters; reject null
bytes
- Malformed Zips: Catch `ZipException` from `std.zip.ZipArchive`
constructor
- Cleanup on error
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
- revert to using GCC14: (current) GCC 15
introduced nullptr in its headers, and DMD's
ImportC parser needs update to handle it,
monitor and update
- (nix ldc overlay, minor comsetic)
|
| | |
|
| |
|
|
|
| |
- FIXES issue with .tex files and xetex finding image paths when run
within latex/ output directory
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
- markup samples pod dir: markup/pod-samples/pod
|
| | |
|
| | |
|
| |
|
|
| |
- revisit links (fix later)
|
| |
|
|
|
|
| |
- preferable, endnote parent object number
available for use (as here in text output,
compare "endnotes, add caller ocn" commit)
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
- spine --text [--output=output path] [markup source]
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
- appears to work, but needs review
|
| |
|
|
| |
- .env/ removed as unused & unmaintained
|
| |
|
|
|
| |
- related updates to files
- notes on updating these added (.org)
|
| |
|
|
| |
- plus minor housekeeping/tidy
|
| | |
|
| | |
|