ENTREACTE
← Back to Blog
Architecture

Building GNN glyph graphs
from skeletal representations

After reversed binarization extracts the différance objects — the intervals between characters — the next stage of the Entreacte pipeline converts them into a graph. Each interval becomes a node. Spatial proximity between intervals becomes an edge. The result is a graph neural network input that encodes the relational geometry of the entire document.

On the Derrida benchmark page, this produces 1,675 nodes and 17,831 edges. The density of the graph — roughly 10 edges per node — reflects the regularity of typeset Latin text. A handwritten manuscript produces a sparser, more irregular graph. An Arabic newspaper produces a denser one, reflecting the cursive connectivity of Arabic script.

From skeleton to graph

The pipeline proceeds in three steps. First, the binary image is skeletonised using Zhang-Suen thinning — a topology-preserving algorithm that reduces each connected component to a single-pixel-wide medial axis. This skeleton captures the structural form of each interval without the noise of the original pixel boundary.

Second, connected components of the skeleton are extracted and filtered. Components below a minimum area threshold are discarded as noise. Each surviving component becomes a différance object with a bounding box, centroid, and area measurement.

Third, a proximity graph is constructed. For each pair of objects within a spatial threshold, an edge is added. The threshold is adaptive — calibrated to the median object size on the page — so the graph density remains consistent across documents of different resolutions and type sizes.

What message passing reveals

The GNN layer applies two rounds of message passing over this graph. Each node aggregates information from its neighbours, producing an updated feature vector that encodes not just the object's own geometry but its local context. An interval that sits between two narrow character strokes receives different messages than one that sits in the white space between words.

After message passing, the node embeddings are aggregated into a document-level representation. This representation feeds the layout confidence score — a measure of how regularly the glyph graph is structured. Well-formed typeset documents score close to 1.0. Degraded or handwritten documents score lower, and the score itself is a useful quality signal for downstream processing.

Structural encoding without language knowledge

The key property of this representation is that it requires no language-specific knowledge. The graph is built entirely from geometric relationships between intervals. A model trained on the graph structure of Latin documents can be applied to Arabic documents, CJK documents, or ancient scripts without modification — because the graph encodes spatial relationships, not linguistic content.

This is the foundation for the IntervalGlyphNet classifier. By training a neural network on the graph structure of documents in different scripts, we achieve 97.1% classification accuracy on seven script families — without ever looking at a single character glyph. The void between characters is sufficient.