SEM — Mathematical Apparatus (Capability Catalog)
A non-internal catalog of the operators SEM offers, what each is for,
and which entry points of the sem_cython12 library back them.
This document describes WHAT the apparatus does and WHERE to use it.
It does not describe HOW any operator works internally — algorithms,
formulas, lemmas and proofs are intentionally not reproduced here.
Conventions
- "Item" / "world" / "observation": one row of input data. Items live
in some payload space (real numbers, vectors, matrices, sampled
functions, sampled manifolds, distributions, complex amplitudes,
time-series windows, recursive concept trees) — the apparatus
treats them uniformly via a small set of structural operators.
- "Concept": a subset of items that share structural meaning. The
apparatus can either be told the concepts (labelled mode) or
discover them from data (unsupervised mode).
- "Witness": an item whose structural position carries information
beyond merely belonging to one concept.
- "Verdict": the system's qualified output for a new observation -
one of
confident, gap, incoherent (see §4.6).
All of the apparatus is parameter-free and threshold-free: there are
no fitting parameters, no numeric cut-offs, no fidelity knobs.
1. Structural similarity primitives
These are the lowest-level building blocks. Each is exposed directly
in sem_cython12.wrapper.
1.1 Pairwise similarity
|
|
| Purpose |
Score how close a query item is to the most similar member of a reference set. |
| Output |
A score in [0, 1] per query (1 = at the reference set, 0 = effectively far). |
| Applications |
Membership tests, retrieval, anomaly detection, k-nearest-neighbour pre-filtering, similarity-weighted aggregation. |
| Cython entry point |
batch_max_similarity(X_query, X_members, lam) |
1.2 Multi-class similarity matrix
|
|
| Purpose |
The same operation applied across K independent reference sets in one call, returning a (Q, K) score matrix. |
| Applications |
Multi-class classification scoring, multi-criterion membership, class-confusion matrices, support-vector inputs to higher-level filters. |
| Cython entry point |
concept_support_matrix(X_query, member_mats, lam) |
1.3 Pairwise distance matrix
|
|
| Purpose |
Symmetric (N, N) distance matrix between rows of X. |
| Applications |
Graph construction, clustering, scale estimation, downstream filtering and ranking. |
| Cython entry point |
pairwise_distances(X) |
1.4 Nearest-neighbour distance vector
|
|
| Purpose |
For each row, the minimum positive distance to any other row. Rows with no positive-distance neighbour receive inf. |
| Applications |
Local-density estimation, intrinsic-scale derivation, duplicate detection, outlier identification. |
| Cython entry point |
nn_distances(X) |
2. Multi-criterion filtering primitives
Given a real-valued matrix S of shape (N, k) (rows are items,
columns are independent criteria — each in maximisation orientation),
these primitives identify structurally informative subsets of rows.
2.1 Best-tradeoff filter
|
|
| Purpose |
Mask the rows that survive a multi-objective best-tradeoff filter (i.e. items that are not strictly worse than another item on every criterion). |
| Applications |
Multi-objective optimisation frontier, concept-membership trade-off, candidate winnowing before further analysis. |
| Cython entry point |
pareto_core_mask(S) |
2.2 One-sided peak flagging
|
|
| Purpose |
Flag row/column pairs where the row is the column-wise winner but contributes nothing on the remaining columns - i.e. items that "peak" on a single criterion alone. |
| Applications |
Removing items that are only locally informative; finding cross-criterion contributors; bridge identification. |
| Cython entry point |
one_sided_mask(S) |
2.3 Non-redundant witness identification
|
|
| Purpose |
The subset of rows that survive both 2.1 and 2.2 — items that contribute meaningfully across multiple criteria, not just on one. |
| Applications |
Bridge-witness selection between concept regions, structurally informative subset extraction, downstream gap analysis. |
| Cython entry point |
non_redundant_witnesses(S) |
3. Incremental aggregation primitive
3.1 Fused centroid + radius update
|
|
| Purpose |
One-pass bulk update for an incremental aggregation step. Given F reference items - each summarised by a centre vector and a radius (representing the dispersion of cur_arity underlying points) - and A candidate new contributions, produce all F * A updated (centre, radius) pairs that result from appending one candidate to one reference item. |
| Applications |
Streaming centroid / radius maintenance, candidate-frontier expansion in multi-stage selection, online aggregation pipelines. |
| Cython entry point |
extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity) |
4. Higher-level apparatus
Built on the primitives in §1–§3. These are the operators that
distinguish SEM as a reasoning system rather than a computation
library. Their internal construction is not reproduced here; the
"Cython entry points used" column lists the public primitives the
operator composes.
4.1 Intrinsic scale
|
|
| Purpose |
Derive the kernel scale from the data's own structural geometry, so that no manual lam value is ever required. |
| Applications |
Any pipeline that wants the scale property to be a function of the data, not a tuning knob; cross-application portability. |
| Cython entry points used |
nn_distances, pairwise_distances |
4.2 Concept discovery
|
|
| Purpose |
Group observations into structurally coherent regions without using labels, ML training, or numeric thresholds. Returns the concepts the data itself supports. |
| Applications |
Unsupervised classification, regime identification, exploratory analysis, foundation for downstream operators. |
| Cython entry points used |
pairwise_distances, nn_distances, pareto_core_mask |
4.3 Relational hypothesis generation
|
|
| Purpose |
Enumerate candidate structural relationships between concepts (pair-wise and higher-arity) and rank them by support. |
| Applications |
Discovering laws / regularities between groups, cross-concept analysis, scientific structure recovery. |
| Cython entry points used |
concept_support_matrix, pareto_core_mask, extend_frontier_kernel |
4.4 Semantic gap detection
|
|
| Purpose |
Identify positions in structural space where the data should produce a witness bridging two or more concepts but does not. |
| Applications |
Detecting missing variables, hidden mediators, unobserved confounders; identifying where additional measurement would resolve ambiguity. |
| Cython entry points used |
concept_support_matrix, non_redundant_witnesses |
4.5 Prototype construction
|
|
| Purpose |
Predict the structural features of an item that should exist between known concepts but has not yet been observed. |
| Applications |
Drug-candidate suggestion, missing-mediator prediction, "what if" scenario generation, hypothesis-driven data acquisition. |
| Cython entry points used |
batch_max_similarity, concept_support_matrix |
4.6 Verdict-qualified inference
|
|
| Purpose |
Decide which concept best explains a new observation, returning one of three outcomes: confident (a single concept dominates), gap (multiple concepts are equally admissible), incoherent (no concept admits the observation consistently). |
| Applications |
Decision-support systems that must abstain when ambiguous, safety-critical classification, regime change detection, automated triage. |
| Cython entry points used |
concept_support_matrix, pareto_core_mask, batch_max_similarity |
4.7 Lifecycle / dominance verification
|
|
| Purpose |
When a real observation arrives, decide whether it confirms, displaces, or co-exists with a previously predicted prototype. Maintains the prototype's status across its lifetime. |
| Applications |
Continuous-learning pipelines, theory revision under new evidence, audit-trail-preserving inference. |
| Cython entry points used |
pareto_core_mask |
4.8 Hierarchical recursion
|
|
| Purpose |
Apply every operator above to recursive concept trees — concepts whose members are themselves concepts. Operators bubble through the hierarchy and remain mathematically consistent at every level. |
| Applications |
Taxonomies, organisational hierarchies, multi-scale analysis (chemical → biological → organism, file → folder → project, etc.). |
| Cython entry points used |
the operators above, recursively |
4.9 Streaming kNN graph maintenance
|
|
| Purpose |
Maintain an exact k-nearest-neighbour graph as items are added or removed one at a time, without rebuilding from scratch on each update. |
| Applications |
Online time-series ingest, sliding-window analytics, sensor-stream monitoring, real-time anomaly detection. |
| Cython entry points used |
pairwise_distances, nn_distances (on the contiguous buffer); scipy.spatial.cKDTree is used internally above 1000 items for exact O(log N) queries — no fidelity knob. |
4.10 Time-series streaming model
|
|
| Purpose |
A complete reasoning model over sliding windows of a stream: state extraction, transition modelling, intrinsic-scale maintenance, and verdict-qualified prediction on novel windows. Optionally projects high-dimensional windows to lower dimensions when configured to do so. |
| Applications |
Multivariate time-series classification, regime detection, online anomaly identification, signal-quality forecasting. |
| Cython entry points used |
nn_distances (intrinsic scale), concept_support_matrix (verdict), the streaming-kNN apparatus from 4.9 |
5. Composition properties
The operators in §1–§4 compose along several axes:
- Across payload types: the same operator works for scalars,
vectors, matrices, tensors, functions, manifolds, complex states,
distributions, time-series windows. The caller supplies the
appropriate distance function or, equivalently, an embedding into
Euclidean space.
- Across hierarchy levels: concepts can themselves be members of
parent concepts; operators recurse through the tree (§4.8).
- Under wrapping: stochastic and temporal extensions can be
layered over any base payload type. Triple compositions like
"hierarchy of stochastic time-series" are admissible and produce
consistent results at every level.
6. What the apparatus does NOT offer
Stated explicitly so users can plan around the limits:
- No probability distributions over outcomes. Verdicts are
structural, not Bayesian.
- No reward / objective optimisation. The apparatus does not learn
policies; it identifies structural relationships.
- No tuning knobs that trade fidelity for speed. Where some
alternatives expose
epsilon, top_k, temperature, etc., the
apparatus uses data-derived structural boundaries instead.
- No approximate-mode kNN (HNSW / IVF / LSH / FAISS lossy modes).
Every kNN-related operator returns exact results.
7. Mapping summary
| Apparatus operator |
Cython entry point(s) |
| Pairwise similarity |
batch_max_similarity |
| Multi-class similarity |
concept_support_matrix |
| Pairwise distance |
pairwise_distances |
| Nearest-neighbour distance |
nn_distances |
| Best-tradeoff filter |
pareto_core_mask |
| One-sided peak flag |
one_sided_mask |
| Non-redundant witness |
non_redundant_witnesses |
| Fused centroid + radius update |
extend_frontier_kernel |
| Intrinsic scale |
composed of nn_distances, pairwise_distances |
| Concept discovery |
composed of pairwise_distances, nn_distances, pareto_core_mask |
| Relational hypothesis generation |
composed of concept_support_matrix, pareto_core_mask, extend_frontier_kernel |
| Semantic gap detection |
composed of concept_support_matrix, non_redundant_witnesses |
| Prototype construction |
composed of batch_max_similarity, concept_support_matrix |
| Verdict-qualified inference |
composed of concept_support_matrix, pareto_core_mask, batch_max_similarity |
| Lifecycle / dominance verification |
composed of pareto_core_mask |
| Hierarchical recursion |
every operator above, recursively |
| Streaming kNN graph |
pairwise_distances, nn_distances |
| Time-series streaming model |
nn_distances, concept_support_matrix, streaming kNN |
8. Library availability
The Cython entry points in the right column of §7 are all in
sem_cython12.wrapper, distributed at
https://git.sevana.biz/vvs/sem_cython12.
Higher-level apparatus (composed operators in §4) is built on those
primitives and ships in the SEM foundation package, separate from
this library.