Files
sem_cython12/docs/SEM_Mathematical_Apparatus.md

13 KiB

SEM — Mathematical Apparatus (Capability Catalog)

A non-internal catalog of the operators SEM offers, what each is for, and which entry points of the sem_cython12 library back them.

This document describes WHAT the apparatus does and WHERE to use it. It does not describe HOW any operator works internally — algorithms, formulas, lemmas and proofs are intentionally not reproduced here.


Conventions

  • "Item" / "world" / "observation": one row of input data. Items live in some payload space (real numbers, vectors, matrices, sampled functions, sampled manifolds, distributions, complex amplitudes, time-series windows, recursive concept trees) — the apparatus treats them uniformly via a small set of structural operators.
  • "Concept": a subset of items that share structural meaning. The apparatus can either be told the concepts (labelled mode) or discover them from data (unsupervised mode).
  • "Witness": an item whose structural position carries information beyond merely belonging to one concept.
  • "Verdict": the system's qualified output for a new observation - one of confident, gap, incoherent (see §4.6).

All of the apparatus is parameter-free and threshold-free: there are no fitting parameters, no numeric cut-offs, no fidelity knobs.


1. Structural similarity primitives

These are the lowest-level building blocks. Each is exposed directly in sem_cython12.wrapper.

1.1 Pairwise similarity

Purpose Score how close a query item is to the most similar member of a reference set.
Output A score in [0, 1] per query (1 = at the reference set, 0 = effectively far).
Applications Membership tests, retrieval, anomaly detection, k-nearest-neighbour pre-filtering, similarity-weighted aggregation.
Cython entry point batch_max_similarity(X_query, X_members, lam)

1.2 Multi-class similarity matrix

Purpose The same operation applied across K independent reference sets in one call, returning a (Q, K) score matrix.
Applications Multi-class classification scoring, multi-criterion membership, class-confusion matrices, support-vector inputs to higher-level filters.
Cython entry point concept_support_matrix(X_query, member_mats, lam)

1.3 Pairwise distance matrix

Purpose Symmetric (N, N) distance matrix between rows of X.
Applications Graph construction, clustering, scale estimation, downstream filtering and ranking.
Cython entry point pairwise_distances(X)

1.4 Nearest-neighbour distance vector

Purpose For each row, the minimum positive distance to any other row. Rows with no positive-distance neighbour receive inf.
Applications Local-density estimation, intrinsic-scale derivation, duplicate detection, outlier identification.
Cython entry point nn_distances(X)

2. Multi-criterion filtering primitives

Given a real-valued matrix S of shape (N, k) (rows are items, columns are independent criteria — each in maximisation orientation), these primitives identify structurally informative subsets of rows.

2.1 Best-tradeoff filter

Purpose Mask the rows that survive a multi-objective best-tradeoff filter (i.e. items that are not strictly worse than another item on every criterion).
Applications Multi-objective optimisation frontier, concept-membership trade-off, candidate winnowing before further analysis.
Cython entry point pareto_core_mask(S)

2.2 One-sided peak flagging

Purpose Flag row/column pairs where the row is the column-wise winner but contributes nothing on the remaining columns - i.e. items that "peak" on a single criterion alone.
Applications Removing items that are only locally informative; finding cross-criterion contributors; bridge identification.
Cython entry point one_sided_mask(S)

2.3 Non-redundant witness identification

Purpose The subset of rows that survive both 2.1 and 2.2 — items that contribute meaningfully across multiple criteria, not just on one.
Applications Bridge-witness selection between concept regions, structurally informative subset extraction, downstream gap analysis.
Cython entry point non_redundant_witnesses(S)

3. Incremental aggregation primitive

3.1 Fused centroid + radius update

Purpose One-pass bulk update for an incremental aggregation step. Given F reference items - each summarised by a centre vector and a radius (representing the dispersion of cur_arity underlying points) - and A candidate new contributions, produce all F * A updated (centre, radius) pairs that result from appending one candidate to one reference item.
Applications Streaming centroid / radius maintenance, candidate-frontier expansion in multi-stage selection, online aggregation pipelines.
Cython entry point extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)

4. Higher-level apparatus

Built on the primitives in §1–§3. These are the operators that distinguish SEM as a reasoning system rather than a computation library. Their internal construction is not reproduced here; the "Cython entry points used" column lists the public primitives the operator composes.

4.1 Intrinsic scale

Purpose Derive the kernel scale from the data's own structural geometry, so that no manual lam value is ever required.
Applications Any pipeline that wants the scale property to be a function of the data, not a tuning knob; cross-application portability.
Cython entry points used nn_distances, pairwise_distances

4.2 Concept discovery

Purpose Group observations into structurally coherent regions without using labels, ML training, or numeric thresholds. Returns the concepts the data itself supports.
Applications Unsupervised classification, regime identification, exploratory analysis, foundation for downstream operators.
Cython entry points used pairwise_distances, nn_distances, pareto_core_mask

4.3 Relational hypothesis generation

Purpose Enumerate candidate structural relationships between concepts (pair-wise and higher-arity) and rank them by support.
Applications Discovering laws / regularities between groups, cross-concept analysis, scientific structure recovery.
Cython entry points used concept_support_matrix, pareto_core_mask, extend_frontier_kernel

4.4 Semantic gap detection

Purpose Identify positions in structural space where the data should produce a witness bridging two or more concepts but does not.
Applications Detecting missing variables, hidden mediators, unobserved confounders; identifying where additional measurement would resolve ambiguity.
Cython entry points used concept_support_matrix, non_redundant_witnesses

4.5 Prototype construction

Purpose Predict the structural features of an item that should exist between known concepts but has not yet been observed.
Applications Drug-candidate suggestion, missing-mediator prediction, "what if" scenario generation, hypothesis-driven data acquisition.
Cython entry points used batch_max_similarity, concept_support_matrix

4.6 Verdict-qualified inference

Purpose Decide which concept best explains a new observation, returning one of three outcomes: confident (a single concept dominates), gap (multiple concepts are equally admissible), incoherent (no concept admits the observation consistently).
Applications Decision-support systems that must abstain when ambiguous, safety-critical classification, regime change detection, automated triage.
Cython entry points used concept_support_matrix, pareto_core_mask, batch_max_similarity

4.7 Lifecycle / dominance verification

Purpose When a real observation arrives, decide whether it confirms, displaces, or co-exists with a previously predicted prototype. Maintains the prototype's status across its lifetime.
Applications Continuous-learning pipelines, theory revision under new evidence, audit-trail-preserving inference.
Cython entry points used pareto_core_mask

4.8 Hierarchical recursion

Purpose Apply every operator above to recursive concept trees — concepts whose members are themselves concepts. Operators bubble through the hierarchy and remain mathematically consistent at every level.
Applications Taxonomies, organisational hierarchies, multi-scale analysis (chemical → biological → organism, file → folder → project, etc.).
Cython entry points used the operators above, recursively

4.9 Streaming kNN graph maintenance

Purpose Maintain an exact k-nearest-neighbour graph as items are added or removed one at a time, without rebuilding from scratch on each update.
Applications Online time-series ingest, sliding-window analytics, sensor-stream monitoring, real-time anomaly detection.
Cython entry points used pairwise_distances, nn_distances (on the contiguous buffer); scipy.spatial.cKDTree is used internally above 1000 items for exact O(log N) queries — no fidelity knob.

4.10 Time-series streaming model

Purpose A complete reasoning model over sliding windows of a stream: state extraction, transition modelling, intrinsic-scale maintenance, and verdict-qualified prediction on novel windows. Optionally projects high-dimensional windows to lower dimensions when configured to do so.
Applications Multivariate time-series classification, regime detection, online anomaly identification, signal-quality forecasting.
Cython entry points used nn_distances (intrinsic scale), concept_support_matrix (verdict), the streaming-kNN apparatus from 4.9

5. Composition properties

The operators in §1–§4 compose along several axes:

  • Across payload types: the same operator works for scalars, vectors, matrices, tensors, functions, manifolds, complex states, distributions, time-series windows. The caller supplies the appropriate distance function or, equivalently, an embedding into Euclidean space.
  • Across hierarchy levels: concepts can themselves be members of parent concepts; operators recurse through the tree (§4.8).
  • Under wrapping: stochastic and temporal extensions can be layered over any base payload type. Triple compositions like "hierarchy of stochastic time-series" are admissible and produce consistent results at every level.

6. What the apparatus does NOT offer

Stated explicitly so users can plan around the limits:

  • No probability distributions over outcomes. Verdicts are structural, not Bayesian.
  • No reward / objective optimisation. The apparatus does not learn policies; it identifies structural relationships.
  • No tuning knobs that trade fidelity for speed. Where some alternatives expose epsilon, top_k, temperature, etc., the apparatus uses data-derived structural boundaries instead.
  • No approximate-mode kNN (HNSW / IVF / LSH / FAISS lossy modes). Every kNN-related operator returns exact results.

7. Mapping summary

Apparatus operator Cython entry point(s)
Pairwise similarity batch_max_similarity
Multi-class similarity concept_support_matrix
Pairwise distance pairwise_distances
Nearest-neighbour distance nn_distances
Best-tradeoff filter pareto_core_mask
One-sided peak flag one_sided_mask
Non-redundant witness non_redundant_witnesses
Fused centroid + radius update extend_frontier_kernel
Intrinsic scale composed of nn_distances, pairwise_distances
Concept discovery composed of pairwise_distances, nn_distances, pareto_core_mask
Relational hypothesis generation composed of concept_support_matrix, pareto_core_mask, extend_frontier_kernel
Semantic gap detection composed of concept_support_matrix, non_redundant_witnesses
Prototype construction composed of batch_max_similarity, concept_support_matrix
Verdict-qualified inference composed of concept_support_matrix, pareto_core_mask, batch_max_similarity
Lifecycle / dominance verification composed of pareto_core_mask
Hierarchical recursion every operator above, recursively
Streaming kNN graph pairwise_distances, nn_distances
Time-series streaming model nn_distances, concept_support_matrix, streaming kNN

8. Library availability

The Cython entry points in the right column of §7 are all in sem_cython12.wrapper, distributed at https://git.sevana.biz/vvs/sem_cython12. Higher-level apparatus (composed operators in §4) is built on those primitives and ships in the SEM foundation package, separate from this library.