# SEM — Mathematical Apparatus (Capability Catalog) *A non-internal catalog of the operators SEM offers, what each is for, and which entry points of the `sem_cython12` library back them.* This document describes WHAT the apparatus does and WHERE to use it. It does not describe HOW any operator works internally — algorithms, formulas, lemmas and proofs are intentionally not reproduced here. --- ## Conventions - "Item" / "world" / "observation": one row of input data. Items live in some payload space (real numbers, vectors, matrices, sampled functions, sampled manifolds, distributions, complex amplitudes, time-series windows, recursive concept trees) — the apparatus treats them uniformly via a small set of structural operators. - "Concept": a subset of items that share structural meaning. The apparatus can either be told the concepts (labelled mode) or discover them from data (unsupervised mode). - "Witness": an item whose structural position carries information beyond merely belonging to one concept. - "Verdict": the system's qualified output for a new observation - one of `confident`, `gap`, `incoherent` (see §4.6). All of the apparatus is parameter-free and threshold-free: there are no fitting parameters, no numeric cut-offs, no fidelity knobs. --- ## 1. Structural similarity primitives These are the lowest-level building blocks. Each is exposed directly in `sem_cython12.wrapper`. ### 1.1 Pairwise similarity | | | |---|---| | Purpose | Score how close a query item is to the most similar member of a reference set. | | Output | A score in `[0, 1]` per query (1 = at the reference set, 0 = effectively far). | | Applications | Membership tests, retrieval, anomaly detection, k-nearest-neighbour pre-filtering, similarity-weighted aggregation. | | Cython entry point | `batch_max_similarity(X_query, X_members, lam)` | ### 1.2 Multi-class similarity matrix | | | |---|---| | Purpose | The same operation applied across `K` independent reference sets in one call, returning a `(Q, K)` score matrix. | | Applications | Multi-class classification scoring, multi-criterion membership, class-confusion matrices, support-vector inputs to higher-level filters. | | Cython entry point | `concept_support_matrix(X_query, member_mats, lam)` | ### 1.3 Pairwise distance matrix | | | |---|---| | Purpose | Symmetric `(N, N)` distance matrix between rows of `X`. | | Applications | Graph construction, clustering, scale estimation, downstream filtering and ranking. | | Cython entry point | `pairwise_distances(X)` | ### 1.4 Nearest-neighbour distance vector | | | |---|---| | Purpose | For each row, the minimum positive distance to any other row. Rows with no positive-distance neighbour receive `inf`. | | Applications | Local-density estimation, intrinsic-scale derivation, duplicate detection, outlier identification. | | Cython entry point | `nn_distances(X)` | --- ## 2. Multi-criterion filtering primitives Given a real-valued matrix `S` of shape `(N, k)` (rows are items, columns are independent criteria — each in maximisation orientation), these primitives identify structurally informative subsets of rows. ### 2.1 Best-tradeoff filter | | | |---|---| | Purpose | Mask the rows that survive a multi-objective best-tradeoff filter (i.e. items that are not strictly worse than another item on every criterion). | | Applications | Multi-objective optimisation frontier, concept-membership trade-off, candidate winnowing before further analysis. | | Cython entry point | `pareto_core_mask(S)` | ### 2.2 One-sided peak flagging | | | |---|---| | Purpose | Flag row/column pairs where the row is the column-wise winner but contributes nothing on the remaining columns - i.e. items that "peak" on a single criterion alone. | | Applications | Removing items that are only locally informative; finding cross-criterion contributors; bridge identification. | | Cython entry point | `one_sided_mask(S)` | ### 2.3 Non-redundant witness identification | | | |---|---| | Purpose | The subset of rows that survive both 2.1 and 2.2 — items that contribute meaningfully across multiple criteria, not just on one. | | Applications | Bridge-witness selection between concept regions, structurally informative subset extraction, downstream gap analysis. | | Cython entry point | `non_redundant_witnesses(S)` | --- ## 3. Incremental aggregation primitive ### 3.1 Fused centroid + radius update | | | |---|---| | Purpose | One-pass bulk update for an incremental aggregation step. Given `F` reference items - each summarised by a centre vector and a radius (representing the dispersion of `cur_arity` underlying points) - and `A` candidate new contributions, produce all `F * A` updated (centre, radius) pairs that result from appending one candidate to one reference item. | | Applications | Streaming centroid / radius maintenance, candidate-frontier expansion in multi-stage selection, online aggregation pipelines. | | Cython entry point | `extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)` | --- ## 4. Higher-level apparatus Built on the primitives in §1–§3. These are the operators that distinguish SEM as a reasoning system rather than a computation library. Their internal construction is not reproduced here; the "Cython entry points used" column lists the public primitives the operator composes. ### 4.1 Intrinsic scale | | | |---|---| | Purpose | Derive the kernel scale from the data's own structural geometry, so that no manual `lam` value is ever required. | | Applications | Any pipeline that wants the scale property to be a function of the data, not a tuning knob; cross-application portability. | | Cython entry points used | `nn_distances`, `pairwise_distances` | ### 4.2 Concept discovery | | | |---|---| | Purpose | Group observations into structurally coherent regions without using labels, ML training, or numeric thresholds. Returns the concepts the data itself supports. | | Applications | Unsupervised classification, regime identification, exploratory analysis, foundation for downstream operators. | | Cython entry points used | `pairwise_distances`, `nn_distances`, `pareto_core_mask` | ### 4.3 Relational hypothesis generation | | | |---|---| | Purpose | Enumerate candidate structural relationships between concepts (pair-wise and higher-arity) and rank them by support. | | Applications | Discovering laws / regularities between groups, cross-concept analysis, scientific structure recovery. | | Cython entry points used | `concept_support_matrix`, `pareto_core_mask`, `extend_frontier_kernel` | ### 4.4 Semantic gap detection | | | |---|---| | Purpose | Identify positions in structural space where the data should produce a witness bridging two or more concepts but does not. | | Applications | Detecting missing variables, hidden mediators, unobserved confounders; identifying where additional measurement would resolve ambiguity. | | Cython entry points used | `concept_support_matrix`, `non_redundant_witnesses` | ### 4.5 Prototype construction | | | |---|---| | Purpose | Predict the structural features of an item that should exist between known concepts but has not yet been observed. | | Applications | Drug-candidate suggestion, missing-mediator prediction, "what if" scenario generation, hypothesis-driven data acquisition. | | Cython entry points used | `batch_max_similarity`, `concept_support_matrix` | ### 4.6 Verdict-qualified inference | | | |---|---| | Purpose | Decide which concept best explains a new observation, returning one of three outcomes: `confident` (a single concept dominates), `gap` (multiple concepts are equally admissible), `incoherent` (no concept admits the observation consistently). | | Applications | Decision-support systems that must abstain when ambiguous, safety-critical classification, regime change detection, automated triage. | | Cython entry points used | `concept_support_matrix`, `pareto_core_mask`, `batch_max_similarity` | ### 4.7 Lifecycle / dominance verification | | | |---|---| | Purpose | When a real observation arrives, decide whether it confirms, displaces, or co-exists with a previously predicted prototype. Maintains the prototype's status across its lifetime. | | Applications | Continuous-learning pipelines, theory revision under new evidence, audit-trail-preserving inference. | | Cython entry points used | `pareto_core_mask` | ### 4.8 Hierarchical recursion | | | |---|---| | Purpose | Apply every operator above to recursive concept trees — concepts whose members are themselves concepts. Operators bubble through the hierarchy and remain mathematically consistent at every level. | | Applications | Taxonomies, organisational hierarchies, multi-scale analysis (chemical → biological → organism, file → folder → project, etc.). | | Cython entry points used | the operators above, recursively | ### 4.9 Streaming kNN graph maintenance | | | |---|---| | Purpose | Maintain an exact k-nearest-neighbour graph as items are added or removed one at a time, without rebuilding from scratch on each update. | | Applications | Online time-series ingest, sliding-window analytics, sensor-stream monitoring, real-time anomaly detection. | | Cython entry points used | `pairwise_distances`, `nn_distances` (on the contiguous buffer); `scipy.spatial.cKDTree` is used internally above 1000 items for exact O(log N) queries — no fidelity knob. | ### 4.10 Time-series streaming model | | | |---|---| | Purpose | A complete reasoning model over sliding windows of a stream: state extraction, transition modelling, intrinsic-scale maintenance, and verdict-qualified prediction on novel windows. Optionally projects high-dimensional windows to lower dimensions when configured to do so. | | Applications | Multivariate time-series classification, regime detection, online anomaly identification, signal-quality forecasting. | | Cython entry points used | `nn_distances` (intrinsic scale), `concept_support_matrix` (verdict), the streaming-kNN apparatus from 4.9 | --- ## 5. Composition properties The operators in §1–§4 compose along several axes: - **Across payload types**: the same operator works for scalars, vectors, matrices, tensors, functions, manifolds, complex states, distributions, time-series windows. The caller supplies the appropriate distance function or, equivalently, an embedding into Euclidean space. - **Across hierarchy levels**: concepts can themselves be members of parent concepts; operators recurse through the tree (§4.8). - **Under wrapping**: stochastic and temporal extensions can be layered over any base payload type. Triple compositions like "hierarchy of stochastic time-series" are admissible and produce consistent results at every level. --- ## 6. What the apparatus does NOT offer Stated explicitly so users can plan around the limits: - No probability distributions over outcomes. Verdicts are structural, not Bayesian. - No reward / objective optimisation. The apparatus does not learn policies; it identifies structural relationships. - No tuning knobs that trade fidelity for speed. Where some alternatives expose `epsilon`, `top_k`, `temperature`, etc., the apparatus uses data-derived structural boundaries instead. - No approximate-mode kNN (HNSW / IVF / LSH / FAISS lossy modes). Every kNN-related operator returns exact results. --- ## 7. Mapping summary | Apparatus operator | Cython entry point(s) | |---|---| | Pairwise similarity | `batch_max_similarity` | | Multi-class similarity | `concept_support_matrix` | | Pairwise distance | `pairwise_distances` | | Nearest-neighbour distance | `nn_distances` | | Best-tradeoff filter | `pareto_core_mask` | | One-sided peak flag | `one_sided_mask` | | Non-redundant witness | `non_redundant_witnesses` | | Fused centroid + radius update | `extend_frontier_kernel` | | Intrinsic scale | composed of `nn_distances`, `pairwise_distances` | | Concept discovery | composed of `pairwise_distances`, `nn_distances`, `pareto_core_mask` | | Relational hypothesis generation | composed of `concept_support_matrix`, `pareto_core_mask`, `extend_frontier_kernel` | | Semantic gap detection | composed of `concept_support_matrix`, `non_redundant_witnesses` | | Prototype construction | composed of `batch_max_similarity`, `concept_support_matrix` | | Verdict-qualified inference | composed of `concept_support_matrix`, `pareto_core_mask`, `batch_max_similarity` | | Lifecycle / dominance verification | composed of `pareto_core_mask` | | Hierarchical recursion | every operator above, recursively | | Streaming kNN graph | `pairwise_distances`, `nn_distances` | | Time-series streaming model | `nn_distances`, `concept_support_matrix`, streaming kNN | ## 8. Library availability The Cython entry points in the right column of §7 are all in `sem_cython12.wrapper`, distributed at [https://git.sevana.biz/vvs/sem_cython12](https://git.sevana.biz/vvs/sem_cython12). Higher-level apparatus (composed operators in §4) is built on those primitives and ships in the SEM foundation package, separate from this library.