Add SEM_Overview.md and SEM_Mathematical_Apparatus.md under docs/ and link from README
This commit is contained in:
@@ -0,0 +1,270 @@
|
||||
# SEM — Mathematical Apparatus (Capability Catalog)
|
||||
|
||||
*A non-internal catalog of the operators SEM offers, what each is for,
|
||||
and which entry points of the `sem_cython12` library back them.*
|
||||
|
||||
This document describes WHAT the apparatus does and WHERE to use it.
|
||||
It does not describe HOW any operator works internally — algorithms,
|
||||
formulas, lemmas and proofs are intentionally not reproduced here.
|
||||
|
||||
---
|
||||
|
||||
## Conventions
|
||||
|
||||
- "Item" / "world" / "observation": one row of input data. Items live
|
||||
in some payload space (real numbers, vectors, matrices, sampled
|
||||
functions, sampled manifolds, distributions, complex amplitudes,
|
||||
time-series windows, recursive concept trees) — the apparatus
|
||||
treats them uniformly via a small set of structural operators.
|
||||
- "Concept": a subset of items that share structural meaning. The
|
||||
apparatus can either be told the concepts (labelled mode) or
|
||||
discover them from data (unsupervised mode).
|
||||
- "Witness": an item whose structural position carries information
|
||||
beyond merely belonging to one concept.
|
||||
- "Verdict": the system's qualified output for a new observation -
|
||||
one of `confident`, `gap`, `incoherent` (see §4.6).
|
||||
|
||||
All of the apparatus is parameter-free and threshold-free: there are
|
||||
no fitting parameters, no numeric cut-offs, no fidelity knobs.
|
||||
|
||||
---
|
||||
|
||||
## 1. Structural similarity primitives
|
||||
|
||||
These are the lowest-level building blocks. Each is exposed directly
|
||||
in `sem_cython12.wrapper`.
|
||||
|
||||
### 1.1 Pairwise similarity
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Score how close a query item is to the most similar member of a reference set. |
|
||||
| Output | A score in `[0, 1]` per query (1 = at the reference set, 0 = effectively far). |
|
||||
| Applications | Membership tests, retrieval, anomaly detection, k-nearest-neighbour pre-filtering, similarity-weighted aggregation. |
|
||||
| Cython entry point | `batch_max_similarity(X_query, X_members, lam)` |
|
||||
|
||||
### 1.2 Multi-class similarity matrix
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | The same operation applied across `K` independent reference sets in one call, returning a `(Q, K)` score matrix. |
|
||||
| Applications | Multi-class classification scoring, multi-criterion membership, class-confusion matrices, support-vector inputs to higher-level filters. |
|
||||
| Cython entry point | `concept_support_matrix(X_query, member_mats, lam)` |
|
||||
|
||||
### 1.3 Pairwise distance matrix
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Symmetric `(N, N)` distance matrix between rows of `X`. |
|
||||
| Applications | Graph construction, clustering, scale estimation, downstream filtering and ranking. |
|
||||
| Cython entry point | `pairwise_distances(X)` |
|
||||
|
||||
### 1.4 Nearest-neighbour distance vector
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | For each row, the minimum positive distance to any other row. Rows with no positive-distance neighbour receive `inf`. |
|
||||
| Applications | Local-density estimation, intrinsic-scale derivation, duplicate detection, outlier identification. |
|
||||
| Cython entry point | `nn_distances(X)` |
|
||||
|
||||
---
|
||||
|
||||
## 2. Multi-criterion filtering primitives
|
||||
|
||||
Given a real-valued matrix `S` of shape `(N, k)` (rows are items,
|
||||
columns are independent criteria — each in maximisation orientation),
|
||||
these primitives identify structurally informative subsets of rows.
|
||||
|
||||
### 2.1 Best-tradeoff filter
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Mask the rows that survive a multi-objective best-tradeoff filter (i.e. items that are not strictly worse than another item on every criterion). |
|
||||
| Applications | Multi-objective optimisation frontier, concept-membership trade-off, candidate winnowing before further analysis. |
|
||||
| Cython entry point | `pareto_core_mask(S)` |
|
||||
|
||||
### 2.2 One-sided peak flagging
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Flag row/column pairs where the row is the column-wise winner but contributes nothing on the remaining columns - i.e. items that "peak" on a single criterion alone. |
|
||||
| Applications | Removing items that are only locally informative; finding cross-criterion contributors; bridge identification. |
|
||||
| Cython entry point | `one_sided_mask(S)` |
|
||||
|
||||
### 2.3 Non-redundant witness identification
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | The subset of rows that survive both 2.1 and 2.2 — items that contribute meaningfully across multiple criteria, not just on one. |
|
||||
| Applications | Bridge-witness selection between concept regions, structurally informative subset extraction, downstream gap analysis. |
|
||||
| Cython entry point | `non_redundant_witnesses(S)` |
|
||||
|
||||
---
|
||||
|
||||
## 3. Incremental aggregation primitive
|
||||
|
||||
### 3.1 Fused centroid + radius update
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | One-pass bulk update for an incremental aggregation step. Given `F` reference items - each summarised by a centre vector and a radius (representing the dispersion of `cur_arity` underlying points) - and `A` candidate new contributions, produce all `F * A` updated (centre, radius) pairs that result from appending one candidate to one reference item. |
|
||||
| Applications | Streaming centroid / radius maintenance, candidate-frontier expansion in multi-stage selection, online aggregation pipelines. |
|
||||
| Cython entry point | `extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)` |
|
||||
|
||||
---
|
||||
|
||||
## 4. Higher-level apparatus
|
||||
|
||||
Built on the primitives in §1–§3. These are the operators that
|
||||
distinguish SEM as a reasoning system rather than a computation
|
||||
library. Their internal construction is not reproduced here; the
|
||||
"Cython entry points used" column lists the public primitives the
|
||||
operator composes.
|
||||
|
||||
### 4.1 Intrinsic scale
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Derive the kernel scale from the data's own structural geometry, so that no manual `lam` value is ever required. |
|
||||
| Applications | Any pipeline that wants the scale property to be a function of the data, not a tuning knob; cross-application portability. |
|
||||
| Cython entry points used | `nn_distances`, `pairwise_distances` |
|
||||
|
||||
### 4.2 Concept discovery
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Group observations into structurally coherent regions without using labels, ML training, or numeric thresholds. Returns the concepts the data itself supports. |
|
||||
| Applications | Unsupervised classification, regime identification, exploratory analysis, foundation for downstream operators. |
|
||||
| Cython entry points used | `pairwise_distances`, `nn_distances`, `pareto_core_mask` |
|
||||
|
||||
### 4.3 Relational hypothesis generation
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Enumerate candidate structural relationships between concepts (pair-wise and higher-arity) and rank them by support. |
|
||||
| Applications | Discovering laws / regularities between groups, cross-concept analysis, scientific structure recovery. |
|
||||
| Cython entry points used | `concept_support_matrix`, `pareto_core_mask`, `extend_frontier_kernel` |
|
||||
|
||||
### 4.4 Semantic gap detection
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Identify positions in structural space where the data should produce a witness bridging two or more concepts but does not. |
|
||||
| Applications | Detecting missing variables, hidden mediators, unobserved confounders; identifying where additional measurement would resolve ambiguity. |
|
||||
| Cython entry points used | `concept_support_matrix`, `non_redundant_witnesses` |
|
||||
|
||||
### 4.5 Prototype construction
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Predict the structural features of an item that should exist between known concepts but has not yet been observed. |
|
||||
| Applications | Drug-candidate suggestion, missing-mediator prediction, "what if" scenario generation, hypothesis-driven data acquisition. |
|
||||
| Cython entry points used | `batch_max_similarity`, `concept_support_matrix` |
|
||||
|
||||
### 4.6 Verdict-qualified inference
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Decide which concept best explains a new observation, returning one of three outcomes: `confident` (a single concept dominates), `gap` (multiple concepts are equally admissible), `incoherent` (no concept admits the observation consistently). |
|
||||
| Applications | Decision-support systems that must abstain when ambiguous, safety-critical classification, regime change detection, automated triage. |
|
||||
| Cython entry points used | `concept_support_matrix`, `pareto_core_mask`, `batch_max_similarity` |
|
||||
|
||||
### 4.7 Lifecycle / dominance verification
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | When a real observation arrives, decide whether it confirms, displaces, or co-exists with a previously predicted prototype. Maintains the prototype's status across its lifetime. |
|
||||
| Applications | Continuous-learning pipelines, theory revision under new evidence, audit-trail-preserving inference. |
|
||||
| Cython entry points used | `pareto_core_mask` |
|
||||
|
||||
### 4.8 Hierarchical recursion
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Apply every operator above to recursive concept trees — concepts whose members are themselves concepts. Operators bubble through the hierarchy and remain mathematically consistent at every level. |
|
||||
| Applications | Taxonomies, organisational hierarchies, multi-scale analysis (chemical → biological → organism, file → folder → project, etc.). |
|
||||
| Cython entry points used | the operators above, recursively |
|
||||
|
||||
### 4.9 Streaming kNN graph maintenance
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | Maintain an exact k-nearest-neighbour graph as items are added or removed one at a time, without rebuilding from scratch on each update. |
|
||||
| Applications | Online time-series ingest, sliding-window analytics, sensor-stream monitoring, real-time anomaly detection. |
|
||||
| Cython entry points used | `pairwise_distances`, `nn_distances` (on the contiguous buffer); `scipy.spatial.cKDTree` is used internally above 1000 items for exact O(log N) queries — no fidelity knob. |
|
||||
|
||||
### 4.10 Time-series streaming model
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Purpose | A complete reasoning model over sliding windows of a stream: state extraction, transition modelling, intrinsic-scale maintenance, and verdict-qualified prediction on novel windows. Optionally projects high-dimensional windows to lower dimensions when configured to do so. |
|
||||
| Applications | Multivariate time-series classification, regime detection, online anomaly identification, signal-quality forecasting. |
|
||||
| Cython entry points used | `nn_distances` (intrinsic scale), `concept_support_matrix` (verdict), the streaming-kNN apparatus from 4.9 |
|
||||
|
||||
---
|
||||
|
||||
## 5. Composition properties
|
||||
|
||||
The operators in §1–§4 compose along several axes:
|
||||
|
||||
- **Across payload types**: the same operator works for scalars,
|
||||
vectors, matrices, tensors, functions, manifolds, complex states,
|
||||
distributions, time-series windows. The caller supplies the
|
||||
appropriate distance function or, equivalently, an embedding into
|
||||
Euclidean space.
|
||||
- **Across hierarchy levels**: concepts can themselves be members of
|
||||
parent concepts; operators recurse through the tree (§4.8).
|
||||
- **Under wrapping**: stochastic and temporal extensions can be
|
||||
layered over any base payload type. Triple compositions like
|
||||
"hierarchy of stochastic time-series" are admissible and produce
|
||||
consistent results at every level.
|
||||
|
||||
---
|
||||
|
||||
## 6. What the apparatus does NOT offer
|
||||
|
||||
Stated explicitly so users can plan around the limits:
|
||||
|
||||
- No probability distributions over outcomes. Verdicts are
|
||||
structural, not Bayesian.
|
||||
- No reward / objective optimisation. The apparatus does not learn
|
||||
policies; it identifies structural relationships.
|
||||
- No tuning knobs that trade fidelity for speed. Where some
|
||||
alternatives expose `epsilon`, `top_k`, `temperature`, etc., the
|
||||
apparatus uses data-derived structural boundaries instead.
|
||||
- No approximate-mode kNN (HNSW / IVF / LSH / FAISS lossy modes).
|
||||
Every kNN-related operator returns exact results.
|
||||
|
||||
---
|
||||
|
||||
## 7. Mapping summary
|
||||
|
||||
| Apparatus operator | Cython entry point(s) |
|
||||
|---|---|
|
||||
| Pairwise similarity | `batch_max_similarity` |
|
||||
| Multi-class similarity | `concept_support_matrix` |
|
||||
| Pairwise distance | `pairwise_distances` |
|
||||
| Nearest-neighbour distance | `nn_distances` |
|
||||
| Best-tradeoff filter | `pareto_core_mask` |
|
||||
| One-sided peak flag | `one_sided_mask` |
|
||||
| Non-redundant witness | `non_redundant_witnesses` |
|
||||
| Fused centroid + radius update | `extend_frontier_kernel` |
|
||||
| Intrinsic scale | composed of `nn_distances`, `pairwise_distances` |
|
||||
| Concept discovery | composed of `pairwise_distances`, `nn_distances`, `pareto_core_mask` |
|
||||
| Relational hypothesis generation | composed of `concept_support_matrix`, `pareto_core_mask`, `extend_frontier_kernel` |
|
||||
| Semantic gap detection | composed of `concept_support_matrix`, `non_redundant_witnesses` |
|
||||
| Prototype construction | composed of `batch_max_similarity`, `concept_support_matrix` |
|
||||
| Verdict-qualified inference | composed of `concept_support_matrix`, `pareto_core_mask`, `batch_max_similarity` |
|
||||
| Lifecycle / dominance verification | composed of `pareto_core_mask` |
|
||||
| Hierarchical recursion | every operator above, recursively |
|
||||
| Streaming kNN graph | `pairwise_distances`, `nn_distances` |
|
||||
| Time-series streaming model | `nn_distances`, `concept_support_matrix`, streaming kNN |
|
||||
|
||||
## 8. Library availability
|
||||
|
||||
The Cython entry points in the right column of §7 are all in
|
||||
`sem_cython12.wrapper`, distributed at
|
||||
[https://git.sevana.biz/vvs/sem_cython12](https://git.sevana.biz/vvs/sem_cython12).
|
||||
Higher-level apparatus (composed operators in §4) is built on those
|
||||
primitives and ships in the SEM foundation package, separate from
|
||||
this library.
|
||||
Reference in New Issue
Block a user