# sem_cython12 OpenMP-parallel numerical kernel library for Python. Pre-built Linux and Windows binaries included; no compilation required at install time. ## What is this for? `sem_cython12` is a small, focused toolbox of fast C-level routines exposed through a thin numpy wrapper. It is not a general-purpose numerical library; it accelerates three specific jobs that are awkward or slow to do in pure numpy once `N` reaches the thousands: 1. **Similarity / distance over batches of vectors.** Full pairwise distance matrices, nearest-neighbour distances, and kernel-based `[0, 1]` similarity scores of a query set against one or many reference sets. Useful for nearest-neighbour search, kernel-density-style scoring, and "how close is each query to this concept?" lookups. 2. **Multi-objective ("best-tradeoff") filtering of score matrices.** Given a matrix of `N` candidates × `k` criteria, select the rows on the Pareto frontier, isolate rows that only spike on a single criterion, and recover the rows that contribute meaningfully across several criteria - candidates a naive sum-of-scores ranker would miss. 3. **An incremental aggregation primitive** for streaming clustering / frontier-expansion algorithms: a fused bulk update that, given `F` running summaries (centre + radius) and `A` new contributions, produces all `F·A` updated summaries in one parallel pass. The kernels release the GIL, scale near-linearly to ~8 OpenMP threads on commodity x86, and operate on shared-memory numpy arrays with no inter-process serialisation. The Python wrapper handles contiguous-float64 casting and degrades loudly (via `available()` / `backend()` plus `RuntimeError`) when the compiled extension cannot load on the host - there is no slow pure-Python fallback path. The [`demos/`](./demos/) directory contains three runnable end-to-end examples (Iris boundary discovery, parameter-free anomaly detection, multi-criteria candidate selection) that exercise these three jobs against well-known baselines. ## Contents - `sem_cython12/sem_core12.cpython-3{10,11,12,13}-x86_64-linux-gnu.so` - compiled extensions (Linux, x86_64) for CPython 3.10 / 3.11 / 3.12 / 3.13. - `sem_cython12/sem_core12.cp3{10,11,12,13}-win_amd64.pyd` - compiled extensions (Windows, AMD64) for CPython 3.10 / 3.11 / 3.12 / 3.13. - `sem_cython12/wrapper.py` - Python API. - `sem_cython12/__init__.py` - package entry. Python's import system selects the correct binary for the running interpreter automatically — install the whole package and the right `.so` / `.pyd` is picked up by ABI tag. ## Compatibility | Platform | Architecture | Python | Runtime requirements | |-----------------|--------------|------------------------|-----------------------------| | Linux | x86_64 | CPython 3.10/3.11/3.12/3.13 | glibc >= 2.31, libgomp | | Windows 10/11 | AMD64 | CPython 3.10/3.11/3.12/3.13 | vcomp (ships with Windows) | | macOS | - | - | not provided (contact sales@sevana.biz) | Single Python dependency: `numpy >= 1.23` (see `requirements.txt`). ## How the binaries were built - **Linux (`*.so`), cp312**: system gcc 13.3 on Ubuntu, OpenMP via `libgomp`, flags `-O3 -ffast-math -march=native -fopenmp`. - **Linux (`*.so`), cp310 / cp311 / cp313**: conda-forge gcc inside isolated `python=3.10/3.11/3.13` envs (clean, system-Python-free build), same OpenMP and optimisation flags. - **Windows (`*.pyd`), all four versions**: MSVC v14.50 (Visual Studio Build Tools 2026), OpenMP via `vcomp`, flags `/O2 /openmp`. Each built against the matching CPython interpreter installed via `winget`. All eight binaries pass the same numerical smoke test (`batch_max_similarity` over fixed-seed data) and produce identical output to within float64 round-off. ## Install ```bash git clone https://git.sevana.biz/vvs/sem_cython12.git cd sem_cython12 pip install -r requirements.txt # Make the package importable, either: pip install -e . # if pyproject.toml/setup.py is added # or just put the package on PYTHONPATH: export PYTHONPATH=$PWD:$PYTHONPATH ``` ## Quick start ```python import numpy as np from sem_cython12 import wrapper as cy # Sanity check assert cy.available(), "compiled extension did not load" print("backend:", cy.backend()) # Thread count (defaults to ~50% of logical cores; set explicitly via # either the SEM_NUM_THREADS env var or set_num_threads()): cy.set_num_threads(8) print("threads:", cy.get_num_threads()) # Example workload rng = np.random.default_rng(0) Q = rng.standard_normal((1000, 32)) # 1000 queries M = rng.standard_normal((5000, 32)) # 5000 reference points # For each query: max similarity to any reference, with kernel scale lam. sim = cy.batch_max_similarity(Q, M, lam=1.0) print(sim.shape, sim.dtype) # (1000,) float64 ``` ## API reference All functions accept either Python lists or numpy arrays; inputs are internally cast to contiguous `float64`. Outputs are numpy arrays. ### Configuration | Function | Purpose | |---|---| | `available() -> bool` | True iff the compiled extension loaded | | `backend() -> str` | `'cython12'` or `'python-fallback'` | | `get_num_threads() -> int` | Active OpenMP worker count | | `set_num_threads(n: int)` | Set OpenMP worker count (n >= 1) | ### Distance / similarity | Function | Inputs | Output | |---|---|---| | `batch_max_similarity(X_query, X_members, lam)` | `(Q, D)`, `(M, D)`, `lam > 0` | `(Q,)` - per-query similarity score in `[0, 1]` against the closest member | | `concept_support_matrix(X_query, member_mats, lam)` | `(Q, D)`, list of `(M_k, D)`, `lam > 0` | `(Q, K)` - one similarity column per member set | | `pairwise_distances(X)` | `(N, D)` | `(N, N)` - symmetric distance matrix between rows | | `nn_distances(X)` | `(N, D)` | `(N,)` - min positive distance per row; `inf` if none | ### Best-tradeoff filtering | Function | Inputs | Output | |---|---|---| | `pareto_core_mask(S)` | `(N, k)` | `(N,)` byte mask: rows that survive the multi-objective best-tradeoff filter | | `one_sided_mask(S)` | `(N, k)` | `(N, k)` byte mask: rows contributing meaningfully on a single column only | | `non_redundant_witnesses(S)` | `(N, k)` | int32 array of row indices contributing meaningfully across multiple columns | ### Vector reduction | Function | Inputs | Output | |---|---|---| | `extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)` | `(F, D)`, `(F,)`, `(A, D)`, `int` | `(flat_centers (F*A, D), flat_radii (F*A,))` | See the wrapper docstrings for exact semantics of each function. ## Documentation - [`docs/SEM_Overview.md`](./docs/SEM_Overview.md) — non-internal introduction to SEM (Similarity Energy Model), what it does, and how the `sem_cython12` library fits in. - [`docs/SEM_Mathematical_Apparatus.md`](./docs/SEM_Mathematical_Apparatus.md) — capabilities-level description of the operators and engines exposed by the library. ## Demos Three runnable demos live in [`demos/`](./demos/): 1. [`01_iris_boundary.py`](./demos/01_iris_boundary.py) — rediscovers the famous Iris versicolor/virginica boundary specimens with no training, using only `concept_support_matrix` and `pairwise_distances`. 2. [`02_anomaly_detection.py`](./demos/02_anomaly_detection.py) — parameter-free anomaly detection that matches IsolationForest's AUC=1.0 on a synthetic benchmark, using only `batch_max_similarity`. 3. [`03_multicriteria_selection.py`](./demos/03_multicriteria_selection.py) — recovers 5/5 hidden balanced candidates that naive sum-of-scores ranking misses, using `pareto_core_mask` and `non_redundant_witnesses`. A standalone copy of the demos repository is also published at https://git.sevana.biz/vvs/sem_cython12-demos. ## Performance notes Threads are configured globally per process; calling `set_num_threads(n)` updates the OpenMP team size for all subsequent calls. The default uses approximately 50% of the host's logical cores so other processes are not starved on shared machines. For workloads dominated by `pairwise_distances` and `pareto_core_mask`, near-linear scaling up to ~8 threads is typical on commodity x86 hardware. `batch_max_similarity` is BLAS-friendly and benefits most from larger `M` (reference set) at fixed `D`. ## Memory / threading model - All arrays are processed in shared memory; no inter-process serialisation. - Each routine releases the GIL during its inner loops, so calling it concurrently from Python threads is safe. - The compiled extension links against the system OpenMP runtime (`libgomp`); avoid mixing with conda's `intel-openmp` in the same process if possible. ## Privacy / telemetry `sem_cython12` performs **no network I/O**, opens no sockets, and writes no files outside the calling process's working directory. There is no telemetry, no usage reporting, and no licence-server check-in. All computation is in-process on local arrays. ## Diagnostics `backend()` returns `'python-fallback'` only when the `.so` failed to import (wrong architecture, glibc too old, missing libgomp). In that state, every numerical function raises `RuntimeError`; check `available()` before each batch to fail loudly rather than silently fall back. ## Licence The Software is licensed under the terms contained in the [LICENSE](./LICENSE) file in this repository. In short: - **Research and non-commercial use**: granted free of charge under the conditions in section 2 of the LICENSE. - **Commercial use**: requires a separate written commercial licence from the Licensor. Contact `sales@sevana.biz`. - **No warranty**: the Software is provided strictly "AS IS", without warranty of any kind. The Licensor's total aggregate liability is limited to zero. Please read the LICENSE file in full before using the Software. ## Support Open an issue at https://git.sevana.biz/vvs/sem_cython12.