223 lines
8.8 KiB
Markdown
223 lines
8.8 KiB
Markdown
# sem_cython12
|
||
|
||
OpenMP-parallel numerical kernel library for Python. Pre-built
|
||
Linux and Windows binaries included; no compilation required at
|
||
install time.
|
||
|
||
## What is this for?
|
||
|
||
`sem_cython12` is a small, focused toolbox of fast C-level routines
|
||
exposed through a thin numpy wrapper. It is not a general-purpose
|
||
numerical library; it accelerates three specific jobs that are
|
||
awkward or slow to do in pure numpy once `N` reaches the thousands:
|
||
|
||
1. **Similarity / distance over batches of vectors.** Full
|
||
pairwise distance matrices, nearest-neighbour distances, and
|
||
kernel-based `[0, 1]` similarity scores of a query set against
|
||
one or many reference sets. Useful for nearest-neighbour
|
||
search, kernel-density-style scoring, and "how close is each
|
||
query to this concept?" lookups.
|
||
2. **Multi-objective ("best-tradeoff") filtering of score matrices.**
|
||
Given a matrix of `N` candidates × `k` criteria, select the
|
||
rows on the Pareto frontier, isolate rows that only spike on a
|
||
single criterion, and recover the rows that contribute
|
||
meaningfully across several criteria - candidates a naive
|
||
sum-of-scores ranker would miss.
|
||
3. **An incremental aggregation primitive** for streaming
|
||
clustering / frontier-expansion algorithms: a fused bulk update
|
||
that, given `F` running summaries (centre + radius) and `A`
|
||
new contributions, produces all `F·A` updated summaries in one
|
||
parallel pass.
|
||
|
||
The kernels release the GIL, scale near-linearly to ~8 OpenMP
|
||
threads on commodity x86, and operate on shared-memory numpy
|
||
arrays with no inter-process serialisation. The Python wrapper
|
||
handles contiguous-float64 casting and degrades loudly (via
|
||
`available()` / `backend()` plus `RuntimeError`) when the compiled
|
||
extension cannot load on the host - there is no slow pure-Python
|
||
fallback path.
|
||
|
||
The [`demos/`](./demos/) directory contains three runnable
|
||
end-to-end examples (Iris boundary discovery, parameter-free
|
||
anomaly detection, multi-criteria candidate selection) that
|
||
exercise these three jobs against well-known baselines.
|
||
|
||
## Contents
|
||
|
||
- `sem_cython12/sem_core12.cpython-312-x86_64-linux-gnu.so` -
|
||
compiled extension (Linux, CPython 3.12, x86_64).
|
||
- `sem_cython12/sem_core12.cp312-win_amd64.pyd` -
|
||
compiled extension (Windows, CPython 3.12, AMD64).
|
||
- `sem_cython12/wrapper.py` - Python API.
|
||
- `sem_cython12/__init__.py` - package entry.
|
||
|
||
## Compatibility
|
||
|
||
| Platform | Architecture | Python | Runtime requirements |
|
||
|-----------------|--------------|-----------|-----------------------------|
|
||
| Linux | x86_64 | CPython 3.12 | glibc >= 2.31, libgomp |
|
||
| Windows 10/11 | AMD64 | CPython 3.12 | vcomp (ships with Windows) |
|
||
| macOS | - | - | not provided (contact sales@sevana.biz) |
|
||
|
||
Single Python dependency: `numpy >= 1.23` (see `requirements.txt`).
|
||
|
||
## How the binaries were built
|
||
|
||
- **Linux (`*.so`)**: gcc 13.3, OpenMP via `libgomp`, flags
|
||
`-O3 -ffast-math -march=native -fopenmp`.
|
||
- **Windows (`*.pyd`)**: MSVC v14.50 (Visual Studio Build Tools 2026),
|
||
OpenMP via `vcomp`, flags `/O2 /openmp`.
|
||
|
||
Both binaries target CPython 3.12 (cp312) ABI. No other Python
|
||
version is supported in this release.
|
||
|
||
## Install
|
||
|
||
```bash
|
||
git clone https://git.sevana.biz/vvs/sem_cython12.git
|
||
cd sem_cython12
|
||
pip install -r requirements.txt
|
||
# Make the package importable, either:
|
||
pip install -e . # if pyproject.toml/setup.py is added
|
||
# or just put the package on PYTHONPATH:
|
||
export PYTHONPATH=$PWD:$PYTHONPATH
|
||
```
|
||
|
||
## Quick start
|
||
|
||
```python
|
||
import numpy as np
|
||
from sem_cython12 import wrapper as cy
|
||
|
||
# Sanity check
|
||
assert cy.available(), "compiled extension did not load"
|
||
print("backend:", cy.backend())
|
||
|
||
# Thread count (defaults to ~50% of logical cores; set explicitly via
|
||
# either the SEM_NUM_THREADS env var or set_num_threads()):
|
||
cy.set_num_threads(8)
|
||
print("threads:", cy.get_num_threads())
|
||
|
||
# Example workload
|
||
rng = np.random.default_rng(0)
|
||
Q = rng.standard_normal((1000, 32)) # 1000 queries
|
||
M = rng.standard_normal((5000, 32)) # 5000 reference points
|
||
|
||
# For each query: max similarity to any reference, with kernel scale lam.
|
||
sim = cy.batch_max_similarity(Q, M, lam=1.0)
|
||
print(sim.shape, sim.dtype) # (1000,) float64
|
||
```
|
||
|
||
## API reference
|
||
|
||
All functions accept either Python lists or numpy arrays; inputs are
|
||
internally cast to contiguous `float64`. Outputs are numpy arrays.
|
||
|
||
### Configuration
|
||
|
||
| Function | Purpose |
|
||
|---|---|
|
||
| `available() -> bool` | True iff the compiled extension loaded |
|
||
| `backend() -> str` | `'cython12'` or `'python-fallback'` |
|
||
| `get_num_threads() -> int` | Active OpenMP worker count |
|
||
| `set_num_threads(n: int)` | Set OpenMP worker count (n >= 1) |
|
||
|
||
### Distance / similarity
|
||
|
||
| Function | Inputs | Output |
|
||
|---|---|---|
|
||
| `batch_max_similarity(X_query, X_members, lam)` | `(Q, D)`, `(M, D)`, `lam > 0` | `(Q,)` - per-query similarity score in `[0, 1]` against the closest member |
|
||
| `concept_support_matrix(X_query, member_mats, lam)` | `(Q, D)`, list of `(M_k, D)`, `lam > 0` | `(Q, K)` - one similarity column per member set |
|
||
| `pairwise_distances(X)` | `(N, D)` | `(N, N)` - symmetric distance matrix between rows |
|
||
| `nn_distances(X)` | `(N, D)` | `(N,)` - min positive distance per row; `inf` if none |
|
||
|
||
### Best-tradeoff filtering
|
||
|
||
| Function | Inputs | Output |
|
||
|---|---|---|
|
||
| `pareto_core_mask(S)` | `(N, k)` | `(N,)` byte mask: rows that survive the multi-objective best-tradeoff filter |
|
||
| `one_sided_mask(S)` | `(N, k)` | `(N, k)` byte mask: rows contributing meaningfully on a single column only |
|
||
| `non_redundant_witnesses(S)` | `(N, k)` | int32 array of row indices contributing meaningfully across multiple columns |
|
||
|
||
### Vector reduction
|
||
|
||
| Function | Inputs | Output |
|
||
|---|---|---|
|
||
| `extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)` | `(F, D)`, `(F,)`, `(A, D)`, `int` | `(flat_centers (F*A, D), flat_radii (F*A,))` |
|
||
|
||
See the wrapper docstrings for exact semantics of each function.
|
||
|
||
## Demos
|
||
|
||
Three runnable demos live in [`demos/`](./demos/):
|
||
|
||
1. [`01_iris_boundary.py`](./demos/01_iris_boundary.py) — rediscovers
|
||
the famous Iris versicolor/virginica boundary specimens with no
|
||
training, using only `concept_support_matrix` and `pairwise_distances`.
|
||
2. [`02_anomaly_detection.py`](./demos/02_anomaly_detection.py) —
|
||
parameter-free anomaly detection that matches IsolationForest's
|
||
AUC=1.0 on a synthetic benchmark, using only `batch_max_similarity`.
|
||
3. [`03_multicriteria_selection.py`](./demos/03_multicriteria_selection.py)
|
||
— recovers 5/5 hidden balanced candidates that naive sum-of-scores
|
||
ranking misses, using `pareto_core_mask` and `non_redundant_witnesses`.
|
||
|
||
A standalone copy of the demos repository is also published at
|
||
https://git.sevana.biz/vvs/sem_cython12-demos.
|
||
|
||
## Performance notes
|
||
|
||
Threads are configured globally per process; calling
|
||
`set_num_threads(n)` updates the OpenMP team size for all subsequent
|
||
calls. The default uses approximately 50% of the host's logical
|
||
cores so other processes are not starved on shared machines.
|
||
|
||
For workloads dominated by `pairwise_distances` and
|
||
`pareto_core_mask`, near-linear scaling up to ~8 threads is typical
|
||
on commodity x86 hardware. `batch_max_similarity` is BLAS-friendly
|
||
and benefits most from larger `M` (reference set) at fixed `D`.
|
||
|
||
## Memory / threading model
|
||
|
||
- All arrays are processed in shared memory; no inter-process
|
||
serialisation.
|
||
- Each routine releases the GIL during its inner loops, so calling
|
||
it concurrently from Python threads is safe.
|
||
- The compiled extension links against the system OpenMP runtime
|
||
(`libgomp`); avoid mixing with conda's `intel-openmp` in the same
|
||
process if possible.
|
||
|
||
## Privacy / telemetry
|
||
|
||
`sem_cython12` performs **no network I/O**, opens no sockets, and
|
||
writes no files outside the calling process's working directory.
|
||
There is no telemetry, no usage reporting, and no licence-server
|
||
check-in. All computation is in-process on local arrays.
|
||
|
||
## Diagnostics
|
||
|
||
`backend()` returns `'python-fallback'` only when the `.so` failed
|
||
to import (wrong architecture, glibc too old, missing libgomp). In
|
||
that state, every numerical function raises `RuntimeError`; check
|
||
`available()` before each batch to fail loudly rather than silently
|
||
fall back.
|
||
|
||
## Licence
|
||
|
||
The Software is licensed under the terms contained in the [LICENSE](./LICENSE)
|
||
file in this repository.
|
||
|
||
In short:
|
||
- **Research and non-commercial use**: granted free of charge under
|
||
the conditions in section 2 of the LICENSE.
|
||
- **Commercial use**: requires a separate written commercial
|
||
licence from the Licensor. Contact `sales@sevana.biz`.
|
||
- **No warranty**: the Software is provided strictly "AS IS",
|
||
without warranty of any kind. The Licensor's total aggregate
|
||
liability is limited to zero.
|
||
|
||
Please read the LICENSE file in full before using the Software.
|
||
|
||
## Support
|
||
|
||
Open an issue at https://git.sevana.biz/vvs/sem_cython12.
|