3e588f8024
The previous version exposed: - exp(-d/lam) as the literal similarity-kernel form - 'Euclidean' as the literal distance metric - the O1+O2 conditions of the one-sided-mask routine in both the Python docstrings and the README API tables. Replaced with operational descriptions: 'similarity score in [0,1] against the closest member', 'distance matrix between rows', etc. The library's behaviour and call signatures are unchanged.
136 lines
4.6 KiB
Markdown
136 lines
4.6 KiB
Markdown
# sem_cython12
|
|
|
|
OpenMP-parallel numerical kernel library for Python. Pre-built Linux
|
|
shared object included; no compilation required at install time.
|
|
|
|
## Contents
|
|
|
|
- `sem_cython12/sem_core12.cpython-312-x86_64-linux-gnu.so` -
|
|
compiled extension (Linux, CPython 3.12, x86_64).
|
|
- `sem_cython12/wrapper.py` - Python API.
|
|
- `sem_cython12/__init__.py` - package entry.
|
|
|
|
## Requirements
|
|
|
|
- Linux x86_64.
|
|
- CPython 3.12.
|
|
- numpy >= 1.23 (see `requirements.txt`).
|
|
- A modern glibc + libgomp. Both ship with Ubuntu 20.04 LTS and
|
|
later. No other system libraries needed.
|
|
|
|
The Windows / macOS binaries are not included in this distribution.
|
|
|
|
## Install
|
|
|
|
```bash
|
|
git clone https://git.sevana.biz/vvs/sem_cython12.git
|
|
cd sem_cython12
|
|
pip install -r requirements.txt
|
|
# Make the package importable, either:
|
|
pip install -e . # if pyproject.toml/setup.py is added
|
|
# or just put the package on PYTHONPATH:
|
|
export PYTHONPATH=$PWD:$PYTHONPATH
|
|
```
|
|
|
|
## Quick start
|
|
|
|
```python
|
|
import numpy as np
|
|
from sem_cython12 import wrapper as cy
|
|
|
|
# Sanity check
|
|
assert cy.available(), "compiled extension did not load"
|
|
print("backend:", cy.backend())
|
|
|
|
# Thread count (defaults to ~50% of logical cores; set explicitly via
|
|
# either the SEM_NUM_THREADS env var or set_num_threads()):
|
|
cy.set_num_threads(8)
|
|
print("threads:", cy.get_num_threads())
|
|
|
|
# Example workload
|
|
rng = np.random.default_rng(0)
|
|
Q = rng.standard_normal((1000, 32)) # 1000 queries
|
|
M = rng.standard_normal((5000, 32)) # 5000 reference points
|
|
|
|
# For each query: max similarity to any reference, with kernel scale lam.
|
|
sim = cy.batch_max_similarity(Q, M, lam=1.0)
|
|
print(sim.shape, sim.dtype) # (1000,) float64
|
|
```
|
|
|
|
## API reference
|
|
|
|
All functions accept either Python lists or numpy arrays; inputs are
|
|
internally cast to contiguous `float64`. Outputs are numpy arrays.
|
|
|
|
### Configuration
|
|
|
|
| Function | Purpose |
|
|
|---|---|
|
|
| `available() -> bool` | True iff the compiled extension loaded |
|
|
| `backend() -> str` | `'cython12'` or `'python-fallback'` |
|
|
| `get_num_threads() -> int` | Active OpenMP worker count |
|
|
| `set_num_threads(n: int)` | Set OpenMP worker count (n >= 1) |
|
|
|
|
### Distance / similarity
|
|
|
|
| Function | Inputs | Output |
|
|
|---|---|---|
|
|
| `batch_max_similarity(X_query, X_members, lam)` | `(Q, D)`, `(M, D)`, `lam > 0` | `(Q,)` - per-query similarity score in `[0, 1]` against the closest member |
|
|
| `concept_support_matrix(X_query, member_mats, lam)` | `(Q, D)`, list of `(M_k, D)`, `lam > 0` | `(Q, K)` - one similarity column per member set |
|
|
| `pairwise_distances(X)` | `(N, D)` | `(N, N)` - symmetric distance matrix between rows |
|
|
| `nn_distances(X)` | `(N, D)` | `(N,)` - min positive distance per row; `inf` if none |
|
|
|
|
### Best-tradeoff filtering
|
|
|
|
| Function | Inputs | Output |
|
|
|---|---|---|
|
|
| `pareto_core_mask(S)` | `(N, k)` | `(N,)` byte mask: rows that survive the multi-objective best-tradeoff filter |
|
|
| `one_sided_mask(S)` | `(N, k)` | `(N, k)` byte mask: rows contributing meaningfully on a single column only |
|
|
| `non_redundant_witnesses(S)` | `(N, k)` | int32 array of row indices contributing meaningfully across multiple columns |
|
|
|
|
### Vector reduction
|
|
|
|
| Function | Inputs | Output |
|
|
|---|---|---|
|
|
| `extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)` | `(F, D)`, `(F,)`, `(A, D)`, `int` | `(flat_centers (F*A, D), flat_radii (F*A,))` |
|
|
|
|
See the wrapper docstrings for exact semantics of each function.
|
|
|
|
## Performance notes
|
|
|
|
Threads are configured globally per process; calling
|
|
`set_num_threads(n)` updates the OpenMP team size for all subsequent
|
|
calls. The default uses approximately 50% of the host's logical
|
|
cores so other processes are not starved on shared machines.
|
|
|
|
For workloads dominated by `pairwise_distances` and
|
|
`pareto_core_mask`, near-linear scaling up to ~8 threads is typical
|
|
on commodity x86 hardware. `batch_max_similarity` is BLAS-friendly
|
|
and benefits most from larger `M` (reference set) at fixed `D`.
|
|
|
|
## Memory / threading model
|
|
|
|
- All arrays are processed in shared memory; no inter-process
|
|
serialisation.
|
|
- Each routine releases the GIL during its inner loops, so calling
|
|
it concurrently from Python threads is safe.
|
|
- The compiled extension links against the system OpenMP runtime
|
|
(`libgomp`); avoid mixing with conda's `intel-openmp` in the same
|
|
process if possible.
|
|
|
|
## Diagnostics
|
|
|
|
`backend()` returns `'python-fallback'` only when the `.so` failed
|
|
to import (wrong architecture, glibc too old, missing libgomp). In
|
|
that state, every numerical function raises `RuntimeError`; check
|
|
`available()` before each batch to fail loudly rather than silently
|
|
fall back.
|
|
|
|
## Licence
|
|
|
|
Proprietary. Internal use only.
|
|
|
|
## Support
|
|
|
|
Open an issue at https://git.sevana.biz/vvs/sem_cython12.
|