The previous version exposed: - exp(-d/lam) as the literal similarity-kernel form - 'Euclidean' as the literal distance metric - the O1+O2 conditions of the one-sided-mask routine in both the Python docstrings and the README API tables. Replaced with operational descriptions: 'similarity score in [0,1] against the closest member', 'distance matrix between rows', etc. The library's behaviour and call signatures are unchanged.
sem_cython12
OpenMP-parallel numerical kernel library for Python. Pre-built Linux shared object included; no compilation required at install time.
Contents
sem_cython12/sem_core12.cpython-312-x86_64-linux-gnu.so- compiled extension (Linux, CPython 3.12, x86_64).sem_cython12/wrapper.py- Python API.sem_cython12/__init__.py- package entry.
Requirements
- Linux x86_64.
- CPython 3.12.
- numpy >= 1.23 (see
requirements.txt). - A modern glibc + libgomp. Both ship with Ubuntu 20.04 LTS and later. No other system libraries needed.
The Windows / macOS binaries are not included in this distribution.
Install
git clone https://git.sevana.biz/vvs/sem_cython12.git
cd sem_cython12
pip install -r requirements.txt
# Make the package importable, either:
pip install -e . # if pyproject.toml/setup.py is added
# or just put the package on PYTHONPATH:
export PYTHONPATH=$PWD:$PYTHONPATH
Quick start
import numpy as np
from sem_cython12 import wrapper as cy
# Sanity check
assert cy.available(), "compiled extension did not load"
print("backend:", cy.backend())
# Thread count (defaults to ~50% of logical cores; set explicitly via
# either the SEM_NUM_THREADS env var or set_num_threads()):
cy.set_num_threads(8)
print("threads:", cy.get_num_threads())
# Example workload
rng = np.random.default_rng(0)
Q = rng.standard_normal((1000, 32)) # 1000 queries
M = rng.standard_normal((5000, 32)) # 5000 reference points
# For each query: max similarity to any reference, with kernel scale lam.
sim = cy.batch_max_similarity(Q, M, lam=1.0)
print(sim.shape, sim.dtype) # (1000,) float64
API reference
All functions accept either Python lists or numpy arrays; inputs are
internally cast to contiguous float64. Outputs are numpy arrays.
Configuration
| Function | Purpose |
|---|---|
available() -> bool |
True iff the compiled extension loaded |
backend() -> str |
'cython12' or 'python-fallback' |
get_num_threads() -> int |
Active OpenMP worker count |
set_num_threads(n: int) |
Set OpenMP worker count (n >= 1) |
Distance / similarity
| Function | Inputs | Output |
|---|---|---|
batch_max_similarity(X_query, X_members, lam) |
(Q, D), (M, D), lam > 0 |
(Q,) - per-query similarity score in [0, 1] against the closest member |
concept_support_matrix(X_query, member_mats, lam) |
(Q, D), list of (M_k, D), lam > 0 |
(Q, K) - one similarity column per member set |
pairwise_distances(X) |
(N, D) |
(N, N) - symmetric distance matrix between rows |
nn_distances(X) |
(N, D) |
(N,) - min positive distance per row; inf if none |
Best-tradeoff filtering
| Function | Inputs | Output |
|---|---|---|
pareto_core_mask(S) |
(N, k) |
(N,) byte mask: rows that survive the multi-objective best-tradeoff filter |
one_sided_mask(S) |
(N, k) |
(N, k) byte mask: rows contributing meaningfully on a single column only |
non_redundant_witnesses(S) |
(N, k) |
int32 array of row indices contributing meaningfully across multiple columns |
Vector reduction
| Function | Inputs | Output |
|---|---|---|
extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity) |
(F, D), (F,), (A, D), int |
(flat_centers (F*A, D), flat_radii (F*A,)) |
See the wrapper docstrings for exact semantics of each function.
Performance notes
Threads are configured globally per process; calling
set_num_threads(n) updates the OpenMP team size for all subsequent
calls. The default uses approximately 50% of the host's logical
cores so other processes are not starved on shared machines.
For workloads dominated by pairwise_distances and
pareto_core_mask, near-linear scaling up to ~8 threads is typical
on commodity x86 hardware. batch_max_similarity is BLAS-friendly
and benefits most from larger M (reference set) at fixed D.
Memory / threading model
- All arrays are processed in shared memory; no inter-process serialisation.
- Each routine releases the GIL during its inner loops, so calling it concurrently from Python threads is safe.
- The compiled extension links against the system OpenMP runtime
(
libgomp); avoid mixing with conda'sintel-openmpin the same process if possible.
Diagnostics
backend() returns 'python-fallback' only when the .so failed
to import (wrong architecture, glibc too old, missing libgomp). In
that state, every numerical function raises RuntimeError; check
available() before each batch to fail loudly rather than silently
fall back.
Licence
Proprietary. Internal use only.
Support
Open an issue at https://git.sevana.biz/vvs/sem_cython12.