# sem_cython12 OpenMP-parallel numerical kernel library for Python. Pre-built Linux shared object included; no compilation required at install time. ## Contents - `sem_cython12/sem_core12.cpython-312-x86_64-linux-gnu.so` - compiled extension (Linux, CPython 3.12, x86_64). - `sem_cython12/wrapper.py` - Python API. - `sem_cython12/__init__.py` - package entry. ## Requirements - Linux x86_64. - CPython 3.12. - numpy >= 1.23 (see `requirements.txt`). - A modern glibc + libgomp. Both ship with Ubuntu 20.04 LTS and later. No other system libraries needed. The Windows / macOS binaries are not included in this distribution. ## Install ```bash git clone https://git.sevana.biz/vvs/sem_cython12.git cd sem_cython12 pip install -r requirements.txt # Make the package importable, either: pip install -e . # if pyproject.toml/setup.py is added # or just put the package on PYTHONPATH: export PYTHONPATH=$PWD:$PYTHONPATH ``` ## Quick start ```python import numpy as np from sem_cython12 import wrapper as cy # Sanity check assert cy.available(), "compiled extension did not load" print("backend:", cy.backend()) # Thread count (defaults to ~50% of logical cores; set explicitly via # either the SEM_NUM_THREADS env var or set_num_threads()): cy.set_num_threads(8) print("threads:", cy.get_num_threads()) # Example workload rng = np.random.default_rng(0) Q = rng.standard_normal((1000, 32)) # 1000 queries M = rng.standard_normal((5000, 32)) # 5000 reference points # For each query: max similarity to any reference, with kernel scale lam. sim = cy.batch_max_similarity(Q, M, lam=1.0) print(sim.shape, sim.dtype) # (1000,) float64 ``` ## API reference All functions accept either Python lists or numpy arrays; inputs are internally cast to contiguous `float64`. Outputs are numpy arrays. ### Configuration | Function | Purpose | |---|---| | `available() -> bool` | True iff the compiled extension loaded | | `backend() -> str` | `'cython12'` or `'python-fallback'` | | `get_num_threads() -> int` | Active OpenMP worker count | | `set_num_threads(n: int)` | Set OpenMP worker count (n >= 1) | ### Distance / similarity | Function | Inputs | Output | |---|---|---| | `batch_max_similarity(X_query, X_members, lam)` | `(Q, D)`, `(M, D)`, `lam > 0` | `(Q,)` - per-query max of `exp(-d / lam)` | | `concept_support_matrix(X_query, member_mats, lam)` | `(Q, D)`, list of `(M_k, D)`, `lam > 0` | `(Q, K)` - one column per member matrix | | `pairwise_distances(X)` | `(N, D)` | `(N, N)` - symmetric Euclidean matrix | | `nn_distances(X)` | `(N, D)` | `(N,)` - min positive distance per row; `inf` if none | ### Pareto / dominance | Function | Inputs | Output | |---|---|---| | `pareto_core_mask(S)` | `(N, k)` | `(N,)` byte mask: `1` iff row not strictly dominated | | `one_sided_mask(S)` | `(N, k)` | `(N, k)` byte mask: see docstring | | `non_redundant_witnesses(S)` | `(N, k)` | int32 array of row indices | ### Vector reduction | Function | Inputs | Output | |---|---|---| | `extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)` | `(F, D)`, `(F,)`, `(A, D)`, `int` | `(flat_centers (F*A, D), flat_radii (F*A,))` | See the wrapper docstrings for exact semantics of each function. ## Performance notes Threads are configured globally per process; calling `set_num_threads(n)` updates the OpenMP team size for all subsequent calls. The default uses approximately 50% of the host's logical cores so other processes are not starved on shared machines. For workloads dominated by `pairwise_distances` and `pareto_core_mask`, near-linear scaling up to ~8 threads is typical on commodity x86 hardware. `batch_max_similarity` is BLAS-friendly and benefits most from larger `M` (reference set) at fixed `D`. ## Memory / threading model - All arrays are processed in shared memory; no inter-process serialisation. - Each routine releases the GIL during its inner loops, so calling it concurrently from Python threads is safe. - The compiled extension links against the system OpenMP runtime (`libgomp`); avoid mixing with conda's `intel-openmp` in the same process if possible. ## Diagnostics `backend()` returns `'python-fallback'` only when the `.so` failed to import (wrong architecture, glibc too old, missing libgomp). In that state, every numerical function raises `RuntimeError`; check `available()` before each batch to fail loudly rather than silently fall back. ## Licence Proprietary. Internal use only. ## Support Open an issue at https://git.sevana.biz/vvs/sem_cython12.