Files

T

vvs 3e588f8024 Sanitize wrapper docstrings + README: remove kernel formula and metric-specific exposure

The previous version exposed:
  - exp(-d/lam) as the literal similarity-kernel form
  - 'Euclidean' as the literal distance metric
  - the O1+O2 conditions of the one-sided-mask routine
in both the Python docstrings and the README API tables.

Replaced with operational descriptions: 'similarity score in [0,1]
against the closest member', 'distance matrix between rows', etc.
The library's behaviour and call signatures are unchanged.

2026-05-09 14:22:01 +01:00

4.6 KiB

Raw Blame History

sem_cython12

OpenMP-parallel numerical kernel library for Python. Pre-built Linux shared object included; no compilation required at install time.

sem_cython12/sem_core12.cpython-312-x86_64-linux-gnu.so - compiled extension (Linux, CPython 3.12, x86_64).
sem_cython12/wrapper.py - Python API.
sem_cython12/__init__.py - package entry.

Requirements

Linux x86_64.
CPython 3.12.
numpy >= 1.23 (see requirements.txt).
A modern glibc + libgomp. Both ship with Ubuntu 20.04 LTS and later. No other system libraries needed.

The Windows / macOS binaries are not included in this distribution.

Install

git clone https://git.sevana.biz/vvs/sem_cython12.git
cd sem_cython12
pip install -r requirements.txt
# Make the package importable, either:
pip install -e .                      # if pyproject.toml/setup.py is added
# or just put the package on PYTHONPATH:
export PYTHONPATH=$PWD:$PYTHONPATH

Quick start

import numpy as np
from sem_cython12 import wrapper as cy

# Sanity check
assert cy.available(), "compiled extension did not load"
print("backend:", cy.backend())

# Thread count (defaults to ~50% of logical cores; set explicitly via
# either the SEM_NUM_THREADS env var or set_num_threads()):
cy.set_num_threads(8)
print("threads:", cy.get_num_threads())

# Example workload
rng = np.random.default_rng(0)
Q = rng.standard_normal((1000, 32))         # 1000 queries
M = rng.standard_normal((5000, 32))         # 5000 reference points

# For each query: max similarity to any reference, with kernel scale lam.
sim = cy.batch_max_similarity(Q, M, lam=1.0)
print(sim.shape, sim.dtype)                 # (1000,) float64

API reference

All functions accept either Python lists or numpy arrays; inputs are internally cast to contiguous float64. Outputs are numpy arrays.

Configuration

Function	Purpose
`available() -> bool`	True iff the compiled extension loaded
`backend() -> str`	`'cython12'` or `'python-fallback'`
`get_num_threads() -> int`	Active OpenMP worker count
`set_num_threads(n: int)`	Set OpenMP worker count (n >= 1)

Distance / similarity

Function	Inputs	Output
`batch_max_similarity(X_query, X_members, lam)`	`(Q, D)`, `(M, D)`, `lam > 0`	`(Q,)` - per-query similarity score in `[0, 1]` against the closest member
`concept_support_matrix(X_query, member_mats, lam)`	`(Q, D)`, list of `(M_k, D)`, `lam > 0`	`(Q, K)` - one similarity column per member set
`pairwise_distances(X)`	`(N, D)`	`(N, N)` - symmetric distance matrix between rows
`nn_distances(X)`	`(N, D)`	`(N,)` - min positive distance per row; `inf` if none

Best-tradeoff filtering

Function	Inputs	Output
`pareto_core_mask(S)`	`(N, k)`	`(N,)` byte mask: rows that survive the multi-objective best-tradeoff filter
`one_sided_mask(S)`	`(N, k)`	`(N, k)` byte mask: rows contributing meaningfully on a single column only
`non_redundant_witnesses(S)`	`(N, k)`	int32 array of row indices contributing meaningfully across multiple columns

Vector reduction

Function	Inputs	Output
`extend_frontier_kernel(cur_centers, cur_radii, new_emb, cur_arity)`	`(F, D)`, `(F,)`, `(A, D)`, `int`	`(flat_centers (FA, D), flat_radii (FA,))`

See the wrapper docstrings for exact semantics of each function.

Performance notes

Threads are configured globally per process; calling set_num_threads(n) updates the OpenMP team size for all subsequent calls. The default uses approximately 50% of the host's logical cores so other processes are not starved on shared machines.

For workloads dominated by pairwise_distances and pareto_core_mask, near-linear scaling up to ~8 threads is typical on commodity x86 hardware. batch_max_similarity is BLAS-friendly and benefits most from larger M (reference set) at fixed D.

Memory / threading model

All arrays are processed in shared memory; no inter-process serialisation.
Each routine releases the GIL during its inner loops, so calling it concurrently from Python threads is safe.
The compiled extension links against the system OpenMP runtime (libgomp); avoid mixing with conda's intel-openmp in the same process if possible.

Diagnostics

backend() returns 'python-fallback' only when the .so failed to import (wrong architecture, glibc too old, missing libgomp). In that state, every numerical function raises RuntimeError; check available() before each batch to fail loudly rather than silently fall back.

Licence

Proprietary. Internal use only.

Support

Open an issue at https://git.sevana.biz/vvs/sem_cython12.

4.6 KiB Raw Blame History