- add 'what is this' section to README.md

This commit is contained in:
2026-05-09 20:46:56 +03:00
parent c886ded981
commit 80f99d1d15
+38
View File
@@ -4,6 +4,44 @@ OpenMP-parallel numerical kernel library for Python. Pre-built
Linux and Windows binaries included; no compilation required at
install time.
## What is this for?
`sem_cython12` is a small, focused toolbox of fast C-level routines
exposed through a thin numpy wrapper. It is not a general-purpose
numerical library; it accelerates three specific jobs that are
awkward or slow to do in pure numpy once `N` reaches the thousands:
1. **Similarity / distance over batches of vectors.** Full
pairwise distance matrices, nearest-neighbour distances, and
kernel-based `[0, 1]` similarity scores of a query set against
one or many reference sets. Useful for nearest-neighbour
search, kernel-density-style scoring, and "how close is each
query to this concept?" lookups.
2. **Multi-objective ("best-tradeoff") filtering of score matrices.**
Given a matrix of `N` candidates × `k` criteria, select the
rows on the Pareto frontier, isolate rows that only spike on a
single criterion, and recover the rows that contribute
meaningfully across several criteria - candidates a naive
sum-of-scores ranker would miss.
3. **An incremental aggregation primitive** for streaming
clustering / frontier-expansion algorithms: a fused bulk update
that, given `F` running summaries (centre + radius) and `A`
new contributions, produces all `F·A` updated summaries in one
parallel pass.
The kernels release the GIL, scale near-linearly to ~8 OpenMP
threads on commodity x86, and operate on shared-memory numpy
arrays with no inter-process serialisation. The Python wrapper
handles contiguous-float64 casting and degrades loudly (via
`available()` / `backend()` plus `RuntimeError`) when the compiled
extension cannot load on the host - there is no slow pure-Python
fallback path.
The [`demos/`](./demos/) directory contains three runnable
end-to-end examples (Iris boundary discovery, parameter-free
anomaly detection, multi-criteria candidate selection) that
exercise these three jobs against well-known baselines.
## Contents
- `sem_cython12/sem_core12.cpython-312-x86_64-linux-gnu.so` -