Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 1f9cbe4a48 | |||
| a98c55cea7 |
@@ -6,41 +6,14 @@ install time.
|
|||||||
|
|
||||||
## What is this for?
|
## What is this for?
|
||||||
|
|
||||||
`sem_cython12` is a small, focused toolbox of fast C-level routines
|
For an introduction to SEM (Similarity Energy Model) and how
|
||||||
exposed through a thin numpy wrapper. It is not a general-purpose
|
`sem_cython12` fits in, see:
|
||||||
numerical library; it accelerates three specific jobs that are
|
|
||||||
awkward or slow to do in pure numpy once `N` reaches the thousands:
|
|
||||||
|
|
||||||
1. **Similarity / distance over batches of vectors.** Full
|
- [`docs/SEM_Overview.md`](./docs/SEM_Overview.md) — non-internal
|
||||||
pairwise distance matrices, nearest-neighbour distances, and
|
introduction to SEM, what it does, and how this library fits in.
|
||||||
kernel-based `[0, 1]` similarity scores of a query set against
|
- [`docs/SEM_Mathematical_Apparatus.md`](./docs/SEM_Mathematical_Apparatus.md)
|
||||||
one or many reference sets. Useful for nearest-neighbour
|
— capabilities-level description of the operators and engines
|
||||||
search, kernel-density-style scoring, and "how close is each
|
exposed by the library.
|
||||||
query to this concept?" lookups.
|
|
||||||
2. **Multi-objective ("best-tradeoff") filtering of score matrices.**
|
|
||||||
Given a matrix of `N` candidates × `k` criteria, select the
|
|
||||||
rows on the Pareto frontier, isolate rows that only spike on a
|
|
||||||
single criterion, and recover the rows that contribute
|
|
||||||
meaningfully across several criteria - candidates a naive
|
|
||||||
sum-of-scores ranker would miss.
|
|
||||||
3. **An incremental aggregation primitive** for streaming
|
|
||||||
clustering / frontier-expansion algorithms: a fused bulk update
|
|
||||||
that, given `F` running summaries (centre + radius) and `A`
|
|
||||||
new contributions, produces all `F·A` updated summaries in one
|
|
||||||
parallel pass.
|
|
||||||
|
|
||||||
The kernels release the GIL, scale near-linearly to ~8 OpenMP
|
|
||||||
threads on commodity x86, and operate on shared-memory numpy
|
|
||||||
arrays with no inter-process serialisation. The Python wrapper
|
|
||||||
handles contiguous-float64 casting and degrades loudly (via
|
|
||||||
`available()` / `backend()` plus `RuntimeError`) when the compiled
|
|
||||||
extension cannot load on the host - there is no slow pure-Python
|
|
||||||
fallback path.
|
|
||||||
|
|
||||||
The [`demos/`](./demos/) directory contains three runnable
|
|
||||||
end-to-end examples (Iris boundary discovery, parameter-free
|
|
||||||
anomaly detection, multi-criteria candidate selection) that
|
|
||||||
exercise these three jobs against well-known baselines.
|
|
||||||
|
|
||||||
## Contents
|
## Contents
|
||||||
|
|
||||||
@@ -157,15 +130,6 @@ internally cast to contiguous `float64`. Outputs are numpy arrays.
|
|||||||
|
|
||||||
See the wrapper docstrings for exact semantics of each function.
|
See the wrapper docstrings for exact semantics of each function.
|
||||||
|
|
||||||
## Documentation
|
|
||||||
|
|
||||||
- [`docs/SEM_Overview.md`](./docs/SEM_Overview.md) — non-internal
|
|
||||||
introduction to SEM (Similarity Energy Model), what it does, and
|
|
||||||
how the `sem_cython12` library fits in.
|
|
||||||
- [`docs/SEM_Mathematical_Apparatus.md`](./docs/SEM_Mathematical_Apparatus.md)
|
|
||||||
— capabilities-level description of the operators and engines
|
|
||||||
exposed by the library.
|
|
||||||
|
|
||||||
## Demos
|
## Demos
|
||||||
|
|
||||||
Three runnable demos live in [`demos/`](./demos/):
|
Three runnable demos live in [`demos/`](./demos/):
|
||||||
|
|||||||
Reference in New Issue
Block a user