4.4 KiB
sem_cython12 - sample projects
Three short, runnable Python projects that demonstrate the sem_cython12
library on small but realistic problems. Each demo is a single file,
self-contained, and produces a clear printable result.
The demos use only sem_cython12.wrapper, numpy, and (for the
Iris and anomaly demos) scikit-learn.
What each demo shows
| File | Domain | "Wow" |
|---|---|---|
01_iris_boundary.py |
The 1936 Iris dataset | Rediscovers the famous versicolor/virginica boundary specimens without training a classifier and without setting any threshold. |
02_anomaly_detection.py |
Synthetic 5-D anomalies | Detects 10/10 injected anomalies with a single function call and matches/beats sklearn's IsolationForest on ROC AUC. |
03_multicriteria_selection.py |
Multi-criteria candidate ranking | Identifies the hidden all-rounders that naive sum-of-scores ranking misses entirely. |
Install
# Get the library (private repo)
git clone https://git.sevana.biz/vvs/sem_cython12.git ../sem_cython12
export PYTHONPATH="$(pwd)/../sem_cython12:$PYTHONPATH"
# Demo dependencies
pip install -r requirements.txt
The pre-built Linux x86_64 / CPython 3.12 binary ships with the library; no compilation step is required.
Run
python 01_iris_boundary.py
python 02_anomaly_detection.py
python 03_multicriteria_selection.py
Each demo finishes in well under a second on a laptop.
What you'll see
01_iris_boundary.py
Auto-derived kernel scale lam = 3.4762
Top 10 most ambiguous specimens (highest cross-species score):
rank idx species sim->setosa sim->versic sim->virgin cross
1 138 virginica 0.2330 0.9096 1.0000 0.9096
2 70 versicolor 0.2396 1.0000 0.9096 0.9096
3 127 virginica 0.2222 0.8806 1.0000 0.8806
4 83 versicolor 0.2084 1.0000 0.8689 0.8689
5 133 virginica 0.2062 0.8689 1.0000 0.8689
...
Top 10 distribution by species:
setosa : 0 of 10
versicolor : 3 of 10
virginica : 7 of 10
*** Confirmed: zero setosa specimens; the top-10 boundary cases ***
*** all come from the famous versicolor/virginica overlap zone. ***
02_anomaly_detection.py
SEM (sem_cython12 - one batch_max_similarity call)
Top-10 retrieved as anomalous: precision = 10/10
ROC AUC = 1.0000
Baseline: sklearn IsolationForest (default settings)
Top-10 retrieved as anomalous: precision = 10/10
ROC AUC = 1.0000
SEM matches IsolationForest within noise (+0.0000 AUC),
with one function call and zero tuning.
03_multicriteria_selection.py
Best-tradeoff frontier size : 35
Cross-criterion winners (NRW) : 31
Hidden all-rounders we injected : 5 (indices 0-4)
NRW recovered hidden all-rounders : 5/5 [0, 1, 2, 3, 4]
Naive top-10 found hidden all-rounders: 3/5 [1, 2, 3]
*** SEM's NRW filter recovered 5/5 hidden all-rounders. ***
*** Naive sum-of-scores top-10 found only 3/5. ***
*** SEM surfaces 2 candidates the naive ranking misses ***
*** because they don't peak on any single criterion. ***
What to try next
- Replace the synthetic data in
02_*with your own observations and see what gets flagged. - Replace the synthetic candidate matrix in
03_*with your real-world multi-criteria evaluation (job applicants, vendor proposals, product features, drug screens). - Extend
01_*to your own classification problems: any time you have multiple classes with overlapping members, the NRW operator surfaces the structurally informative boundary cases.
The library has more capabilities than these three demos exercise.
See the sem_cython12.wrapper API for the full operator set
(pairwise distances, multi-class similarity matrix, incremental
aggregation, etc.).
Licence
The demos and the underlying sem_cython12 library are licensed
under the terms in the LICENSE file:
- Research and non-commercial use: free under the conditions stated in the licence.
- Commercial use: requires a separate written commercial licence.
Contact
sales@sevana.biz. - The Software is provided strictly "AS IS", without warranty of any kind.
Please read the LICENSE file in full before using the demos or the underlying library.