sem_cython12 - sample projects

Three short, runnable Python projects that demonstrate the sem_cython12 library on small but realistic problems. Each demo is a single file, self-contained, and produces a clear printable result.

The demos use only sem_cython12.wrapper, numpy, and (for the Iris and anomaly demos) scikit-learn.

What each demo shows

File Domain "Wow"
01_iris_boundary.py The 1936 Iris dataset Rediscovers the famous versicolor/virginica boundary specimens without training a classifier and without setting any threshold.
02_anomaly_detection.py Synthetic 5-D anomalies Detects 10/10 injected anomalies with a single function call and matches/beats sklearn's IsolationForest on ROC AUC.
03_multicriteria_selection.py Multi-criteria candidate ranking Identifies the hidden all-rounders that naive sum-of-scores ranking misses entirely.

Install

# Get the library (private repo)
git clone https://git.sevana.biz/vvs/sem_cython12.git ../sem_cython12
export PYTHONPATH="$(pwd)/../sem_cython12:$PYTHONPATH"

# Demo dependencies
pip install -r requirements.txt

The pre-built Linux x86_64 / CPython 3.12 binary ships with the library; no compilation step is required.

Run

python 01_iris_boundary.py
python 02_anomaly_detection.py
python 03_multicriteria_selection.py

Each demo finishes in well under a second on a laptop.

What you'll see

01_iris_boundary.py

Auto-derived kernel scale lam = 3.4762

Top 10 most ambiguous specimens (highest cross-species score):

  rank  idx     species  sim->setosa  sim->versic  sim->virgin  cross
     1  138   virginica       0.2330       0.9096       1.0000  0.9096
     2   70  versicolor       0.2396       1.0000       0.9096  0.9096
     3  127   virginica       0.2222       0.8806       1.0000  0.8806
     4   83  versicolor       0.2084       1.0000       0.8689  0.8689
     5  133   virginica       0.2062       0.8689       1.0000  0.8689
     ...

Top 10 distribution by species:
  setosa      : 0 of 10
  versicolor  : 3 of 10
  virginica   : 7 of 10

*** Confirmed: zero setosa specimens; the top-10 boundary cases ***
*** all come from the famous versicolor/virginica overlap zone. ***

02_anomaly_detection.py

SEM  (sem_cython12 - one batch_max_similarity call)
  Top-10 retrieved as anomalous:  precision = 10/10
  ROC AUC                          = 1.0000

Baseline: sklearn IsolationForest (default settings)
  Top-10 retrieved as anomalous:  precision = 10/10
  ROC AUC                          = 1.0000

SEM matches IsolationForest within noise (+0.0000 AUC),
with one function call and zero tuning.

03_multicriteria_selection.py

Best-tradeoff frontier size      : 35
Cross-criterion winners (NRW)    : 31
Hidden all-rounders we injected  : 5 (indices 0-4)

NRW recovered hidden all-rounders     : 5/5  [0, 1, 2, 3, 4]
Naive top-10 found hidden all-rounders: 3/5  [1, 2, 3]

*** SEM's NRW filter recovered 5/5 hidden all-rounders. ***
*** Naive sum-of-scores top-10 found only 3/5.          ***
*** SEM surfaces 2 candidates the naive ranking misses  ***
*** because they don't peak on any single criterion.    ***

What to try next

  • Replace the synthetic data in 02_* with your own observations and see what gets flagged.
  • Replace the synthetic candidate matrix in 03_* with your real-world multi-criteria evaluation (job applicants, vendor proposals, product features, drug screens).
  • Extend 01_* to your own classification problems: any time you have multiple classes with overlapping members, the NRW operator surfaces the structurally informative boundary cases.

The library has more capabilities than these three demos exercise. See the sem_cython12.wrapper API for the full operator set (pairwise distances, multi-class similarity matrix, incremental aggregation, etc.).

Licence

The demos and the underlying sem_cython12 library are licensed under the terms in the LICENSE file:

  • Research and non-commercial use: free under the conditions stated in the licence.
  • Commercial use: requires a separate written commercial licence. Contact sales@sevana.biz.
  • The Software is provided strictly "AS IS", without warranty of any kind.

Please read the LICENSE file in full before using the demos or the underlying library.

S
Description
Wow-factor demos for the sem_cython12 library
Readme 38 KiB
Languages
Python 100%