Перейти к содержанию

Iris — Licensing & Commercialization Audit

Status: Iris is a private, soon-to-be-commercial product. This document inventories every third-party component, flags what blocks commercial use, and lays out the options (replace / license / self-train) with cost and effort.

Bottom line: Iris's own code is MIT (yours). Only three dependencies block commercial sale today — the object detector (YOLOv8, AGPL-3.0), the face models (InsightFace buffalo_l, non-commercial), and the re-ID weights (OSNet trained on MSMT17, research-only data). Everything else — STT, translation, both VAD engines, the web stack, FAISS, ONNX Runtime — is already commercial-clean. All three blockers have a free Apache/MIT replacement path; none require paying anyone if you swap models.


1. Deployment model decides which licenses bite

This matters before any line-item:

Model What it means License impact
SaaS / hosted Customers use Iris over the network; you never ship them the software GPL (ffmpeg) does not trigger (no distribution). AGPL still bites (its §13 network clause covers remote interaction). Non-commercial model/data licenses still bite (you're using them commercially regardless).
On-prem / shipped (Docker image to the customer) Customer runs the container GPL and AGPL bite, plus you redistribute ffmpeg/CUDA/NVENC → their redistribution terms apply.

Either way the three blockers below must be resolved. On-prem additionally needs an LGPL ffmpeg build (§5).


2. Full component inventory

✅ = commercial-OK · ⛔ = blocks commercial use · ⚠️ = conditional

Component Role in Iris License Commercial?
Iris application code everything we wrote MIT (ours) ✅ relicense freely
Ultralytics YOLOv8n person/object detector AGPL-3.0 blocker
InsightFace — code SCRFD + ArcFace runtime MIT
InsightFace — buffalo_l weights face detect + 512-d embedding non-commercial research blocker
OSNet osnet_ain_x1_0_msmt17 appearance re-ID embedding code MIT, weights trained on MSMT17 (research-only) blocker (weights)
faster-whisper + CTranslate2 speech-to-text MIT
OpenAI Whisper model STT weights MIT
EuroLLM-9B translation LLM Apache-2.0
ollama LLM server MIT
TEN VAD live voice gate Apache-2.0
Silero VAD fallback VAD MIT
yt-dlp web-link source resolver Unlicense (public domain)
mediamtx RTSP fan-out sidecar MIT
ONNX Runtime (GPU) inference MIT
FAISS vector index MIT
OpenCV (headless) frame I/O Apache-2.0
NumPy / Pillow / SQLAlchemy / FastAPI / uvicorn / pydantic / Starlette web + utils BSD / MIT
hls.js browser HLS player Apache-2.0
ffmpeg decode/encode/segment LGPL-2.1+ core; GPL if built --enable-gpl ⚠️ see §5
CUDA / cuDNN / NVENC / NVDEC GPU runtime NVIDIA proprietary EULA (redistributable for deployment) ⚠️ keep the EULA notice

3. The three blockers — options, cost, effort

3.1 Object detector — YOLOv8 (AGPL-3.0)

YOLOv8 is the easiest to fix: the detection task is commoditised and there are several Apache-2.0 detectors with COCO-pretrained weights that ship person/vehicle classes out of the box.

Option License cost Eng. effort Notes
A. Buy Ultralytics Enterprise license Contact sales — custom; one publicly-reported quote ≈ $5k/yr (older, unverified), expect low-to-mid 5 figures/yr, subscription or one-time ~0 (keep current model) Keeps YOLOv8 weights/accuracy; recurring cost; vendor lock-in
B. Swap to an Apache-2.0 detector $0 ~days Re-export to ONNX, adapt the pre/post-process in app/detect/. Candidates: RTMDet (OpenMMLab), RT-DETR (Baidu original), RF-DETR (Roboflow), YOLOX, D-FINE — all Apache-2.0, COCO-pretrained, real-time. Avoid YOLO-NAS — its weights carry a restrictive custom license.
C. Train our own $0 (COCO annotations CC-BY 4.0) ~1–2 wks Only worth it if we need custom classes; otherwise (B)'s pretrained weights are enough

Recommendation: B — swap to RTMDet or RT-DETR (Apache-2.0). Free, removes the AGPL obligation entirely, comparable or better accuracy than YOLOv8n.

Status (1.55.0): resolved in code. The Apache-2.0 RT-DETR backend (app/detect/rtdetr_onnx.py, NMS-free) ships and is selectable per-camera or globally. The model_license_clean edition switch forces it in place of YOLOv8n, scripts/download_models.py --license-clean provisions only the Apache detector, and the settings API accepts detector_backend="rtdetr". The AGPL blocker is lifted for any build that enables the clean edition and drops a *rtdetr*.onnx in place of yolov8n.onnx.

3.2 Face recognition — InsightFace buffalo_l (non-commercial)

The InsightFace code is MIT (fine to keep); the pretrained packs are the problem — trained on datasets (MS1M / Glint360K) released for non-commercial research. This is the hardest blocker because commercially-licensed face data is scarce.

Option License cost Eng. effort Notes
A. Buy InsightFace commercial model license ⭐ (fastest) Contact insightface.ai — not public; expect 4–5 figures per deployment/model ~0 Keeps current SCRFD+ArcFace accuracy; cleanest legal path that preserves the feature
B. Defer face-ID to an "enterprise add-on" $0 low Ship v1 commercially with detection + appearance re-ID only; face recognition stays an opt-in module the customer enables under their own license. De-risks launch
C. Replace with a permissively-licensed face stack $0 high Detector → YuNet (OpenCV Zoo, permissive) is easy; the recognition embedding is the hard part — few ArcFace weights exist on commercially-usable data
D. Self-train ArcFace on a commercial dataset dataset licensing cost very high (wks–mos + data) Realistic commercial face datasets must be licensed or collected with consent; rarely worth it vs (A)

Recommendation: A or B. Buy the InsightFace commercial license if face-ID is a headline feature; otherwise ship without it first (B) and add it under license later. Self-training (D) is the worst ROI.

3.3 Appearance re-ID — OSNet on MSMT17 (research data)

OSNet code (torchreid) is MIT, but our weights are trained on MSMT17, an academic ReID dataset. Cross-camera re-ID is core to Iris, so this needs a clean embedding.

Option License cost Eng. effort Notes
A. Generic appearance embedding $0 ~moderate Use a self-supervised backbone with a permissive license — DINOv2 (Apache-2.0) or an OpenCLIP image encoder — as the clothing/appearance feature instead of MSMT17-trained OSNet. No ReID-dataset provenance
B. Retrain OSNet on commercial data dataset cost high Same data-licensing problem as faces
C. Enterprise add-on $0 low Like 3.2-B: ship without cross-camera appearance re-ID at first

Recommendation: A — DINOv2/OpenCLIP appearance features (Apache-2.0). Removes the dataset-provenance risk; needs re-tuning of the match thresholds in app/reid/.

Status (1.56.0): wired, tuning-gated. reid_appearance_backend="dinov2" (forced by model_license_clean) loads an Apache-2.0 DINOv2 backbone in place of OSNet. The appearance embedding dimension is decoupled from the 512-d face gallery (reid_app_embedding_dim, default 384 for DINOv2-small) across serialization, the appearance store, and the maintenance centroid; stale-dim exemplars from a backend switch are quarantined and decay out. download_models.py --license-clean provisions the DINOv2 ONNX. Remaining before production trust: the reid_app_* cosine thresholds are OSNet-tuned — re-tune them on real footage (DINOv2 similarities distribute differently) before enabling a DINOv2 build on live cameras.


4. Can we just train our own models? (summary)

  • Detector: don't need to — Apache-2.0 COCO-pretrained weights already exist (RTMDet/RT-DETR). Train only for custom classes.
  • Face recognition: training is the expensive path — the blocker is data licensing, not compute. Buying the InsightFace commercial license is almost always cheaper than sourcing a clean face dataset.
  • Re-ID: swapping to a self-supervised Apache backbone (DINOv2) avoids training and the dataset problem.

So: swap models (free) for the detector and re-ID; license-or-defer for face recognition. No mandatory recurring fees if face-ID is deferred.


5. ffmpeg & GPU notes (on-prem only)

  • ffmpeg: the Ubuntu apt build is GPL (--enable-gpl, bundles x264/x265). Iris encodes via NVENC (h264_nvenc), not x264, so for SaaS there is no distribution and no obligation. For on-prem shipping, rebuild ffmpeg LGPL (drop --enable-gpl, no x264/x265) or document GPL compliance.
  • CUDA / cuDNN / NVENC: redistributable under the NVIDIA EULA for deployment; keep the NVIDIA notice in the image and product docs.

  1. Detector → RTMDet/RT-DETR (Apache-2.0). Free, removes AGPL. (do first) — ✅ done (1.55.0): RT-DETR backend + model_license_clean edition + provisioning.
  2. Re-ID → DINOv2/OpenCLIP appearance features (Apache-2.0). Free, removes the MSMT17 data risk. — 🟡 wired (1.56.0): DINOv2 backend + dim-decoupling
  3. provisioning; thresholds still need real-footage tuning before production.
  4. Face recognition → decide: buy the InsightFace commercial license (keep accuracy) or ship v1 without face-ID (B) and add it later.
  5. On-prem only: LGPL ffmpeg build; carry the NVIDIA EULA notice.
  6. Keep STT/translate/VAD as-is — all already commercial-clean.

After steps 1–2 (free, ~1–2 weeks of engineering) Iris is sellable as a SaaS with no per-year model fees, with face recognition as the only license-or-defer decision left.


Pricing figures are indicative and were not publicly confirmed for 2026 — treat Ultralytics and InsightFace numbers as "contact sales", and re-verify before budgeting.