Architecture¶
Iris is a self-hosted Video Management System (VMS) that runs cross-camera, face-anchored re-identification on a single NVIDIA T4 (16 GB). This document describes how the system is structured: the process and threading model, the end-to-end lifecycle of a frame, the data model, how the GPU is shared, the adaptive work-shaping that keeps the hot path real-time, the recording subsystem, and how the frontend is served.
The intended audience is an engineer who needs to reason about, operate, or extend the system. Everything below is traceable to source; concrete file and function references are given throughout.
1. System overview¶
There is a single FastAPI application process (the API process) and one spawned subprocess per enabled camera (the camera workers). The API process owns the HTTP surface, the SQLite database, the authoritative in-memory galleries, and the worker lifecycle. Each camera worker owns exactly one RTSP stream and runs the full detect → track → re-ID pipeline on it.
flowchart TB
subgraph client["Browser (vanilla-JS SPA)"]
UI["Live grid · History · Identities · People · Faces"]
end
nginx["nginx<br/>TLS · cookie-SSO (auth_request)<br/>injects X-Email, strips client copy"]
client -->|HTTPS| nginx
nginx -->|"127.0.0.1:8120"| api
subgraph api["API process (FastAPI + uvicorn)"]
routers["Routers: cameras · live · events ·<br/>identities · people · face-groups · system"]
wm["WorkerManager<br/>(spawn / reconcile / status)"]
gal["Authoritative IdentityGallery<br/>FaceIndex (FAISS)"]
maint["ReID maintenance thread<br/>(decay · prune · merge)"]
hls["HLS manager (on-demand)"]
static["StaticFiles SPA mount"]
end
db[("SQLite (WAL)<br/>vms.db — single source of truth")]
shm[["/dev/shm/vms_frames<br/>one JPEG slot per camera"]]
seg[["data/segments/<cam><br/>rolling ffmpeg buffer"]]
api --- db
wm -.spawn.-> w1
wm -.spawn.-> w2
subgraph w1["Camera worker N (subprocess, spawn)"]
dec["decode thread<br/>(drop-to-latest)"]
loop["main loop<br/>detect → track → re-ID"]
clip["clip drain thread"]
pers["persist drain thread"]
ffmpeg["ffmpeg segmenter<br/>(-c:v copy)"]
dec --> loop
loop --> clip
loop --> pers
end
w2["Camera worker M ..."]
loop --> shm
ffmpeg --> seg
w1 --- db
routers -->|read JPEG slot| shm
routers -->|assemble manual clip| seg
routers --- db
Cross-process state is deliberately minimal and uses the lowest-friction channel for each kind of data:
- Metadata + media paths flow through the SQLite DB. The DB is the single
source of truth (
app/db/models.py); every in-memory structure (FAISS face index, identity gallery) is derived state rebuilt from it. - Live preview frames are passed as whole JPEG files via a per-camera frame
slot on tmpfs (
/dev/shm/vms_frames/cam_<id>.jpg), written atomically. This avoids the fragility of fixed-sizeshared_memoryblocks for variable-size JPEGs while staying effectively as fast as shared memory (WorkerManager._resolve_frames_dir,read_frame). - Worker health (state, fps, last_seen, pid) is published into a
multiprocessing.Managerdict-of-dicts (WorkerManager.status).
2. Process and threading model¶
This is the core of the design, so it gets the deepest treatment. The guiding
principle: the detection hot path must never block on I/O it does not strictly
need — RTSP decode latency, ffmpeg clip assembly, JPEG encode, or fsync.
2.1 One subprocess per camera (spawn)¶
WorkerManager (app/workers/manager.py) spawns one OS process per enabled
camera via multiprocessing.get_context("spawn"). Spawn — not fork — is
mandatory because CUDA and onnxruntime contexts are not fork-safe; a forked
child inherits a broken CUDA handle. The entrypoint _worker_entrypoint imports
cv2/onnxruntime inside the child so the parent process never loads those
heavy, GPU-touching libraries.
Process-per-camera buys:
- Fault isolation. A camera with a wedged decoder, a corrupt stream, or a
segfault in a native dep takes down only its own worker. The manager's
reconcile loop (
_reconcile_loop, default every 20 s) and the cgroup OOM policy (see §5) restart it; the rest of the cameras and the API are unaffected. - A clean lifecycle boundary. Deleting or disabling a camera stops exactly
one process.
WorkerManager.syncis the architectural guarantee that running workers == enabled cameras: it starts workers for enabled cameras and terminates any orphaned worker whose camera is gone, so a deleted camera can never keep generating data. - True parallelism across cameras without the GIL: each worker is a separate interpreter.
Config crosses the spawn boundary as a plain, picklable dict
(WorkerManager._camera_config) — per-camera tunables are resolved against
global Settings defaults before the dict is built, so the worker needs the
settings object only for model paths, not for every threshold.
2.2 Inside a worker: a three-thread pipeline¶
Each CameraWorker (app/workers/camera_worker.py) runs three cooperating
thread roles. The reasoning for splitting them is latency and back-pressure
control.
sequenceDiagram
participant RTSP as RTSP camera
participant Dec as Decode thread<br/>(_decode_loop)
participant Slot as latest-frame slot<br/>(Condition + seq)
participant Loop as Main loop<br/>(_loop)
participant GPU as GPU (YOLO/ArcFace/OSNet)
participant DB as SQLite
participant Clip as Clip drain<br/>(_clip_drain_loop)
participant Pers as Persist drain<br/>(_persist_drain_loop)
loop as fast as stream delivers
Dec->>RTSP: cap.read()
Dec->>Slot: overwrite latest_frame, ++seq, notify()
Note over Slot: an unconsumed frame is DROPPED here
end
loop每 newest frame
Loop->>Slot: wait_for(seq != last_seq)
Slot-->>Loop: newest frame only (stale ones skipped)
Loop->>GPU: detect (adaptive cadence)
Loop->>GPU: embed faces/bodies (≤ max_reid_per_frame)
Loop->>DB: assign sightings, commit (cheap rows)
Loop->>Clip: enqueue clip job on track close (put_nowait)
Loop->>Pers: enqueue thumbnail/face-sample (put_nowait)
end
Clip->>DB: build_clip_from_track + update Event (off hot path)
Pers->>DB: imwrite crops + batched commit (off hot path)
(1) Decode thread — drop-to-latest (_decode_loop)¶
A dedicated producer thread owns the OpenCV/FFMPEG VideoCapture and does
nothing but cap.read() in a tight loop, plus reconnect handling with backoff.
Each frame it reads overwrites the single _latest_frame slot under a
threading.Condition, bumps _frame_seq, and notifies the consumer.
Why a separate decode thread that keeps only the newest frame: RTSP delivers
frames at the camera's wall-clock rate. If analysis is momentarily slower than
the stream (a crowd, a GC pause), a queue would grow and the worker would fall
progressively behind real time — fatal for a live monitoring system. By
keeping only the latest frame, a slow analysis pass simply drops the stale
frames it missed; the worker always processes the freshest available image and
stays anchored to real time. CAP_PROP_BUFFERSIZE=1 keeps the driver's own
buffer shallow for the same reason.
(2) Main loop — synchronous detect → track → re-ID (_loop)¶
The consumer blocks on the condition until a frame newer than the one it last
processed arrives (wait_for(self._frame_seq != last_seq)), then runs the full
pipeline synchronously on that single frame:
_safe_detect→ YOLOv8n on the GPU, filtered to the camera'strigger_classes._track_and_identify→ advance the greedy IoU tracker (app/reid/tracker.py), finalize closed tracks, and (re-)embed/assign a bounded number of active tracks (§4)._handle_detections/_finalize_presence→ birth events and enqueue clip jobs._maybe_write_frame_slot→ throttled JPEG encode of the annotated preview.
These steps are intentionally synchronous and single-threaded: detection, tracking, and identity assignment share frame state and ordering assumptions, and running them on one thread keeps the logic simple and the per-frame work bounded. The only thing the loop must never do is block on slow, non-essential I/O — which is what the drain threads are for.
(3) Off-loop drain threads — persist + clip assembly¶
Two daemon drain threads take the slow work off the hot path:
- Clip drain (
_clip_drain_loop): on track close the loop only writes a cheapEventrow andput_nowaits a small job dict. The drain thread callsbuild_clip_from_track(which waits for post-roll segments to finalize and shells out to ffmpeg concat) and then updates theEventrow with the clip path. All clip-thread DB writes go through this one thread, serializing them to avoid SQLite write contention. Bounded queue (maxsize=64); a full queue drops the clip with a warning rather than stalling detection. - Persist drain (
_persist_drain_loop): all body-crop thumbnail and face-sample writes —cv2.imwrite(JPEG encode) plus theSighting/FaceSampleDB commits — run here, batched (up to 64 jobs) into one transaction per drain. The loop only enqueues a job carrying a.copy()of the crop (the frame buffer is overwritten by the nextcap.read()). If the queue (maxsize=256) is saturated,_enqueue_persistfalls back to an inline write so no data is lost.
Why this split: ffmpeg post-roll waits are seconds long; JPEG encode and
fsync are milliseconds but unbounded under disk pressure. Either, inline, would
let the worker drift behind the live stream and miss detections. Pushing them to
drain threads means the hot path's only synchronous DB work is small, indexed row
inserts/updates.
Teardown ordering (_teardown)¶
Shutdown is ordered to avoid losing in-flight work or hanging on dead threads:
stop the decode producer first, flush() the tracker so open presences still
record, drain the clip queue while the segmenter is still alive, then the
persist queue, then close() the components. Joins on drain/persist threads only
occur when those threads are actually alive (else queue.join() would block
forever).
3. End-to-end frame lifecycle¶
A single frame's journey through one worker:
- Decode. The decode thread reads a frame from RTSP and publishes it as the sole latest frame (any unconsumed predecessor is dropped).
- Pickup. The main loop wakes, takes the latest frame + timestamp, ticks
fps, and publishes an
onlineheartbeat (~1 Hz throttle). - Cadence gate. Adaptive cadence (§4) decides whether to detect this
frame. If not, the loop reuses
_last_boxesfor the preview overlay and skips straight to the frame-slot write. - Detect. YOLOv8n runs on the GPU (
_safe_detect); boxes are filtered totrigger_classes. If any trigger object is present,_last_activity_tsis refreshed (keeping the camera in active mode). - Track.
ObjectTracker.updategreedily associates this frame's boxes to existing tracks by same-class IoU, opens tracks for unmatched boxes, and closes tracks idle beyondtrack_gap_seconds. - Finalize closed tracks. Each closed track births (in
trackmode) oneEventfor the whole presence and enqueues a clip job, then accrues its dwell time into aPresenceSegmentandIdentity.total_seconds(_finalize_presence). - (Re-)identify active tracks. A bounded set of due tracks is embedded
(
IdentityPipeline.extract→ ArcFace face vector + OSNet body vector), andIdentityManager.assignlinks each to an existing identity or mints a new one (§4, §6). Sighting rows are committed; thumbnails/face-samples are enqueued to the persist drain. - Preview.
_maybe_write_frame_slotre-encodes the annotated frame to JPEG at the active/idle preview fps and writes it atomically (tmp +replace) to the camera's frame slot, where the/api/live/{id}/streamMJPEG endpoint reads it.
Meanwhile, entirely independent of this loop, the per-camera ffmpeg segmenter
(§5) is continuously writing 2-second -c:v copy segments to disk, so the
pre-roll for any event already exists the instant a track opens.
4. GPU sharing, adaptive cadence, and the per-frame re-ID cap¶
4.1 GPU sharing model¶
A single T4 (16 GB) is shared by all models across all camera workers. There is no explicit GPU scheduler; sharing is achieved by keeping each model small and each worker's GPU demand bounded:
- Detection: YOLOv8n exported to ONNX, run via
onnxruntime-gpu(app/detect/yolo_onnx.py). Running detection on the GPU costs ~1 CPU core/camera versus ~7 on CPU — the dominant reason inference is offloaded. - Faces: insightface
buffalo_l— SCRFD-10G detector + ArcFace embedder — shared via oneFaceRecognizerper worker. Face detection runs once per frame on the whole frame, then each face is assigned to the smallest containing person box (IdentityPipeline.extract), so there is no second face model and no per-crop re-detection. - Appearance: OSNet-AIN x1.0 (MSMT17) exported to ONNX via
ReIDEmbedder. - Vehicle attributes: optional NVIDIA TAO make/body-type classifiers, only invoked for vehicle-class crops.
The Dockerfile takes deliberate care that the CUDAExecutionProvider is the
one that actually loads: insightface hard-depends on the CPU onnxruntime wheel,
which shadows onnxruntime-gpu in the same package dir; the build uninstalls the
CPU build and force-reinstalls the GPU build so inference lands on the T4.
Because every model is loaded lazily inside each child process
(_build_components), VRAM grows with the number of cameras. The two mechanisms
below are what keep aggregate GPU (and CPU, and disk) demand bounded as cameras
scale.
4.2 Adaptive detection cadence¶
Each worker tracks whether its scene is active. A scene is active for
active_grace_seconds after the last frame that contained a trigger object
(active = (now - self._last_activity_ts) < self.active_grace_seconds). The
detection interval follows that state:
- Active: detect every
detect_intervalseconds (default0.0= every frame). - Idle: detect every
detect_interval_idleseconds (default0.5).
The live-preview JPEG encode rate follows the same active/idle state
(active_preview_fps vs idle_preview_fps), because encoding an empty scene at
full rate is pure waste. The net effect: a quiet camera consumes a fraction of
the GPU/CPU/disk of a busy one, yet the moment an object enters, the camera snaps
to full rate and never misses the entrance (the activity timestamp is set on the
very detection that first sees the object).
4.3 Per-frame re-ID cap¶
Re-ID embedding (ArcFace + OSNet + optional TAO) is far more expensive than
detection. Without a bound, a crowd of N people would force N embeddings per
frame and stall the loop. _track_and_identify therefore:
- Selects only tracks due for (re-)identification. Fresh tracks with no
identity yet use the fast cadence (
reid_sample_seconds, default 3 s); already-identified tracks refresh on the slower "confident" cadence (reid_confident_sample_seconds, default 9 s), because identity is sticky. - Prioritizes unassigned tracks, then oldest-waiting first.
- Caps the number actually embedded this frame at
max_reid_per_frame(default 4).
So per-frame re-ID work is constant regardless of crowd size; the backlog just
drains over subsequent frames. Identity stickiness (hysteresis in
IdentityManager, IoU + time window) means a continuous track keeps its identity
between embeds without re-evaluation.
5. Recording¶
5.1 Warm rolling segment buffer¶
Each worker runs one long-lived ffmpeg process via Segmenter
(app/recording/segmenter.py) that continuously writes fixed-length
(segment_seconds, ~2 s) .mp4 segments named with a UTC strftime pattern
(seg_YYYYMMDDThhmmss.mp4) into data/segments/<camera_id>/. Key properties:
- Stream copy, no decode.
-map 0:v:0 -c:v copy— ffmpeg never re-encodes, so CPU is negligible and GPU is zero. The worker decodes separately for detection; the segmenter is purely an I/O recorder. - Audio dropped (
-an). IP-camera audio (often G.711 with broken timestamps) periodically hung the segment muxer and silently stopped clips; dropping it makes the buffer rock-solid. (Live-with-sound is handled separately via on-demand HLS.) - Anti-SSRF.
-protocol_whitelist rtsp,rtsps,rtp,rtcp,udp,tcp,tls,cryptoprevents a malicious RTSP URL from making ffmpeg read local files or reach internal HTTP. - UTC-pinned filenames. The subprocess runs with
TZ=UTCso segment timestamps match the UTC timestamps the worker writes to the DB — clip selection and pruning depend on this. - Bounded retention + self-healing. A watchdog thread prunes segments older
than
retention_seconds(default 120), restarts ffmpeg with backoff if it dies, and detects a hung ffmpeg (alive but producing no new segments) by watching the newest segment's mtime — restarting it on stall.PR_SET_PDEATHSIGensures ffmpeg is SIGKILLed if the worker dies even uncleanly.
Because a configurable amount of pre-roll is always already on disk, clips can include the seconds before an object appeared without buffering frames in memory.
5.2 Track-mode events¶
The default recording_mode is track: exactly one Event per object presence.
On track close, _finalize_presence enqueues a clip job; build_clip_from_track
(app/recording/clipper.py) assembles the clip spanning
[enter - pre_seconds, last + post_seconds]:
- Wait for the post-roll segment(s) covering the window end to finalize
(
_wait_for_post_roll) — the segmenter writes a segment to its final name only when it finishes, so the presence of a newer segment proves the tail is complete. - Select every segment overlapping the window, excluding the still-growing
live tail (no
moovatom yet → would fail concat) and any non-finalized file (_is_finalizedprobes via ffprobe). - Concatenate via ffmpeg's concat demuxer with
-c copy -movflags +faststart(no re-encode) intodata/recordings/<camera_id>/<event_id>.mp4. - Extract one thumbnail near the trigger instant.
The legacy fixed-window trigger mode (_trigger_event /
_record_and_persist) still exists but is disabled when recording_mode ==
"track" to avoid double-recording.
5.3 Manual recording¶
The operator can press ● REC in Live Monitoring. This is stateless: POST
/api/live/{id}/record/start returns a server-trusted start timestamp the client
echoes back to record/stop, which assembles [started_at, now] from the same
on-disk segment buffer (build_clip_from_track via a SimpleNamespace segmenter
shim) and persists a manual-labelled Event. No worker round-trip is needed;
the buffer is the shared substrate.
6. Re-ID and the identity model (summary)¶
Re-ID is documented in depth elsewhere; here is what the architecture needs to
hold. Identity is anchored on the face — the only cue stable across clothing
change, viewpoint, lighting, and days. IdentityManager.assign
(app/reid/manager.py) evaluates each sighting in order: sticky/hysteresis →
confident face (with a best-minus-second margin) → appearance within the
session's time window (with a face-contradiction veto and, for non-person
objects, a colour gate) → otherwise a new identity, gated by a face-quality
floor and a per-camera new-identity rate limit. A faceless back/side view never
spawns a new identity — it can only attach to an existing one by appearance, else
it is dropped. That gate is what stops a person seen from behind from exploding
into dozens of duplicates.
The in-memory IdentityGallery (app/reid/gallery.py) is derived state: a FAISS
IndexFlatIP over per-identity ArcFace exemplars (faces are time-stable, not
decayed) plus per-identity OSNet appearance exemplars (time-decayed). Each worker
holds its own gallery, rebuilt from the DB at startup and re-synced on a timer
(_maybe_reload_gallery, default 30 s), so identities created by other workers
converge. The API process holds the authoritative gallery and a background
maintenance thread (app/reid/maintenance.py) that decays/prunes exemplars,
recomputes centroids, deletes provisional noise identities, and performs
conservative face-only auto-merges.
7. Data model¶
Ten tables in app/db/models.py, all on a single SQLite database in WAL mode
with per-connection PRAGMAs (journal_mode=WAL, synchronous=NORMAL,
foreign_keys=ON, busy_timeout=5000 — app/db/database.py). Vectors are
512-d little-endian float32, L2-normalized, stored as BLOBs and
(de)serialized with numpy.frombuffer/tobytes (never pickle — no
deserialization RCE).
erDiagram
CAMERA ||--o{ EVENT : "has (cascade delete)"
CAMERA ||--o{ SIGHTING : "captured on (cascade)"
PERSON ||--o{ FACE_EMBEDDING : "enrolled (cascade)"
PERSON |o--o{ EVENT : "best face match (SET NULL)"
IDENTITY ||--o{ SIGHTING : "has (cascade)"
IDENTITY ||--o{ FACE_EXEMPLAR : "has (cascade)"
IDENTITY ||--o{ APPEARANCE_EXEMPLAR : "has (cascade)"
IDENTITY ||--o{ PRESENCE_SEGMENT : "dwell (cascade)"
IDENTITY |o--o{ EVENT : "auto identity (SET NULL / app-code)"
EVENT |o--o{ SIGHTING : "links (SET NULL)"
IDENTITY |o--|| SIGHTING : "rep_sighting (SET NULL, use_alter)"
CAMERA {
int id PK
string rtsp_url
bool enabled
string status
string trigger_classes "nullable per-cam tunables"
}
EVENT {
int id PK
int camera_id FK
datetime ts
string clip_path
string thumb_path
int person_id FK "manual match (SET NULL)"
int identity_id "auto identity (plain INT on SQLite)"
string label
}
PERSON {
int id PK
string name
}
FACE_EMBEDDING {
int id PK
int person_id FK
blob vector "512 f32"
}
IDENTITY {
int id PK
string name
bool is_named
string object_class "class-scoped matching"
float total_seconds
bool is_provisional
blob face_centroid
blob appearance_centroid
}
SIGHTING {
int id PK
int identity_id FK
int camera_id FK
int event_id FK
string match_kind "face|appearance|new"
string thumb_path
}
FACE_EXEMPLAR {
int id PK
int identity_id FK
blob vector
float pose "signed yaw"
}
APPEARANCE_EXEMPLAR {
int id PK
int identity_id FK
blob vector
datetime ts "decay clock"
}
PRESENCE_SEGMENT {
int id PK
int identity_id FK
float seconds
}
FACE_SAMPLE {
int id PK
blob vector "ArcFace"
blob app_vector "OSNet"
string label "named group"
}
The ten tables:
| Table | Role |
|---|---|
Camera |
An RTSP source + its per-camera tunables (nullable → fall back to global Settings). Heartbeat fields (status, last_seen) updated by the worker. |
Event |
One recorded presence: clip path, thumbnail, denormalized manual (person_*) and auto (identity_*) match snapshots, track metadata. |
Person |
A manually enrolled known person (the "People" layer). |
FaceEmbedding |
A 512-d ArcFace vector belonging to a Person; the FAISS FaceIndex is derived from these. |
Identity |
An auto-discovered person/object built online from sightings — no enrollment. Carries object_class (matching is class-scoped), total_seconds dwell, derived centroids, and is_provisional/is_named flags. |
Sighting |
One identified detection: bbox, scores, match_kind, body-crop thumbnail. |
FaceExemplar |
A representative ArcFace vector for an identity (cap ~8/12), bucketed by signed-yaw pose for the multi-view gallery. |
AppearanceExemplar |
A per-identity OSNet vector with a capture ts for time-decay (cap ~16). |
PresenceSegment |
One continuous appearance of an identity at one camera; summed into Identity.total_seconds (the dwell audit trail). |
FaceSample |
A captured face crop + ArcFace vector (+ optional OSNet app_vector) for unsupervised face grouping — independent of the body-Re-ID identities. |
7.1 Cascade and delete behavior¶
FK cascades are declared at the ORM level with passive_deletes=True (the DB
enforces them, given foreign_keys=ON):
- Delete a
Camera→ itsEvents andSightings cascade-delete. - Delete a
Person→ itsFaceEmbeddings cascade-delete; anyEvent.person_idreferencing it is SET NULL (event history is preserved). - Delete an
Identity→ itsSightings,FaceExemplars,AppearanceExemplars, andPresenceSegments cascade-delete.Event.identity_idis a denormalized link that is nulled in application code (on SQLite it is a plainINTEGERcolumn materialised by the schema shim, not a real FK — see below). The identities APIdelete_identityperforms a recursive delete that also purges on-disk artifacts: the per-identity crop directory and the linkedFaceSamplerows + their crop files (FaceSamplehas no DB-level FK cascade).
7.2 Schema management¶
There is no migration framework. init_db (app/db/database.py) calls
Base.metadata.create_all and then a set of idempotent shims
(ensure_reid_schema, ensure_camera_schema, ensure_identity_object_schema,
ensure_event_track_schema, ensure_face_pose_schema). These exist because
SQLite cannot ALTER a column with an inline FK onto an existing table, so
events.identity_id and friends are added as plain columns guarded by PRAGMA
table_info checks (no-ops on re-run). The shims run in the API process at
startup, before any worker writes a track-mode event. The
identities.rep_sighting_id ↔ sightings.identity_id FK cycle is broken at
create time with use_alter=True.
8. Frontend and the live surface¶
The frontend is a vanilla-JS single-page app with no build step
(app/static/: index.html, app.js, identities.js, people.js,
faces.js, CSS, plus a vendored hls.min.js). It is mounted as the last route
in create_app (app/main.py):
The API routers (/api/*) and /health are registered before the mount, so
they take precedence; everything else falls through to static assets, with
index.html served at /.
Live viewing has three modes, all backed by the worker frame slots and segment buffer:
- Low-latency MJPEG grid. Each tile is an
<img src="/api/live/{id}/stream?fps=…">holding one long-livedmultipart/x-mixed-replaceconnection (app/api/live.py). The generator pushes a part only when the frame slot changes and honors a?fps=cap so the grid can request a lower rate (many tiles) while a focused viewer requests the fulllive_mjpeg_fps. nginx runs withproxy_buffering off/X-Accel-Buffering: nofor per-frame flush. - Single snapshot.
/api/live/{id}/snapshotreturns the latest annotated JPEG (or a "no signal" placeholder so the UI never breaks while a worker warms up). - Live with sound. MJPEG carries no audio, so on demand the SPA requests
/api/live/{id}/hls/index.m3u8, which starts an on-demand RTSP→HLS session (app/recording/hls.py) played byhls.js(or Safari native), with strict^seg\d{5}\.ts$segment-name validation.
History/clip playback streams the recorded mp4 with full HTTP Range support
(app/api/events.py, _iter_file_range) so <video> seeking works, with
os.path.commonpath containment guarding against path traversal out of the data
root.
The focused monitor adds operator ergonomics on the client: mouse-wheel and two-finger pinch/pan zoom, double-tap, orientation-aware fullscreen, and the manual record button. The SPA is mobile-first responsive.
9. Security and deployment posture (architectural)¶
The relevant invariants the architecture depends on:
- Trust boundary at nginx. The app binds
127.0.0.1:8120only (compose publishes to loopback). nginx terminates TLS, performs cookie-SSO (auth_request), and injects a trusted identity header (X-Email) — and overrides any client-supplied copy (anti-spoof).require_user(app/auth.py) is fail-closed (auth_requireddefaults true) and accepts either the SSO header or an optional bearer API key compared in constant time (hmac.compare_digest). - Hardened container. Runs as non-root uid 1000 with
cap_drop: [ALL],no-new-privileges, and a read-only app-code mount; the GPU still works because/dev/nvidia*are world-accessible. - Co-tenancy safety. A hard
mem_limit(6 g) plusoom_score_adj: 800make a runaway VMS the preferred OOM victim — a cgroup-scoped kill restarts one camera worker rather than letting the VMS push the shared host into a global OOM that would disrupt co-hosted VMs. - No secrets in the repo.
.env.exampledocumentsAUTH_REQUIREDand API-key generation; RTSP credentials are masked in every API response (the raw URL stays server-side).
10. Where to start reading the code¶
| Concern | Entry point |
|---|---|
| App wiring, lifespan, router mounts, SPA | app/main.py |
| Worker lifecycle, spawn, reconcile, frame slots | app/workers/manager.py |
| The three-thread pipeline (decode / loop / drains) | app/workers/camera_worker.py |
| Identity assignment (face → appearance → new) | app/reid/manager.py |
| Feature extraction (ArcFace + OSNet, pose) | app/reid/pipeline.py |
| Derived gallery (FAISS + appearance store) | app/reid/gallery.py |
| Tracker (greedy IoU, dwell timing) | app/reid/tracker.py |
| Rolling segment buffer / clip assembly | app/recording/segmenter.py, clipper.py |
| Data model + cascades | app/db/models.py, app/db/database.py |
| Auth / trust boundary | app/auth.py, app/config.py |
| Live MJPEG / HLS / manual record | app/api/live.py |