Cross-Camera Re-Identification¶
Design deep-dive for the identity engine under
app/reid/. Source of truth is the code; every threshold, gate and behaviour described here maps to a named symbol inapp/reid/manager.py,gallery.py,pipeline.py,decay.py,maintenance.pyandapp/faces/recognizer.py. Where a default is quoted it is the value inapp/config.py(Settings) orMatchConfig.
Re-identification is the part of Iris that turns a stream of disconnected detections into a small, stable library of people and objects that persists across cameras and across days. A person who walks past the front camera at 09:00, the yard camera at 12:00 and the door camera the next morning should resolve to one identity — "Person 14" — not three, and not fourteen.
This document explains how that is done, why the obvious approaches fail, and the exact decision procedure the online matcher runs for every sighting.
1. The problem¶
Cross-camera identity is hard for reasons that have nothing to do with model accuracy:
- Viewpoint. The same person is seen frontal on one camera, in profile on the next, and from directly behind on a third. A body crop from behind and a body crop from the front of the same person can be less similar than two different people seen from the same angle.
- Clothing across days. Body/appearance re-ID models (OSNet, etc.) key heavily on clothing and silhouette. That is a strong cue within a session and a worthless one across days — the same person in a different coat looks like a stranger, and two strangers in the same uniform look identical.
- Lighting and camera response. Colour and texture statistics shift between a daylight yard camera and an IR-lit corridor.
The naive design — embed every person crop with a body re-ID model, match by cosine, mint a new identity when nothing is close enough — fails in a specific and visible way: duplicate explosion. A person seen from behind produces a body embedding that doesn't match their own frontal embedding, so the matcher creates a new identity. Walk a single person across a camera's field of view and you get a dozen "people". Appearance alone has no anchor that survives a viewpoint change, so it cannot decide that two dissimilar-looking crops are the same person.
The design choice that makes the rest of the system work is to pick a cue that is stable across clothing, viewpoint, lighting and time, and to anchor identity on it.
2. The face anchor¶
Identity is anchored on the face/head — the only signal that is stable across a change of clothes, a change of viewpoint (within the face's own pose range), a change of lighting, and the passage of days.
Extraction¶
Face features come from the insightface buffalo_l pack via
app/faces/recognizer.py:
- Detect with SCRFD-10G (640×640 input,
det_threshdefault 0.5), producing a bounding box, a detection score, and 5 facial landmarks (DetectedFace.kps). - Align + embed with ArcFace (r50), producing a 512-d, L2-normalized embedding. Because vectors are unit-norm, an inner-product index gives cosine similarity directly.
ArcFace is run once on the whole trigger frame, not re-run inside every person
crop. IdentityPipeline.extract() (app/reid/pipeline.py) detects all faces, then
assigns each face to the smallest person box that contains the face centroid
(smallest-area containment correctly handles overlapping people). This is cheaper
than per-crop re-detection and keeps a single face model resident on the GPU.
The "decent face" quality gate¶
Not every detected face is worth trusting. A tiny, blurry, or extreme-profile face yields an ArcFace vector that is garbage — close to no one in particular, or spuriously close to the wrong person. Storing it pollutes the gallery forever.
So each assigned face carries a pose-aware quality score:
frontalness() (pipeline.py) is a cheap landmark geometry: a frontal face has the
nose horizontally centred between the eyes; a profile pushes it toward one side,
driving the score toward 0. A face only counts as decent when
face_quality ≥ face_exemplar_min_quality (default 0.35).
This single gate is load-bearing in two places:
- A new identity is created from a face only if the face is decent
(
_quality_ok_for_new). - A face is stored as a gallery exemplar only if it is decent
(
_maybe_add_face_exemplar).
Why a faceless view never mints an identity¶
This is the rule that defeats duplicate explosion, in MatchConfig:
In _quality_ok_for_new():
- A decent face qualifies a new identity (faces are the strong, rare evidence).
- For a person with no decent face — a back, a far side view, a head crop with
no landmarks —
require_face_for_new_personforces aFalse. It is never allowed to create a new identity.
A faceless person crop has exactly two fates: it attaches to an existing
identity by appearance within a session (Section 4), or it is dropped
(match_kind="dropped", no DB row written). It can never spawn a person. That is
precisely what stops one person seen from behind from fragmenting into dozens of
duplicates — a back-of-head carries no identity, so the system refuses to invent one
from it.
Non-person objects (cars, animals) have no face, so they are held to a different,
appearance-based quality bar rather than the face requirement
(require_quality_for_new, min_app_box_area_frac, min_crop_quality_for_new).
A faceless person that does survive on first sighting is marked
is_provisional=True and is deleted by maintenance if it never accrues a second
sighting or a face (Section 6).
3. The multi-view, pose-bucketed face gallery¶
A frontal-only gallery still fails on viewpoint: a profile query against only frontal exemplars scores low even for the right person. ArcFace is robust to some pose, not arbitrary pose. The fix is to make the gallery itself angle-invariant by storing exemplars across the pose range.
Signed yaw → pose buckets¶
face_pose() (pipeline.py) computes a signed yaw in roughly [-0.5, +0.5]
from the 5 landmarks (nose offset relative to the eye span): 0 ≈ frontal, negative
and positive = head turned to each side. It is persisted on every FaceExemplar.pose.
_face_pose_bucket() (manager.py) quantizes it into three buckets:
| Bucket | Condition | Meaning |
|---|---|---|
-1 |
yaw ≤ -0.12 |
turned one way |
0 |
-0.12 < yaw < 0.12 |
frontal |
+1 |
yaw ≥ 0.12 |
turned the other way |
Pose-diverse acceptance¶
When a new decent face arrives for an existing identity, the old instinct — "skip it,
we already have a face for this person" — is wrong: it keeps the gallery
frontal-only. Instead _maybe_add_face_exemplar() only rejects a face that is a
true near-duplicate of one already stored (cosine > face_exemplar_hi, default
0.90). A face that scores low against existing exemplars is not noise — it is a
new pose of the same tracked person, and it is exactly the left/right view the
gallery needs to recognise this person from that angle later. So it is kept.
Pose-aware eviction¶
The per-identity cap is max_face_exemplars (default 12, sized for frontal +
left + right). When full, eviction must not starve a pose bucket — otherwise the
gallery drifts back to frontal-only under churn. So eviction:
- counts exemplars per pose bucket,
- picks the most-represented bucket (
over), - deletes the lowest
det_scoreexemplar within that bucket.
The under-represented profiles are protected; the redundant frontals are trimmed. The net effect is angle-invariance at the gallery level: a profile query lands near the stored profile exemplar of the same person, not just their frontals.
The gallery (IdentityGallery, gallery.py) holds every face exemplar in a FAISS
IndexFlatIP (inner product = cosine on unit vectors). best_face_per_identity()
collapses the raw exemplar hits to the best hit per identity, so the margin test
(Section 5) compares the two best distinct people, not two exemplars of one person.
4. Appearance as a within-session helper only¶
Body appearance (OSNet) is deliberately demoted to a within-session, time-decayed assist — never a cross-day identity cue.
- Model. OSNet (
reid_modeldefaultosnet_ain_x1_0_msmt17.onnx— the AIN x1.0 domain-generalizable variant for best cross-camera/angle behaviour;osnet_x0_25_msmt17.onnxis the documented lighter CPU fallback), exported to ONNX, run under onnxruntime-gpu. Input 128×256, ImageNet-normalized, 512-d L2-normalized output.app/reid/embedder.py. - Always computed. A back-turned or masked person still yields a body vector
even when
face_vecisNone— that is the point: appearance is what lets a faceless crop attach to a person established by their face.
Time decay¶
app/reid/decay.py weights every appearance match by
A candidate identity's effective appearance score is
max_i(cosine_i × w(Δt_i)) — a single recent, similar outfit is enough to link,
while a 12-hour-old outfit is worth ~e^{-1} of its cosine and a 2-day-old one is
worth essentially nothing. This is the literal encoding of "the outfit is stale by
tomorrow".
Time/space window gate¶
appearance_candidates() only considers an identity whose freshest appearance
exemplar is still within app_window_seconds (default 600 s) of the query — the
hard window is translated into a min decay-weight floor
exp(-window / TAU). This bounds "teleportation": an identity last seen as a body
hours ago is not an appearance candidate now. Faces carry no such window — they are
time-stable and are never decayed.
5. The matching algorithm — assign()¶
IdentityManager.assign() (manager.py) decides, per sighting, MATCH (which
identity) vs NEW vs DROP. It is the single source of truth; camera-worker subprocesses
each hold a derived gallery and write through to the shared SQLite DB so all workers
converge. The order is strongest-evidence-first.
flowchart TD
A["SightingFeature<br/>(box, face_vec?, appearance_vec?,<br/>face_quality, pose, color_hist, object_class)"] --> B{Sticky / hysteresis?<br/>IoU >= sticky_iou & same class<br/>within sticky_seconds}
B -- yes --> STK["MATCH: keep last identity<br/>(match_kind = sticky)<br/>skip exemplar churn"]
B -- no --> C{Face present?}
C -- yes --> D["best_face_per_identity()<br/>best, second = top-2 distinct"]
D --> E{best >= FACE_STRONG<br/>(0.55)?}
E -- yes --> FM["MATCH by face<br/>(authoritative; ignore margin<br/>& appearance)"]
E -- no --> F{best >= FACE_MATCH (0.42)<br/>AND margin >= 0.06?}
F -- yes --> FM
F -- no --> G["set veto_identity if<br/>best >= FACE_MATCH<br/>(forbid appearance->other id)"]
C -- no --> H[appearance step]
G --> H
H --> I{appearance_vec present<br/>AND candidates in window?}
I -- no --> N[new-identity step]
I -- yes --> J["appearance_candidates()<br/>(decay-weighted, class-scoped,<br/>in time window)"]
J --> K{Face contradiction?<br/>veto_identity set &<br/>best != veto}
K -- yes --> K2["drop to vetoed candidate<br/>or fail appearance"]
K -- no --> L
K2 --> L{Borderline-face corroborates<br/>best & best >= APP_GATE (0.50)<br/>& margin >= 0.05?}
L -- yes --> AM["MATCH by appearance"]
L -- no --> M{Colour gate ok (non-person)<br/>AND best >= bar<br/>(same-cam 0.62 / cross-cam 0.66)<br/>AND margin >= 0.05?}
M -- yes --> AM
M -- no --> N
N{Quality OK for NEW?<br/>person: decent face REQUIRED<br/>else: app + size + sharpness}
N -- no --> DROP["DROP (match_kind = dropped)<br/>no identity, no row"]
N -- yes --> O{New-identity rate<br/>< new_identity_rate_per_min?}
O -- no --> DROP
O -- yes --> NEW["CREATE identity 'Class N'<br/>(provisional if person w/o face)"]
FM --> FIN[finalize: write Sighting,<br/>update exemplars, counters,<br/>refresh sticky]
AM --> FIN
NEW --> FIN
STK --> FIN
Step 0 — Sticky / hysteresis (_sticky_lookup)¶
A continuous track on one camera should not be re-evaluated frame by frame (that
risks an identity flicker and burns GPU). If the current box overlaps a recent box
on the same camera and same object class with IoU ≥ sticky_iou (0.30)
within sticky_seconds (2.0 s), the sighting inherits that identity directly
(match_kind="sticky", persisted as appearance). Exemplar updates are skipped on
this hot path — the streak already enrolled. The same-class check stops a car
inheriting a person's identity just because their boxes briefly overlap.
Step 1 — Face (_decide_by_face)¶
Query the FAISS face index; take the best hit per distinct identity and the second-best for the margin.
best ≥ face_strong(0.55): authoritative. A strong face is conclusive even among look-alikes — appearance is ignored, and a thin margin is ignored.best ≥ face_match(0.42) ANDmargin ≥ match_margin_face(0.06): match. The best-minus-second-best margin rejects the ambiguous case where two different people both score moderately — if the top two are close, the face is not discriminative enough and we fall through rather than guess.- Otherwise fall through to appearance. If the best face is still
≥ face_match, it sets a veto identity (_face_veto_identity): appearance is now forbidden from assigning this sighting to a different identity.
Step 2 — Appearance (_decide_by_appearance)¶
Only reached when the face was absent or inconclusive.
- Class scoping.
appearance_candidates()only returns identities of the sameobject_class— a car only matches cars, a person only persons. Class is enforced in the gallery, so object types can never merge into one identity. - Face-contradiction veto. If a veto identity is set and the top appearance candidate is a different identity, the match is refused; the matcher will only accept the vetoed identity itself (the one the borderline face pointed at).
- Borderline-face fusion (
_face_corroborator). If a face in[face_reject_new (0.32), face_match (0.42))weakly corroborates the same candidate the appearance picked, the bar is lowered from fullapp_matchtoapp_gate(0.50) — weak face + decent appearance jointly clear a bar neither clears alone. - Colour gate (non-person). For objects, a hue-histogram intersection below
color_gate(0.35) forbids the match — a red car never becomes a blue car (app/reid/attributes.py). People are exempt (clothing colour is not identity). - Same- vs cross-camera bar. A pure appearance link clears
app_match(0.62) on the same camera, or the stricterapp_match_cross(0.66) when the candidate has only ever been seen on other cameras — cross-camera appearance links are held to a higher standard. Both also requiremargin ≥ match_margin_app(0.05).
Step 3 — New identity (_create_new)¶
Reached only when neither face nor appearance produced a match. Two gates stand in front of creation:
- Quality gate (
_quality_ok_for_new): a person needs a decent face; otherwise the crop is dropped, not promoted (Section 2). Objects need a sufficiently large, sharp crop with an appearance vector. - New-identity rate limit (
_rate_limit_ok): at mostnew_identity_rate_per_min(default 30) new identities per camera per rolling minute. A burst of unmatched crops (a crowd, a glitch) cannot flood the library — excess is dropped.
On success an Identity is created, auto-named "Person N" / "Car N" / "Dog N"
from its class and id, registered in the gallery, and seeded with its colour
histogram. A faceless non-...person object, or a person with a face, is real; a
faceless person is created provisional and is on probation until maintenance.
Finalize (_finalize)¶
For every accepted sighting (match or new): write the Sighting row (bbox, scores,
match_kind, thumb path, optional event_id); conditionally add a face exemplar
(decent-quality, pose-bucketed) and an appearance exemplar (size-gated, timestamped);
bump num_sightings, first/last_seen, rep_sighting_id; graduate the identity out
of provisional on its 2nd sighting or first face; merge in visual attributes (colour
once; vehicle make/body-type keeping the highest-confidence value); and refresh the
sticky cache. The caller commits, so a whole trigger-frame batch persists atomically.
Per-frame work bound¶
The matcher cannot stall the camera loop on a crowd. The worker
(camera_worker._reidentify) selects which tracks are due this frame —
unidentified tracks first (fast cadence), already-identified tracks on the slower
confident cadence — then caps the number actually embedded at max_reid_per_frame
(default 4). Identity is sticky, so a track that waits a frame keeps its identity
meanwhile.
6. The maintenance pass¶
A daemon thread (app/reid/maintenance.py, every reid_maintenance_interval_seconds,
default 120 s) runs run_once() to keep the library small, accurate and cheap.
One pass, in order:
- Provisional cleanup. An un-named identity with no face evidence and ≤ 1
sighting, older than
provisional_grace_seconds(600 s), is deleted — detector noise, not a person. - Conservative face-only auto-merge. Two un-named identities whose face
centroids are cosine
≥ face_merge_threshold(0.60) and which were never seen on different cameras at the same instant (_temporally_conflicting, ±5 s — you can't be in two places at once) are merged, repairing over-segmentation. The lower id survives ("Person N" stays stable). Appearance never merges (clothing is shared/ambiguous) andis_namedidentities are frozen. - Per-identity prune + recompute.
- Appearance: drop exemplars whose decay weight has fallen below the prune
floor (
exp(-2)) or that exceed the hard age cap (default 7 days); then cap atmax_app_exemplars(16) by combined quality × decay. - Faces: never decayed; cap at
max_face_exemplars(evict lowestdet_score). - Recompute the
face_centroid(mean of face exemplars) andappearance_centroid(decay-weighted mean); refreshrep_sighting_idto the best face-bearing, high-score, recent thumbnail; keepnum_sightings/first/last_seenhonest.
After a pass, the API-process gallery is asked to reload so operator UIs and
in-process matching see the compacted state. Operators can also merge and
split identities by hand via the identities API; split() can auto-recluster an
identity's exemplars (2-means on faces, falling back to appearance) when no explicit
sighting list is given.
7. The honest limit¶
A camera that only ever sees the back of a head carries no biometric that any method can use. ArcFace needs a face; OSNet sees only a coat that will be different tomorrow. There is no algorithm that can turn a back-of-head into a reliable cross-day identity — and any system that claims to is fabricating one.
Iris is designed to be correct under that constraint rather than to pretend it away:
- A faceless person view never mints a new identity
(
require_face_for_new_person). It can only attach to a person already established by their face, and only within a live session window, otherwise it is dropped. The failure mode is "we didn't identify this back-of-head", not "we invented twelve people". - Appearance links are time-decayed and window-gated, so a faceless body match cannot silently bridge across days on clothing alone.
- The margin and new-identity-rate gates make the system conservative under ambiguity: when two candidates are close, or evidence is weak, it declines to guess.
The practical consequence is a deliberate bias: Iris under-claims rather than over-claims. It would rather leave a poorly-seen person un-identified than split a real person into duplicates or merge two strangers. For a surveillance system whose output operators act on, refusing to fabricate identity from a non-identifiable view is not a missing feature — it is the correct behaviour, and it is enforced in code, not left to a threshold someone might lower.
Appendix — key thresholds¶
All overridable via Settings / env (app/config.py); MatchConfig holds the
matcher-side defaults.
| Symbol | Default | Role |
|---|---|---|
face_strong |
0.55 | authoritative face match (ignores margin/appearance) |
face_match |
0.42 | face match (with margin); also sets the appearance veto |
face_reject_new |
0.32 | floor for borderline-face corroboration |
match_margin_face |
0.06 | best − 2nd-best required for a non-strong face match |
face_exemplar_min_quality |
0.35 | decent-face gate (det_score × frontalness) for new id + exemplars |
face_exemplar_hi |
0.90 | above this a face is a near-duplicate, not stored |
face_merge_threshold |
0.60 | conservative face-only auto-merge (un-named only) |
app_match / app_match_cross |
0.62 / 0.66 | appearance bar, same- vs cross-camera |
app_gate |
0.50 | lowered appearance bar when a borderline face corroborates |
match_margin_app |
0.05 | best − 2nd-best required for an appearance match |
color_gate |
0.35 | hue-histogram floor for non-person appearance matches |
app_window_seconds |
600 | appearance candidacy time window |
app_decay_tau_seconds |
43 200 (12 h) | appearance time-decay constant |
max_face_exemplars / max_app_exemplars |
12 / 16 | per-identity gallery caps |
new_identity_rate_per_min |
30 | per-camera anti-explosion cap |
sticky_iou / sticky_seconds |
0.30 / 2.0 | continuous-track hysteresis |
max_reid_per_frame |
4 | per-frame re-ID work bound (worker) |