Skip to content

Cross-Camera Re-Identification

Design deep-dive for the identity engine under app/reid/. Source of truth is the code; every threshold, gate and behaviour described here maps to a named symbol in app/reid/manager.py, gallery.py, pipeline.py, decay.py, maintenance.py and app/faces/recognizer.py. Where a default is quoted it is the value in app/config.py (Settings) or MatchConfig.

Re-identification is the part of Iris that turns a stream of disconnected detections into a small, stable library of people and objects that persists across cameras and across days. A person who walks past the front camera at 09:00, the yard camera at 12:00 and the door camera the next morning should resolve to one identity — "Person 14" — not three, and not fourteen.

This document explains how that is done, why the obvious approaches fail, and the exact decision procedure the online matcher runs for every sighting.


1. The problem

Cross-camera identity is hard for reasons that have nothing to do with model accuracy:

  • Viewpoint. The same person is seen frontal on one camera, in profile on the next, and from directly behind on a third. A body crop from behind and a body crop from the front of the same person can be less similar than two different people seen from the same angle.
  • Clothing across days. Body/appearance re-ID models (OSNet, etc.) key heavily on clothing and silhouette. That is a strong cue within a session and a worthless one across days — the same person in a different coat looks like a stranger, and two strangers in the same uniform look identical.
  • Lighting and camera response. Colour and texture statistics shift between a daylight yard camera and an IR-lit corridor.

The naive design — embed every person crop with a body re-ID model, match by cosine, mint a new identity when nothing is close enough — fails in a specific and visible way: duplicate explosion. A person seen from behind produces a body embedding that doesn't match their own frontal embedding, so the matcher creates a new identity. Walk a single person across a camera's field of view and you get a dozen "people". Appearance alone has no anchor that survives a viewpoint change, so it cannot decide that two dissimilar-looking crops are the same person.

The design choice that makes the rest of the system work is to pick a cue that is stable across clothing, viewpoint, lighting and time, and to anchor identity on it.


2. The face anchor

Identity is anchored on the face/head — the only signal that is stable across a change of clothes, a change of viewpoint (within the face's own pose range), a change of lighting, and the passage of days.

Extraction

Face features come from the insightface buffalo_l pack via app/faces/recognizer.py:

  1. Detect with SCRFD-10G (640×640 input, det_thresh default 0.5), producing a bounding box, a detection score, and 5 facial landmarks (DetectedFace.kps).
  2. Align + embed with ArcFace (r50), producing a 512-d, L2-normalized embedding. Because vectors are unit-norm, an inner-product index gives cosine similarity directly.

ArcFace is run once on the whole trigger frame, not re-run inside every person crop. IdentityPipeline.extract() (app/reid/pipeline.py) detects all faces, then assigns each face to the smallest person box that contains the face centroid (smallest-area containment correctly handles overlapping people). This is cheaper than per-crop re-detection and keeps a single face model resident on the GPU.

The "decent face" quality gate

Not every detected face is worth trusting. A tiny, blurry, or extreme-profile face yields an ArcFace vector that is garbage — close to no one in particular, or spuriously close to the wrong person. Storing it pollutes the gallery forever.

So each assigned face carries a pose-aware quality score:

face_quality = det_score × frontalness(landmarks)

frontalness() (pipeline.py) is a cheap landmark geometry: a frontal face has the nose horizontally centred between the eyes; a profile pushes it toward one side, driving the score toward 0. A face only counts as decent when face_quality ≥ face_exemplar_min_quality (default 0.35).

This single gate is load-bearing in two places:

  • A new identity is created from a face only if the face is decent (_quality_ok_for_new).
  • A face is stored as a gallery exemplar only if it is decent (_maybe_add_face_exemplar).

Why a faceless view never mints an identity

This is the rule that defeats duplicate explosion, in MatchConfig:

require_face_for_new_person = True

In _quality_ok_for_new():

  • A decent face qualifies a new identity (faces are the strong, rare evidence).
  • For a person with no decent face — a back, a far side view, a head crop with no landmarks — require_face_for_new_person forces a False. It is never allowed to create a new identity.

A faceless person crop has exactly two fates: it attaches to an existing identity by appearance within a session (Section 4), or it is dropped (match_kind="dropped", no DB row written). It can never spawn a person. That is precisely what stops one person seen from behind from fragmenting into dozens of duplicates — a back-of-head carries no identity, so the system refuses to invent one from it.

Non-person objects (cars, animals) have no face, so they are held to a different, appearance-based quality bar rather than the face requirement (require_quality_for_new, min_app_box_area_frac, min_crop_quality_for_new). A faceless person that does survive on first sighting is marked is_provisional=True and is deleted by maintenance if it never accrues a second sighting or a face (Section 6).


A frontal-only gallery still fails on viewpoint: a profile query against only frontal exemplars scores low even for the right person. ArcFace is robust to some pose, not arbitrary pose. The fix is to make the gallery itself angle-invariant by storing exemplars across the pose range.

Signed yaw → pose buckets

face_pose() (pipeline.py) computes a signed yaw in roughly [-0.5, +0.5] from the 5 landmarks (nose offset relative to the eye span): 0 ≈ frontal, negative and positive = head turned to each side. It is persisted on every FaceExemplar.pose.

_face_pose_bucket() (manager.py) quantizes it into three buckets:

Bucket Condition Meaning
-1 yaw ≤ -0.12 turned one way
0 -0.12 < yaw < 0.12 frontal
+1 yaw ≥ 0.12 turned the other way

Pose-diverse acceptance

When a new decent face arrives for an existing identity, the old instinct — "skip it, we already have a face for this person" — is wrong: it keeps the gallery frontal-only. Instead _maybe_add_face_exemplar() only rejects a face that is a true near-duplicate of one already stored (cosine > face_exemplar_hi, default 0.90). A face that scores low against existing exemplars is not noise — it is a new pose of the same tracked person, and it is exactly the left/right view the gallery needs to recognise this person from that angle later. So it is kept.

Pose-aware eviction

The per-identity cap is max_face_exemplars (default 12, sized for frontal + left + right). When full, eviction must not starve a pose bucket — otherwise the gallery drifts back to frontal-only under churn. So eviction:

  1. counts exemplars per pose bucket,
  2. picks the most-represented bucket (over),
  3. deletes the lowest det_score exemplar within that bucket.

The under-represented profiles are protected; the redundant frontals are trimmed. The net effect is angle-invariance at the gallery level: a profile query lands near the stored profile exemplar of the same person, not just their frontals.

The gallery (IdentityGallery, gallery.py) holds every face exemplar in a FAISS IndexFlatIP (inner product = cosine on unit vectors). best_face_per_identity() collapses the raw exemplar hits to the best hit per identity, so the margin test (Section 5) compares the two best distinct people, not two exemplars of one person.


4. Appearance as a within-session helper only

Body appearance (OSNet) is deliberately demoted to a within-session, time-decayed assist — never a cross-day identity cue.

  • Model. OSNet (reid_model default osnet_ain_x1_0_msmt17.onnx — the AIN x1.0 domain-generalizable variant for best cross-camera/angle behaviour; osnet_x0_25_msmt17.onnx is the documented lighter CPU fallback), exported to ONNX, run under onnxruntime-gpu. Input 128×256, ImageNet-normalized, 512-d L2-normalized output. app/reid/embedder.py.
  • Always computed. A back-turned or masked person still yields a body vector even when face_vec is None — that is the point: appearance is what lets a faceless crop attach to a person established by their face.

Time decay

app/reid/decay.py weights every appearance match by

w(Δt) = exp(-Δt / TAU),   TAU = app_decay_tau_seconds (default 43 200 s = 12 h)

A candidate identity's effective appearance score is max_i(cosine_i × w(Δt_i)) — a single recent, similar outfit is enough to link, while a 12-hour-old outfit is worth ~e^{-1} of its cosine and a 2-day-old one is worth essentially nothing. This is the literal encoding of "the outfit is stale by tomorrow".

Time/space window gate

appearance_candidates() only considers an identity whose freshest appearance exemplar is still within app_window_seconds (default 600 s) of the query — the hard window is translated into a min decay-weight floor exp(-window / TAU). This bounds "teleportation": an identity last seen as a body hours ago is not an appearance candidate now. Faces carry no such window — they are time-stable and are never decayed.


5. The matching algorithm — assign()

IdentityManager.assign() (manager.py) decides, per sighting, MATCH (which identity) vs NEW vs DROP. It is the single source of truth; camera-worker subprocesses each hold a derived gallery and write through to the shared SQLite DB so all workers converge. The order is strongest-evidence-first.

flowchart TD
    A["SightingFeature<br/>(box, face_vec?, appearance_vec?,<br/>face_quality, pose, color_hist, object_class)"] --> B{Sticky / hysteresis?<br/>IoU >= sticky_iou & same class<br/>within sticky_seconds}
    B -- yes --> STK["MATCH: keep last identity<br/>(match_kind = sticky)<br/>skip exemplar churn"]
    B -- no --> C{Face present?}

    C -- yes --> D["best_face_per_identity()<br/>best, second = top-2 distinct"]
    D --> E{best >= FACE_STRONG<br/>(0.55)?}
    E -- yes --> FM["MATCH by face<br/>(authoritative; ignore margin<br/>& appearance)"]
    E -- no --> F{best >= FACE_MATCH (0.42)<br/>AND margin >= 0.06?}
    F -- yes --> FM
    F -- no --> G["set veto_identity if<br/>best >= FACE_MATCH<br/>(forbid appearance->other id)"]

    C -- no --> H[appearance step]
    G --> H

    H --> I{appearance_vec present<br/>AND candidates in window?}
    I -- no --> N[new-identity step]
    I -- yes --> J["appearance_candidates()<br/>(decay-weighted, class-scoped,<br/>in time window)"]
    J --> K{Face contradiction?<br/>veto_identity set &<br/>best != veto}
    K -- yes --> K2["drop to vetoed candidate<br/>or fail appearance"]
    K -- no --> L
    K2 --> L{Borderline-face corroborates<br/>best & best >= APP_GATE (0.50)<br/>& margin >= 0.05?}
    L -- yes --> AM["MATCH by appearance"]
    L -- no --> M{Colour gate ok (non-person)<br/>AND best >= bar<br/>(same-cam 0.62 / cross-cam 0.66)<br/>AND margin >= 0.05?}
    M -- yes --> AM
    M -- no --> N

    N{Quality OK for NEW?<br/>person: decent face REQUIRED<br/>else: app + size + sharpness}
    N -- no --> DROP["DROP (match_kind = dropped)<br/>no identity, no row"]
    N -- yes --> O{New-identity rate<br/>< new_identity_rate_per_min?}
    O -- no --> DROP
    O -- yes --> NEW["CREATE identity 'Class N'<br/>(provisional if person w/o face)"]

    FM --> FIN[finalize: write Sighting,<br/>update exemplars, counters,<br/>refresh sticky]
    AM --> FIN
    NEW --> FIN
    STK --> FIN

Step 0 — Sticky / hysteresis (_sticky_lookup)

A continuous track on one camera should not be re-evaluated frame by frame (that risks an identity flicker and burns GPU). If the current box overlaps a recent box on the same camera and same object class with IoU ≥ sticky_iou (0.30) within sticky_seconds (2.0 s), the sighting inherits that identity directly (match_kind="sticky", persisted as appearance). Exemplar updates are skipped on this hot path — the streak already enrolled. The same-class check stops a car inheriting a person's identity just because their boxes briefly overlap.

Step 1 — Face (_decide_by_face)

Query the FAISS face index; take the best hit per distinct identity and the second-best for the margin.

  • best ≥ face_strong (0.55): authoritative. A strong face is conclusive even among look-alikes — appearance is ignored, and a thin margin is ignored.
  • best ≥ face_match (0.42) AND margin ≥ match_margin_face (0.06): match. The best-minus-second-best margin rejects the ambiguous case where two different people both score moderately — if the top two are close, the face is not discriminative enough and we fall through rather than guess.
  • Otherwise fall through to appearance. If the best face is still ≥ face_match, it sets a veto identity (_face_veto_identity): appearance is now forbidden from assigning this sighting to a different identity.

Step 2 — Appearance (_decide_by_appearance)

Only reached when the face was absent or inconclusive.

  • Class scoping. appearance_candidates() only returns identities of the same object_class — a car only matches cars, a person only persons. Class is enforced in the gallery, so object types can never merge into one identity.
  • Face-contradiction veto. If a veto identity is set and the top appearance candidate is a different identity, the match is refused; the matcher will only accept the vetoed identity itself (the one the borderline face pointed at).
  • Borderline-face fusion (_face_corroborator). If a face in [face_reject_new (0.32), face_match (0.42)) weakly corroborates the same candidate the appearance picked, the bar is lowered from full app_match to app_gate (0.50) — weak face + decent appearance jointly clear a bar neither clears alone.
  • Colour gate (non-person). For objects, a hue-histogram intersection below color_gate (0.35) forbids the match — a red car never becomes a blue car (app/reid/attributes.py). People are exempt (clothing colour is not identity).
  • Same- vs cross-camera bar. A pure appearance link clears app_match (0.62) on the same camera, or the stricter app_match_cross (0.66) when the candidate has only ever been seen on other cameras — cross-camera appearance links are held to a higher standard. Both also require margin ≥ match_margin_app (0.05).

Step 3 — New identity (_create_new)

Reached only when neither face nor appearance produced a match. Two gates stand in front of creation:

  • Quality gate (_quality_ok_for_new): a person needs a decent face; otherwise the crop is dropped, not promoted (Section 2). Objects need a sufficiently large, sharp crop with an appearance vector.
  • New-identity rate limit (_rate_limit_ok): at most new_identity_rate_per_min (default 30) new identities per camera per rolling minute. A burst of unmatched crops (a crowd, a glitch) cannot flood the library — excess is dropped.

On success an Identity is created, auto-named "Person N" / "Car N" / "Dog N" from its class and id, registered in the gallery, and seeded with its colour histogram. A faceless non-...person object, or a person with a face, is real; a faceless person is created provisional and is on probation until maintenance.

Finalize (_finalize)

For every accepted sighting (match or new): write the Sighting row (bbox, scores, match_kind, thumb path, optional event_id); conditionally add a face exemplar (decent-quality, pose-bucketed) and an appearance exemplar (size-gated, timestamped); bump num_sightings, first/last_seen, rep_sighting_id; graduate the identity out of provisional on its 2nd sighting or first face; merge in visual attributes (colour once; vehicle make/body-type keeping the highest-confidence value); and refresh the sticky cache. The caller commits, so a whole trigger-frame batch persists atomically.

Per-frame work bound

The matcher cannot stall the camera loop on a crowd. The worker (camera_worker._reidentify) selects which tracks are due this frame — unidentified tracks first (fast cadence), already-identified tracks on the slower confident cadence — then caps the number actually embedded at max_reid_per_frame (default 4). Identity is sticky, so a track that waits a frame keeps its identity meanwhile.


6. The maintenance pass

A daemon thread (app/reid/maintenance.py, every reid_maintenance_interval_seconds, default 120 s) runs run_once() to keep the library small, accurate and cheap. One pass, in order:

  1. Provisional cleanup. An un-named identity with no face evidence and ≤ 1 sighting, older than provisional_grace_seconds (600 s), is deleted — detector noise, not a person.
  2. Conservative face-only auto-merge. Two un-named identities whose face centroids are cosine ≥ face_merge_threshold (0.60) and which were never seen on different cameras at the same instant (_temporally_conflicting, ±5 s — you can't be in two places at once) are merged, repairing over-segmentation. The lower id survives ("Person N" stays stable). Appearance never merges (clothing is shared/ambiguous) and is_named identities are frozen.
  3. Per-identity prune + recompute.
  4. Appearance: drop exemplars whose decay weight has fallen below the prune floor (exp(-2)) or that exceed the hard age cap (default 7 days); then cap at max_app_exemplars (16) by combined quality × decay.
  5. Faces: never decayed; cap at max_face_exemplars (evict lowest det_score).
  6. Recompute the face_centroid (mean of face exemplars) and appearance_centroid (decay-weighted mean); refresh rep_sighting_id to the best face-bearing, high-score, recent thumbnail; keep num_sightings / first/last_seen honest.

After a pass, the API-process gallery is asked to reload so operator UIs and in-process matching see the compacted state. Operators can also merge and split identities by hand via the identities API; split() can auto-recluster an identity's exemplars (2-means on faces, falling back to appearance) when no explicit sighting list is given.


7. The honest limit

A camera that only ever sees the back of a head carries no biometric that any method can use. ArcFace needs a face; OSNet sees only a coat that will be different tomorrow. There is no algorithm that can turn a back-of-head into a reliable cross-day identity — and any system that claims to is fabricating one.

Iris is designed to be correct under that constraint rather than to pretend it away:

  • A faceless person view never mints a new identity (require_face_for_new_person). It can only attach to a person already established by their face, and only within a live session window, otherwise it is dropped. The failure mode is "we didn't identify this back-of-head", not "we invented twelve people".
  • Appearance links are time-decayed and window-gated, so a faceless body match cannot silently bridge across days on clothing alone.
  • The margin and new-identity-rate gates make the system conservative under ambiguity: when two candidates are close, or evidence is weak, it declines to guess.

The practical consequence is a deliberate bias: Iris under-claims rather than over-claims. It would rather leave a poorly-seen person un-identified than split a real person into duplicates or merge two strangers. For a surveillance system whose output operators act on, refusing to fabricate identity from a non-identifiable view is not a missing feature — it is the correct behaviour, and it is enforced in code, not left to a threshold someone might lower.


Appendix — key thresholds

All overridable via Settings / env (app/config.py); MatchConfig holds the matcher-side defaults.

Symbol Default Role
face_strong 0.55 authoritative face match (ignores margin/appearance)
face_match 0.42 face match (with margin); also sets the appearance veto
face_reject_new 0.32 floor for borderline-face corroboration
match_margin_face 0.06 best − 2nd-best required for a non-strong face match
face_exemplar_min_quality 0.35 decent-face gate (det_score × frontalness) for new id + exemplars
face_exemplar_hi 0.90 above this a face is a near-duplicate, not stored
face_merge_threshold 0.60 conservative face-only auto-merge (un-named only)
app_match / app_match_cross 0.62 / 0.66 appearance bar, same- vs cross-camera
app_gate 0.50 lowered appearance bar when a borderline face corroborates
match_margin_app 0.05 best − 2nd-best required for an appearance match
color_gate 0.35 hue-histogram floor for non-person appearance matches
app_window_seconds 600 appearance candidacy time window
app_decay_tau_seconds 43 200 (12 h) appearance time-decay constant
max_face_exemplars / max_app_exemplars 12 / 16 per-identity gallery caps
new_identity_rate_per_min 30 per-camera anti-explosion cap
sticky_iou / sticky_seconds 0.30 / 2.0 continuous-track hysteresis
max_reid_per_frame 4 per-frame re-ID work bound (worker)