Object tracking

Central to AXIS Scene Metadata is object tracking data and the concept of object tracking, more specifically Multi-Object Tracking (MOT). This section includes a brief description of object tracking.

In an object tracking system, sensors are used to generate measurements or detections of objects in an environment, a tracker can use these measurements to identify and track objects over time.

One way to gather measurements for an object tracker is to use computer vision. There, an AI algorithm can be used to find and classify objects for each frame. Feeding this data into a tracker, it can both maintain the objects identities and estimate their state over time.

An object's state encapsulates the current attributes of an object being tracked. The object state typically includes attributes such as position, classification and appearance. Exactly what attributes the estimated state includes is dependent on the type of the tracker and its capabilities. The object state is continuously updated by the tracker to reflect the current state of an object given the latest observation and predictions from the tracker.

A continuous sequence of estimated object states, for one object, forms a track. A track is initialized when a new object is first detected, updated with each update of the estimated object state, and terminated when the tracker no longer has a perception of the estimated object state.

Sensors other than video can also be used for object tracking, one such example is radar. From the context of a radar, object tracking involves tracking multiple detected objects over a series of radar scans.

Combining the results from multiple sensors and tracking algorithms can provide more comprehensive and accurate understanding of the object in the environment. This is done by leveraging the strengths of each sensor type and algorithm by combining the data in a sensor fusion.