Fusion tracker

The Fusion Tracker module works to fuse the output from several more specialized trackers into a single more comprehensive and accurate tracking data output. The module is also responsible for incorporating other features such as, for example, producing object snapshots and producing object geographical coordinates.

The data processing pipeline that includes Fusion Tracker instance can be visualized as,

A single Fusion Tracker module instance is dependent on data from module instances of the types:

Video Object Detection Tracker
Video Motion Tracker
Radar Motion Tracker (if Radar Video Fusion Feature is supported)

Since this module builds on data from previous modules, the configuration of the input modules will also effect the fusion module instance.

Instances

A device is preconfigured with a fixed number of instances of this module type.

In general there is one instance of this module type for each physical video channel corresponding to each image sensor on the device. There is an exception for multisensor cameras that producer a panorama image that is stitched from several image sensors.

Devices with one image sensor will have one tracker instance.
Multidirectional cameras will have one tracker instance for each image sensor.
Multisensor cameras that produce a panorama image that is stitched together from several image senors will have a one instance of the tracker corresponding to the viewarea of full stitched image.

Function

The Fusion Tracker module tracks and estimates state of objects in the scene. The estimated state of each tracked object is continually updated to provide up to date estimates for each object's state currently present in the scene.

Update frequency, approx 10Hz – on multidirectional cameras when multiple instances are producing data the framerate may be lower.
Latency, approx 1s – the latency of the output varies but is generally around 1 second.
Heartbeat, every 2s – If no objects are detected in the scene a heartbeat message (an empty scene message) will be sent every 2s.

To indicate that an object's track has been terminated (never to be updated again) a delete track operation is sent. Tracked objects that are classified humans may be re-identified (ReID), this is represented by sending a rename track operation saying that an id has been renamed. For more information on ReID, refer to the description of ReID on the ReID concept page.

Object state

Depending on how the module is configured, what sensors are considered, and other factors the exact estimated state might be different. An overview of the object state estimated for each object is found below.

General:

Class - Object classification.
- Human
- Human Face
- Vehicle - Car
- Vehicle - Bus
- Vehicle - Truck
- Bike
- License Plate
Vehicle Color - Vehicle color
Human Upper Clothing Color - Human upper clothing color
Human Lower Clothing Color - Human lower clothing color
License Plate Text - License plate text (requires license plate text feature)
Image - The current snapshot image of the object (requires best snapshot feature)

2D Image Perspective:

Bounding Box - Object position as bounding box in the image
CenterOfGravity/Centroid - Position of the object as a point in the image

3D Device Perspective:

Spherical Coordinate - Position of the object in spherical coordinates (requires radar-video fusion feature)
Speed - Speed of the object (requires radar-video fusion feature)
Direction - Direction of the object (requires radar-video fusion feature)

3D World Perspective:

Geolocation - Geographic coordinates of the object (requires geographic coordinates feature)

Track operations

Delete - marks track end
Rename - renames one track id to another id
Merge - marks a merger to two ids into one (only on M1135 Mk II, M1137 Mk II and Q3819-PVE)
Split - marks a split of one id into two (only on M1135 Mk II, M1137 Mk II and Q3819-PVE)

Output protocols

This module's output can be retrieved using a variety of methods. Depending on what method is used the data can be sent on different formats, below is a full list of ways to receive the output.

Protocol	Name/Address	Format	Guide
RTSP	Address: rtsp://ip-address/axis-media/media.amp?analytics=polygon Source: AnalyticsSceneDescription	ONVIF `tt:Frame`	Configure scene metadata over RTSP
MQTT	com.axis.analytics_scene_description.v0.beta	ADF Frame	scene metadata over MQTT
Message Broker	com.axis.analytics_scene_description.v0.beta	ADF Frame	ACAP example (consume-scene-metadata)

Configuration

There are a number of configuration options effecting the output from this module. All of the below configuration effects all instances unless otherwise stated.

Some configuration options require a device restart to take effect. Best practice is to restart the device when a configuration options is changed to be sure it has taken effect.

All configuration options is not available on all devices.

Description	Method	Default value	Note	Configuration guide
Best snapshot feature	Rest API, `http://<camera-ip>/config/rest/best-snapshot/v1/enabled`	OFF	See details below.	Enable and Retrieve Object Snapshots
Geographic coordinates feature	Configure device geolocation and device geoorentation using designated cgi:s	OFF	This feature is implicitly enabled by configuring the device geolocation and geoorientation. Once enabled it can not be turned off other than by a device reset. Only available on some devices, see more details below.	Enable Geographic Coordinates
License plate text feature	Install AXIS License Plate Verifier ACAP	OFF	This feature is implicitly enabled by installing the AXIS License Plate Verifier ACAP. Only available on some devices, see more details below.	Install ACAP
Image rotation	Image rotation CGI	Device dependent	The object state related using a 2D image perspective will be rotated according to the input video rotation rotation, this configuration is per instance.	Image source rotation

Features behind feature flag

warning

These are experimental features not ready for use in production. No support is provided for issues at this stage.

Description	Feature flag name	Default value	Note
Experimental classes feature	`metadata_fusion_experimental_classes`	OFF	See details below.

Best snapshot feature

A complete list of devices that supports this feature can be found at the AXIS Scene Metadata product page by filtering on the functionality "Best snapshot".

The Best Snapshot feature enables the capturing of object snapshots in a scene. When enabled this feature adds a base-64 encoded cropped image to both classified and unclassified objects. The current implementation is heavily focused on bounding box size during the track lifetime. Snapshots may be sent more than once for each object, if better alternatives are found.

This feature interfaces with the Fusion Tracker module as shown below,

To enable this feature see guide.

Radar video fusion feature

info

This feature is only available radar video fusion cameras such as Q1686-DLE and Q1656-DLE.

The radar video fusion feature enables the Fusion Tracker module to also fuse tracking data generated from radar scans.

This feature will add additional information to the object state for objects detected by the radar. The additional info includes

Spherical Coordinate
Speed
Direction

As the radar sensor is designed to detect objects far away, using this feature will also improve detection range.

This feature can also be used in combination with the Geographic Coordinates this feature to receive geolocation data.

License plate verifier integration feature

info

This feature is only available on Q1686-DLE.

The AXIS License Plate Verifier ACAP enables capturing and recognizing of license plate text. By integrating this data into the Fusion Tracker module through this feature, the same information can now be provided for the Fusion Tracker module output.

The ALPV ACAP acap can be visualized as a processing module together with the Fusion Tracker module as such,

Geographic coordinates

info

This feature is only available radar video fusion cameras such as Q1686-DLE and Q1656-DLE.

The geographic coordinates feature allows the object state in the output to include global, decimal degrees, geographical coordinates. For the device to be able to calculate correct coordinates it has to be correctly and precisely configured. Coordinates are calculated using the lowest, middle point of each bounding-box, therefore scenarios where a human's feet are not visible will result in miscalculated coordinates.

A this features is based on the data from the Radar Motion Tracker module only objects being detected by the radar for a specific frame will receive geolocation data.

Read the how-to on how to enable this feature.

Experimental classes

Some classes are considered experimental and can be turned on by the enabling the experimental classes feature flag, see feature flags.

Classes:

Bag
- Backpack
- Suitcase
- Other
Animal

Attributes:

Hat on (head/face)
- Hard hat
- Other

Limitations

Mirroring is not supported.
- If the object tracking should be visualized on a mirrored image, the user must apply mirroring.
Objects must be moving to be considered by this tracker.
This tracker only support video stream default aspect ratios.

Instances​

Function​

Object state​

Track operations​

Output protocols​

Configuration​

Features behind feature flag​

Best snapshot feature​

Radar video fusion feature​

License plate verifier integration feature​

Geographic coordinates​

Experimental classes​

Limitations​

Instances

Function

Object state

Track operations

Output protocols

Configuration

Features behind feature flag

Best snapshot feature

Radar video fusion feature

License plate verifier integration feature

Geographic coordinates

Experimental classes

Limitations