liblarod  3.6.5
Neural Network Inference

Table of Contents

Neural network models

Neural network models are provided as binary blobs to the backends. These binary blobs are generally produced by the backend specific toolchains. These binary blobs are provided through an open file descriptor in the larodLoadModel()/larodLoadModelAsync() functions. Each backend is then responsible for unpacking, parsing and loading the binary model using its specific runtime calls.

Supported backends

Currently the following neural network inference backends are supported by larod.

TFLite CPU

This backend executes TFLite models on the SoC CPU(s). Note that the CPU subsystem generally have limited performance compared to dedicated hardware accelerators.

The device name of this backend is "cpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

TFLite EdgeTPU

This backend executes TFLite models on the Google EdgeTPU accelerator.

The device name of this backend is "google-edge-tpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties for running jobs

This backend supports the fd access types LAROD_FD_PROP_DMABUF, LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE for input tensors, but only the latter two for output tensors.

The access type LAROD_FD_PROP_DMABUF provides less overhead since the buffer fd will be passed directly to the EdgeTPU library through TFLite without extra copies in larod. The fd offsets for input tensors must be 0 when using the LAROD_FD_PROP_DMABUF access method. The application is responsible for ensuring that the external RAM is up to date with CPU cache before the inference is started (the service will not initiate any cache flush operations). Refer to About dma-buf for more info about dma-buf and user space synchronization.

The access type LAROD_FD_PROP_MAP will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each tensor buffer - these extra operations will degrade performance. Only the access type LAROD_FD_PROP_READWRITE is supported for output tensors.

Allocation support

Using the call larodAllocModelInputs() with a model loaded to this backend two kinds of tensor buffers can be allocated. If the LAROD_FD_PROP_READWRITE is not set as required in the call, then tensors with mappable file descriptors based on ARTPEC VMEM dma-bufs will be returned. As such these tensors will have the fd props LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF set. If however LAROD_FD_PROP_READWRITE is required, then tensors with readable, writable and mappable file descriptors will be returned. As such these tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

If the user application wants to pass an ARTPEC VMEM buffer to larod, it first needs to convert the buffer to dma-buf. This can be done by passing the fd of the application's ARTPEC VMEM buffer, along with its offset, to larodConvertVmemFdtoDmabuf(). Then a new fd of the converted dma-buf will be returned.

Tensors allocated using larodAllocModelOutputs() and with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Ambarella CVFlowNN

This backend executes neural network models on the VP (Vector Processor) in Ambarella chips. The models need to be converted to a suitable format using Ambarella toolchain before deployment on this backend.

The device name of this backend is "ambarella-cvflow"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties

This backend supports the fd access types LAROD_FD_PROP_DMABUF and LAROD_FD_PROP_READWRITE.

The access type LAROD_FD_PROP_DMABUF provides less overhead since the buffer will be passed directly to the underlying inference framework without extra copies in larod. The client is responsible for cache maintenance of the buffers when using LAROD_FD_PROP_DMABUF. Refer to About dma-buf for more info about dma-buf and user space synchronization. Note that the supplied dma-bufs must be allocated by the Ambarella platform.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

By using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend, two kinds of tensor buffers can be allocated. If the LAROD_FD_PROP_READWRITE is not set as required in the call, then tensors with mappable file descriptors based on Cavalry Mem dma-bufs will be returned. As such these tensors will have the fd props LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF set. If however LAROD_FD_PROP_READWRITE is required, then tensors with readable, writable and mappable file descriptors will be returned. As such these tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

TFLite GPU

This backend executes TFLite models on an OpenGL capable HW accelerator.

The device name of this backend is "gpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

NOTE This is an experimental feature and could be removed or changed at any time.

Supported format of model data

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

TFLite ARTPEC-7 GPU

This backend executes TFLite models on the GPU in ARTPEC-7.

The device name of this backend is "axis-a7-gpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Model compilation and caching

When a model is loaded on the ARTPEC-7 GPU backend it is compiled into a different format native to the accelerator, the .nb format, also referred to NBG. The .nb files are cached in flash after the compilation is complete. Once a file is cached the corresponding .tflite model will be loaded faster, as the entire compilation step will be skipped.

The cached NBG files are stored in /var/lib/larod/nbg-cache. Whenever a new model file is compiled, a new NBG file will be created in this location. If the maximum number of models is exceeded, or if the accumulated size of all the cached models exceeds the storage limit, older models will be removed to make space for the new one.

Native ARTPEC-7 GPU

This backend executes models compiled to run natively on the GPU accelerator in ARTPEC-7.

The device name of this backend is "a7-gpu-native"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

A compiled model format native for the ARTPEC-7 GPU, usually identified by a .nb or .nbg file extension. This is the same model file that is created by the backend device "a7-gpu-tflite", see the corresponding compilation and cache section for more information.

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Optional parameter support

This backend supports the skip-input-dma-sync and skip-output-dma-sync parameters, see the documentation for dma-buf for more details.

TFLite ARTPEC-9 GPU

This backend executes TFLite models on the GPU in ARTPEC-9.

The device name of this backend is "armnn-gpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Model compilation and caching

When a model is loaded on a GPU via Arm NN, it is compiled into a different format native to the accelerator. Since this conversion takes a lot of time, sometimes up to several minutes depending on the model, the compiled files are cached in flash after the compilation is complete. Once a file is cached the corresponding .tflite model will be loaded substantially faster, as the entire compilation step will be skipped.

The cache is located at /var/lib/larod/gpu-model-cache. Whenever a new model file is compiled, a new cache file will be created in this cache. If the maximum number of models is exceeded, the cache file with the longest time since last access will be removed to make space for the new one.

TFLite ARTPEC-8 DLPU

This backend executes TFLite models on the DLPU accelerator in ARTPEC-8.

The device name of this backend is "axis-a8-dlpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Model compilation and caching

When a model is loaded on the ARTPEC-8 DLPU it is compiled into a different format native to the accelerator, the .nb format, also referred to NBG. Since this conversion takes a lot of time, sometimes up to several minutes depending on the model, the .nb files are cached in flash after the compilation is complete. Once a file is cached the corresponding .tflite model will be loaded substantially faster, as the entire compilation step will be skipped.

The cached NBG files are stored in /var/lib/larod/nbg-cache. This cache storage is limited to 64 MiB and 16 files. Whenever a new model file is compiled a new NBG file will be created in this location. If the maximum number of models is exceeded, or if the accumulated size of all the cached models exceeds the storage limit, older models will be removed to make space for the new one.

Native ARTPEC-8 DLPU

This backend executes models compiled to run natively on the DLPU accelerator in ARTPEC-8.

The device name of this backend is "axis-a8-dlpu-native"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

A compiled model format native for the ARTPEC-8 DLPU, usually identified by a .nb or .nbg file extension. This is the same model file that is created by the backend device "axis-a8-dlpu-tflite", see the corresponding compilation and cache section for more information.

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Optional parameter support

This backend supports the skip-input-dma-sync and skip-output-dma-sync parameters, see the documentation for dma-buf for more details.

TFLite ARTPEC-9 DLPU

This backend executes TFLite models on the DLPU accelerator in ARTPEC-9.

The device name of this backend is "a9-dlpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Optional parameter support

This backend supports the model parameter disable-winograd which is a boolean type option. When the option is set to a non-zero value, Winograd optimization will be disabled for applicable convolutional layers that are executed on the DLPU. Winograd convolution is in practice a trade-off between inference time and numerical accuracy where it improves the former at the cost of the latter.

If you experience problems with numerical accuracy, it might be worth to try running with disable-winograd:1.

Model compilation and caching

When a model is loaded on the ARTPEC-9 DLPU it is compiled into a different format native to the accelerator. Since this conversion takes a lot of time, sometimes up to several minutes depending on the model, the compiled files are subsequently cached in flash after the compilation is complete. Once a file is cached the corresponding .tflite model will be loaded substantially faster, as the entire compilation step will be skipped.

The cache is located at /var/lib/larod/dlpu-model-cache. Whenever a new model file is compiled, a new cache file will be created in this cache. If the maximum number of models is exceeded, the cache file with the longest time since last access will be removed to make space for the new one.

TFLite Arm NN CPU

This backend executes TFLite models on Arm NN Neon (CpuAcc).

The device name of this backend is "armnn-cpu-tflite"; a device handle can be retrieved by providing this device name to larodGetDevice().

Supported format of model data

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Remote backends

The remote backends are used in devices built with multiple ARTPEC chips. Such multi-ARTPEC devices are composed of one ARTPEC chip (the "primary system") running the bulk of the system logic, and another ARTPEC chip ("secondary system") which can perform tasks on behalf of the primary system. The remote backends support executing jobs on the secondary ARTPEC chip. They function the same as their corresponding non-remote counterparts, but are instead run on the secondary ARTPEC system. See also the corresponding documentation for the respective non-remote backends.

Available remote inference backends

ARTPEC-8 remote backends
remote-axis-a8-dlpu-native
remote-axis-a8-dlpu-tflite
remote-cpu-tflite

Supported buffer properties for running jobs

All fd props are supported. The user is encouraged to use LAROD_FD_PROP_DMABUF for remote backends as it will yield optimal performance in terms of buffer handling.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to either of these backends will have file descriptors that are mappable. These file descriptors follow the rules and conventions of Linux's dma-buf API. As such, these tensors will have the fd props LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF set.

On ARTPEC-8 systems for the remote backends, it is possible to select which system larodAllocModelInputs() and larodAllocModelOutputs() should allocate the buffer(s) on; to do this, pass a larodMap containing an integer value with the key "vmem-pinning-device" to the allocation function. Possible values are 1, which represents allocating the buffer on the primary system, and 2, representing the secondary system. If this option is left unset, the allocation will default to allocating buffers on the secondary system.

About dma-buf

The Linux dma-buf framework provides a mechanism for userspace programs to indicate beginning and end of access of a buffer from userspace. This allows the buffer exporter in kernel space to take appropriate actions, e.g. flushing or invalidating CPU caches depending on the type of userspace access. This synchronization is performed by calling ioctl(DMA_BUF_IOCTL_SYNC) on the dma-buf file descriptor. The arguments to the ioctl indicates start/stop and type of userspace access. Details on the dma-buf can be found here.

As mentioned previously, on backends that supports dma-bufs, the user application is responsible for this synchronization. However, if a dma-buf is supplied to a backend that doesn't support it (i.e. by forwarding the dma-buf directly to the underlying runtime API), but supports mapping it (i.e. LAROD_FD_PROP_MAP is set), it will appropriately take care of the synchronization when accessing the buffer through the CPU.

For example, let's say we supply a LAROD_FD_TYPE_DMA fd (i.e. LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF is set) to "cpu-tflite". This backend doesn't support dma-bufs but it supports mapping fds (i.e. LAROD_FD_PROP_MAP). This backend will therefore call ioctl(DMA_BUF_IOCTL_SYNC) with appropriate arguments whenever it accesses the mapped dma-buf fd through the CPU. Specifically, it will invalidate the CPU cache before reading the input tensors for the job request. Similarly, it will flush the CPU cache after writing to the output tensors.

Supported buffer properties

This is an overview of what file descriptor properties are supported by the various neural network backends. Note that the LAROD_FD_PROP_ prefix have been omitted from the table headers in the interest of brevity. Please see larod.h for more info about the LAROD_FD_PROP_* flags.

When running jobs

Please note that though several properties may be supported by a backend, a tensor buffer supplied for running a job need only have at least one of the backend's supported properties to be usable for the job. Having said that, each property comes with different implications on memory access performance.

Input tensors

Backend READWRITE MAP DMABUF
TFLite CPU Yes Yes
TFLite EdgeTPU Yes Yes Yes
CVFlowNN Yes Yes
TFLite GPU Yes Yes
TFLite ARTPEC-7 GPU Yes Yes
TFLite ARTPEC-9 GPU Yes Yes
TFLite ARTPEC-8 DLPU Yes Yes
Native ARTPEC-8 DLPU Yes Yes
TFLite ARTPEC-9 DLPU Yes Yes
TFLite Arm NN CPU Yes Yes
Remote TFLite CPU Yes Yes Yes
Remote TFLite ARTPEC-8 DLPU Yes Yes Yes
Remote Native ARTPEC-8 DLPU Yes Yes Yes

Output tensors

Backend READWRITE MAP DMABUF
TFLite CPU Yes Yes
TFLite EdgeTPU Yes Yes
CVFlowNN Yes Yes
TFLite GPU Yes Yes
TFLite ARTPEC-7 GPU Yes Yes
TFLite ARTPEC-9 GPU Yes Yes
TFLite ARTPEC-8 DLPU Yes Yes
Native ARTPEC-8 DLPU Yes Yes
TFLite ARTPEC-9 DLPU Yes Yes
TFLite Arm NN CPU Yes Yes
Remote TFLite CPU Yes Yes Yes
Remote TFLite ARTPEC-8 DLPU Yes Yes Yes
Remote Native ARTPEC-8 DLPU Yes Yes Yes

When allocating tensors

Please note that though several properties may be supported by a backend, it may not be possible to allocate buffers having all the properties at the same time.

Input tensors

Backend READWRITE MAP DMABUF
TFLite CPU Yes Yes
TFLite EdgeTPU Yes Yes Yes
CVFlowNN Yes Yes Yes
TFLite GPU Yes Yes
TFLite ARTPEC-7 GPU Yes Yes
TFLite ARTPEC-9 GPU Yes Yes
TFLite ARTPEC-8 DLPU Yes Yes
Native ARTPEC-8 DLPU Yes Yes
TFLite ARTPEC-9 DLPU Yes Yes
TFLite Arm NN CPU Yes Yes
Remote TFLite CPU Yes Yes Yes
Remote TFLite ARTPEC-8 DLPU Yes Yes Yes
Remote Native ARTPEC-8 DLPU Yes Yes Yes

Output tensors

Backend READWRITE MAP DMABUF
TFLite CPU Yes Yes
TFLite EdgeTPU Yes Yes
CVFlowNN Yes Yes Yes
TFLite GPU Yes Yes
TFLite ARTPEC-7 GPU Yes Yes
TFLite ARTPEC-9 GPU Yes Yes
TFLite ARTPEC-8 DLPU Yes Yes
Native ARTPEC-8 DLPU Yes Yes
TFLite ARTPEC-9 DLPU Yes Yes
TFLite Arm NN CPU Yes Yes
Remote TFLite CPU Yes Yes Yes
Remote TFLite ARTPEC-8 DLPU Yes Yes Yes
Remote Native ARTPEC-8 DLPU Yes Yes Yes

Supported backend options

This is a summary of which backend options that are supported for each backend.

ARTPEC-7

Backend Model Options Job Options
cpu-tflite
axis-a7-gpu-tflite
google-edge-tpu-tflite
a7-gpu-native skip-input-dma-sync
skip-output-dma-sync

ARTPEC-8

Backend Model Options Job Options
cpu-tflite
axis-a8-dlpu-tflite force-ptq
use-tp-reorder-opt
use-nn-first-pixel-pooling
axis-a8-dlpu-native skip-input-dma-sync
skip-output-dma-sync

ARTPEC-9

Backend Model Options Job Options
cpu-tflite
armnn-cpu-tflite
armnn-gpu-tflite
a9-dlpu-tflite disable-winograd

Ambarella CV25

No backend options supported.