![]() |
liblarod
3.6.5
|
Table of Contents
Preprocessing in larod can be used to process input data so that it has the format, size and shape a neural network expects. For optimal performance the processing operations can be offloaded to specialized preprocessing hardware. Each supported preprocessing hardware accelerator is exposed to applications through a larodDevice struct in larod.
Currently only image processing operations are supported.
Preprocessing jobs are configured with key-value parameters in a larodMap. The configuration describes the data that you have and how you want the data to be. The selected backend will crop, scale and convert according to what the description requires.
Below is an example job configuration. It describes a job that takes a 1280x720 input image in NV12 format, crops out 200x200 from the center of the image (X offset 540 and Y offset 260), converts it to the RGB interleaved format and scales it down to 48x48. In this case the libyuv backend will be used to perform these operations. In the interest of brevity error handling has been omitted.
Note that should one be interested in just scaling the original image (from 1280x720 to 48x48) without cropping it first, one could simply neglect to provide a larodMap altogether in the larodJobRequest used.
Image preprocessing backends may support the following common image processing operations. Backends are not required to support all operations and image formats.
| Operation | Description |
|---|---|
| Image crop | Crop out a part of an image. |
| Image scale | Scale up or down an image. |
| Image convert | Convert an image between two color formats. |
Image preprocessing backends may support the common image processing parameters in the tables below to describe processing jobs. Backends are not required to support all parameters and values.
The following are parameters that can be set on a larodMap provided when loading a model using e.g. larodLoadModel.
| Key | Value |
|---|---|
| image.input.format* | String, describing input image format. |
| image.input.size* | 2-integer-tuple, describing input image width and height. |
| image.input.row-pitch | Integer, describing input image width including padding, in bytes. Inferred if not explicilty given. |
| image.output.format* | String, describing output image format. |
| image.output.size* | 2-integer-tuple, describing output image width and height. |
| image.output.row-pitch | Integer, describing output image width including padding, in bytes. Inferred if not explicitly given. |
*: This parameter is mandatory for all preprocessing backends outlined in this document.
The following are parameters that can be set on a larodMap of a larodJobRequest. The map can be attached to a job request upon its creation (larodCreateJobRequest) or later using larodSetJobRequestParams. The parameters of the map will then be used in a subsequent call to e.g. larodRunJob using this job request.
Since these parameters are not attached to a model it's possible to send job requests having larodMaps with different values for these parameters to the same model.
| Key | Value |
|---|---|
| image.input.crop | 4-integer-tuple, describing the crop window. The elements in the tuple are: X offset in input image, Y offset in input image, crop window width, crop window height. |
Currently the following image preprocessing backends/devices are supported by larod.
The libyuv backend uses the open source library libyuv. It runs on most CPUs and in particular uses the SIMD technology Neon on Arm architectures to accelerate parallelizable computation. It supports image crop, scale and format conversion.
The device name of this backend is "cpu-proc"; a device handle can be retrieved by providing this device name to larodGetDevice().
This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.
As the name indicates the former fd prop will allow the libyuv backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor()) the libyuv backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.
The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.
Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.
This backend supports the skip-input-dma-sync and skip-output-dma-sync parameters, see the documentation for dma-buf for more details.
The ACE backend uses Axis Compute Engine in Axis ARTPEC series chips. It only supports image format conversion.
The device name of this backend is "axis-ace-proc"; a device handle can be retrieved by providing this device name to larodGetDevice().
This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.
As the name indicates the former fd prop will allow the ACE backend to map (using mmap) the tensor's file descriptor instead of reading or writing directly from it. Combined with tensor tracking (e.g. using larodTrackTensor()) the ACE backend may be able to cache a tensor's mapping. The backend does not support zero copy, meaning that data will still be copied from the memory mapping to the actual buffer for the job.
The access type LAROD_FD_PROP_READWRITE will do a memory copy through read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.
Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.
This backend supports the skip-input-dma-sync and skip-output-dma-sync parameters, see the documentation for dma-buf for more details.
The VProc backend uses VPROC in Ambarella CV series chips. It supports image crop, scale and format conversion.
The device name of this backend is "ambarella-cvflow-proc"; a device handle can be retrieved by providing this device name to larodGetDevice().
If the input and output format does not match, the operation will be significantly faster if the output width and height are both even. The performance gain is directly correlated to how large the difference is between the output's and the input's width and height.
This backend supports the fd access types LAROD_FD_PROP_DMABUF, LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP.
The access type LAROD_FD_PROP_DMABUF provides less overhead since the buffer will be passed directly the underlying processing framework without extra copies in larod. When using LAROD_FD_PROP_DMABUF, the application is responsible for ensuring that the external RAM is up to date with CPU cache before the inference is started (the service will not initiate any cache flush operations). Refer to About dma-buf for more information about dma-buf and user space synchronization.
The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.
As the name indicates LAROD_FD_PROP_MAP will allow the VProc backend to map (using mmap) the tensor's file descriptor instead of reading or writing directly from it. Combined with tensor tracking (e.g. using larodTrackTensor()) the VProc backend may be able to cache a tensor's mapping. The backend does not support zero copy, meaning that data will still be copied from the memory mapping to the actual buffer for the job.
Using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend two kinds of tensor buffers can be allocated. If the LAROD_FD_PROP_READWRITE is not set as required in the call, then tensors with mappable file descriptors based on Cavalry Mem dma-bufs will be returned. As such these tensors will have the fd props LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF set. If however LAROD_FD_PROP_READWRITE is required, then tensors with readable, writable and mappable file descriptors will be returned. As such these tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.
OpenCL is a compute framework which enables programmers to write programs that execute across heterogeneous platforms such as CPUs, GPUs and more. larod contains predefined OpenCL programs which lets a larod user through its OpenCL backend conveniently run image crop, scale and format conversion.
The platform larod runs on may have several devices supporting the OpenCL framework. larod can run its operations on any of these devices; each OpenCL device has a unique device name.
On ARTPEC-8 there are currently two available OpenCL backends; these are "axis-a8-dlpu-proc", which runs on the DLPU, and "axis-a8-gpu-proc", which runs on the GPU.
On ARTPEC-9 there is only one OpenCL backend: "a9-gpu-proc", which runs on the GPU.
Other platforms do not have any OpenCL backends.
A device handle to one of these backends can be retrieved by providing the respective device name to larodGetDevice().
The backends supports the fd access types LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP.
Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.
The remote backends are used in devices built with multiple ARTPEC chips. Such multi-ARTPEC devices are composed of one ARTPEC chip (the "primary system") running the bulk of the system logic, and another ARTPEC chip ("secondary system") which can perform tasks on behalf of the primary system. The remote backends support executing pre-processing jobs on the secondary ARTPEC chip. They function the same as their corresponding non-remote counterparts, but are instead run on the secondary ARTPEC system. See also the corresponding documentation for the respective non-remote backends.
| ARTPEC-8 |
|---|
| remote-cpu-proc |
| remote-axis-a8-gpu-proc |
All fd props are supported. The user is encouraged to use LAROD_FD_PROP_DMABUF for remote backends as it will yield optimal performance in terms of buffer handling.
Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to either of these backends will have file descriptors that are mappable. These file descriptors follow the rules and conventions of Linux's dma-buf API. As such, these tensors will have the fd props LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF set.
The following table describes supported operations for each backend.
| Backend | crop | convert | scale |
|---|---|---|---|
| libyuv | Yes | Yes | Yes |
| ACE | Yes | ||
| VProc | Yes | Yes | Yes |
| OpenCL | Yes | Yes | Yes |
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes | Yes | Yes |
The following table describes supported input formats for operations requiring a color format conversion.
| Backend | nv12 | rgb-interleaved | rgb-planar |
|---|---|---|---|
| libyuv | Yes | Yes | Yes |
| ACE | Yes | ||
| VProc | Yes | ||
| OpenCL | Yes | ||
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes |
The following table describes supported output formats for operations requiring a color format conversion.
| Backend | nv12 | rgb-interleaved | rgb-planar |
|---|---|---|---|
| libyuv | Yes | Yes | Yes |
| ACE | Yes | ||
| VProc | Yes | ||
| OpenCL | Yes | ||
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes |
The following table describes supported image formats for operations not requiring a color format conversion, i.e. the input and output formats are identical. This could be e.g. a pure scaling operation.
| Backend | nv12 | rgb-interleaved | rgb-planar |
|---|---|---|---|
| libyuv | Yes | Yes | Yes |
| VProc | Yes | Yes | |
| OpenCL | Yes | Yes | |
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes | Yes |
This is an overview of what file descriptor properties are supported by the various preprocessing backends. Note that the LAROD_FD_PROP_ prefix have been omitted from the table headers in the interest of brevity. Please see larod.h for more info about the LAROD_FD_PROP_* flags.
Please note that though several properties may be supported by a backend, a tensor buffer supplied for running a job need only have at least one of the backend's supported properties to be usable for the job. Having said that, each property comes with different implications on memory access performance.
| Backend | READWRITE | MAP | DMABUF |
|---|---|---|---|
| libyuv | Yes | Yes | |
| ACE | Yes | Yes | |
| VProc | Yes | Yes | Yes |
| OpenCL | Yes | Yes | |
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes | Yes | Yes |
| Backend | READWRITE | MAP | DMABUF |
|---|---|---|---|
| libyuv | Yes | Yes | |
| ACE | Yes | Yes | |
| VProc | Yes | Yes | Yes |
| OpenCL | Yes | Yes | |
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes | Yes | Yes |
Please note that though several properties may be supported by a backend, it may not be possible to allocate buffers having all the properties at the same time.
| Backend | READWRITE | MAP | DMABUF |
|---|---|---|---|
| libyuv | Yes | Yes | |
| ACE | Yes | Yes | |
| VProc | Yes | Yes | Yes |
| OpenCL | Yes | Yes | |
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes | Yes | Yes |
| Backend | READWRITE | MAP | DMABUF |
|---|---|---|---|
| libyuv | Yes | Yes | |
| ACE | Yes | Yes | |
| VProc | Yes | Yes | Yes |
| OpenCL | Yes | Yes | |
| Remote libyuv | Yes | Yes | Yes |
| Remote OpenCL | Yes | Yes | Yes |
This is a summary of which backend specific options that are supported for each backend.
| Backend | Model Options | Job Options |
|---|---|---|
| cpu-proc | skip-input-dma-sync skip-output-dma-sync | |
| axis-ace-proc | skip-input-dma-sync skip-output-dma-sync | |
| axis-a7-gpu-proc | skip-input-dma-sync skip-output-dma-sync |
| Backend | Model Options | Job Options |
|---|---|---|
| cpu-proc | skip-input-dma-sync skip-output-dma-sync | |
| axis-ace-proc | skip-input-dma-sync skip-output-dma-sync | |
| axis-a8-gpu-proc | skip-input-dma-sync skip-output-dma-sync | |
| axis-a8-dlpu-proc | skip-input-dma-sync skip-output-dma-sync |
No backend specific options supported.
No backend specific options supported.