Skip to content

AI Inference on Qualcomm Platforms

This document provides an overview and configuration guide for the AI inference feature of AVT SDK. It covers the parallel inference architecture, configuration file structure, and supported models to help developers build their own AI applications on Qualcomm-based AVerMedia devices.

Parallel Inference Architecture

As mentioned in the AVT SDK Multimedia Framework section, the AVT SDK multimedia framework is built on top of Qualcomm GStreamer plugins. The AI inference component utilizes a GStreamer pipeline that enables parallel processing of multiple AI models simultaneously. The structure of the pipeline is shown below:

---
title: Pipeline Structure in AI Inference Component
---
flowchart LR
    A[Source Image] --> B[qtivtransform]
    B --> C[tee]
    C --> D1[1st inference pipeline] --> E[qtivcomposer]
    C --> D2[2nd inference pipeline] --> E[qtivcomposer]
    C --> D3[3rd inference pipeline] --> E[qtivcomposer]
    C --> D4[4th inference pipeline] --> E[qtivcomposer]

The inference results are sent to the same qtivcomposer and displayed in a split-screen layout.


2-model parallel inference

2-model parallel inference

4-model parallel inference

4-model parallel inference


Configuration files

As shown above, the AI inference component supports running multiple inference tasks in parallel. Each inference task is configured with a separate configuration file. The format of each configuration file is as follows:

det_yolov8_q_tfl.txt
modelPath=/opt/demo/model/yolov8_quantized.tflite
labelPath=/opt/demo/label/coco_labels.labels
module=yolov8
constant=YOLOv8,q-offsets=<21.0, 0.0, 0.0>,q-scales=<3.093529462814331, 0.00390625, 1.0>;
threshold=40.0
numResult=10
postprocess=object_detection
framework=tflite
device=external
  • modelPath: The complete path of the AI model being used.
  • labelPath: The complete path of the label file for the selected AI model.
  • module: There are several dedicated modules for specific models within Qualcomm's post-processing plugins. You must choose the correct module corresponding to the model in use. See the table in the Supported Models section for more details.
  • constant: If the model is quantized, the parameters for offsets and scales are required in this field.

    Check quantization parameters with Netron

    1. Visit Netron and drag a TensorFlow Lite (.tflite) model to view the model structure.
    2. Click the top node, such as the node named "image" in Figure 1, and then a "GRAPH PROPERTIES" panel will appear on the right side of the window.

      Model structure in Netron

      Figure 1: Model structure in Netron.

    3. In the "OUTPUTS" section of the panel, as shown in Figure 2, check the "quantization" part of each output.

      Model properties in Netron

      Figure 2: Outputs section in the model (graph) properties panel.

      From Figure 2, we can get the scales and offsets for each output as the following table:

      output scale offset
      boxes 3.093529462814331 21.0
      scores 0.00390625 0.0
      class_idx 1.0 0.0

      Note: All the values for scales and offsets should be written as floats.

    4. Then in the configuration file, the constant field can be formed in the following format:

      (arbitrary name),q-offsets=<values>,q-scales=<values>;
      

      For example, for the model in Figure 2, the constant field should be written as:

      YOLOv8,q-offsets=<21.0, 0.0, 0.0>,q-scales=<3.093529462814331, 0.00390625, 1.0>;
      
  • threshold: A value, between 0.0 and 100.0, used to filter out low-confidence detections.

  • numResult: Decides the number of results to display.

    Note

    For segmentation, both threshold and numResult are not utilized, but you still need to include them in the config file.

  • postprocess: Specifies the task type. Then the proper post-processing plugin will be selected automatically. For the supported task types and corresponding post-processing plugins, please refer to the table below.

    Task type Post-processing plugin
    object_detection qtimlvdetection
    classification qtimlvclassification
    monodepth qtimlvsegmentation
    segmentation qtimlvsegmentation
    pose_detection qtimlvpose

    Supported task types and corresponding post-processing plugins. Note that the depth estimation task also applies qtimlvsegmentation plugin.

  • framework and device: Specifies the AI framework and the target hardware device. The currently supported combinations are listed below.

    • htp is the only device option available for QNN, currently.
    • To achieve the best speed-accuracy tradeoff, choose external device for tflite models and dsp device for snpe models.
    • Other device options are available but may not provide optimal performance. You could still try them by modifying the config file.
    Framework Available devices
    tflite none, gpu, external
    qnn htp
    snpe cpu, gpu, dsp

Supported Models

(This list was last updated on 2025/3/20)

All the models listed below have been verified to be compatible with Qualcomm GStreamer post-processing plugins, hence compatible with AVT SDK. You can find them in Qualcomm AI Hub. It's highly recommended to use .tflite models as they are not affected by QNN version compatibility issues, unlike .bin models.

Task Type Model Name Module
Object detection Yolo-v7-Quantized yolov8
YOLOv8-Detection-Quantized yolov8
Yolo-NAS-Quantized yolo-nas
YOLOv11-Detection-Quantized yolov8
Person-Foot-Detection-Quantized qpd
Depth estimation Midas-V2-Quantized midas-v2
Segmentation DeepLabV3-Plus-MobileNet-Quantized deeplab-argmax
FFNet-40S-Quantized deeplab-argmax
FFNet-54S-Quantized deeplab-argmax
FFNet-78S-Quantized deeplab-argmax
Pose estimation HRNetPoseQuantized hrnet

Supported models for each task type, and the corresponding module name to be used in the module field of the configuration file.

For the models not listed above, you can still give them a try, but they are not guaranteed to be compatible with AVT SDK.

Further Reading