Skip to content

Tutorial

AI on the QL601: Bringing Your Models to Life Fast with LiteRT on Python


Imagine you’ve trained an amazing AI model on your workstation. Now, it’s time for it to shine on the QL601 edge device. Whether you want HTP acceleration, GPU power, or just CPU execution, LiteRT on Python makes this transition seamless.

Deploying AI models to edge devices often means rewriting your entire pipeline, learning new frameworks, or accepting significant performance compromises. But what if you could keep your Python workflow intact while unlocking 10-20x performance gains?

LiteRT acts as the bridge between your existing Python workflow and Qualcomm’s optimized hardware. Instead of rewriting your pipeline, you convert your model to TFLite, attach the LiteRT delegate, and keep your preprocessing, postprocessing, and business logic intact. With only minor changes to the inference step, your Python app can continue running on your laptop or in the cloud while the QL601 handles fast, edge-ready inference.

LiteRT doesn’t rewrite your story—it simply makes your model run faster in the real world.

AI Fusion Kit Quick Start Guide

AVerMedia AI Fusion Kit is an all-in-one solution for LLM/VLM developers. It consists of a powerful AI box PC, a 4K camera, and an AI speakerphone, allowing you to easily build your own multimodal AI applications. This guide will walk you through the steps to get started with the AI Fusion Kit.

Time to First Token

Time to First Token (TTFT) refers to the latency between a user hit the Enter key and the appearance of the first character shows on the screen. Excessive TTFT can greatly diminish the overall user experience.

TTFT is a crucial response time indicator for an online interactive application powered by a large language model (LLM), as it reflects how quickly users can catch the first character from the model through a web page.

Here, we will explore two simple ways to get the latency of first token from a language model.

How to Setup QL601 Development Environment

QL601 is a powerful single-board computer equipped with Qualcomm® QCS6490 chipset, along with AVerMedia software stack, helping developers to build AI-powered multimedia applications.

In this tutorial, we will guide you through the steps to set up the QL601 development environment, helping you to get started with the QL601 quickly.

How to Download Qualcomm AI Hub Models

Qualcomm AI Hub provides various AI models optimized for Qualcomm devices. This guide introduces two methods for downloading these models:

  • Through the Qualcomm AI Hub website.
  • Using the Python package qai-hub-models.

You'll learn how to access all the Qualcomm-provided models, including those with licensing restrictions like YOLOv8 and YOLOv11.