AI Fusion Kit Quick Start Guide
AVerMedia AI Fusion Kit is an all-in-one solution for LLM/VLM developers. It consists of a powerful AI box PC, a 4K camera, and an AI speakerphone, allowing you to easily build your own multimodal AI applications. This guide will walk you through the steps to get started with the AI Fusion Kit.
Note
If not explicitly mentioned, all the commands are expected to be run on the box PC (the Jetson device).
Run the Setup Script
The AI Fusion Kit comes with a pre-installed NVIDIA Jetson Linux (L4T 36.4.3 or L4T 36.4.4) operating system, based on Ubuntu 22.04 LTS. You can directly use the GUI by connecting keyboard, mouse, and monitor to the box PC, or you can also get access to the box PC via SSH if you prefer a headless setup.
To quickly get started, we recommend that you download our quick start scripts from GitHub:
git clone https://github.com/AVerMedia-Technologies-Inc/ai-fusion-kit-quick-start.git
cd ai-fusion-kit-quick-start
In the directory, you will find the following files:
setup.sh
: The setup script for the AI Fusion Kit.run_demo.sh
: The convenience script to run the demo applications with Docker Compose.stop_demo.sh
: The convenience script to stop the demo applications.compose.yaml
: The Docker Compose file for the demo applications.
We will first focus on the setup script, which automates the setup process for you, including:
- Configure the NVIDIA Jetson power mode to the best performance (MAXN or MAXN Super)
- Configure udev rules for the AVerMedia HID devices
- Install NVIDIA JetPack SDK components (full or necessary components)
- Install Docker and Docker Compose
- Install
jetson-containers
- Install AI Fusion Kit demo applications (optional)
All these steps can be done with a single command:
Although the script completes most of the setup work automatically, it will prompt you to make some choices. Please follow the instructions to complete the setup.
The setup script will prompt you to make some choices.
Install NVIDIA Riva (Optional)
NVIDIA Riva is a GPU-accelerated SDK for speech AI applications. With Riva microservice containers, developers can easily integrate features like ASR and TTS into their applications. Although AI Fusion Kit demo applications can definitely work without Riva, installing Riva will unlock the full power of them, enabling the speech AI features.
NVIDIA Riva SDK is hosted on NVIDIA GPU Cloud, which requires a NVIDIA Cloud Account to access. Therefore, we are not able to install it for you, but we will guide you through the process.
Access to NVIDIA Riva
At the time of writing (2025-08-29
), NVIDIA Developer Program provides free access to NVIDIA Riva software for non-production purposes including internal evaluation, development, and testing. However, both the access to Riva and the process of getting access to Riva might change over time. Please refer to the official documentation for the latest information.
Create/Get your NVIDIA Cloud Account
NCA is NVIDIA's system for managing how different people in an organization access NVIDIA cloud services. However, individual users can also create a NCA to access NGC.
- Go to NGC Catalog
- Click on "Sign In / Sign Up" in the menu at the top right corner
- Sign in or create your NVIDIA account
-
The website will ask you to create a NCA. Feel free to name it as you want.
Creating page for NCA
-
If you just created a new NVIDIA account, you might be asked to fill in your personal information.
Please contact the administrator of your organization's NCA. They should be able to add you to the NCA. For how to add users to an NCA, please refer to the NGC User Guide.
Install NGC CLI to Jetson
NGC CLI is a command-line tool that allows you to access NGC resources. It is the recommended way to install Riva.
Download NGC CLI
-
Go to the NGC CLI Download Page.(1)
- If the link is not working, you can search for "NGC CLI" in the NGC Catalog.
-
Find the
arm64
Linux version of NGC CLI. The file name should be likengccli_arm64.zip
. -
Click the "..." button in the "Actions" column and copy the
curl
orwget
command.Get the download command for NGC CLI from NGC Catalog
-
Download the NGC CLI by running the command on the box PC. For example:
-
Unzip and install the NGC CLI package to the desired location like, for example,
/opt
, and then add the installation location to yourPATH
environment variable. -
Test if the installation is successful.
You should see the version of NGC CLI like this:
Set up NGC CLI
Detailed instructions for generating API key
You can also refer to the NGC User Guide for the detailed instructions with screenshots.
To make the NGC CLI work, you need to set up the authentication.
- Go to NGC Catalog, sign in, and click "Setup" in the menu at the top right corner
- In the "Setup" page, click "Generate API Key".
- In the "API Keys" page, click "Generate API Key" and generate a new API key (or use an existing one). Note that the key permissions should include "NGC Catalog".
-
Run the following command on your Jetson device:
Follow the instructions of NGC CLI to finish the setup. It will ask for the API key, organization name, etc. (The organization name might be an auto-generated long number. NGC CLI will provide possible choices in the prompt, so you can just copy and paste it.)
Install NVIDIA Riva
Here we use version 2.19.0
(the latest version at the time of writing) of NVIDIA Riva as an example.
-
Download the quick start scripts. You can download it to the home directory to avoid permission issues.
-
Take a look at the
config.sh
file in theriva_quickstart_arm64_v2.19.0
directory and modify the configuration if needed.By default, the ASR and TTS services are enabled for the
en-US
language, and the models will be stored inriva_quickstart_arm64_v2.19.0/model_repository
. -
Enter the
riva_quickstart_arm64_v2.19.0
directory, run theriva_init.sh
script to install the Riva containers and models. After the script finishes, you should be able to start the Riva services by running theriva_start.sh
script. -
By default, the
riva_start.sh
script will open a terminal inside the container for you to test the Riva services. However, the example scripts inside may not function as expected on Jetson. For Python examples, we recommend you to run theriva-client:python
container provided byjetson-containers
. See Jetson Containers section for more details.
Run the Demo Applications
Basic Usage
About echo cancellation on the speakerphone
The AI speakerphone is equipped with a nice echo cancellation feature. However, the algorithm will only work after a few seconds of sound has been played from the speakerphone. Therefore, you may need to play some sound before running the demo, or the speech AI may hear itself.
Currently, only one demo application is available for the AI Fusion Kit. It is a multi-container application that consists of:
- A local vLLM server for LLM/VLM inference
- (Optional) A local NVIDIA Riva server providing ASR and TTS services
- A GUI application built with AVerMedia software stack, handling the multimedia stream and providing the user interface
If you have chosen to install the demo applications during the setup process, the containers and the VLM model (by default, Qwen/Qwen2.5-VL-3B-Instruct-AWQ
) should have been installed automatically. You can easily run the demo application by running the following command:
What does the run_demo.sh
script do?
If you take a look at the script, you will find that it basically does the following:
- Detect the video devices and HID devices connected to the box PC
- Find the installation path of
jetson-containers
, which should contain the VLM model - Find the installation path of Riva models
- Write the above information to the
compose.override.yaml
or the.env
files - Start the containers with Docker Compose
The script automates the configuration that can vary between launches, such as device paths (which may change based on USB connection order) and installation directories (which depend on user preferences or system configuration).
Troubleshooting
After you run the script, you should see the Docker Compose status like:
[+] Running 3/3
⠏ Container demo-llm-server-1 Waiting 5.0s
✔ Container demo-riva-server-1 Healthy 2.3s
⠏ Container demo-app-1 Waiting 4.9s
where demo
is the name of this Docker Compose project, and llm-server
, riva-server
, and app
are the names of the "services" in the compose.yaml
file.
If any of the services fails to start, you can check the logs by running the following command:
For example, if the llm-server
service fails to start, you can run:
The GUI application should launch shortly. You will find the live camera feed and the VLM output in the left column, with various configuration controls in the right column.
In the "Devices" tab, you can select the video and audio devices and start the camera.
The AI features are placed in the "Models" tab. Here you can check the status of the AI microservices, set the VLM prompt, and start the VLM inference. It is a live image captioning application, which will describe the live camera feed in real time. Typically, it takes several minutes for vLLM to start the server, so you will have to wait for a while before you see the LLM service healthy.
The prompt is usually a question for the model to answer based on the live camera feed. You can update the prompt at any time, either by typing or speaking, even while the inference is running.
- Type the prompt in the text box
- Click the "Update" button for the prompt to take effect
NVIDIA Riva Required
NVIDIA Riva is required for the speech AI features. You will not be able to click the "Listen" button if you have not installed Riva.
- Click the "Listen" button, and you should hear the speech AI saying "I'm listening..."
- Speak the prompt
- The speech AI will repeat the prompt back to you and automatically update the prompt (you don't need to click the "Update" button)
In the "Models" tab, you can check the status of the AI microservices and control the inference. The live camera feed and the VLM output are displayed in the left.
Customize the Configuration
As mentioned earlier, the demo application is managed with Docker Compose. Currently, except for those settings dynamically generated by the run_demo.sh
script, all the configuration is done in the compose.yaml
file. It is a standard Docker Compose file, written in YAML and following the Compose Specification. You can refer to the Docker Compose File Reference for more details.
Below we will introduce the most common configuration changes you might want to make. After you make any changes, if the demo application is running, you can apply them by running the following command in the ai-fusion-kit-quick-start
directory:
Docker Compose is smart enough to only restart the services that have changed, so you don't need to worry about restarting the whole application. If the demo application is not running, you can simply run the run_demo.sh
script.
Change the VLM Model
The model is directly specified in the command
attribute of the llm-server
service. By default, it is:
command: vllm serve Qwen/Qwen2.5-VL-3B-Instruct-AWQ
--host=0.0.0.0
--port=9000
--max-num-seqs=1
--max-model-len=2048
--trust-remote-code
--chat-template-content-format=openai
--gpu-memory-utilization=0.35
--uvicorn-log-level=debug
--limit-mm-per-prompt='{"image":1,"video":0}'
--mm-processor-kwargs='{"max_pixels":200704,"size":{"shortest_edge":3136,"longest_edge":200704}}'
You can replace the model name (Qwen/Qwen2.5-VL-3B-Instruct-AWQ
) with your desired model. Here are some considerations:
Model Selection:
- Popular alternatives include
llava-hf/llava-v1.6-mistral-7b-hf
,microsoft/Phi-3.5-vision-instruct
, or other vision-language models - Check Hugging Face Model Hub for available VLM models
-
You'll have to download the model in advance, because the
TRANSFORMERS_OFFLINE
environment variable is set incompose.yaml
by default. You can download the model with the following command:It will download the model to the
data
directory in the installation path ofjetson-containers
.
Configuration Adjustments:
- Memory Allocation: Adjust
--gpu-memory-utilization
based on model size - Processor Parameters: Update or remove
--mm-processor-kwargs
for the specific model's processor requirements - Quantization:
Qwen2.5-VL-3B-Instruct-AWQ
is already quantized toAWQ
format. For those unquantized models, you'll probably need to quantize them on the fly with--quantization=fp8
or--quantization=bitsandbytes
.
You can also check the model cards on Jetson AI Lab as a reference.
Memory Planning:
- Operating system (with GUI): ~2-3 GB
- Riva service (ASR/TTS for
en-US
): ~2-2.5 GB - VLM model: varies by model size and quantization method.
You should especially be careful with the memory usage if you are using Orin NX based platforms!
Change the Container Images
In some cases, you may want to use a different container image for a service in the demo, such as:
- Using a newer version of vLLM or Riva
- Switching to a different LLM inference engine like Ollama
- Using a custom image built by yourself
Steps to change an image:
- Download the new image with
docker pull
or build it by yourself - Find the service (
llm-server
orriva-server
) you want to change in thecompose.yaml
file - Change the
image
attribute of the service to the new image - If you change the inference engine for
llm-server
, you may also need to change thecommand
attribute
API Compatibility Requirements
To work with the demo application, the LLM inference engine must support these OpenAI-compatible API endpoints:
/v1/models
- List available models/v1/chat/completions
- Handle chat completion requests
More Development Resources
Jetson Containers
jetson-containers is a great open-source project for Jetson developers. It provides a set of pre-built Docker images for Jetson devices on Docker Hub, and tools to help you build the images you need. These images are built for general development purposes, so typically they are big in size and contain many tools, whether you need them or not. We highly encourage you to use the actively maintained images provided by jetson-containers
in the early stage of your development, like dustynv/vllm
, dustynv/ollama
, etc., for they save you a lot of time, especially if you are not familiar with either NVIDIA Jetson or Docker.
Actually, the demo application is also built with dustynv/vllm
, a pre-built image for vLLM. You can also directly run the container without launching the demo application. For example, you can start the vLLM server by running the following command:
jetson-containers run $(autotag vllm) vllm serve Qwen/Qwen2.5-VL-3B-Instruct-AWQ \
--host=0.0.0.0 \
--port=9000 \
--max-num-seqs 1 \
--disable-mm-preprocessor-cache \
--trust-remote-code \
--max-model-len=2048 \
--gpu-memory-utilization=0.35 \
--limit-mm-per-prompt='{"image":1,"video":0}' \
--mm-processor-kwargs='{"max_pixels":200704,"size":{"shortest_edge":3136,"longest_edge":200704}}' \
--uvicorn-log-level=debug
jetson-containers
will handle all the options you need to pass to docker run
for you, and autotag
is a tool provided by jetson-containers
to help you find the latest image tag on your device.
After the server is ready, you can test it by either sending a request with curl
or using the openai
Python package. It takes a while for the first inference of the server to finish, but the subsequent inferences should be much faster.
Example provided by Jetson AI Lab.
curl http://0.0.0.0:9000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]}],
"max_tokens": 300
}'
The openai
Python package is a popular library for interacting with OpenAI API. Again, the following example is provided by Jetson AI Lab.
import base64
import requests
import openai
client = openai.OpenAI(
base_url='http://0.0.0.0:9000/v1',
api_key='dummy', # Local server doesn't require an API key
)
models = [model.id for model in client.models.list()]
print(f"Models from server {client.base_url}: {models}")
url = "https://raw.githubusercontent.com/dusty-nv/jetson-containers/refs/heads/dev/data/images/dogs.jpg"
txt = "What kind of dogs are these?" # the image shows a husky and golden retriever
img = requests.get(url)
messages = [{
'role': 'user',
'content': [
{ 'type': 'text', 'text': txt },
{
'type': 'image_url',
'image_url': {
'url': 'data:' + img.headers['Content-Type'] + ';' + 'base64,' + base64.b64encode(img.content).decode()
},
},
],
}]
stream = client.chat.completions.create(
model=models[0],
messages=messages,
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
Jetson AI Lab
Jetson AI Lab examples might be deprecated
Some examples in Jetson AI Lab are problematic and not working as expected at the moment. We are planning to provide the known issues in the future.
Jetson AI Lab is a great website for Jetson developers, providing rich examples and tutorials. Nevertheless, the examples and tutorials are not actively maintained at the moment, and one of the main contributors of jetson-containers
has stated that the examples are outdated (on Jun 27, 2025).
Many examples in Jetson AI Lab are based on the NanoLLM library, which is also inactive now. Therefore, we recommend you to use other popular third-party libraries instead, like vLLM, Ollama, SGLang, etc. The model cards in Jetson AI Lab can still help a lot for you to run VLM models with vLLM or Ollama, but you'll have to be careful that the default model saving location (the "Cache Dir" option in the card) is different from the jetson-containers
.
Ready to Start Your AI Journey?
If you've made it this far through our comprehensive guide, you've seen the incredible potential of the AVerMedia AI Fusion Kit. Whether you're a researcher, developer, or AI enthusiast, this all-in-one solution provides everything you need to bring your AI applications to life.
Why Choose the AI Fusion Kit?
All-in-One AI Solution
- Powerful AI Box PC: Pre-configured NVIDIA® Jetson with optimized performance
- Professional 4K Camera: High-quality video input for vision applications
- Smart AI Speakerphone: Built-in noise reduction and echo cancellation for seamless voice interactions
- Ready-to-Use Software: Automatic setup script and containerized demo applications
Accelerate Your Development
- Skip weeks of hardware integration and software setup
- Focus on your AI logic instead of infrastructure challenges
- Leverage our tested configurations and optimizations
- Get professional support from our team
Don't let complex setup hold back your AI innovations. The AI Fusion Kit eliminates the barriers between your ideas and reality, providing a professional-grade platform that's ready to deploy.