AI Fusion Kit Quick Start Guide

AVerMedia AI Fusion Kit is an all-in-one solution for LLM/VLM developers. It consists of a powerful AI box PC, a 4K camera, and an AI speakerphone, allowing you to easily build your own multimodal AI applications. This guide will walk you through the steps to get started with the AI Fusion Kit.

Note

If not explicitly mentioned, all the commands are expected to be run on the box PC (the Jetson device).

Run the Setup Script

The AI Fusion Kit comes with a pre-installed NVIDIA Jetson Linux (L4T 36.4.3 or L4T 36.4.4) operating system, based on Ubuntu 22.04 LTS. You can directly use the GUI by connecting keyboard, mouse, and monitor to the box PC, or you can also get access to the box PC via SSH if you prefer a headless setup.

To quickly get started, we recommend that you download our quick start scripts from GitHub:

git clone https://github.com/AVerMedia-Technologies-Inc/ai-fusion-kit-quick-start.git
cd ai-fusion-kit-quick-start

In the directory, you will find the following files:

setup.sh: The setup script for the AI Fusion Kit.
run_demo.sh: The convenience script to run the demo applications with Docker Compose.
stop_demo.sh: The convenience script to stop the demo applications.
compose.yaml: The Docker Compose file for the demo applications.

We will first focus on the setup script, which automates the setup process for you, including:

Configure the NVIDIA Jetson power mode to the best performance (MAXN or MAXN Super)
Configure udev rules for the AVerMedia HID devices
Install NVIDIA JetPack SDK components (full or necessary components)
Install Docker and Docker Compose
Install jetson-containers
Install AI Fusion Kit demo applications (optional)

All these steps can be done with a single command:

sudo ./setup.sh

Although the script completes most of the setup work automatically, it will prompt you to make some choices. Please follow the instructions to complete the setup.

Setup Script — The setup script will prompt you to make some choices.

Install NVIDIA Riva (Optional)

NVIDIA Riva is a GPU-accelerated SDK for speech AI applications. With Riva microservice containers, developers can easily integrate features like ASR and TTS into their applications. Although AI Fusion Kit demo applications can definitely work without Riva, installing Riva will unlock the full power of them, enabling the speech AI features.

NVIDIA Riva SDK is hosted on NVIDIA GPU Cloud, which requires a NVIDIA Cloud Account to access. Therefore, we are not able to install it for you, but we will guide you through the process.

Access to NVIDIA Riva

At the time of writing (2025-08-29), NVIDIA Developer Program provides free access to NVIDIA Riva software for non-production purposes including internal evaluation, development, and testing. However, both the access to Riva and the process of getting access to Riva might change over time. Please refer to the official documentation for the latest information.

Create/Get your NVIDIA Cloud Account

NCA is NVIDIA's system for managing how different people in an organization access NVIDIA cloud services. However, individual users can also create a NCA to access NGC.

For individual usersFor companies/organizations with NCA

Go to NGC Catalog
Click on "Sign In / Sign Up" in the menu at the top right corner
Sign in or create your NVIDIA account
The website will ask you to create a NCA. Feel free to name it as you want.

Creating page for NCA
If you just created a new NVIDIA account, you might be asked to fill in your personal information.

Please contact the administrator of your organization's NCA. They should be able to add you to the NCA. For how to add users to an NCA, please refer to the NGC User Guide.

Install NGC CLI to Jetson

NGC CLI is a command-line tool that allows you to access NGC resources. It is the recommended way to install Riva.

Download NGC CLI

Go to the NGC CLI Download Page.(1)
1. If the link is not working, you can search for "NGC CLI" in the NGC Catalog.
Find the arm64 Linux version of NGC CLI. The file name should be like ngccli_arm64.zip.
Click the "..." button in the "Actions" column and copy the curl or wget command.

Get the download command for NGC CLI from NGC Catalog

Download the NGC CLI by running the command on the box PC. For example:

cd ~/Downloads
# Run the curl or wget command you get here
curl -L 'https://api.ngc.nvidia.com/v2/resources/org/nvidia/team/ngc-apps/ngc_cli/3.169.4/files?redirect=true&path=ngccli_arm64.zip' -o 'ngccli_arm64.zip'

Unzip and install the NGC CLI package to the desired location like, for example, /opt, and then add the installation location to your PATH environment variable.

# Unzip and install the package
unzip ngccli_arm64.zip
sudo mv ngc-cli /opt/

# Add the installation location to the PATH environment variable
cat <<EOF >> ~/.bashrc

# NGC CLI
export PATH=/opt/ngc-cli:\$PATH
EOF
source ~/.bashrc

Test if the installation is successful.
```
ngc --version
```
You should see the version of NGC CLI like this:
```
NGC CLI 3.169.4
```

Set up NGC CLI

Detailed instructions for generating API key

You can also refer to the NGC User Guide for the detailed instructions with screenshots.

To make the NGC CLI work, you need to set up the authentication.

Go to NGC Catalog, sign in, and click "Setup" in the menu at the top right corner
In the "Setup" page, click "Generate API Key".
In the "API Keys" page, click "Generate API Key" and generate a new API key (or use an existing one). Note that the key permissions should include "NGC Catalog".
Run the following command on your Jetson device:
```
ngc config set
```
Follow the instructions of NGC CLI to finish the setup. It will ask for the API key, organization name, etc. (The organization name might be an auto-generated long number. NGC CLI will provide possible choices in the prompt, so you can just copy and paste it.)

Install NVIDIA Riva

Here we use version 2.19.0 (the latest version at the time of writing) of NVIDIA Riva as an example.

Download the quick start scripts. You can download it to the home directory to avoid permission issues.
```
cd ~
ngc registry resource download-version nvidia/riva/riva_quickstart_arm64:2.19.0
```
Take a look at the config.sh file in the riva_quickstart_arm64_v2.19.0 directory and modify the configuration if needed.

By default, the ASR and TTS services are enabled for the en-US language, and the models will be stored in riva_quickstart_arm64_v2.19.0/model_repository.
Enter the riva_quickstart_arm64_v2.19.0 directory, run the riva_init.sh script to install the Riva containers and models. After the script finishes, you should be able to start the Riva services by running the riva_start.sh script.
```
cd riva_quickstart_arm64_v2.19.0
bash riva_init.sh
bash riva_start.sh
```
By default, the riva_start.sh script will open a terminal inside the container for you to test the Riva services. However, the example scripts inside may not function as expected on Jetson. For Python examples, we recommend you to run the riva-client:python container provided by jetson-containers. See Jetson Containers section for more details.

Run the Demo Applications

Basic Usage

About echo cancellation on the speakerphone

The AI speakerphone is equipped with a nice echo cancellation feature. However, the algorithm will only work after a few seconds of sound has been played from the speakerphone. Therefore, you may need to play some sound before running the demo, or the speech AI may hear itself.

Currently, only one demo application is available for the AI Fusion Kit. It is a multi-container application that consists of:

A local vLLM server for LLM/VLM inference
(Optional) A local NVIDIA Riva server providing ASR and TTS services
A GUI application built with AVerMedia software stack, handling the multimedia stream and providing the user interface

If you have chosen to install the demo applications during the setup process, the containers and the VLM model (by default, Qwen/Qwen2.5-VL-3B-Instruct-AWQ) should have been installed automatically. You can easily run the demo application by running the following command:

cd /path/to/ai-fusion-kit-quick-start
./run_demo.sh

What does the run_demo.sh script do?

If you take a look at the script, you will find that it basically does the following:

Detect the video devices and HID devices connected to the box PC
Find the installation path of jetson-containers, which should contain the VLM model
Find the installation path of Riva models
Write the above information to the compose.override.yaml or the .env files
Start the containers with Docker Compose

The script automates the configuration that can vary between launches, such as device paths (which may change based on USB connection order) and installation directories (which depend on user preferences or system configuration).

Troubleshooting

After you run the script, you should see the Docker Compose status like:

[+] Running 3/3
 ⠏ Container demo-llm-server-1   Waiting              5.0s
 ✔ Container demo-riva-server-1  Healthy              2.3s
 ⠏ Container demo-app-1          Waiting              4.9s

where demo is the name of this Docker Compose project, and llm-server, riva-server, and app are the names of the "services" in the compose.yaml file.

If any of the services fails to start, you can check the logs by running the following command:

docker compose logs <service-name>

For example, if the llm-server service fails to start, you can run:

docker compose logs llm-server

The GUI application should launch shortly. You will find the live camera feed and the VLM output in the left column, with various configuration controls in the right column.

GUI Application (Device configuration) — In the "Devices" tab, you can select the video and audio devices and start the camera.

The AI features are placed in the "Models" tab. Here you can check the status of the AI microservices, set the VLM prompt, and start the VLM inference. It is a live image captioning application, which will describe the live camera feed in real time. Typically, it takes several minutes for vLLM to start the server, so you will have to wait for a while before you see the LLM service healthy.

The prompt is usually a question for the model to answer based on the live camera feed. You can update the prompt at any time, either by typing or speaking, even while the inference is running.

Update the prompt by typingUpdate the prompt by speaking

Type the prompt in the text box
Click the "Update" button for the prompt to take effect

NVIDIA Riva Required

NVIDIA Riva is required for the speech AI features. You will not be able to click the "Listen" button if you have not installed Riva.

Click the "Listen" button, and you should hear the speech AI saying "I'm listening..."
Speak the prompt
The speech AI will repeat the prompt back to you and automatically update the prompt (you don't need to click the "Update" button)

GUI Application (AI features) — In the "Models" tab, you can check the status of the AI microservices and control the inference. The live camera feed and the VLM output are displayed in the left.

Customize the Configuration

As mentioned earlier, the demo application is managed with Docker Compose. Currently, except for those settings dynamically generated by the run_demo.sh script, all the configuration is done in the compose.yaml file. It is a standard Docker Compose file, written in YAML and following the Compose Specification. You can refer to the Docker Compose File Reference for more details.

Below we will introduce the most common configuration changes you might want to make. After you make any changes, if the demo application is running, you can apply them by running the following command in the ai-fusion-kit-quick-start directory:

docker compose up -d

Docker Compose is smart enough to only restart the services that have changed, so you don't need to worry about restarting the whole application. If the demo application is not running, you can simply run the run_demo.sh script.

Change the VLM Model

The model is directly specified in the command attribute of the llm-server service. By default, it is:

command: vllm serve Qwen/Qwen2.5-VL-3B-Instruct-AWQ
  --host=0.0.0.0
  --port=9000
  --max-num-seqs=1
  --max-model-len=2048
  --trust-remote-code
  --chat-template-content-format=openai
  --gpu-memory-utilization=0.35
  --uvicorn-log-level=debug
  --limit-mm-per-prompt='{"image":1,"video":0}'
  --mm-processor-kwargs='{"max_pixels":200704,"size":{"shortest_edge":3136,"longest_edge":200704}}'

You can replace the model name (Qwen/Qwen2.5-VL-3B-Instruct-AWQ) with your desired model. Here are some considerations:

Model Selection:

Popular alternatives include llava-hf/llava-v1.6-mistral-7b-hf, microsoft/Phi-3.5-vision-instruct, or other vision-language models
Check Hugging Face Model Hub for available VLM models
You'll have to download the model in advance, because the TRANSFORMERS_OFFLINE environment variable is set in compose.yaml by default. You can download the model with the following command:
```
jetson-containers run $(autotag vllm) huggingface-cli download <MODEL_NAME>
```
It will download the model to the data directory in the installation path of jetson-containers.

Configuration Adjustments:

Memory Allocation: Adjust --gpu-memory-utilization based on model size
Processor Parameters: Update or remove --mm-processor-kwargs for the specific model's processor requirements
Quantization: Qwen2.5-VL-3B-Instruct-AWQ is already quantized to AWQ format. For those unquantized models, you'll probably need to quantize them on the fly with --quantization=fp8 or --quantization=bitsandbytes.

You can also check the model cards on Jetson AI Lab as a reference.

Memory Planning:

Operating system (with GUI): ~2-3 GB
Riva service (ASR/TTS for en-US): ~2-2.5 GB
VLM model: varies by model size and quantization method.

You should especially be careful with the memory usage if you are using Orin NX based platforms!

Change the Container Images

In some cases, you may want to use a different container image for a service in the demo, such as:

Using a newer version of vLLM or Riva
Switching to a different LLM inference engine like Ollama
Using a custom image built by yourself

Steps to change an image:

Download the new image with docker pull or build it by yourself
Find the service (llm-server or riva-server) you want to change in the compose.yaml file
Change the image attribute of the service to the new image
If you change the inference engine for llm-server, you may also need to change the command attribute

API Compatibility Requirements

To work with the demo application, the LLM inference engine must support these OpenAI-compatible API endpoints:

/v1/models - List available models
/v1/chat/completions - Handle chat completion requests

More Development Resources

Jetson Containers

jetson-containers is a great open-source project for Jetson developers. It provides a set of pre-built Docker images for Jetson devices on Docker Hub, and tools to help you build the images you need. These images are built for general development purposes, so typically they are big in size and contain many tools, whether you need them or not. We highly encourage you to use the actively maintained images provided by jetson-containers in the early stage of your development, like dustynv/vllm, dustynv/ollama, etc., for they save you a lot of time, especially if you are not familiar with either NVIDIA Jetson or Docker.

Actually, the demo application is also built with dustynv/vllm, a pre-built image for vLLM. You can also directly run the container without launching the demo application. For example, you can start the vLLM server by running the following command:

jetson-containers run $(autotag vllm) vllm serve Qwen/Qwen2.5-VL-3B-Instruct-AWQ \
    --host=0.0.0.0 \
    --port=9000 \
    --max-num-seqs 1 \
    --disable-mm-preprocessor-cache \
    --trust-remote-code \
    --max-model-len=2048 \
    --gpu-memory-utilization=0.35 \
    --limit-mm-per-prompt='{"image":1,"video":0}' \
    --mm-processor-kwargs='{"max_pixels":200704,"size":{"shortest_edge":3136,"longest_edge":200704}}' \
    --uvicorn-log-level=debug

jetson-containers will handle all the options you need to pass to docker run for you, and autotag is a tool provided by jetson-containers to help you find the latest image tag on your device.

After the server is ready, you can test it by either sending a request with curl or using the openai Python package. It takes a while for the first inference of the server to finish, but the subsequent inferences should be much faster.

Send a request with curlUse the openai Python package

Example provided by Jetson AI Lab.

curl http://0.0.0.0:9000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "messages": [{
    "role": "user",
    "content": [{
        "type": "text",
        "text": "What is in this image?"
    },
    {
        "type": "image_url",
        "image_url": {
        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
        }
    }
    ]}],
    "max_tokens": 300
}'

The openai Python package is a popular library for interacting with OpenAI API. Again, the following example is provided by Jetson AI Lab.

import base64
import requests

import openai

client = openai.OpenAI(
    base_url='http://0.0.0.0:9000/v1',
    api_key='dummy', # Local server doesn't require an API key
)

models = [model.id for model in client.models.list()]
print(f"Models from server {client.base_url}: {models}")

url = "https://raw.githubusercontent.com/dusty-nv/jetson-containers/refs/heads/dev/data/images/dogs.jpg"
txt = "What kind of dogs are these?"  # the image shows a husky and golden retriever
img = requests.get(url)

messages = [{
    'role': 'user',
    'content': [
        { 'type': 'text', 'text': txt },
        {
            'type': 'image_url',
            'image_url': {
                'url': 'data:' + img.headers['Content-Type'] + ';' + 'base64,' + base64.b64encode(img.content).decode()
            },
        },
    ],
}]

stream = client.chat.completions.create(
    model=models[0],
    messages=messages,
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end='', flush=True)
print()

Jetson AI Lab

Jetson AI Lab examples might be deprecated

Some examples in Jetson AI Lab are problematic and not working as expected at the moment. We are planning to provide the known issues in the future.

Jetson AI Lab is a great website for Jetson developers, providing rich examples and tutorials. Nevertheless, the examples and tutorials are not actively maintained at the moment, and one of the main contributors of jetson-containers has stated that the examples are outdated (on Jun 27, 2025).

Many examples in Jetson AI Lab are based on the NanoLLM library, which is also inactive now. Therefore, we recommend you to use other popular third-party libraries instead, like vLLM, Ollama, SGLang, etc. The model cards in Jetson AI Lab can still help a lot for you to run VLM models with vLLM or Ollama, but you'll have to be careful that the default model saving location (the "Cache Dir" option in the card) is different from the jetson-containers.

Ready to Start Your AI Journey?

If you've made it this far through our comprehensive guide, you've seen the incredible potential of the AVerMedia AI Fusion Kit. Whether you're a researcher, developer, or AI enthusiast, this all-in-one solution provides everything you need to bring your AI applications to life.

Why Choose the AI Fusion Kit?

All-in-One AI Solution

Powerful AI Box PC: Pre-configured NVIDIA® Jetson with optimized performance
Professional 4K Camera: High-quality video input for vision applications
Smart AI Speakerphone: Built-in noise reduction and echo cancellation for seamless voice interactions
Ready-to-Use Software: Automatic setup script and containerized demo applications

Accelerate Your Development

Skip weeks of hardware integration and software setup
Focus on your AI logic instead of infrastructure challenges
Leverage our tested configurations and optimizations
Get professional support from our team

Don't let complex setup hold back your AI innovations. The AI Fusion Kit eliminates the barriers between your ideas and reality, providing a professional-grade platform that's ready to deploy.

AI Fusion Kit (AGX Orin) Product Page

AI Fusion Kit (Orin NX) Product Page