Benchmark SUPER mode of NVIDIA Jetson Orin NX

In 2025, AI is undoubtedly the hottest topic today, and there are many application scenarios that require localized deployment, such as smart surveillance systems, intelligent retail stores, and small-scale robots with LLM/VLM.

The NVIDIA Jetson Orin NX is a compact, high-performance AI computing module designed for edge applications such as robotics, smart cameras, and industrial automation. It delivers up to 157 TOPS of AI performance using the NVIDIA Ampere architecture, making it ideal for running complex AI models locally with low latency and high efficiency.

To unlock its full potential, the AVerMedia D133S Carrier Board provides a robust and versatile platform tailored for the Orin NX. It supports super mode for enhanced performance and offers a rich set of I/O.

Here, we are going to introduce and benchmark these two powerful standard carrier boards(D133 and D133S) from AVerMedia, which offer rich I/O options such as camera inputs, multiple Ethernet ports and a GPU, making them especially suitable for AI edge computing applications.

D133 and D133S from AVerMedia

The following table highlights the differences in main features between the D133 and D133S.

Device	D133	D133S
Platform Compatibility	NVIDIA® Jetson Orin NX	NVIDIA® Jetson Orin NX
Super Mode Support	No	Yes
OOB Functionality	Not available	1 x OOB (Optional, for power cycling and cloud serial console)
5G Daughter Board	Not available	1 x 5G daughter board (Optional, supports 4G/5G functions)

Power Mode

The D133S supports a new power mode that utilizes a higher 40W reference power, as well as a MAXN SUPER mode.

Device	Power Mode
D133	10W, 15W, 25W, MAXN
D133S	10W, 15W, 25W, 40W, MAXN SUPER

Suitable Scenarios

For D133

Academic Development: Ideal for academic research, AI model development, and prototyping.
Basic Edge Computing: For tasks like smart cameras, simple object detection, or image processing.
Cost-Sensitive Projects: Great for startups or small businesses buildings.

For D133S

Smart City & Industrial Applications: Such as traffic monitoring, smart lighting, or factory equipment supervision.
Remote Management Needs: OOB functionality enables remote reboot and diagnostics, perfect for unattended environments.
High-Performance Edge AI: Super Mode supports demanding AI inference tasks.
5G-Enabled Deployments: Suitable for mobile systems, in-vehicle platforms, or remote video streaming.

Info

We provide a Quick Start Guide for VLM Development with D133S series.

Key Advantages of VLM Over LLM

Real-World Contextual Awareness

VLM can "see" the world, giving them a more grounded understanding of real-world situations.

Identifying objects in photos, detecting emotions in facial expressions, reading product labels or signs.
Better Performance in Tasks Requiring Visual Context

VLM outperform LLM in tasks where visual input is crucial.

Diagnosing issues from a photo, interpreting data from a graph or table image, OCR + comprehension.
Real-World Applications
- Medical imaging analysis
- Retail (visual search, product tagging)
- Accessibility tools (describing images for visually impaired users)
- Autonomous driving (object detection + decision-making)

VLM Benchmark

From the key advantages, we can know that VLM can indeed offer more advantages than LLM. Additionally, there is a lack of information regarding the performance comparison of VLM on the D133 and D133S. Therefore, we will compare the performance differences of various VLM models on the d133s and d133.

The following table compares the computational performance of D133 and D133S, representing normal mode(MAXN) and super mode(MAXN SUPER) respectively.

It is evident from the table that both TTFT and TPOT show significant speed improvements in D133S.

In particular, the improvement in TTFT is especially notable, indicating that D133S offers greater potential for accelerating AI model inference compared to D133.

We used the open dataset MMStar to measure the average values of TTFT and TPOT. At the same time, it also allows us to evaluate the accuracy performance of various VLM.

alt text

VLM Model	Framework	TTFT Speed	TPOT Speed	MMStar	TTFT(s) D133	TTFT(s) D133S	TPOT(s) D133	TPOT(s) D133S
Qwen/Qwen2.5-VL-3B-Instruct	vLLM	+15%	+5%	0.539	0.69	0.599	0.084	0.08
Qwen/Qwen2.5-VL-3B-Instruct-AWQ	vLLM	+15%	+6%	0.525	0.544	0.471	0.034	0.032
AIDC-AI/Ovis2-4B-GPTQ-Int4	vLLM	+23%	+3%	0.609	1.68	1.367	0.034	0.033
microsoft/Phi-3.5-vision-instruct	vLLM	+24%	+7%	0.467	1.33	1.075	0.111	0.104
HuggingFaceTB/SmolVLM2-2.2B-Instruct	vLLM	+29%	+8%	0.447	2.202	1.709	0.056	0.052
qwen2.5vl:3b	Ollama	+16%	+22%	0.509	2.119	1.82	0.062	0.051
qwen2.5vl:7b	Ollama	+18%	+21%	0.577	2.652	2.249	0.099	0.082
minicpm-v:8b	Ollama	+17%	+19%	0.477	1.98	1.688	0.082	0.069
llava:7b	Ollama	+22%	+14%	0.317	2.497	2.043	0.079	0.069
llava:13b	Ollama	+20%	+5%	0.364	4.31	3.578	0.166	0.158
gemma3:4b	Ollama	+16%	+20%	0.384	5.407	4.655	0.071	0.059

Info

MMStar is a multi-modal benchmark designed to evaluate the capabilities of Large Vision-Language Model (VLM). You can refer to Lin-Chen/MMStar for more details.

More competitive choice for AI deployment.

Increased computation capability

D133S can now provide higher power mode that can increase more AI inference speed.

Enhanced mobility and portability

D133S can support OOB and 4G/5G communication that especially for mobile and remote application.