Skip to content

Building a Resilient Smart Traffic Monitoring System on Jetson: Recovery via ERMI Virtual Media and API

Edge AI devices are increasingly used in smart traffic applications such as license plate recognition, vehicle flow analysis, and anomaly detection. However, real-world deployment challenges such as configuration corruption, system failure, or file damage can severely degrade device functionality. These issues are further complicated by the physical inaccessibility of many deployed systems.

We outline a practical, developer-oriented solution using ERMI (Edge Remote Management Interface) API and virtual media capabilities to enable remote recovery and minimize system downtime, set within the broader background of industry adoption of edge AI and remote management standards.

1. Edge AI for Intelligent Transportation

Jetson platforms such as the Orin NX deliver powerful AI capabilities at the edge, reducing latency and offloading central server infrastructure. Their compact size and performance make them ideal for deployment in citywide traffic environments. However, resilience is critical to ensuring continuity and responsiveness in the face of real-world operating conditions.

2. What Breaks in the Field: Config, Network, Model

A typical edge device deployed for traffic monitoring may fail due to:

  • Configuration File Corruption: Damaged system parameters or model pipeline configurations result in AI models failing to load.
  • Network Disconnection: Remote servers become unreachable, halting updates and diagnostics.
  • Incomplete or Damaged AI Models: Missing model weights or runtime artifacts prevent task execution.

These issues may be triggered by power fluctuations, security threats, or harsh climate conditions such as extreme cold or summer lightning storms. When devices are mounted in remote or elevated locations, field service is time-consuming and costly.

3. Solution Overview: ERMI and Virtual Media

By integrating Redfish-compliant ERMI module, administrators can execute remote diagnostics and recovery without relying on the host OS.

Redfish: A Modern, Open Standard for Out-of-Band Management

Redfish is a unified open industry standard developed by the DMTF (Distributed Management Task Force) for managing and monitoring systems at scale. Unlike traditional system management interfaces such as IPMI — which are outdated, binary-based, and harder to integrate into modern automation pipelines — Redfish adopts widely used web technologies: RESTful APIs, JSON payloads, and HTTPS. This makes it accessible to developers familiar with web development, DevOps, and infrastructure-as-code practices.

From a developer’s perspective, Redfish offers:

  • API-Native Operations: Use common tools like curl, Postman, Python requests, or JavaScript fetch to interact with endpoints.
  • Self-Describing Schemas: Machine-readable, versioned schemas that support client auto-generation and validation.
  • Built-in Security: HTTPS transport and role-based access control as standard.
  • Easy Integration: Seamless fit into CI/CD, IaC workflows, and automation pipelines for tasks such as remote recovery, firmware updates, and system monitoring.
  • Cross-Vendor Compatibility: Consistent schema across vendors, reducing the need for custom logic per hardware type.

Redfish is not just a management interface — it is a scalable, programmable layer for infrastructure automation. For operators managing thousands of edge devices, it enables building robust recovery workflows with minimal overhead and maximum consistency. Operationally, this approach eliminates most onsite technician visits, shortens recovery from days to minutes, and provides auditability and version control over recovery images and configuration.

More information and specifications can be found at the official Redfish documentation site: https://www.dmtf.org/standards/redfish

Virtual Media

Virtual Media presents a remote ISO/IMG to the host as a read‑only removable device (e.g., /dev/sr0 or /dev/sdX). In this design, Virtual Media is used as a file transport to deliver an offline bundle—image tars, compose files, configs, models, seed scripts, and a manifest/checksums.

How we use it

  • Insert the ISO/IMG via Redfish; the host sees a removable disk.
  • Mount the device read‑only and copy the bundle to the host (e.g., /opt/ermi_bundle).
  • Verify checksums, then run the seed script to docker load images and docker compose up -d to refresh services; update configs/models as needed.
  • Eject the media once validation passes.

Operational advantages (why it matters)

  • Lower MTTR, fewer truck rolls: Remote “insert” + local docker load avoids WAN pulls and on‑site USB handling on weak/air‑gapped links.
  • Deterministic artifact delivery: Digest‑pinned images, manifests, and checksums ensure correctness and reduce configuration drift.
  • Lower risk & stronger governance: Read‑only media plus API‑logged operations keep changes auditable and scoped via RBAC.
  • Safe rollback: Keep the previous stack/engines under /opt/ermi_backup; if validation fails, stop the new stack and resume the prior compose.
  • Bandwidth efficiency: Move only what matters (configs/models/scripts and selected image tars), not multi‑GB base layers over the WAN.

Operator tips

  • Treat Virtual Media as a courier for offline bundles.
  • Pre‑install a systemd one‑shot that auto‑applies bundles placed via Virtual Media.
  • Prefer digest‑pinned images in compose and keep bundles minimal to speed transfers.

Benefits & Impact (Detailed)

Operational

  • Lower MTTR, fewer truck rolls: API-driven ERMI flows restore service in minutes rather than hours/days by avoiding on-site dispatch.
  • Deterministic rebuilds: Golden, signed recovery images eliminate snowflake drift; every node returns to a known-good baseline.
  • Safer rollbacks: One-time UEFI override + image ejection provides a reversible path that leaves the disk state intact until validation.
  • Air-gapped continuity: Offline bundles (images/configs/scripts) enable full container replacement with zero external connectivity.

Security & Governance

  • Standards-based control plane: HTTPS/JSON + RBAC via Redfish centralizes access control and reduces bespoke tooling risks.
  • Auditable changes: Every step is a versioned artifact and an API call, improving compliance posture and incident forensics.
  • Supply-chain integrity: Digest-pinned images (and optional signing) prevent tag drift and mitigate image tampering.

Scalability & Engineering Velocity

  • API-first automation: CI/CD + IaC encode recovery as idempotent runbooks; teams test, simulate, and ship with confidence.
  • Fleet consistency: Schema-driven, cross-vendor Redfish removes vendor-specific snowflakes and simplifies large-scale ops.
  • Faster iterations: Blue/green swaps and offline seeding enable rapid, low-risk upgrades even on constrained links.

Performance & Cost

  • Bandwidth efficiency: Move only what matters (models/configs/scripts) instead of full multi-GB base images on weak links.
  • Operational expenses reduction: Less travel, shorter outages, and fewer manual interventions cut operating costs at scale.
  • Higher uptime: Consistent recovery reduces SLA breaches and downstream business interruption.

Environmental & Business Continuity

  • Lower carbon footprint: Fewer truck rolls reduce emissions for distributed deployments.
  • Predictable RTO: Deterministic flows turn emergency work into scheduled, measurable procedures.

How to Measure (suggested KPIs)

  • MTTR (median minutes to restore service) before/after ERMI adoption.
  • Truck rolls per 100 nodes/quarter and average on-site time saved.
  • Recovery success rate on first attempt; rollback frequency.
  • Bandwidth per update (MB/event) vs. legacy OTA pulls.
  • Config drift rate (nodes deviating from baseline) over time.
  • SLA impact: incident count and duration tied to container/OS failures.

4. Implementation Reference

Component Example
Edge Device Jetson Orin NX with ERMI module
ERMI Middleware ERMI module with ERMI firmware supports ERMI API
Remote File Server Secure HTTP/NFS endpoint for model/image hosting
AI Framework TensorRT / ONNX models for vehicle detection and analysis

4.1 Reference ERMI Runbook (Air‑Gapped: Virtual Media as File Transport)

Scenario: The site is offline or on a weak WAN. The device cannot reach any registry, but the host OS is running. Use Virtual Media as a file transport to deliver an offline bundle (images/configs/models/scripts), then replace/update the container locally.

0) Roles & Prerequisites

  • Build Host (with Internet): Pull and export images; assemble an ERMI bundle (images + compose + configs + models + seed scripts + manifest + checksums). Pin by digest.
  • Virtual Media ISO/IMG: Contains the bundle; formatted as ISO9660/UDF for read‑only mounting.
  • Target Jetson (OS running): You have access to a local shell via SoL or an existing management user. The device will mount the Virtual Media as a removable disk (e.g., /dev/sr0 or /dev/sdX).

Bundle layout (suggested):

bundle/
    images/               # *.tar (exported OCI/Docker images, digest-pinned)
    compose/              # compose_*.yaml (image@sha256 references)
    configs/              # DeepStream/JPS configs
    models/               # YOLOv8 / TRT engines (optional if prebuilt on host)
    scripts/              # seed_and_start.sh, health_check.sh, etc.
    manifest.yaml         # versions, digests, paths
    SHA256SUMS            # checksums (optional: signatures)

A) Build Host: Prepare the Offline ERMI Bundle

  1. Pull, lock by digest, export

    IMG="nvcr.io/nvidia/jps/deepstream:7.0-jps-v1.1.1"
    docker pull "$IMG"
    DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' "$IMG")
    mkdir -p bundle/images && docker save -o bundle/images/deepstream.tar "$DIGEST"
    
  2. Compose + assets (ensure digest references)

    mkdir -p bundle/{compose,configs,models,scripts}
    sed "s#nvcr.io/nvidia/jps/deepstream:.*#$DIGEST#g" \
      compose_nx16_yolov8s.yaml > bundle/compose/compose_nx16_yolov8s.yaml
    cp -r ds-config-files/* bundle/configs/
    cp -r yolov8s/* bundle/models/
    
  3. Manifest + checksums

    cat > bundle/manifest.yaml <<YAML
    version: 1
    images:
      - name: nvcr.io/nvidia/jps/deepstream
        digest: ${DIGEST#*@}
        file: images/deepstream.tar
    compose: compose/compose_nx16_yolov8s.yaml
    seed_script: scripts/seed_and_start.sh
    YAML
    ( cd bundle && sha256sum $(find . -type f | sort) > SHA256SUMS )
    
  4. Seed script (to run locally on the Jetson)

    cat > bundle/scripts/seed_and_start.sh <<'SH'
    set -euo pipefail
    # Load images from offline tar
    for f in /opt/ermi_bundle/images/*.tar; do docker load -i "$f"; done
    # Start/refresh stack
    cd /opt/ermi_bundle/compose
    DOCKER_DEFAULT_PLATFORM=linux/arm64 docker compose up -d --force-recreate
    # Optional: smoke check
    docker ps --format 'table {{.Names}}    {{.Status}}'
    SH
    chmod +x bundle/scripts/seed_and_start.sh
    
  5. Package

    tar czf ermi-bundle-jetson-nx-$(date +%F).tgz -C bundle .
    
  6. Embed the bundle into the ISO/IMG (or place alongside) to be delivered via Virtual Media.

B) ERMI API: Insert Virtual Media

  1. InsertMedia via ERMI API (CIFS/SMB or HTTP URL as supported by your ERMI module)

    curl -k -u ADMIN:PASSWORD -H "Content-Type: application/json" -X POST \
      -d '{
            "Image":"smb://10.0.0.10/share/ermi-recovery.iso",
            "TransferProtocolType":"CIFS",
            "Inserted":true,
            "UserName":"user",
            "Password":"pass"
          }' \
      https://<bmc>/redfish/v1/Systems/1/VirtualMedia/USB/Actions/VirtualMedia.InsertMedia
    
  2. Detect on Jetson: The device sees a new removable disk. If it doesn’t auto‑mount, identify and mount manually:

    dmesg | tail -n 20
    # Likely /dev/sr0 or /dev/sdX1
    sudo mkdir -p /mnt/ermivm && sudo mount -o ro /dev/sr0 /mnt/ermivm || \
      sudo mount -o ro /dev/sdX1 /mnt/ermivm
    

C) Stage and Replace Containers (On the Running OS)

  1. Copy & unpack bundle locally

    sudo mkdir -p /opt/ermi_bundle
    sudo cp /mnt/ermivm/ermi-bundle-*.tgz /opt/ermi_bundle/
    cd /opt/ermi_bundle && sudo tar xzf ermi-bundle-*.tgz
    sudo sha256sum -c SHA256SUMS || echo "WARN: checksum mismatch check"
    
  2. Load images & (blue/green) refresh services

    # Load images without network
    for f in /opt/ermi_bundle/images/*.tar; do sudo docker load -i "$f"; done
    # Option A: in-place refresh
    cd /opt/ermi_bundle/compose && sudo docker compose up -d --force-recreate
    # Option B: blue/green (start new stack, validate, then stop old)
    # sudo docker compose -f compose_nx16_yolov8s_blue.yaml up -d --force-recreate
    # health check
    curl -fsS http://127.0.0.1:30080/healthz || true
    
  3. Update configs/models only (if applicable)

    sudo rsync -a /opt/ermi_bundle/configs/ /opt/nv_ai/config/deepstream/
    sudo rsync -a /opt/ermi_bundle/models/  /opt/nv_ai/yolov8s/
    

D) Optional Hardening

  • Create a systemd oneshot (ermi-seed.service) to auto‑apply bundles dropped via VM in the future.
  • Store previous compose & engines under /opt/ermi_backup to enable one‑command rollback.

E) Eject Virtual Media (Clean Up)

curl -k -u ADMIN:PASSWORD -H "Content-Type: application/json" -X POST -d '{}' \
  https://<bmc>/redfish/v1/Systems/1/VirtualMedia/USB/Actions/VirtualMedia.EjectMedia

Why this works in air-gapped sites

  • All artifacts arrive over Virtual Media as a read‑only disk; the OS loads images locally (docker load) and restarts services—no registry pulls.
  • The flow is deterministic (manifest + checksums), and reversible (you can keep the previous stack for rollback).

4.2 Architecture Sketch

ERMI with Virtual Media Sketch

5. Future Considerations

As Jetson-based deployments increasingly adopt NVIDIA's DeepStream SDK or Metropolis Microservices, developers may encounter challenges in maintaining lightweight, maintainable containers. For example, DeepStream 7.1 containers can exceed 10 GB on Jetson platforms and over 22 GB on dGPU systems. This image size creates friction during OTA updates or ERMI recovery, especially when full container redeployment is required.

Furthermore, Metropolis Microservices — while modular and cloud-native — may bundle inference pipelines, streaming services, and storage components into large image layers. Any critical updates or reconfiguration often require pulling or replacing these container images, resulting in considerable downtime and bandwidth usage.

To mitigate these risks, developers should consider:

  • Layered Image Design: Separate custom application logic and models into thin layers above the base image. Use multi-stage builds and modular Dockerfiles to keep updates small.
  • Incremental Config Deployment: Where possible, update only the configuration and model artifacts via virtual media rather than the entire container.
  • Container Trimming: Regularly audit and remove unused packages, plugins, or libraries that contribute to image bloat.
  • Automation Integration: Adopt infrastructure-as-code and CI/CD pipelines to manage container versioning and deployment consistency.

These practices complement ERMI recovery workflows and ensure the maintainability of smart traffic systems at scale.

  • Preventative OTA Updates to reduce reactive recovery
  • Health Monitoring via Redfish Metrics
  • MLOps Workflow Integration for scalable model and config rollout

6. From Playbook to Pilot

Jetson-based traffic monitoring systems must remain operational under adverse conditions. By combining Redfish-based ERMI management and virtual media recovery, operators can restore critical functions quickly and securely without physical intervention. This approach supports the long-term goal of autonomous, resilient, and cost-efficient smart infrastructure.


If you want more information or support, contact AVerMedia or visit https://ermi-open.org.