Skip to content

Conversation

@DoctorFogarty
Copy link

@DoctorFogarty DoctorFogarty commented Jan 20, 2026

Description

Add NPU-accelerated, AI-assisted AprilTag detection to PhotonVision using a hybrid two-stage approach:

  1. Stage 1 (NPU): YOLO model detects AprilTag bounding boxes via neural network inference
  2. Stage 2 (CPU): WPILib AprilTag detector runs only within detected ROIs for accurate corner detection and ID decoding

This leverages existing ML infrastructure for supported NPU platforms:

Platform Hardware Backend Model Format
RK3588 Orange Pi 5, Rock 5C, CoolPi 4B RKNN .rknn
QCS6490 Rubik Pi 3 TFLite .tflite

Key Features

  • Toggle in AprilTag pipeline settings: traditional vs AI-assisted detection
  • Graceful fallback to traditional detection when ML is unavailable or finds no tags
  • Full coordinate transformation from ROI space to full-frame space (corners, center, and homography)
  • Platform auto-detection enables ML only on supported hardware

Coordinate Transformation

Both corner coordinates AND the homography matrix are transformed from ROI to full-frame coordinates using the formula H_full = T * H_roi where T is a translation matrix:
| 1 0 offsetX |
| 0 1 offsetY |
| 0 0 1 |

This ensures pose estimation works correctly regardless of where the tag was detected within the frame. Unit tests verify point projection consistency between full-frame and ROI-based detection.

Architecture

  • AprilTagROIDetectionPipe - Runs YOLO inference on NPU to find tag bounding boxes
  • AprilTagROIDecodePipe - Extracts ROI submats, runs traditional detector, transforms coordinates (including homography) back to full frame
  • Existing pose estimation pipes unchanged

Files Changed

File Change
AprilTagPipelineSettings.java Add ML settings (useMLDetection, confidence, NMS, ROI expansion)
AprilTagPipeline.java Add hybrid detection path with fallback logic
AprilTagROIDetectionPipe.java NEW: ML inference pipe
AprilTagROIDecodePipe.java NEW: ROI decode + coordinate/homography transformation
NeuralNetworkModelManager.java Add model lookup methods
PipelineTypes.ts Add TypeScript types for ML settings
AprilTagTab.vue Add UI controls for ML detection toggle
AprilTagROIDecodePipeTest.java NEW: Unit tests for coordinate mapping and homography transformation
AprilTagMLPipelineIntegrationTest.java NEW: Integration tests comparing ML vs traditional detection

Model

  • YOLOv8n trained on AprilTag dataset
  • INT8 quantized (w8a8), 640x640 input
  • Converted via Qualcomm AI Hub for TFLite format

Meta

Merge checklist:

  • Pull Request title is short, imperative summary of proposed changes
  • The description documents the what and why
  • If this PR changes behavior or adds a feature, user documentation is updated
  • If this PR touches photon-serde, all messages have been regenerated and hashes have not changed unexpectedly
  • If this PR touches configuration, this is backwards compatible with settings back to v2025.3.2
  • If this PR touches pipeline settings or anything related to data exchange, the frontend typing is updated
  • If this PR addresses a bug, a regression test for it is added

@DoctorFogarty DoctorFogarty requested a review from a team as a code owner January 20, 2026 05:30
@github-actions github-actions bot added frontend Having to do with PhotonClient and its related items backend Things relating to photon-core and photon-server labels Jan 20, 2026
@samfreund samfreund marked this pull request as draft January 20, 2026 15:54
@srimanachanta
Copy link
Member

Can you include metrics for performance difference between the traditional CPU only and using ML for bounding box detection? I am curious to see the differences in speed and accuracy between the two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Things relating to photon-core and photon-server frontend Having to do with PhotonClient and its related items

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants