Train the Species Model
This guide walks you through training, quantizing, and deploying a species detection model to the ESP32-P4 in the submerged unit. By the end, you’ll have a working model on an SD card that the trap firmware can load and run.
Overview
Section titled “Overview”Here’s what you’ll do in this guide:
- Collect underwater training images under IR illumination
- Label images with bounding boxes by species
- Train a YOLO-class detection model via transfer learning (PyTorch)
- Export to INT8-quantized ESP-DL format for the ESP32-P4
- Deploy the model to the ESP32-P4’s SD card
All tooling lives in ml-pipeline/ — a Python project managed with uv that wraps PyTorch and timm.
Data Collection Strategy
Section titled “Data Collection Strategy”Before training begins, you need a dataset that actually represents what the camera will see at depth. Stock photos of crabs on a white background will not cut it — the model needs to learn what a crab looks like through wire mesh, in murky water, under IR light.
Underwater Photography Setup
Section titled “Underwater Photography Setup”The training images should match deployment conditions as closely as possible:
- Primary illumination: 850nm IR LED array (same as the submerged unit). This is what the camera will actually see in the field.
- Backup illumination: White LED panel for reference shots and ground truth labeling. Useful during data collection, but don’t rely on it for training — the deployed system runs IR only.
- Camera housing: Any waterproof housing that can mount inside or adjacent to a crab pot. Match the camera angle and field of view to the OV5647 NoIR lens as closely as you can.
Image Sources
Section titled “Image Sources”You don’t need to collect every image yourself. Combine sources to build a diverse dataset faster:
- Personal trap deployments — the best source. Mount a camera in your test pots and pull the SD card after soak periods. This gives you real-world conditions that no other source can match.
- State fisheries departments — many publish survey photos from trawl and pot surveys. Contact your state’s marine resources division directly; they’re often willing to share data for conservation-aligned projects.
- iNaturalist marine observations — searchable by species and location. Quality varies, but useful for building out species diversity. Filter for “research grade” observations.
- NOAA fisheries survey databases — the NOAA Fisheries Photo Gallery and regional science center publications contain species reference images.
Dataset Size
Section titled “Dataset Size”| Stage | Images per class | Expected accuracy |
|---|---|---|
| Initial prototype | 300-500 | ~80-85% (enough to validate the pipeline works) |
| Field-ready | 1,000+ | ~90-95% (production threshold) |
| Mature system | 3,000+ | 95%+ (with continued field data feedback) |
Start small and iterate. A prototype model trained on 300 images per class is enough to prove the hardware and firmware work end-to-end. Accuracy improves as you feed real field captures back into the training set.
Image Diversity
Section titled “Image Diversity”Each class needs images across these axes of variation:
- Angle: Top-down, side profile, angled — crabs don’t pose for the camera
- Lighting: Full IR, partial shadow, backlit by ambient light leaking into the pot
- Occlusion: Partially hidden behind mesh wires, stacked on top of other crabs, half-buried in bait
- Size range: Span the full range from clearly undersized to clearly legal and everything in between
- Water clarity: Clear, silty, algae-tinged — turbidity changes the apparent contrast and color cast
Negative Examples
Section titled “Negative Examples”These are just as important as positives. The model needs to learn what is not a catch event:
- Empty trap interior (different lighting conditions, angles)
- Debris: seaweed, shells, rope fragments, bait remnants
- Non-target species you expect to encounter in your fishery
- Murky water with no visible objects (the camera will see this often)
Labeling
Section titled “Labeling”Use CVAT for a browser-based collaborative workflow or LabelImg for a lightweight desktop tool. Both support bounding box annotation.
- Label consistently: draw tight bounding boxes around each animal. Species label only — size is measured from the box, not from the label.
- Have a second person verify a random sample of labels. Labeling errors propagate directly into model errors.
Data Augmentation
Section titled “Data Augmentation”Beyond the automatic augmentations applied during training (see below), consider generating augmented copies of your raw dataset:
- Rotation: 0-360 degrees (crabs have no preferred orientation in a pot)
- Horizontal and vertical flip
- Brightness and contrast variation: +/- 30% to simulate changing ambient light
- Synthetic blur: Gaussian blur at varying radii to simulate water turbidity and camera focus drift
- Color channel shifts: Simulate the spectral differences between IR illumination sources
A 500-image raw dataset with aggressive augmentation can effectively behave like a 2,000-3,000 image dataset during training.
Training Environment
Section titled “Training Environment”The ml-pipeline/ directory contains the complete training pipeline. Install with uv:
cd ml-pipelineuv sync --extra dev --extra export-p4This installs PyTorch, timm, and ESP-DL export dependencies. GPU acceleration is optional — the model is small enough to train on CPU in minutes for prototype datasets.
Dataset Preparation
Section titled “Dataset Preparation”Image Requirements
Section titled “Image Requirements”| Parameter | Value |
|---|---|
| Resolution | 320x320 px (model input size) |
| Format | RGB JPEG |
| Illumination | 850nm IR (matches deployment) |
| Background | Inside a crab pot (wire mesh, sediment) |
| Minimum per class | 200 images (500+ recommended) |
Target Classes
Section titled “Target Classes”The detection model uses simplified classes — no size-split variants needed since size is measured directly from bounding boxes:
dataset/├── blue_crab/ # All blue crabs regardless of size (index 0)├── finfish/ # Any non-crab species (index 1)├── horseshoe_crab/ # (index 2)Toy Dataset (Pipeline Validation)
Section titled “Toy Dataset (Pipeline Validation)”Before collecting real images, validate the pipeline end-to-end with synthetic data:
uv run python scripts/generate_toy_dataset.py --output dataset --per-class 10This creates synthetic images with distinct color palettes. The toy model should reach ~100% validation accuracy — it just proves the plumbing works.
Data Augmentation
Section titled “Data Augmentation”Training automatically applies:
- Random rotation (±15 degrees)
- Random brightness adjustment (±20%)
- Random horizontal flip
- Mild Gaussian blur (simulates water turbidity)
Training
Section titled “Training”uv run smartpot-train \ --dataset dataset \ --epochs 20 \ --batch-size 32 \ --lr 1e-3 \ --output models/checkpoints \ --device autoThe training loop:
- Uses Adam optimizer with CrossEntropyLoss
- Checkpoints the best model by validation accuracy
- Prints per-epoch train/val loss and accuracy
- Outputs a confusion matrix and classification report at the end
Output is saved to models/checkpoints/best_model.pth.
Export
Section titled “Export”ESP32-P4 (ESP-DL)
Section titled “ESP32-P4 (ESP-DL)”Export the trained detection model for ESP32-P4 deployment:
uv run smartpot-export-p4 \ --checkpoint models/checkpoints/best_model.pth \ --output models/species_detectThe export pipeline:
- PyTorch → ONNX export with detection head
- ONNX → ESP-PPQ quantization (INT8, calibrated for P4 memory layout)
- ESP-PPQ → ESP-DL format (
.espdl) - Output validation
If esp-ppq is not installed, the tool exports ONNX and prints manual conversion instructions.
ESP32-CAM (TFLite — Legacy)
Section titled “ESP32-CAM (TFLite — Legacy)”The original classification export path is preserved:
uv run smartpot-export \ --checkpoint models/checkpoints/best_model.pth \ --output models/species_modelSee the archived training guide for full ESP32-CAM deployment details.
Deployment
Section titled “Deployment”ESP32-P4
Section titled “ESP32-P4”- Generate calibration data for your pot geometry:
Terminal window python ml-pipeline/scripts/generate_calibration.py \--focal-length 500.0 \--pot-distance 350.0 \--output calibration.bin - Copy
species_detect.espdlandcalibration.binto the MicroSD card root - Flash the firmware:
Terminal window cd firmwareidf.py set-target esp32p4idf.py buildidf.py flash monitor - Verify with the serial monitor:
SmartPot Trap Unit v0.2-p4Camera initialized: OV5647 NoIR (MIPI CSI)Calibration loaded: focal=500.0px dist=350mm img=320x320ESP-DL model readyDoor default: UNLOCKEDTether UART initializedWaiting for tether connection...
Testing
Section titled “Testing”Run the full test suite:
cd ml-pipelineuv run pytest -vTests cover:
- Dataset loading and augmentation (
test_dataset.py) - Model architecture and output shapes (
test_model.py) - ONNX export, validation, and C header generation (
test_export.py)
Lint with ruff:
uv run ruff check src/ tests/ scripts/Accuracy Targets
Section titled “Accuracy Targets”| Metric | Target | Notes |
|---|---|---|
| Detection (mAP@50) | >85% | Bounding box accuracy |
| Size measurement | ±10mm | At typical pot dimensions |
| Species classification | >90% | Important for data logging |
| False release rate | <2% | Keepers incorrectly released |
| False retention rate | <5% | Bycatch incorrectly kept |