Skip to content
This guide covers the training pipeline. Data collection is ongoing — dataset sizes and accuracy targets will be updated as field trials progress.

Train the Species Model

This guide walks you through training, quantizing, and deploying a species detection model to the ESP32-P4 in the submerged unit. By the end, you’ll have a working model on an SD card that the trap firmware can load and run.

Here’s what you’ll do in this guide:

  1. Collect underwater training images under IR illumination
  2. Label images with bounding boxes by species
  3. Train a YOLO-class detection model via transfer learning (PyTorch)
  4. Export to INT8-quantized ESP-DL format for the ESP32-P4
  5. Deploy the model to the ESP32-P4’s SD card

All tooling lives in ml-pipeline/ — a Python project managed with uv that wraps PyTorch and timm.

Before training begins, you need a dataset that actually represents what the camera will see at depth. Stock photos of crabs on a white background will not cut it — the model needs to learn what a crab looks like through wire mesh, in murky water, under IR light.

The training images should match deployment conditions as closely as possible:

  • Primary illumination: 850nm IR LED array (same as the submerged unit). This is what the camera will actually see in the field.
  • Backup illumination: White LED panel for reference shots and ground truth labeling. Useful during data collection, but don’t rely on it for training — the deployed system runs IR only.
  • Camera housing: Any waterproof housing that can mount inside or adjacent to a crab pot. Match the camera angle and field of view to the OV5647 NoIR lens as closely as you can.

You don’t need to collect every image yourself. Combine sources to build a diverse dataset faster:

  • Personal trap deployments — the best source. Mount a camera in your test pots and pull the SD card after soak periods. This gives you real-world conditions that no other source can match.
  • State fisheries departments — many publish survey photos from trawl and pot surveys. Contact your state’s marine resources division directly; they’re often willing to share data for conservation-aligned projects.
  • iNaturalist marine observations — searchable by species and location. Quality varies, but useful for building out species diversity. Filter for “research grade” observations.
  • NOAA fisheries survey databases — the NOAA Fisheries Photo Gallery and regional science center publications contain species reference images.
StageImages per classExpected accuracy
Initial prototype300-500~80-85% (enough to validate the pipeline works)
Field-ready1,000+~90-95% (production threshold)
Mature system3,000+95%+ (with continued field data feedback)

Start small and iterate. A prototype model trained on 300 images per class is enough to prove the hardware and firmware work end-to-end. Accuracy improves as you feed real field captures back into the training set.

Each class needs images across these axes of variation:

  • Angle: Top-down, side profile, angled — crabs don’t pose for the camera
  • Lighting: Full IR, partial shadow, backlit by ambient light leaking into the pot
  • Occlusion: Partially hidden behind mesh wires, stacked on top of other crabs, half-buried in bait
  • Size range: Span the full range from clearly undersized to clearly legal and everything in between
  • Water clarity: Clear, silty, algae-tinged — turbidity changes the apparent contrast and color cast

These are just as important as positives. The model needs to learn what is not a catch event:

  • Empty trap interior (different lighting conditions, angles)
  • Debris: seaweed, shells, rope fragments, bait remnants
  • Non-target species you expect to encounter in your fishery
  • Murky water with no visible objects (the camera will see this often)

Use CVAT for a browser-based collaborative workflow or LabelImg for a lightweight desktop tool. Both support bounding box annotation.

  • Label consistently: draw tight bounding boxes around each animal. Species label only — size is measured from the box, not from the label.
  • Have a second person verify a random sample of labels. Labeling errors propagate directly into model errors.

Beyond the automatic augmentations applied during training (see below), consider generating augmented copies of your raw dataset:

  • Rotation: 0-360 degrees (crabs have no preferred orientation in a pot)
  • Horizontal and vertical flip
  • Brightness and contrast variation: +/- 30% to simulate changing ambient light
  • Synthetic blur: Gaussian blur at varying radii to simulate water turbidity and camera focus drift
  • Color channel shifts: Simulate the spectral differences between IR illumination sources

A 500-image raw dataset with aggressive augmentation can effectively behave like a 2,000-3,000 image dataset during training.

The ml-pipeline/ directory contains the complete training pipeline. Install with uv:

Terminal window
cd ml-pipeline
uv sync --extra dev --extra export-p4

This installs PyTorch, timm, and ESP-DL export dependencies. GPU acceleration is optional — the model is small enough to train on CPU in minutes for prototype datasets.

ParameterValue
Resolution320x320 px (model input size)
FormatRGB JPEG
Illumination850nm IR (matches deployment)
BackgroundInside a crab pot (wire mesh, sediment)
Minimum per class200 images (500+ recommended)

The detection model uses simplified classes — no size-split variants needed since size is measured directly from bounding boxes:

dataset/
├── blue_crab/ # All blue crabs regardless of size (index 0)
├── finfish/ # Any non-crab species (index 1)
├── horseshoe_crab/ # (index 2)

Before collecting real images, validate the pipeline end-to-end with synthetic data:

Terminal window
uv run python scripts/generate_toy_dataset.py --output dataset --per-class 10

This creates synthetic images with distinct color palettes. The toy model should reach ~100% validation accuracy — it just proves the plumbing works.

Training automatically applies:

  • Random rotation (±15 degrees)
  • Random brightness adjustment (±20%)
  • Random horizontal flip
  • Mild Gaussian blur (simulates water turbidity)
Terminal window
uv run smartpot-train \
--dataset dataset \
--epochs 20 \
--batch-size 32 \
--lr 1e-3 \
--output models/checkpoints \
--device auto

The training loop:

  • Uses Adam optimizer with CrossEntropyLoss
  • Checkpoints the best model by validation accuracy
  • Prints per-epoch train/val loss and accuracy
  • Outputs a confusion matrix and classification report at the end

Output is saved to models/checkpoints/best_model.pth.

Export the trained detection model for ESP32-P4 deployment:

Terminal window
uv run smartpot-export-p4 \
--checkpoint models/checkpoints/best_model.pth \
--output models/species_detect

The export pipeline:

  1. PyTorch → ONNX export with detection head
  2. ONNX → ESP-PPQ quantization (INT8, calibrated for P4 memory layout)
  3. ESP-PPQ → ESP-DL format (.espdl)
  4. Output validation

If esp-ppq is not installed, the tool exports ONNX and prints manual conversion instructions.

The original classification export path is preserved:

Terminal window
uv run smartpot-export \
--checkpoint models/checkpoints/best_model.pth \
--output models/species_model

See the archived training guide for full ESP32-CAM deployment details.

  1. Generate calibration data for your pot geometry:
    Terminal window
    python ml-pipeline/scripts/generate_calibration.py \
    --focal-length 500.0 \
    --pot-distance 350.0 \
    --output calibration.bin
  2. Copy species_detect.espdl and calibration.bin to the MicroSD card root
  3. Flash the firmware:
    Terminal window
    cd firmware
    idf.py set-target esp32p4
    idf.py build
    idf.py flash monitor
  4. Verify with the serial monitor:
    SmartPot Trap Unit v0.2-p4
    Camera initialized: OV5647 NoIR (MIPI CSI)
    Calibration loaded: focal=500.0px dist=350mm img=320x320
    ESP-DL model ready
    Door default: UNLOCKED
    Tether UART initialized
    Waiting for tether connection...

Run the full test suite:

Terminal window
cd ml-pipeline
uv run pytest -v

Tests cover:

  • Dataset loading and augmentation (test_dataset.py)
  • Model architecture and output shapes (test_model.py)
  • ONNX export, validation, and C header generation (test_export.py)

Lint with ruff:

Terminal window
uv run ruff check src/ tests/ scripts/
MetricTargetNotes
Detection (mAP@50)>85%Bounding box accuracy
Size measurement±10mmAt typical pot dimensions
Species classification>90%Important for data logging
False release rate<2%Keepers incorrectly released
False retention rate<5%Bycatch incorrectly kept