Skip to content
This guide covers the training pipeline. Data collection is ongoing — dataset sizes and accuracy targets will be updated as field trials progress.

Train the Species Model

This guide covers training a TFLite Micro classification model for on-device species identification on the ESP32-CAM.

The submerged unit runs a quantized MobileNet v2 model to classify catch by species. The pipeline:

  1. Collect training images (underwater crab photos, IR-illuminated)
  2. Label images by species and size
  3. Train a MobileNet v2 classifier (transfer learning)
  4. Quantize the model to INT8 for ESP32 deployment
  5. Convert to TFLite Micro format
  6. Deploy to the ESP32-CAM’s SD card

Before training begins, you need a dataset that actually represents what the camera will see at depth. Stock photos of crabs on a white background will not cut it — the model needs to learn what a crab looks like through wire mesh, in murky water, under IR light.

The training images should match deployment conditions as closely as possible:

  • Primary illumination: 850nm IR LED array (same as the submerged unit). This is what the camera will actually see in the field.
  • Backup illumination: White LED panel for reference shots and ground truth labeling. Useful during data collection, but don’t rely on it for training — the deployed system runs IR only.
  • Camera housing: Any waterproof housing that can mount inside or adjacent to a crab pot. A GoPro in a dive housing works for initial collection. Match the camera angle and field of view to the ESP32-CAM’s OV2640 lens as closely as you can.

You don’t need to collect every image yourself. Combine sources to build a diverse dataset faster:

  • Personal trap deployments — the best source. Mount a camera in your test pots and pull the SD card after soak periods. This gives you real-world conditions that no other source can match.
  • State fisheries departments — many publish survey photos from trawl and pot surveys. Contact your state’s marine resources division directly; they’re often willing to share data for conservation-aligned projects.
  • iNaturalist marine observations — searchable by species and location. Quality varies, but useful for building out species diversity. Filter for “research grade” observations.
  • NOAA fisheries survey databases — the NOAA Fisheries Photo Gallery and regional science center publications contain species reference images.
StageImages per classExpected accuracy
Initial prototype300-500~80-85% (enough to validate the pipeline works)
Field-ready1,000+~90-95% (production threshold)
Mature system3,000+95%+ (with continued field data feedback)

Start small and iterate. A prototype model trained on 300 images per class is enough to prove the hardware and firmware work end-to-end. Accuracy improves as you feed real field captures back into the training set.

Each class needs images across these axes of variation:

  • Angle: Top-down, side profile, angled — crabs don’t pose for the camera
  • Lighting: Full IR, partial shadow, backlit by ambient light leaking into the pot
  • Occlusion: Partially hidden behind mesh wires, stacked on top of other crabs, half-buried in bait
  • Size classes: Span the full range from clearly undersized to clearly legal and everything in between
  • Water clarity: Clear, silty, algae-tinged — turbidity changes the apparent contrast and color cast

These are just as important as positives. The model needs to learn what is not a catch event:

  • Empty trap interior (different lighting conditions, angles)
  • Debris: seaweed, shells, rope fragments, bait remnants
  • Non-target species you expect to encounter in your fishery
  • Murky water with no visible objects (the camera will see this often)

Use LabelImg for a lightweight desktop tool or CVAT for a browser-based collaborative workflow. Both support bounding box annotation.

  • Export as Pascal VOC (XML per image) or COCO format (single JSON) — both work with TFLite Model Maker’s data pipeline
  • Label consistently: decide on class boundaries before you start and document them. “Undersized” means below the legal minimum for your jurisdiction, measured point-to-point.
  • Have a second person verify a random sample of labels. Labeling errors propagate directly into model errors.

Beyond the automatic augmentations applied during training (see below), consider generating augmented copies of your raw dataset:

  • Rotation: 0-360 degrees (crabs have no preferred orientation in a pot)
  • Horizontal and vertical flip
  • Brightness and contrast variation: +/- 30% to simulate changing ambient light
  • Synthetic blur: Gaussian blur at varying radii to simulate water turbidity and camera focus drift
  • Color channel shifts: Simulate the spectral differences between IR illumination sources

A 500-image raw dataset with aggressive augmentation can effectively behave like a 2,000-3,000 image dataset during training.

Terminal window
# Create a virtual environment
python -m venv smartpot-ml
source smartpot-ml/bin/activate
# Install dependencies
pip install tensorflow tflite-model-maker pillow numpy
ParameterValue
Resolution96x96 px (model input size)
FormatRGB JPEG
Illumination850nm IR (matches deployment)
BackgroundInside a crab pot (wire mesh, sediment)
Minimum per class200 images (500+ recommended)

Adapt these to your fishery. Example for Chesapeake Bay blue crab:

dataset/
├── blue_crab_keeper/ # ≥ 5" point-to-point
├── blue_crab_undersized/
├── blue_crab_female_sook/
├── finfish/ # Any non-crab species
├── horseshoe_crab/
├── empty/ # No animal present

Training automatically applies:

  • Random rotation (+-15 degrees)
  • Random brightness adjustment (+-20%)
  • Random horizontal flip
  • Mild Gaussian blur (simulates water turbidity)
import tensorflow as tf
from tflite_model_maker import image_classifier
# Load dataset
data = image_classifier.DataLoader.from_folder('dataset/')
train_data, test_data = data.split(0.8)
# Train (transfer learning from MobileNet v2)
model = image_classifier.create(
train_data,
model_spec='mobilenet_v2',
epochs=50,
batch_size=32
)
# Evaluate
loss, accuracy = model.evaluate(test_data)
print(f'Test accuracy: {accuracy:.2%}')
# Export as INT8 quantized TFLite
model.export(
export_dir='output/',
tflite_filename='species_v1.tflite',
quantization_config=tf.lite.Optimize.DEFAULT
)
  1. Copy species_v1.tflite to the ESP32-CAM’s MicroSD card
  2. The firmware loads the model on boot
  3. Verify with the serial monitor:
    TFLite model loaded: species_v1.tflite
    Input: 96x96x3 INT8
    Output: 6 classes
    Arena size: 186KB
MetricTargetNotes
Keeper vs. undersized>95%Critical — determines door action
Species classification>90%Important for data logging
False release rate<2%Keepers incorrectly released
False retention rate<5%Bycatch incorrectly kept