Train the Species Model

This guide covers training a TFLite Micro classification model for on-device species identification on the ESP32-CAM.

Overview

The submerged unit runs a quantized MobileNet v2 model to classify catch by species. The pipeline:

Collect training images (underwater crab photos, IR-illuminated)
Label images by species and size
Train a MobileNet v2 classifier (transfer learning)
Quantize the model to INT8 for ESP32 deployment
Convert to TFLite Micro format
Deploy to the ESP32-CAM’s SD card

Data Collection Strategy

Before training begins, you need a dataset that actually represents what the camera will see at depth. Stock photos of crabs on a white background will not cut it — the model needs to learn what a crab looks like through wire mesh, in murky water, under IR light.

Underwater Photography Setup

The training images should match deployment conditions as closely as possible:

Primary illumination: 850nm IR LED array (same as the submerged unit). This is what the camera will actually see in the field.
Backup illumination: White LED panel for reference shots and ground truth labeling. Useful during data collection, but don’t rely on it for training — the deployed system runs IR only.
Camera housing: Any waterproof housing that can mount inside or adjacent to a crab pot. A GoPro in a dive housing works for initial collection. Match the camera angle and field of view to the ESP32-CAM’s OV2640 lens as closely as you can.

Image Sources

You don’t need to collect every image yourself. Combine sources to build a diverse dataset faster:

Personal trap deployments — the best source. Mount a camera in your test pots and pull the SD card after soak periods. This gives you real-world conditions that no other source can match.
State fisheries departments — many publish survey photos from trawl and pot surveys. Contact your state’s marine resources division directly; they’re often willing to share data for conservation-aligned projects.
iNaturalist marine observations — searchable by species and location. Quality varies, but useful for building out species diversity. Filter for “research grade” observations.
NOAA fisheries survey databases — the NOAA Fisheries Photo Gallery and regional science center publications contain species reference images.

Dataset Size

Stage	Images per class	Expected accuracy
Initial prototype	300-500	~80-85% (enough to validate the pipeline works)
Field-ready	1,000+	~90-95% (production threshold)
Mature system	3,000+	95%+ (with continued field data feedback)

Start small and iterate. A prototype model trained on 300 images per class is enough to prove the hardware and firmware work end-to-end. Accuracy improves as you feed real field captures back into the training set.

Image Diversity

Each class needs images across these axes of variation:

Angle: Top-down, side profile, angled — crabs don’t pose for the camera
Lighting: Full IR, partial shadow, backlit by ambient light leaking into the pot
Occlusion: Partially hidden behind mesh wires, stacked on top of other crabs, half-buried in bait
Size classes: Span the full range from clearly undersized to clearly legal and everything in between
Water clarity: Clear, silty, algae-tinged — turbidity changes the apparent contrast and color cast

Negative Examples

These are just as important as positives. The model needs to learn what is not a catch event:

Empty trap interior (different lighting conditions, angles)
Debris: seaweed, shells, rope fragments, bait remnants
Non-target species you expect to encounter in your fishery
Murky water with no visible objects (the camera will see this often)

Labeling

Use LabelImg for a lightweight desktop tool or CVAT for a browser-based collaborative workflow. Both support bounding box annotation.

Export as Pascal VOC (XML per image) or COCO format (single JSON) — both work with TFLite Model Maker’s data pipeline
Label consistently: decide on class boundaries before you start and document them. “Undersized” means below the legal minimum for your jurisdiction, measured point-to-point.
Have a second person verify a random sample of labels. Labeling errors propagate directly into model errors.

Data Augmentation

Beyond the automatic augmentations applied during training (see below), consider generating augmented copies of your raw dataset:

Rotation: 0-360 degrees (crabs have no preferred orientation in a pot)
Horizontal and vertical flip
Brightness and contrast variation: +/- 30% to simulate changing ambient light
Synthetic blur: Gaussian blur at varying radii to simulate water turbidity and camera focus drift
Color channel shifts: Simulate the spectral differences between IR illumination sources

A 500-image raw dataset with aggressive augmentation can effectively behave like a 2,000-3,000 image dataset during training.

Training Environment

# Create a virtual environment
python -m venv smartpot-ml
source smartpot-ml/bin/activate

# Install dependencies
pip install tensorflow tflite-model-maker pillow numpy

Dataset Preparation

Image Requirements

Parameter	Value
Resolution	96x96 px (model input size)
Format	RGB JPEG
Illumination	850nm IR (matches deployment)
Background	Inside a crab pot (wire mesh, sediment)
Minimum per class	200 images (500+ recommended)

Target Classes

Adapt these to your fishery. Example for Chesapeake Bay blue crab:

dataset/
├── blue_crab_keeper/    # ≥ 5" point-to-point
├── blue_crab_undersized/
├── blue_crab_female_sook/
├── finfish/             # Any non-crab species
├── horseshoe_crab/
├── empty/               # No animal present

Data Augmentation

Training automatically applies:

Random rotation (+-15 degrees)
Random brightness adjustment (+-20%)
Random horizontal flip
Mild Gaussian blur (simulates water turbidity)

Training

import tensorflow as tf
from tflite_model_maker import image_classifier

# Load dataset
data = image_classifier.DataLoader.from_folder('dataset/')
train_data, test_data = data.split(0.8)

# Train (transfer learning from MobileNet v2)
model = image_classifier.create(
    train_data,
    model_spec='mobilenet_v2',
    epochs=50,
    batch_size=32
)

# Evaluate
loss, accuracy = model.evaluate(test_data)
print(f'Test accuracy: {accuracy:.2%}')

# Export as INT8 quantized TFLite
model.export(
    export_dir='output/',
    tflite_filename='species_v1.tflite',
    quantization_config=tf.lite.Optimize.DEFAULT
)

Deployment

Copy species_v1.tflite to the ESP32-CAM’s MicroSD card
The firmware loads the model on boot

Verify with the serial monitor:

TFLite model loaded: species_v1.tflite
Input: 96x96x3 INT8
Output: 6 classes
Arena size: 186KB

Accuracy Targets

Metric	Target	Notes
Keeper vs. undersized	>95%	Critical — determines door action
Species classification	>90%	Important for data logging
False release rate	<2%	Keepers incorrectly released
False retention rate	<5%	Bycatch incorrectly kept