Train the Species Model
This guide covers training a TFLite Micro classification model for on-device species identification on the ESP32-CAM.
Overview
Section titled “Overview”The submerged unit runs a quantized MobileNet v2 model to classify catch by species. The pipeline:
- Collect training images (underwater crab photos, IR-illuminated)
- Label images by species and size
- Train a MobileNet v2 classifier (transfer learning)
- Quantize the model to INT8 for ESP32 deployment
- Convert to TFLite Micro format
- Deploy to the ESP32-CAM’s SD card
Data Collection Strategy
Section titled “Data Collection Strategy”Before training begins, you need a dataset that actually represents what the camera will see at depth. Stock photos of crabs on a white background will not cut it — the model needs to learn what a crab looks like through wire mesh, in murky water, under IR light.
Underwater Photography Setup
Section titled “Underwater Photography Setup”The training images should match deployment conditions as closely as possible:
- Primary illumination: 850nm IR LED array (same as the submerged unit). This is what the camera will actually see in the field.
- Backup illumination: White LED panel for reference shots and ground truth labeling. Useful during data collection, but don’t rely on it for training — the deployed system runs IR only.
- Camera housing: Any waterproof housing that can mount inside or adjacent to a crab pot. A GoPro in a dive housing works for initial collection. Match the camera angle and field of view to the ESP32-CAM’s OV2640 lens as closely as you can.
Image Sources
Section titled “Image Sources”You don’t need to collect every image yourself. Combine sources to build a diverse dataset faster:
- Personal trap deployments — the best source. Mount a camera in your test pots and pull the SD card after soak periods. This gives you real-world conditions that no other source can match.
- State fisheries departments — many publish survey photos from trawl and pot surveys. Contact your state’s marine resources division directly; they’re often willing to share data for conservation-aligned projects.
- iNaturalist marine observations — searchable by species and location. Quality varies, but useful for building out species diversity. Filter for “research grade” observations.
- NOAA fisheries survey databases — the NOAA Fisheries Photo Gallery and regional science center publications contain species reference images.
Dataset Size
Section titled “Dataset Size”| Stage | Images per class | Expected accuracy |
|---|---|---|
| Initial prototype | 300-500 | ~80-85% (enough to validate the pipeline works) |
| Field-ready | 1,000+ | ~90-95% (production threshold) |
| Mature system | 3,000+ | 95%+ (with continued field data feedback) |
Start small and iterate. A prototype model trained on 300 images per class is enough to prove the hardware and firmware work end-to-end. Accuracy improves as you feed real field captures back into the training set.
Image Diversity
Section titled “Image Diversity”Each class needs images across these axes of variation:
- Angle: Top-down, side profile, angled — crabs don’t pose for the camera
- Lighting: Full IR, partial shadow, backlit by ambient light leaking into the pot
- Occlusion: Partially hidden behind mesh wires, stacked on top of other crabs, half-buried in bait
- Size classes: Span the full range from clearly undersized to clearly legal and everything in between
- Water clarity: Clear, silty, algae-tinged — turbidity changes the apparent contrast and color cast
Negative Examples
Section titled “Negative Examples”These are just as important as positives. The model needs to learn what is not a catch event:
- Empty trap interior (different lighting conditions, angles)
- Debris: seaweed, shells, rope fragments, bait remnants
- Non-target species you expect to encounter in your fishery
- Murky water with no visible objects (the camera will see this often)
Labeling
Section titled “Labeling”Use LabelImg for a lightweight desktop tool or CVAT for a browser-based collaborative workflow. Both support bounding box annotation.
- Export as Pascal VOC (XML per image) or COCO format (single JSON) — both work with TFLite Model Maker’s data pipeline
- Label consistently: decide on class boundaries before you start and document them. “Undersized” means below the legal minimum for your jurisdiction, measured point-to-point.
- Have a second person verify a random sample of labels. Labeling errors propagate directly into model errors.
Data Augmentation
Section titled “Data Augmentation”Beyond the automatic augmentations applied during training (see below), consider generating augmented copies of your raw dataset:
- Rotation: 0-360 degrees (crabs have no preferred orientation in a pot)
- Horizontal and vertical flip
- Brightness and contrast variation: +/- 30% to simulate changing ambient light
- Synthetic blur: Gaussian blur at varying radii to simulate water turbidity and camera focus drift
- Color channel shifts: Simulate the spectral differences between IR illumination sources
A 500-image raw dataset with aggressive augmentation can effectively behave like a 2,000-3,000 image dataset during training.
Training Environment
Section titled “Training Environment”# Create a virtual environmentpython -m venv smartpot-mlsource smartpot-ml/bin/activate
# Install dependenciespip install tensorflow tflite-model-maker pillow numpyDataset Preparation
Section titled “Dataset Preparation”Image Requirements
Section titled “Image Requirements”| Parameter | Value |
|---|---|
| Resolution | 96x96 px (model input size) |
| Format | RGB JPEG |
| Illumination | 850nm IR (matches deployment) |
| Background | Inside a crab pot (wire mesh, sediment) |
| Minimum per class | 200 images (500+ recommended) |
Target Classes
Section titled “Target Classes”Adapt these to your fishery. Example for Chesapeake Bay blue crab:
dataset/├── blue_crab_keeper/ # ≥ 5" point-to-point├── blue_crab_undersized/├── blue_crab_female_sook/├── finfish/ # Any non-crab species├── horseshoe_crab/├── empty/ # No animal presentData Augmentation
Section titled “Data Augmentation”Training automatically applies:
- Random rotation (+-15 degrees)
- Random brightness adjustment (+-20%)
- Random horizontal flip
- Mild Gaussian blur (simulates water turbidity)
Training
Section titled “Training”import tensorflow as tffrom tflite_model_maker import image_classifier
# Load datasetdata = image_classifier.DataLoader.from_folder('dataset/')train_data, test_data = data.split(0.8)
# Train (transfer learning from MobileNet v2)model = image_classifier.create( train_data, model_spec='mobilenet_v2', epochs=50, batch_size=32)
# Evaluateloss, accuracy = model.evaluate(test_data)print(f'Test accuracy: {accuracy:.2%}')
# Export as INT8 quantized TFLitemodel.export( export_dir='output/', tflite_filename='species_v1.tflite', quantization_config=tf.lite.Optimize.DEFAULT)Deployment
Section titled “Deployment”- Copy
species_v1.tfliteto the ESP32-CAM’s MicroSD card - The firmware loads the model on boot
- Verify with the serial monitor:
TFLite model loaded: species_v1.tfliteInput: 96x96x3 INT8Output: 6 classesArena size: 186KB
Accuracy Targets
Section titled “Accuracy Targets”| Metric | Target | Notes |
|---|---|---|
| Keeper vs. undersized | >95% | Critical — determines door action |
| Species classification | >90% | Important for data logging |
| False release rate | <2% | Keepers incorrectly released |
| False retention rate | <5% | Bycatch incorrectly kept |