Skip to content
This page preserves the ESP32-CAM-specific export and deployment steps. For the current ESP32-P4 pipeline, see [Train the Species Model](/how-to/train-species-model/).

Species Model — ESP32-CAM Export (Legacy)

The data collection, labeling, and PyTorch training steps are platform-independent — see the current training guide for those sections. This page preserves only the ESP32-CAM-specific export and deployment workflow.

Export the trained model for ESP32-CAM deployment:

Terminal window
# Export to ONNX with INT8 quantization
uv run smartpot-export \
--checkpoint models/checkpoints/best_model.pth \
--output models/species_model
# Optionally generate a C header for firmware embedding
uv run smartpot-export \
--checkpoint models/checkpoints/best_model.pth \
--output models/species_model \
--generate-header

The export pipeline:

  1. Tries litert-torch for direct PyTorch → TFLite conversion
  2. Falls back to PyTorch → ONNX → INT8 quantized ONNX
  3. Attempts ONNX → TFLite conversion via onnx2tf subprocess (if available)
  4. Validates output shapes and runs a test inference
  5. Optionally generates species_model.h with the model as a C byte array

If the automatic TFLite conversion isn’t available, you can convert manually:

tflite_output/model_integer_quant.tflite
# Install onnx2tf in an isolated environment and convert
uvx onnx2tf -i models/species_model.onnx -o tflite_output -oiqt
  1. Copy the .tflite model file to the ESP32-CAM’s MicroSD card as /species_model.tflite
  2. The firmware loads the model on boot from SD card into PSRAM
  3. Verify with the serial monitor:
    Model loaded: 923648 bytes from SD
    TFLite ready | input: [1,96,96,3] type=9 | arena: 186432/262144 bytes

The model runs inside classify_catch() in firmware/src/trap_cam/main.cpp:

  1. Camera switches from QVGA grayscale (motion detection) to 96×96 RGB565 (classification)
  2. RGB565 pixels are unpacked, ImageNet-normalized, and quantized to INT8
  3. TFLite Micro runs inference (~200–300ms on ESP32)
  4. Output tensor is dequantized, softmax applied, argmax selects species
  5. Camera switches back to grayscale for the next motion detection cycle
Terminal window
cd firmware
pio run -e trap_cam --target upload
pio device monitor -e trap_cam