Files
rowing_stats/README.md
2026-03-16 15:20:59 +00:00

93 lines
3.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Rowing Stats
Extract workout data from photos of Concept 2 PM5 rowing machine displays using computer vision and Claude's vision API.
## How It Works
Photos go through a three-stage pipeline:
```
photos/ → crop_to_screen.py → screen_classifier.py → extract_screen_data.py → rowing_results.csv
```
1. **Screen Detection** (`crop_to_screen.py`) — Finds and perspective-corrects the LCD screen region using OpenCV edge detection, contour filtering, and morphological operations. Candidates are scored by `edge_density × area × rectangularity`.
2. **Classification** (`screen_classifier.py`) — Filters out non-rowing images. Supports a rule-based feature scorer (no training needed) and a 4-layer CNN with batch norm.
3. **Data Extraction** (`extract_screen_data.py`) — Extracts time and distance from cropped screen images using Tesseract OCR with multiple preprocessing variants (CLAHE, thresholding, scaling) and majority-vote extraction.
There is also `extract_rowing_data.py`, which uses Claude Haiku's vision API instead of Tesseract for data extraction. This serves as a reference/test for validating OCR accuracy but is more expensive to run due to API costs.
There is also an Optuna-based hyperparameter tuner (`optimize_crop.py`) for the screen detection parameters.
## Setup
### Dependencies
```
pip install anthropic torch torchvision opencv-python Pillow numpy optuna
```
### API Key
Create a `.env` file with your Anthropic API key:
```
ANTHROPIC_API_KEY=sk-ant-...
```
## Usage
### Full pipeline
```bash
# 1. Crop screens from photos
python crop_to_screen.py photos/ cropped/
# 2. Classify — keep only rowing displays
python screen_classifier.py predict --dir cropped/
# 3. Extract workout data via Tesseract OCR
python extract_screen_data.py --dir cropped/
# 3b. (Test) Extract via Claude API — more expensive, useful for validating OCR accuracy
python extract_rowing_data.py --dir photos/
```
### Individual commands
```bash
# Classify a single image (feature-based or CNN)
python screen_classifier.py predict --image path/to/img.jpg
python screen_classifier.py predict --image path/to/img.jpg --mode cnn
# Extract data from a single image (Tesseract OCR)
python extract_screen_data.py --image path/to/img.jpg
# Extract data from a single image (Claude API — for testing/validation)
python extract_rowing_data.py --image path/to/img.jpg
# Train the CNN classifier
python screen_classifier.py train --data-dir train/
# Optimize crop detection parameters
python optimize_crop.py --n-trials 300 --photos-dir photos/
```
## Training Data
The CNN classifier trains on labeled images in `train/`:
- `train/0/` — non-rowing images (negatives)
- `train/1/` — rowing display images (positives)
The trained model is saved as `screen_classifier_model.pth`.
## Validation
Extracted metrics are validated against sensible bounds:
| Metric | Min | Max |
| -------- | --------- | --------- |
| Distance | 100 m | 100,000 m |
| Time | 30 s | 2 hrs |
| Pace | 1:20/500m | 2:30/500m |