Optimise parameters
This commit is contained in:
92
README.md
Normal file
92
README.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# Rowing Stats
|
||||
|
||||
Extract workout data from photos of Concept 2 PM5 rowing machine displays using computer vision and Claude's vision API.
|
||||
|
||||
## How It Works
|
||||
|
||||
Photos go through a three-stage pipeline:
|
||||
|
||||
```
|
||||
photos/ → crop_to_screen.py → screen_classifier.py → extract_screen_data.py → rowing_results.csv
|
||||
```
|
||||
|
||||
1. **Screen Detection** (`crop_to_screen.py`) — Finds and perspective-corrects the LCD screen region using OpenCV edge detection, contour filtering, and morphological operations. Candidates are scored by `edge_density × area × rectangularity`.
|
||||
2. **Classification** (`screen_classifier.py`) — Filters out non-rowing images. Supports a rule-based feature scorer (no training needed) and a 4-layer CNN with batch norm.
|
||||
3. **Data Extraction** (`extract_screen_data.py`) — Extracts time and distance from cropped screen images using Tesseract OCR with multiple preprocessing variants (CLAHE, thresholding, scaling) and majority-vote extraction.
|
||||
|
||||
There is also `extract_rowing_data.py`, which uses Claude Haiku's vision API instead of Tesseract for data extraction. This serves as a reference/test for validating OCR accuracy but is more expensive to run due to API costs.
|
||||
|
||||
There is also an Optuna-based hyperparameter tuner (`optimize_crop.py`) for the screen detection parameters.
|
||||
|
||||
## Setup
|
||||
|
||||
### Dependencies
|
||||
|
||||
```
|
||||
pip install anthropic torch torchvision opencv-python Pillow numpy optuna
|
||||
```
|
||||
|
||||
### API Key
|
||||
|
||||
Create a `.env` file with your Anthropic API key:
|
||||
|
||||
```
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Full pipeline
|
||||
|
||||
```bash
|
||||
# 1. Crop screens from photos
|
||||
python crop_to_screen.py photos/ cropped/
|
||||
|
||||
# 2. Classify — keep only rowing displays
|
||||
python screen_classifier.py predict --dir cropped/
|
||||
|
||||
# 3. Extract workout data via Tesseract OCR
|
||||
python extract_screen_data.py --dir cropped/
|
||||
|
||||
# 3b. (Test) Extract via Claude API — more expensive, useful for validating OCR accuracy
|
||||
python extract_rowing_data.py --dir photos/
|
||||
```
|
||||
|
||||
### Individual commands
|
||||
|
||||
```bash
|
||||
# Classify a single image (feature-based or CNN)
|
||||
python screen_classifier.py predict --image path/to/img.jpg
|
||||
python screen_classifier.py predict --image path/to/img.jpg --mode cnn
|
||||
|
||||
# Extract data from a single image (Tesseract OCR)
|
||||
python extract_screen_data.py --image path/to/img.jpg
|
||||
|
||||
# Extract data from a single image (Claude API — for testing/validation)
|
||||
python extract_rowing_data.py --image path/to/img.jpg
|
||||
|
||||
# Train the CNN classifier
|
||||
python screen_classifier.py train --data-dir train/
|
||||
|
||||
# Optimize crop detection parameters
|
||||
python optimize_crop.py --n-trials 300 --photos-dir photos/
|
||||
```
|
||||
|
||||
## Training Data
|
||||
|
||||
The CNN classifier trains on labeled images in `train/`:
|
||||
|
||||
- `train/0/` — non-rowing images (negatives)
|
||||
- `train/1/` — rowing display images (positives)
|
||||
|
||||
The trained model is saved as `screen_classifier_model.pth`.
|
||||
|
||||
## Validation
|
||||
|
||||
Extracted metrics are validated against sensible bounds:
|
||||
|
||||
| Metric | Min | Max |
|
||||
| -------- | --------- | --------- |
|
||||
| Distance | 100 m | 100,000 m |
|
||||
| Time | 30 s | 2 hrs |
|
||||
| Pace | 1:20/500m | 2:30/500m |
|
||||
Reference in New Issue
Block a user