Log 1: DeepForest Approach
Research Question
How can we reliably detect tree mortality across time using aerial imagery? Can pretrained object detection models, like DeepForest, give us useful indicators of tree health or death?
Phase 1 Approach — Using DeepForest for Crown Detection
DeepForest is a state-of-the-art deep learning model trained on RGB imagery to detect individual tree crowns. It outputs bounding boxes around tree-like objects, which initially seemed promising for analyzing:
- Tree count changes over time
- Tree canopy shrinkage
- Mortality via absence or degradation of detected crowns
Thought Process
Use a pretrained model to:
- Quickly extract structured detections
- Use bounding box count or size as a proxy for forest density
- Detect tree loss trends over time without training from scratch
Methodology
- Imagery Source: NAIP (2014–2022), 512×512 RGB tiles, resolution: 1.0m → 0.6m
- Tool:
DeepForest
- Workflow:
- Load NAIP tile
- Predict bounding boxes using pretrained model
- Compare box count and coverage across years
from deepforest import main
model = main.deepforest()
model.use_release()
boxes = model.predict_image(path="naip_tile.png", return_plot=True)
Sample Outputs


Challenges Encountered
1. Spatial Resolution Mismatch
- DeepForest was trained on ~0.1m resolution. NAIP tiles at 1.0m/0.6m were too coarse.
- Crowns were blurry, often undetected or merged into clusters.
2. Bounding Boxes ≠ Tree Area
- Boxes were not calibrated to actual canopy area.
- Detection size and count varied erratically due to visual artifacts.
3. Noisy Temporal Consistency
- Detections fluctuated due to shadows, season, sun angle — not ecological change.
- Trees “disappeared” or “reappeared” randomly between years.
Summary
Metric | Result |
---|---|
Visual Interpretability | ❌ Low |
Spatial Precision | ❌ Weak |
Temporal Consistency | ❌ Unusable |
Key Takeaways
- Pretrained models are domain-constrained
- Tree detection ≠ mortality inference
- RGB-only features are too volatile over time
Transition to Next Phase
We needed a strategy focused on semantic labeling (LIVE / DEAD / BARE), not detection. This led to the 5x5 patch-based classification approach — covered next in Log 2.
Log 2: Patch-Based Classification
Research Goal
Label small regions of aerial imagery based on ecological intuition, using spatial patches instead of pixel-level or bounding box classification.
Phase 2 Approach — 5×5 Patch Labeling Using Human Intuition
Each 512×512 tile was divided into a 5×5 grid (i.e. ~25×25 pixel patches). Each patch was visually labeled based on dominant appearance: LIVE, DEAD, or BARE.
Thought Process
- Smoother than pixel classification
- Easier than labeling full tiles
- Provides enough context for human labeling (e.g., sparse vs dense canopy)
Methodology
1. Patch Generation
- Each tile → ~100+ 5×5 patches
- Recorded in
valid_patches.csv
:r329_c58_y2020.png, 329, 58, 2020, , possibly BARE or DEAD
2. Visual Labeling
- No NDVI or thresholding
- Labels were assigned via:
- Manual inspection across years
- Texture, color, and shape
- “Hint” labels updated during review
3. Model Training
- Feature extraction:
- RGB stats (mean, std)
- Optional raw pixel flattening
- Classifier:
- Random Forest (baseline)
- Small MLP (later)
Sample Outputs




Challenges Encountered
1. Labeling Conflicts (Majority Dilemma)
- Mixed-content patches (e.g., half LIVE, half DEAD)
- Label assignment was subjective → high variance
2. Spectral Inconsistency
- Same class in different years looked different (due to sensors, lighting)
- Model trained on one year couldn’t generalize to another
3. Spatial Drift
- Misalignments between years → same patch ID pointed to different physical areas
- Resolution changes (1m vs 0.6m) broke equivalence
Summary
Metric | Result |
---|---|
In-year Accuracy | ~65–70% |
Cross-year Generalization | ❌ Failed (<50%) |
Label Noise | High |
Key Takeaways
- Visual labeling ≠ repeatable at scale
- Patches blur meaningful distinctions
- Spectral models must anchor on spatial precision
- RGB features alone cannot reliably detect long-term changes
Transition to Next Phase
To resolve both spectral and spatial instability, we transitioned to pixel-level temporal modeling — tracking individual pixels across all years. That’s the focus of Log 3.
Log 3: Single Pixel Temporal Classification
Research Goal
Track individual pixels over time and use their temporal NDVI trajectories to classify them into ecological categories (LIVE, DEAD, BARE).
Phase 3 Approach — Per-Pixel Temporal NDVI Classifier
We moved away from patches and bounding boxes entirely. Each pixel became its own data point, tracked across all available years.
Thought Process
- Control exact spatial location
- Use temporal signal as primary feature (e.g., NDVI over time)
- Detect transitions like LIVE → DEAD or DEAD → BARE
- Label using human-in-the-loop hints supported by NDVI
Methodology
1. Pixel Extraction
- Spatially aligned tiles across years (2014–2022)
- Pixels indexed by
row
,col
-
Each pixel assigned time-series NDVI values:
filename,row,col,year,ndvi,label_hint r327_c59_y2020.png,327,59,2020,0.162,"possibly BARE or DEAD"
2. Labeling Strategy
- Combined:
- Visual inspection across years
- NDVI pattern (e.g., drop then flatten)
- Human-annotated hints like “likely DEAD”
3. Model Input
- Each training sample: NDVI vector across years
[0.58 (2014), 0.56 (2016), 0.44 (2018), 0.23 (2020), 0.22 (2022)] → label: DEAD
Sample Outputs




Key Strengths
- True temporal stability — fixed pixel over time
- Model can learn transitions, not just snapshot classes
- Eliminates resolution-induced mapping drift
Experimental Insights
- Temporal modeling removes much spatial noise
- NDVI change patterns (drop + flatten) correlate well with true mortality
- Model interpretability improves (you can “see” what happened)
Current Status
Metric | Result |
---|---|
Temporal NDVI consistency | ✅ Strong |
Human label quality | ✅ High confidence |
Class balance | ⚠️ Still tuning |
Next step | Expand labeled samples & train temporal classifier |