Skip to content

Depth Estimation Scientific Paper · In Progress

PyTorch CUDA Computer Vision Stereo Matching

GitHub Full Documentation


The Problem

Accurate depth estimation from monocular images remains a fundamental challenge in computer vision. While stereo matching systems achieve high accuracy, they require expensive dual-camera setups. Monocular depth estimation methods are more practical but often lack the precision needed for downstream tasks. The gap between monocular and stereo approaches motivates a hybrid strategy.

The Approach

This project develops a pipeline for compact monocular depth estimation combined with synthetic stereo generation to improve stereo matching performance. The system:

  1. Estimates depth from a single image using compact, efficient neural networks
  2. Synthesizes a stereo pair by forward-warping the input image using the predicted depth map
  3. Applies stereo matching on the synthetic pair to produce refined disparity maps

The approach benchmarks multiple depth estimation architectures and stereo matching algorithms, evaluating trade-offs between model size, inference speed, and accuracy across standard datasets.

Key Features

  • Multiple depth model architectures compared for accuracy vs. efficiency
  • Forward warping with inpainting for realistic synthetic stereo generation
  • Cost volume-based stereo matching on synthetic pairs
  • Cross-dataset evaluation for generalization analysis
  • Loss function ablation studies to identify optimal training objectives
  • Comprehensive evaluation metrics — AbsRel, RMSE, δ thresholds, stereo quality

Architecture

graph LR
    A[Monocular Image] --> B[Depth Estimation]
    B --> C[Depth Map]
    C --> D[Forward Warping]
    A --> D
    D --> E[Synthetic Stereo Pair]
    E --> F[Stereo Matching]
    F --> G[Refined Disparity]

Results

Detailed benchmarks, ablation studies, and cross-dataset evaluations are available in the full documentation site. Key experiments include:

  • Depth model comparison — accuracy vs. inference speed across architectures
  • Loss function ablation — identifying optimal training objectives
  • Stereo quality assessment — measuring improvement from synthetic stereo augmentation
  • Cross-dataset generalization — KITTI ↔ NYU Depth V2 transfer performance

Tech Stack

Component Technology
Framework PyTorch
GPU Acceleration CUDA
Datasets KITTI, NYU Depth V2
Experiment Management Structured configs + results tracking
Documentation Zensical