Monocular Depth Estimation
The MonocularDepthEstimator class estimates depth from a single RGB image using a pre-trained deep learning model (Depth Anything V2).
No Stereo Required
Unlike stereo depth estimation, monocular depth works with a single image. However, the depth values are relative (not metric) and depend on model generalization.
Class: MonocularDepthEstimator
Constructor
MonocularDepthEstimator(
model_path: str,
device: Literal['cpu', 'cuda'] = 'cpu',
downscale_factor: float = 1.0
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model_path |
str |
Required | Path to the pre-trained model directory |
device |
str |
'cpu' |
Computation device: 'cpu' or 'cuda' (GPU) |
downscale_factor |
float |
1.0 |
Scale factor for image resizing (0 < factor ≤ 1.0) |
Requirements
- PyTorch must be installed
- For
device='cuda', PyTorch CUDA version is required - Model files must be downloaded separately
Example:
model_path = "models/hub/models--depth-anything--Depth-Anything-V2-Base-hf/snapshots/b1958afc..."
estimator = MonocularDepthEstimator(
model_path=model_path,
device='cuda', # Use GPU
downscale_factor=0.5
)
Methods
estimate_depth()
Estimate relative depth from a single image.
Parameters:
| Parameter | Type | Description |
|---|---|---|
image_path |
str |
Path to the input RGB image |
Returns:
| Return Value | Type | Description |
|---|---|---|
depth_map |
np.ndarray |
Relative depth map (higher values = closer) |
Depth Values
The returned depth values are inverted for visualization purposes: - Higher values = closer objects - Lower values = farther objects
Values are relative, not metric (not in meters).
Example:
depth_map = estimator.estimate_depth(image_path='./image.png')
print(f"Depth map shape: {depth_map.shape}")
print(f"Value range: {depth_map.min():.2f} - {depth_map.max():.2f}")
visualize_depth()
Display the estimated depth map using Matplotlib.
Prerequisites
You must call estimate_depth() before calling visualize_depth(), otherwise a RuntimeError will be raised.
Example:
load_model()
Load or reload the pre-trained model.
Automatic Loading
This method is called automatically during initialization. You only need to call it manually if you want to reload the model.
warmup()
Perform a warmup inference to optimize performance.
Automatic Warmup
This method is called automatically during initialization.
Model Setup
Supported Models
The library supports Depth Anything V2 models from Hugging Face:
| Model | Size | Quality | Speed |
|---|---|---|---|
Depth-Anything-V2-Small-hf |
~98MB | Good | Fast |
Depth-Anything-V2-Base-hf |
~390MB | Better | Medium |
Depth-Anything-V2-Large-hf |
~1.4GB | Best | Slow |
Download Model
Download the model from Hugging Face Hub:
# Using git-lfs
git lfs install
git clone https://huggingface.co/depth-anything/Depth-Anything-V2-Base-hf
# Or using huggingface_hub
pip install huggingface_hub
huggingface-cli download depth-anything/Depth-Anything-V2-Base-hf
Model Directory Structure
models/hub/models--depth-anything--Depth-Anything-V2-Base-hf/
└── snapshots/
└── b1958afc87fb45a9e3746cb387596094de553ed8/
├── config.json
├── model.safetensors
└── preprocessor_config.json
Complete Example
import depthlib
import time
# Model path
model_path = "models/hub/models--depth-anything--Depth-Anything-V2-Base-hf/snapshots/b1958afc87fb45a9e3746cb387596094de553ed8"
# Initialize estimator
estimator = depthlib.MonocularDepthEstimator(
model_path=model_path,
device='cuda', # Use GPU for faster inference
downscale_factor=0.5
)
# Estimate depth
image_path = './assets/image.png'
start_time = time.time()
depth_map = estimator.estimate_depth(image_path=image_path)
latency_ms = (time.time() - start_time) * 1000
print(f"Depth estimation completed in {latency_ms:.2f} ms")
print(f"Depth map shape: {depth_map.shape}")
print(f"Value range: {depth_map.min():.2f} - {depth_map.max():.2f}")
# Visualize
estimator.visualize_depth()
Error Handling
Common Errors
PyTorch Not Installed:
# Raises ImportError
ImportError: PyTorch is not installed. Please install the cpu or cuda version of PyTorch.
CUDA Not Available:
# Raises EnvironmentError when device='cuda' but CUDA is not available
EnvironmentError: CUDA is not available. Please check if you have torch cuda version or use device='cpu'.
Model Not Found:
No Model Path:
Performance Tips
- Use GPU: Set
device='cuda'for significantly faster inference - Downscale Images: Use
downscale_factor=0.5or lower for faster processing - Batch Processing: The model performs a warmup on first run; subsequent calls are faster
Monocular vs Stereo Depth
| Feature | Monocular | Stereo |
|---|---|---|
| Input | Single image | Image pair |
| Output | Relative depth | Metric depth (meters) |
| Calibration | Not required | Required |
| Accuracy | Depends on scene | Geometric precision |
| Speed | Model-dependent | Fast (CPU-based) |
See Also
- Stereo Depth Images - Metric depth from stereo pairs
- Stereo Depth Video - Real-time video depth
- Visualization Utilities - Custom visualization options