Overview
This repository contains code to train a ResNet34-backbone-based U-Net model for detecting lanes using a small sample (~3k images) from BDD-Lane-Detection dataset. The codebase has been refactored and improved for better maintainability, performance, and extensibility.
Code Structure
- The input data files and trained models are saved as Kaggle Dataset. They may be downloaded and placed in the 'data' folder for reproducing the results.
- The Python files in the
src
folder contain the implementations of the model, loss, training loop, data loaders, etc., now with improved type hints, documentation, and error handling. - Jupyter notebooks call the classes and functions implemented in the source files for execution.
New Features
Configuration Management
- YAML-based configuration (
config.yaml
) for easy parameter management - Organized into data, model, training, augmentation, and inference sections
Pipeline Architecture
- End-to-end pipeline for training, evaluation, and inference
- Modular components for better code organization
- Automatic data splitting if predefined splits aren't available
Improved Training
- Better checkpointing with optimizer state
- Early stopping to prevent overfitting
- Proper learning rate scheduling
- Enhanced logging and progress tracking
Advanced Visualization
- Tools for visualizing predictions and training history
- Overlay visualization of segmentation masks
Command-line Interface
- Train, evaluate, and perform inference from the command line
- Flexible arguments for different workflows
Notebooks
Notebook | Description | Link |
---|---|---|
01-data.ipynb | Contains information on datasets, image sizes and labels | Link |
02-transform.ipynb | Experimentations with augmentations like RandomCrop and Horizontal Flips | Link |
03-model.ipynb | Trains the UNet Model | Link |
04-evaluate.ipynb | Evaluated the model performance on random images from validation set | Link |
Solution Approach
The solution involves training a U-Net based segmentation model relying on a ResNet-34 backbone. DICE + BCE is used as loss function and evaluation is done using IoU metric. The final model performance and metrics can be seen below:

The output from scoring the model looks as follows:

The detailed PDF report is available here.
Usage
Configuration
The project now uses a YAML configuration file (config.yaml
) for managing parameters:
# Example configuration
data:
path: "./data"
img_size: 720
batch_size: 8
model:
encoder_name: "resnet34"
encoder_weights: "imagenet"
training:
epochs: 50
learning_rate: 0.0001
Training
Train the model using the command-line interface:
python main.py --mode train --config config.yaml
To resume training from a checkpoint:
python main.py --mode train --config config.yaml --resume checkpoints/model.pt
Evaluation
Evaluate model performance on the validation set:
python main.py --mode evaluate --config config.yaml --checkpoint checkpoints/model.pt
Prediction
Make predictions on a single image:
python main.py --mode predict --config config.yaml --input test_image.jpg --output predictions/
Process a directory of images:
python main.py --mode predict --config config.yaml --input test_images/ --output predictions/
Serving
Web Interface
The model can be served via Gradio interface. The code for the same is in app.py
file.
Below is the screenshot of the demo. It's hosted on
Huggingface
Spaces.
python app.py

The improved Gradio interface now provides three outputs:
- Original image
- Lane mask (green overlay)
- Combined visualization
Requirements
Major dependencies include:
- torch
- torchvision
- numpy
- pandas
- opencv-python
- albumentations
- segmentation-models-pytorch
- gradio
- matplotlib
- PyYAML
See requirements.txt for the complete list with version specifications.