Real-time Object Detection System

Ml Personal Project

Overview

Deep learning-based system for real-time object detection and tracking using YOLOv8 and custom-trained models.

Technologies Used

Python
PyTorch
YOLOv8
OpenCV
CUDA
TensorRT

Project Overview

This project implements a real-time object detection system capable of identifying and tracking multiple objects in video streams. The system is optimized for edge deployment and can run on resource-constrained devices.

Key Features

Real-time Performance: 30+ FPS on standard GPUs
Multi-object Tracking: Track multiple objects across frames
Custom Training Pipeline: Easy-to-use training scripts for custom datasets
Edge Optimization: TensorRT optimization for deployment on edge devices
Multiple Input Sources: Support for webcams, video files, and RTSP streams

Model Architecture

The system uses YOLOv8 as the base architecture with several optimizations:

Custom anchor boxes tuned for specific use cases
Modified backbone for better feature extraction
Post-processing optimizations for faster inference

Training Process

Dataset Preparation

Collected and annotated 10,000+ images
Implemented data augmentation pipeline (rotation, scaling, color jittering)
Split dataset: 80% training, 10% validation, 10% testing

Model Training

Transfer learning from pre-trained YOLOv8 weights
Fine-tuned on custom dataset for 100 epochs
Achieved 95% mAP on validation set

Optimization Techniques

Inference Speed

Quantization to reduce model size
TensorRT optimization for NVIDIA GPUs
Batch processing for multiple frames
Asynchronous inference pipeline

Accuracy Improvements

Non-Maximum Suppression (NMS) tuning
Confidence threshold optimization
Multi-scale testing during inference

Deployment

The system can be deployed in multiple configurations:

Desktop Application: Real-time detection from webcam
Server API: REST API for batch processing
Edge Device: Optimized for Jetson Nano/Xavier

Performance Metrics

Speed: 35 FPS on RTX 3060, 15 FPS on Jetson Nano
Accuracy: 95% mAP@0.5, 78% mAP@0.5:0.95
Latency: <30ms per frame on GPU
Model Size: 25MB (quantized)

Use Cases

This system has been successfully applied to:

Retail analytics (customer counting, behavior analysis)
Security surveillance (intrusion detection)
Industrial inspection (defect detection)
Traffic monitoring (vehicle counting)

Challenges Overcome

Occlusion Handling: Implemented robust tracking to handle occluded objects
Lighting Variations: Data augmentation and normalization improved robustness
Real-time Constraints: Optimized inference pipeline for low latency

Future Work

Integration with transformer-based architectures
Support for 3D object detection
Federated learning for privacy-preserving training
Mobile deployment (iOS/Android)