Real-time Object Detection System

Ml Personal Project

Overview

Deep learning-based system for real-time object detection and tracking using YOLOv8 and custom-trained models.

Technologies Used

  • Python
  • PyTorch
  • YOLOv8
  • OpenCV
  • CUDA
  • TensorRT

Project Overview

This project implements a real-time object detection system capable of identifying and tracking multiple objects in video streams. The system is optimized for edge deployment and can run on resource-constrained devices.

Key Features

  • Real-time Performance: 30+ FPS on standard GPUs
  • Multi-object Tracking: Track multiple objects across frames
  • Custom Training Pipeline: Easy-to-use training scripts for custom datasets
  • Edge Optimization: TensorRT optimization for deployment on edge devices
  • Multiple Input Sources: Support for webcams, video files, and RTSP streams

Model Architecture

The system uses YOLOv8 as the base architecture with several optimizations:

  • Custom anchor boxes tuned for specific use cases
  • Modified backbone for better feature extraction
  • Post-processing optimizations for faster inference

Training Process

Dataset Preparation

  • Collected and annotated 10,000+ images
  • Implemented data augmentation pipeline (rotation, scaling, color jittering)
  • Split dataset: 80% training, 10% validation, 10% testing

Model Training

  • Transfer learning from pre-trained YOLOv8 weights
  • Fine-tuned on custom dataset for 100 epochs
  • Achieved 95% mAP on validation set

Optimization Techniques

Inference Speed

  • Quantization to reduce model size
  • TensorRT optimization for NVIDIA GPUs
  • Batch processing for multiple frames
  • Asynchronous inference pipeline

Accuracy Improvements

  • Non-Maximum Suppression (NMS) tuning
  • Confidence threshold optimization
  • Multi-scale testing during inference

Deployment

The system can be deployed in multiple configurations:

  1. Desktop Application: Real-time detection from webcam
  2. Server API: REST API for batch processing
  3. Edge Device: Optimized for Jetson Nano/Xavier

Performance Metrics

  • Speed: 35 FPS on RTX 3060, 15 FPS on Jetson Nano
  • Accuracy: 95% mAP@0.5, 78% mAP@0.5:0.95
  • Latency: <30ms per frame on GPU
  • Model Size: 25MB (quantized)

Use Cases

This system has been successfully applied to:

  • Retail analytics (customer counting, behavior analysis)
  • Security surveillance (intrusion detection)
  • Industrial inspection (defect detection)
  • Traffic monitoring (vehicle counting)

Challenges Overcome

  • Occlusion Handling: Implemented robust tracking to handle occluded objects
  • Lighting Variations: Data augmentation and normalization improved robustness
  • Real-time Constraints: Optimized inference pipeline for low latency

Future Work

  • Integration with transformer-based architectures
  • Support for 3D object detection
  • Federated learning for privacy-preserving training
  • Mobile deployment (iOS/Android)