Skip to main content
Investigating Data Challenges in Deep Learning for Computer Vision Tasks

Investigating Data Challenges in Deep Learning for Computer Vision Tasks

Date25th Sep 2023

Time03:00 PM

Venue ESB 244

PAST EVENT

Details

Deep learning (DL) based approaches are the current state-of-the-art for most computer vision tasks. The datasets used for training are a crucial component contributing to the success of deep networks. In this presentation, we discuss challenges associated with generation of training data for two very relevant tasks in computer vision and propose approaches to address this issue.

First, we discuss ball trajectory generation and spin estimation in sports, especially targeting the practice and coaching sessions of upcoming players with limited resources. While ball detection and tracking datasets are abundant in terms of broadcast videos or those that detect a ball at close quarters, such datasets are not diverse enough in terms of viewpoints and scale and also fail to distinguish between moving and static balls which is the primary object of interest in practice sessions. Moreover, there are no datasets that provide an estimate of the 3D spin of a ball in flight . To address these issues, we propose a low-cost, easy-to-setup traditional computer vision based pipeline for ball detection and tracking, and for obtaining 3D ball trajectories and spin axis from two static unconstrained GoPro cameras. Our approach to detect and track a moving ball in 2D with the ball color as given prior can be used to automatically annotate the moving ball across different scales and viewpoints which in turn can be used to train neural networks with better generalization. Additionally, our estimated 3D spin axes can also be used as annotations to train networks to predict 3D spin. We validate our approach on real data and perform extensive experiments to establish its effectiveness.

Secondly, we discuss the problem of motion blur in the context of semantic segmentation. With increasingly compact and light-weight cameras, blur caused by motion during capture time has become unavoidable. Most DL models are focused on improving segmentation performance for clean images, and the performance drops significantly in the presence of blur. To address the scarcity of motion-blurred images for semantic segmentation, we propose leveraging the available segmentation annotations to generate space-variant motion blur dataset synthetically. We propose an augmentation strategy called Class-Centric Motion-Blur Augmentation (CCMBA), which enables the network to simultaneously learn semantic segmentation for clean images as well as images with different types of motion blur. Our approach outperforms all baselines and shows promising results on real-world data without needing any new annotations to achieve this robustness.

Speakers

Aakanksha (EE18D405)

Electrical Engineering