BEV-Based Multi-Camera Detection for Sports Player Localization
(2025) In Master’s Theses in Mathematical Sciences FMAM05 20251Mathematics (Faculty of Engineering)
- Abstract
- Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.
A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of... (More) - Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.
A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of segmentation networks, Mask R-CNN and YOLOv11x-seg, were tested. The player locations were found from the projected segmentation masks using a detection network in the form of a U-Net, a neural network originally used as a segmentation network, but here adapted to produce heat maps from which the player locations are extracted. Two differently sized U-Nets were tested.
Experiments evaluated the effect of varying camera count, placement, and input order. The results show that a model consisting of Mask R-CNN and a Deep U-Net outperform all other models, especially any single viewpoint structures. Detection accuracy improves significantly with additional cameras, especially when increasing from 2 to 3 cameras, thereby going from 1 to 2 view points. When shuffling the stacked images the performance decreased significantly, indicating that the model would benefit from using the camera calibration as input. Using multiple cameras in a BEV detector shows great promise in accurate player detection and with a more case-specific feature extractor the performance could likely be improved even more. (Less) - Popular Abstract (Swedish)
- Kan spelardetektion på en plan förbättras genom att öka antalet kameror som används för detektionen? Med utvecklingen av artificiell intelligens (AI) och maskininlärning har möjligheterna inom en rad områden ökat markant, exempelvis sportteknologi. Där används AI för att automatiskt detektera spelare på en plan med hjälp av kamerabilder. I det här arbetet undersöks hur en AI-modell kan förbättra detektionen genom att kombinera flera kameror och omvandla dem till en vy ovanifrån, kallat Bird's Eye View-projektion. Dessutom undersöks hur antalet kameror, och information om deras position, påverkar resultatet.
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9198037
- author
- Sjöholm, Hampus LU and Glatz, Johanna
- supervisor
- organization
- course
- FMAM05 20251
- year
- 2025
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Bird’s Eye View (BEV), Player Detection, Multi-Camera, Object Segmentation, YOLO, Mask R-CNN, U-Net, Computer Vision, Machine Learning, Synthetic Data, Sports
- publication/series
- Master’s Theses in Mathematical Sciences
- report number
- 2025:E39
- ISSN
- 1404-6342
- other publication id
- LUTFMA-3587-2025
- language
- English
- id
- 9198037
- date added to LUP
- 2025-09-15 11:11:35
- date last changed
- 2025-09-15 11:11:35
@misc{9198037, abstract = {{Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions. A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of segmentation networks, Mask R-CNN and YOLOv11x-seg, were tested. The player locations were found from the projected segmentation masks using a detection network in the form of a U-Net, a neural network originally used as a segmentation network, but here adapted to produce heat maps from which the player locations are extracted. Two differently sized U-Nets were tested. Experiments evaluated the effect of varying camera count, placement, and input order. The results show that a model consisting of Mask R-CNN and a Deep U-Net outperform all other models, especially any single viewpoint structures. Detection accuracy improves significantly with additional cameras, especially when increasing from 2 to 3 cameras, thereby going from 1 to 2 view points. When shuffling the stacked images the performance decreased significantly, indicating that the model would benefit from using the camera calibration as input. Using multiple cameras in a BEV detector shows great promise in accurate player detection and with a more case-specific feature extractor the performance could likely be improved even more.}}, author = {{Sjöholm, Hampus and Glatz, Johanna}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master’s Theses in Mathematical Sciences}}, title = {{BEV-Based Multi-Camera Detection for Sports Player Localization}}, year = {{2025}}, }