Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

BEV-Based Multi-Camera Detection for Sports Player Localization

Sjöholm, Hampus LU and Glatz, Johanna (2025) In Master’s Theses in Mathematical Sciences FMAM05 20251
Mathematics (Faculty of Engineering)
Abstract
Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.

A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of... (More)
Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.

A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of segmentation networks, Mask R-CNN and YOLOv11x-seg, were tested. The player locations were found from the projected segmentation masks using a detection network in the form of a U-Net, a neural network originally used as a segmentation network, but here adapted to produce heat maps from which the player locations are extracted. Two differently sized U-Nets were tested.

Experiments evaluated the effect of varying camera count, placement, and input order. The results show that a model consisting of Mask R-CNN and a Deep U-Net outperform all other models, especially any single viewpoint structures. Detection accuracy improves significantly with additional cameras, especially when increasing from 2 to 3 cameras, thereby going from 1 to 2 view points. When shuffling the stacked images the performance decreased significantly, indicating that the model would benefit from using the camera calibration as input. Using multiple cameras in a BEV detector shows great promise in accurate player detection and with a more case-specific feature extractor the performance could likely be improved even more. (Less)
Popular Abstract (Swedish)
Kan spelardetektion på en plan förbättras genom att öka antalet kameror som används för detektionen? Med utvecklingen av artificiell intelligens (AI) och maskininlärning har möjligheterna inom en rad områden ökat markant, exempelvis sportteknologi. Där används AI för att automatiskt detektera spelare på en plan med hjälp av kamerabilder. I det här arbetet undersöks hur en AI-modell kan förbättra detektionen genom att kombinera flera kameror och omvandla dem till en vy ovanifrån, kallat Bird's Eye View-projektion. Dessutom undersöks hur antalet kameror, och information om deras position, påverkar resultatet.
Please use this url to cite or link to this publication:
author
Sjöholm, Hampus LU and Glatz, Johanna
supervisor
organization
course
FMAM05 20251
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Bird’s Eye View (BEV), Player Detection, Multi-Camera, Object Segmentation, YOLO, Mask R-CNN, U-Net, Computer Vision, Machine Learning, Synthetic Data, Sports
publication/series
Master’s Theses in Mathematical Sciences
report number
2025:E39
ISSN
1404-6342
other publication id
LUTFMA-3587-2025
language
English
id
9198037
date added to LUP
2025-09-15 11:11:35
date last changed
2025-09-15 11:11:35
@misc{9198037,
  abstract     = {{Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.

A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of segmentation networks, Mask R-CNN and YOLOv11x-seg, were tested. The player locations were found from the projected segmentation masks using a detection network in the form of a U-Net, a neural network originally used as a segmentation network, but here adapted to produce heat maps from which the player locations are extracted. Two differently sized U-Nets were tested.

Experiments evaluated the effect of varying camera count, placement, and input order. The results show that a model consisting of Mask R-CNN and a Deep U-Net outperform all other models, especially any single viewpoint structures. Detection accuracy improves significantly with additional cameras, especially when increasing from 2 to 3 cameras, thereby going from 1 to 2 view points. When shuffling the stacked images the performance decreased significantly, indicating that the model would benefit from using the camera calibration as input. Using multiple cameras in a BEV detector shows great promise in accurate player detection and with a more case-specific feature extractor the performance could likely be improved even more.}},
  author       = {{Sjöholm, Hampus and Glatz, Johanna}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{BEV-Based Multi-Camera Detection for Sports Player Localization}},
  year         = {{2025}},
}