BEV-Based Multi-Camera Detection for Sports Player Localization

Sjöholm, Hampus; Glatz, Johanna

BEV-Based Multi-Camera Detection for Sports Player Localization

Mark

Sjöholm, Hampus ^LU and Glatz, Johanna (2025) In Master’s Theses in Mathematical Sciences FMAM05 20251
Mathematics (Faculty of Engineering)

Abstract: Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.

A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of... (More); Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.

A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of segmentation networks, Mask R-CNN and YOLOv11x-seg, were tested. The player locations were found from the projected segmentation masks using a detection network in the form of a U-Net, a neural network originally used as a segmentation network, but here adapted to produce heat maps from which the player locations are extracted. Two differently sized U-Nets were tested.

Experiments evaluated the effect of varying camera count, placement, and input order. The results show that a model consisting of Mask R-CNN and a Deep U-Net outperform all other models, especially any single viewpoint structures. Detection accuracy improves significantly with additional cameras, especially when increasing from 2 to 3 cameras, thereby going from 1 to 2 view points. When shuffling the stacked images the performance decreased significantly, indicating that the model would benefit from using the camera calibration as input. Using multiple cameras in a BEV detector shows great promise in accurate player detection and with a more case-specific feature extractor the performance could likely be improved even more. (Less)
Popular Abstract (Swedish): Kan spelardetektion på en plan förbättras genom att öka antalet kameror som används för detektionen? Med utvecklingen av artificiell intelligens (AI) och maskininlärning har möjligheterna inom en rad områden ökat markant, exempelvis sportteknologi. Där används AI för att automatiskt detektera spelare på en plan med hjälp av kamerabilder. I det här arbetet undersöks hur en AI-modell kan förbättra detektionen genom att kombinera flera kameror och omvandla dem till en vy ovanifrån, kallat Bird's Eye View-projektion. Dessutom undersöks hur antalet kameror, och information om deras position, påverkar resultatet.

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9198037

author

Sjöholm, Hampus ^LU and Glatz, Johanna

supervisor

Mikael Nilsson ^LU

organization

Mathematics (Faculty of Engineering)

course

FMAM05 20251

year

2025

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

keywords

Bird’s Eye View (BEV), Player Detection, Multi-Camera, Object Segmentation, YOLO, Mask R-CNN, U-Net, Computer Vision, Machine Learning, Synthetic Data, Sports

publication/series

Master’s Theses in Mathematical Sciences

report number

2025:E39

ISSN

1404-6342

other publication id

LUTFMA-3587-2025

language

English

id

9198037

date added to LUP

2025-09-15 11:11:35

date last changed

2025-09-15 11:11:35

@misc{9198037,
  abstract     = {{Accurate detection of player positions is essential for sports analysis. Traditional single-camera methods generally suffer from occlusion and diminishing resolution far away. This work proposes using multi-camera Bird's Eye View (BEV) detection to overcome these issues. Synthetically generated data was used to train and test the detection model, which uses images from six cameras placed around a handball court and pre-trained segmentation networks as feature extractors. Player masks are generated, projected to the ground plane, stacked and used as input into a detection network to retrieve player positions.

A type of neural network called segmentation network was used to generate segmentation masks of the players. Two types of segmentation networks, Mask R-CNN and YOLOv11x-seg, were tested. The player locations were found from the projected segmentation masks using a detection network in the form of a U-Net, a neural network originally used as a segmentation network, but here adapted to produce heat maps from which the player locations are extracted. Two differently sized U-Nets were tested.

Experiments evaluated the effect of varying camera count, placement, and input order. The results show that a model consisting of Mask R-CNN and a Deep U-Net outperform all other models, especially any single viewpoint structures. Detection accuracy improves significantly with additional cameras, especially when increasing from 2 to 3 cameras, thereby going from 1 to 2 view points. When shuffling the stacked images the performance decreased significantly, indicating that the model would benefit from using the camera calibration as input. Using multiple cameras in a BEV detector shows great promise in accurate player detection and with a more case-specific feature extractor the performance could likely be improved even more.}},
  author       = {{Sjöholm, Hampus and Glatz, Johanna}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{BEV-Based Multi-Camera Detection for Sports Player Localization}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

BEV-Based Multi-Camera Detection for Sports Player Localization