Analyzing a player's technique in table tennis requires knowledge of the ball's 3D trajectory and spin. While, the spin is not directly observable in standard broadcasting videos, we show that it can be inferred from the ball's trajectory in the video. We present a novel method to infer the initial spin and 3D trajectory from the corresponding 2D trajectory in a video. Without ground truth labels for broadcast videos, we train a neural network solely on synthetic data. Due to the choice of our input data representation, physically correct synthetic training data, and using targeted augmentations, the network naturally generalizes to real data. Notably, these simple techniques are sufficient to achieve generalization. No real data at all is required for training. To the best of our knowledge, we are the first to present a method for spin and trajectory prediction in simple monocular broadcast videos, achieving an accuracy of 92% in spin classification and a 2D reprojection error of 0.19% of the image diagonal.
Our paper was accepted at the 11th International Workshop on Computer Vision in Sports (CVsports), which takes place at the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) 2025.
The links for the paper, the poster and the code will be added soon.
If you find this paper helpful, please cite it:
@article{kienzle2025, author = {Kienzle, Daniel and Sch{\"o}n, Robin and Lienhart, Rainer and Satoh, Shin'Ichi}, title = {Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer}, journal = {IEEE/CVF International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year = {2025}, }
The structure of this page is taken and modified from nvlabs.github.io/eg3d which was published under the Creative Commons CC BY-NC 4.0 license .