Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology
Apparaît dans la collection : GDR ISIS - Transport Optimal et Apprentissage Statistique
This work is motivated by the study of local protein struc-ture, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space $\mathcal{P}(\mathbb{R}^2/\mathbb{Z}^2)$ with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the space geometry, we focus on the Wasserstein distance as a metric between distributions. We extend existing results of the theory of Optimal Transport to the d-dimensional flat torus $\mathbb{T}^d=\mathbb{R}^d/\mathbb{Z}^d$, in particular a Central Limit Theorem. Moreover, we assess different techniques for two-sample goodness-of-fit testing for the two-dimensional case, based on the Wasserstein distance. We provide an implentation of these approaches in \textsf{R}. Their performance is illustrated by numerical experiments on synthetic data and protein structure data. The full work is available at https://arxiv.org/pdf/2108.00165.pdf.