Rik van Gijn, Barbara Sonnenhauser, Anastasia Escher, Robert Weibel, Pieter Muysken and Balthasar Bickel introduces a novel algorithm for Bayesian clustering in the presence of confounding effects, and shows how it can be used to trace areas of possible spatial contact between different languages. https://doi.org/10.1098/rsif.2020.1031
When speakers of different languages interact, they are likely to influence each other: contact leaves traces in the linguistic record, which in turn can reveal geographical areas of past human interaction and migration. However, other factors may contribute to similarities between languages. Inheritance from a shared ancestral language and universal preference for a linguistic property may both overshadow contact signals. How can we find geographical contact areas in language data, while accounting for the confounding effects of inheritance and universal preference? We present sBayes, an algorithm for Bayesian clustering in the presence of confounding effects. The algorithm learns which similarities are better explained by confounders, and which are due to contact effects. Contact areas are free to take any shape or size, but an explicit geographical prior ensures their spatial coherence. We test sBayes on simulated data and apply it in two case studies to reveal language contact in South America and the Balkans. Our results are supported by findings from previous studies. While we focus on detecting language contact, the method can also be used to uncover other traces of shared history in cultural evolution, and more generally, to reveal latent spatial clusters in the presence of confounders.