Bayesian model choice for testing the existence of language universals

Probabilités et Statistique

Salle séminaire M3-324
Robin Ryder
Université Paris Dauphine
Mercredi, 21 Février, 2018 - 10:30 - 11:30
In the past 15 years, statistical methods and models have gained traction in Historical Linguistics. In this work, we turn to a major conjecture in Linguistics dating to Greenberg (1963)'s foundational work: the existence of so-called language universals for morphosyntactic features of human languages, in particular word order (e.g. whether the adjective goes before or after the noun) and how various orders may co-evolve. We develop a statistical framework to choose between models of co-evolution or independence of features. We model syntactical change by a multidimensional diffusion along a tree, when the only observations are whether the value of the diffusion at the tree leaves is above or below a threshold. We base our inference on a sample of phylogenetic trees from several language families, obtained thanks to a model of lexical data. In this talk, I shall present applied results on the testing of language universals, as well as describe the computational challenges in estimating the Bayes factors involved and our ongoing work to validate our model and tackle model misspecifications.