PLM2016 Thematic session: A New Era for Cross-Linguistic Databases

Conveners: Harald Hammarström (MPI for Psycholinguistics, Nijmegen) Cormac Anderson, and Paul Heggarty (MPI for the Science of Human History, Jena)

The contact person for this thematic session is Cormac Anderson (e-mail:

The diversity of the world’s c. 6,500 languages represents an abundant and irreplaceable resource for understanding the unique communication system of our species. Rather than studying just one language, such as English, by comparing many languages we are better equipped to trace the (pre)history of the populations that speak them, and to understand the processing machinery of our brains (Evans and Levinson, 2009).

This workshop focuses on the curation, dissemination and applications of massively cross-linguistic databases. Existing databases, e.g. PHOIBLE (Moran et al., 2015), ASJP (Wichmann et al., 2013), WALS (Dryer and Haspelmath, 2013), ABVD (Greenhill et al., 2008) are not without their limitations, prompting the active development of larger databases under the names GramBank and LexiBank at the MPI for SHH in Jena. The present workhsop aims to cover topics such as:
•   Data curation procedures and protocols (e.g. Forkel 2014)
•   Models and algorithms for inferring historical (areal or genealogical) relationships between languages (e.g. Muysken et al. 2015, Michael et al. 2014, Dunn 2014, Longobardi et al. 2013, Heggarty 2012, List 2014, Gray et al. 2013)
•   Models and algorithms for inferring non-historical (functional, universal) principles from linguistic data (e.g. Symonds and Blomberg 2014, Dediu and Cysouw 2013, Hammarström and O’Connor 2013, Dunn et al. 2011)
•   Visualization techniques for cross-linguistic data (e.g. McElvenny 2015, Moran and McNew 2015)
Linguistic data of any kind (sociolinguistic, lexical, phonological, grammatical) is desirable so long as there is a significant cross-linguistic coverage.


Dediu, Dan & Michael Cysouw. 2013. Some structural aspects of language are more stable than others: A comparison of seven methods. PLoS One 8e. 55009.

Dryer, Matthew S. & Martin Haspelmath. 2013. The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology (Available online at, Accessed on 2015-10-01.).

Dunn, Michael. 2014. Language phylogenies. In Claire Bowern & Bethwyn Evans (eds.), The Routledge Handbook of Historical Linguistics, 190-211. New York: Routledge.

Dunn, Michael, Simon J. Greenhill, Stephen C. Levinson & Russell D. Gray. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473. 79–82.

Evans, Nicholas & Stephen Levinson. 2009. The Myth of Language Universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32(5). 429–492.

Forkel, Robert. 2014. The Cross-Linguistic Linked Data project. In Christian Chiarcos, John Philip McCrae, Petya Osenova & Cristina Vertan (eds.), 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, 60-66. Reykjavik, Iceland: European Language Resources Association (ELRA).

Gray, Russell D., Simon J. Greenhill & Quentin D. Atkinson. 2013. Phylogenetic Models of Language Change: Three New Questions. In Peter J. Richerson & Morten H. Christiansen (eds.), Cultural Evolution: Society, Technology, Language, and Religion (Strüngmann Forum Reports 12), 285-300. Cambridge, MA: MIT Press.

Greenhill, Simon J., Robert Blust & Russell D. Gray. 2008. The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics. Evolutionary Bioinformatics 4. 271–283.

Hammarström, Harald & Loretta O’Connor. 2013. Dependency Sensitive Typological Distance. In Lars Borin & Anju Saxena (eds.), Approaches to measuring linguistic differences, 337-360. Berlin: Mouton.

Heggarty, Paul. 2012. Beyond lexicostatistics: How to get more out of ’word list’ comparisons. Diachronica 27(2). 113–137.

List, Johann-Mattis. 2014. Sequence comparison in historical linguistics. Düsseldorf: Heinrich Heine University doctoral dissertation.

Longobardi, Giuseppe, Cristina Guardiano, Giuseppina Silvestri, Alessio Boattini & Andrea Ceolin. 2013. Toward a syntactic phylogeny of modern Indo-European languages. Journal of Historical Linguistics 3(1). 122–152.

McElvenny, James. 2015. Visualization and user interface design in the World Phonotactics Database. Paper presented at the Language Comparison with Linguistic Databases (LanCLiD 2) Workshop, 30 April, 2015.

Michael, Lev, Will Chang & Tammy Stark. 2014. Exploring phonological areality in the circum-Andean region using a Naive Bayes Classifier. Language Dynamics and Change 4(1). 27–86.

Moran, Steven, Daniel McCloy & Richard Wright. 2015. PHOIBLE Online. Leipzig: Max Planck Institute for Evolutionary Anthropology (Available online at, Accessed on 2015-10-01.).

Moran, Steven & Garland McNew. 2015. Visualizing WALS data. Paper presented at the Language Comparison with Linguistic Databases (LanCLiD 2) Workshop, 30 April, 2015.

Muysken, Pieter, Harald Hammarström, Joshua Birchall, Rik van Gijn, Olga Krasnoukhova & Neele Müller. 2015. Linguistic Areas, bottom up or top down? The case of the Guaporé-Mamoré region. In Bernard Comrie & Lucía Golluscio (eds.), Language Contact and Documentation, 205-238. Berlin: DeGruyter Mouton.

Symonds, Matthew R. E. & Simon P. Blomberg. 2014. A Primer on Phylogenetic Generalised Least Squares. In László Zsolt Garamszegi (ed.), Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology, 105-130. Berlin: Springer.

Wichmann, Søren, André Müller, Annkathrin Wett, Viveka Velupillai, Julia Bishoffberger, Cecil H. Brown, Eric W. Holman, Sebastian Sauppe, Zarina Molochieva, Pamela Brown, Harald Hammarström, Oleg Belyaev, Johann-Mattis List, Dik Bakker, Dmitry Egorov, Matthias Urban, Robert Mailhammer, Agustina Carrizo, Matthew S. Dryer, Evgenia Korovina, David Beck, Helen Geyer, Pattie Epps, Anthony Grant & Pilar Valenzuela. 2013. The ASJP Database (Version 16). Available at (accessed 1 Oct 2015).