Enigmas in the Origins of the Andean Languages: 
 Applying New Techniques to the Unanswered Questions




This summary is intended for specialists in (historical) linguistics.  Readers with a more general interest in the Andean languages, particularly their history and that of their speakers, should click here for a different summary more oriented to those interests.


*   *   *   *   *


This article outlines a new method for measuring and comparing how similar languages are in their lexical semantics, developed over the course of a three-year multidisciplinary research project into Quantitative Methods in Language Classification.  As a test case, we apply this method to our own extensive new data set on twenty varieties of Aymara and Quechua, to investigate fundamental and still unresolved issues in the historical linguistics of these, the two main surviving language families of the Andes.

We approach these questions also with the aid of phylogenetic analysis programmes drawn from the biological sciences.  Recent years have seen growing interest in how such analyses might valuably be applied to investigate the family trees not just of species and populations, but also of languages.  Much of the work in this new synthesis between genetics and historical linguistics has been undermined, however, by serious reservations about the ‘encodings’ used to convert language data into a format suitable as input to phylogenetic analyses, questioning whether they can really be considered meaningful measurements and representations of the relationships between real languages. 

Hence the need for a radical departure from the traditional method most widely used for this encoding, lexicostatistics.  We put forward a new method specifically designed to go beyond the many idealisations inherent in the all-or-nothing approach of lexicostatistics (not least its insistence on ‘one meaning one lexeme’) in order to model relationships in lexical semantics to a greater level of detail and sensitivity.  Furthermore, we attach particular importance to a thorny but crucial issue in lexical comparison, one that has all too often been brushed aside:  we propose an explicit and novel approach to the question of how to distinguish cognacy from borrowing as alternative explanations for correlations observed between languages. 

We obtain a matrix of measurements of how similar all of the languages covered are to each other.  This matrix contains within it what are often complex signals of the relationships between those languages.  We therefore use the latest phylogenetic analysis programmes, particularly NeighborNet, to synthesise those signals and to represent them graphically, in order to help us interpret what our quantifications of similarity really mean for the key questions about the history and divergence of the languages concerned.

All these new tools can be expected to be of broad methodological interest to historical and comparative linguists, and to illustrate their potential we present a case study in which we apply them to a number of issues of precisely the types most commonly disputed in research into the history and classification of language families.  Together, the Aymara and Quechua families provide an ideal test-bed:  specialists have still not come to a consensus even on whether the two are ultimately related, nor on equally fundamental questions about the internal classification of the Quechua family.

Now that new methods are available, the time is ripe for recruiting them to contribute to a definitive resolution to these questions.  However, our more sensitive quantification method requires data sets in which comparisons in lexical semantics are taken to a level of detail beyond existing databases and dictionary resources for the Andean languages.  So we present also our own major new comparative database, collected in fieldwork in Ecuador, Peru and Bolivia, and downloadable from our website.  

Our results could hardly be of greater import for the historical linguistics of the Andean languages, in that they constitute powerful new data on the two most fundamental outstanding issues, strongly in favour of one particular resolution to each.  On the long‑running ‘Quechumara’ debate, our results clearly back the now majority stance that the two language families do not demonstrably stem from a common source, thanks to our novel approach to teasing apart the signals of common origin and intense contact that have for so long muddied the waters. 

As for the internal structure of the Quechua family, recent work has convincingly challenged the mostly morphological and phonological criteria on which the traditional classification of Quechua dialects has long been based.  Our new lexical data fully support that challenge, and indeed take it further still, by offering strong evidence that even the putative highest-order split between Quechua I (Central) and II (North/South) branches is a misleading idealisation.  We illustrate, then, how the results from our new analysis methods can in some scenarios emerge as clearly incompatible with a discrete two-way branching at a given stage in the history of a family, and argue on the contrary for a dialect continuum (in this case, the early history of Quechua).

The Andean languages allow us to illustrate also how our methods can offer new data and insights to help resolve other, more specific conundrums in historical linguistics, of a range of different types.  We close, for instance, with a discussion of what can and cannot reliably be inferred, from the methods we propose, on the vexed question of dating language splits. 




click on the links below to go back to our webpages on: 


A Comparative Study of Andean Languages

Quechua Language