A Comparative Study of Andean Languages


Research Questions, Methods and Data




Research Questions:  Looking Into the Pre-History of the Andes

Our Methodological Approach:  Measuring Language Similarity

Which Data?  Lexis, Phonetics and Morphosyntax

Methods:  Producing and Analysing Measures of Similarity

References and Bibliography


Our study involves two different types of comparison, addressing two different types of research question. 


Firstly, we aim to provide additional data for questions of the internal classification of each of the main two language families of the Andes, Quechua and Aymara.  This involves making comparisons between language varieties which we are already sure share a common origin, so we have two questions to address here:

   a comparison of all the varieties of Quechua amongst themselves;  

   a separate comparison between all varieties of Aymara amongst themselves (i.e. including Jaqaru/Kawki).


Secondly, we aim to look into what quantified comparative data tell us that might help elucidate the thorny issue of the nature of the relationship between the two language families of Quechua and Aymara:  the famous Quechumara question of whether they are ultimately related ‘genealogically’, or whether the striking parallels between them go back only to convergence through intense and prolonged contact throughout their histories.  This involves making comparisons between language varieties that we are not sure share a common origin in the first place, comparisons of any variety of Quechua against any variety of Aymara.


While these issues are essentially linguistic issues, since we have no written history of the Andes before the Spanish conquest in the 1530s, these language data are of particular importance much more widely beyond linguistics.  A better understanding of the relationships within and between the Quechua and Aymara language families are potentially of great value in helping reconstruct the pre-history of the peoples of the Andes.  In our articles we specifically address issues of interest outside linguistics, in archaeology and genetics:

   What are the most plausible ranges of dates for when each family first began to break up?

   In what stages through history did they expand to reach the regions where they are now spoken?

   Where are the most plausible locations for the original Quechua and Aymara ‘homelands’?



Our Methodological Approach:  Measuring Language Similarity


This comparative study of the Andean languages is part of a larger research project Quantitative Methods in Language Classification, and as its title suggests, our approach to looking into such issues of classification is by means of seeking to measure the similarity and relationships between languages.  That is, we seek to quantify, just how similar or how different various language varieties are relative to each other. 

The Quechua and Aymara families together present a continuum of degrees of difference/similarity between language varieties, from certain only minimally different regional ‘accents’ of Quechua, to entirely different languages which may not even be related to each other.  We produce quantified comparisons over various spans of this continuum – that is, comparisons between pairs of language varieties showing all the various possible degrees of difference.  Naturally this is a gradual scale of degree of difference, though in the familiar terminology one might talk of the following four distinct levels, i.e. we make comparisons between:

   Accents’, or regional variation within the same ‘dialect’:  for example between the Quechua spoken in the Cochabamba region and that of Sucre, all forms of the Bolivian, or indeed the wider Cuzco-Collao ‘dialect’.

   Varieties which while markedly different, are generally still considered ‘dialects’ belonging to the ‘same language’:  for example Ayacucho, Bolivian, and the even more different Ecuadoran Quechua.

   Different languages, though clearly genealogically (‘genetically’) related ones, i.e. from the same family (Quechua or Aymara):  for example the mutually unintelligible Quechua of Cuzco, and that of Huancayo.

   Quite different languages, where it is not yet clear whether they share an ultimate common origin within a single family or not, or show similarities only due to prolonged and deep contact:  Aymara and Quechua.


Which Data?  Lexis, Phonetics and Morphosyntax

Data have been collected in order to make a detailed comparison of these varieties in two aspects: 

   In their basic lexicon, based on a list of 150 word-meanings adapted to the cultural and linguistic context of the Andes (and to a certain extent also Amazonia):  click here to see our meaning list.  We have selected our list of meanings with various criteria in mind:  there is considerable overlap with similar lists already well known and used in studies on various language families around the world, including 100 and 200‑word lists first drawn up by Swadesh (1952), and the modified 200‑meaning version drawn up by Dyen, Kruskal and Black (1992).  The meanings also include many which have been identified by Lohr (1999) and Yakhontov (as reported in Starostin (1991:59-60) as those that appear to be generally resistant to being borrowed from one language to another, as well as many other meanings known on the contrary to be more susceptible.  We use particular characteristics of our lists to focus on the issue of possible cases of word borrowing, and how this might be identified by specific techniques, including statistical ones developed initially in genetics, for processing and analysing the comparative data.


   In their phonetics, based on the pronunciation of a sample list of some 100 ‘pan-Quechua’ cognates (and a different 100 ‘pan-Aymara’ ones), many of which overlap with cognates found for the 150 meanings in the lexical comparison. 


A further possibility for which we have also developed a method, though not yet collected data for the Andean languages, is to measure their similarity in certain aspects of their basic inflectional morphosyntax, which in these highly agglutinating languages principally means their morphology. 


Methods:  Producing and Analysing Measures of Similarity

All the methods we use to produce quantifications of similarity in these three fields of language are basic morphosyntax are set out in full in the book Measured Language Heggarty (in preparation), to be published by Blackwell in late 2005.  (This is a full revision and expansion based on Heggarty’s Ph.D. thesis:  click on these links for either a brief abstract or a fuller description). 

The method for similarity in lexis, and its specific application to the Andean languages using the data in our study, are due to appear in January 2005 in Heggarty (forthcoming).

Details on the method we use to produce quantifications of phonetic similarity, and examples of the results it produces for Romance varieties and a set of Indo-European languages, have already been published in Heggarty (2000) and will also appear in McMahon, Heggarty, McMahon & Slaska (in press).


Having produced our quantifications of language similarity, stage two in our research approach is to process these figures using various ‘family tree‑drawing’ programmes, initially devised for similar uses in biology, particularly genetics.  These include Phylip by Felsenstein (2001), Network by Bandelt et al. (1995), and especially the very recent NeighbourNet by Bryant & Moulton (2002).  These are explained in McMahon & McMahon (in preparation).

Our first publications specifically on our results for the Andean languages, starting with the lexical data, will appear in January 2005 in Heggarty (forthcoming).  This shows how we make use of these combined techniques to bring new insights to the analysis of linguistic data in problematic cases such as those of the Andes.

In the meantime, we have a full list of the papers already published by our research group which can be found by clicking here.


