A Comparative Study of Andean Languages


Research Questions, Methods and Data




Research Questions:  Looking Into the Pre-History of the Andes

Our Methodological Approach:  Measuring Language Similarity

Which Data?  Lexis, Phonetics and Morphosyntax

Methods:  Producing and Analysing Measures of Similarity

References and Bibliography


Back to Homepage



Back to Contents


Research Questions:  Looking Into the Pre-History of the Andes

Our study involves two different types of comparison, addressing two different types of research question. 


Firstly, we aim to provide additional data for questions of the internal classification of each of the main two language families of the Andes, Quechua and Aymara.  This involves making comparisons between language varieties which we are already sure share a common origin, so we have two questions to address here:

   a comparison of all the varieties of Quechua amongst themselves;  

   a separate comparison between all varieties of Aymara amongst themselves (i.e. including Jaqaru/Kawki).


Secondly, we aim to look into what quantified comparative data tell us that might help elucidate the thorny issue of the nature of the relationship between the two language families of Quechua and Aymara:  the famous Quechumara question of whether they are ultimately related ‘genealogically’, or whether the striking parallels between them go back only to convergence through intense and prolonged contact throughout their histories.  This involves making comparisons between language varieties that we are not sure share a common origin in the first place, comparisons of any variety of Quechua against any variety of Aymara.


While these issues are essentially linguistic issues, since we have no written history of the Andes before the Spanish conquest in the 1530s, these language data are of particular importance much more widely beyond linguistics.  A better understanding of the relationships within and between the Quechua and Aymara language families are potentially of great value in helping reconstruct the pre-history of the peoples of the Andes.  In our articles we specifically address issues of interest outside linguistics, in archaeology and genetics:

   What are the most plausible ranges of dates for when each family first began to break up?

   In what stages through history did they expand to reach the regions where they are now spoken?

   Where are the most plausible locations for the original Quechua and Aymara ‘homelands’?



Back to Contents


Our Methodological Approach:  Measuring Language Similarity


This comparative study of the Andean languages is part of a larger research project Quantitative Methods in Language Classification, and as its title suggests, our approach to looking into such issues of classification is by means of seeking to measure the similarity and relationships between languages.  That is, we seek to quantify, just how similar or how different various language varieties are relative to each other. 

The Quechua and Aymara families together present a continuum of degrees of difference/similarity between language varieties, from certain only minimally different regional ‘accents’ of Quechua, to entirely different languages which may not even be related to each other.  We produce quantified comparisons over various spans of this continuum – that is, comparisons between pairs of language varieties showing all the various possible degrees of difference.  Naturally this is a gradual scale of degree of difference, though in the familiar terminology one might talk of the following four distinct levels, i.e. we make comparisons between:

   Accents’, or regional variation within the same ‘dialect’:  for example between the Quechua spoken in the Cochabamba region and that of Sucre, all forms of the Bolivian, or indeed the wider Cuzco-Collao ‘dialect’.

   Varieties which while markedly different, are generally still considered ‘dialects’ belonging to the ‘same language’:  for example Ayacucho, Bolivian, and the even more different Ecuadoran Quechua.

   Different languages, though clearly genealogically (‘genetically’) related ones, i.e. from the same family (Quechua or Aymara):  for example the mutually unintelligible Quechua of Cuzco, and that of Huancayo.

   Quite different languages, where it is not yet clear whether they share an ultimate common origin within a single family or not, or show similarities only due to prolonged and deep contact:  Aymara and Quechua.


Back to Contents


Which Data?  Lexis, Phonetics and Morphosyntax

Data have been collected in order to make a detailed comparison of these varieties in two aspects: 

   In their basic lexicon, based on a list of 150 word-meanings adapted to the cultural and linguistic context of the Andes (and to a certain extent also Amazonia):  click here to see our meaning list.  We have selected our list of meanings with various criteria in mind:  there is considerable overlap with similar lists already well known and used in studies on various language families around the world, including 100 and 200‑word lists first drawn up by Swadesh (1952), and the modified 200‑meaning version drawn up by Dyen, Kruskal and Black (1992).  The meanings also include many which have been identified by Lohr (1999) and Yakhontov (as reported in Starostin (1991:59-60) as those that appear to be generally resistant to being borrowed from one language to another, as well as many other meanings known on the contrary to be more susceptible.  We use particular characteristics of our lists to focus on the issue of possible cases of word borrowing, and how this might be identified by specific techniques, including statistical ones developed initially in genetics, for processing and analysing the comparative data.


   In their phonetics, based on the pronunciation of a sample list of some 100 ‘pan-Quechua’ cognates (and a different 100 ‘pan-Aymara’ ones), many of which overlap with cognates found for the 150 meanings in the lexical comparison. 


A further possibility for which we have also developed a method, though not yet collected data for the Andean languages, is to measure their similarity in certain aspects of their basic inflectional morphosyntax, which in these highly agglutinating languages principally means their morphology. 


Back to Contents


Methods:  Producing and Analysing Measures of Similarity

All the methods we use to produce quantifications of similarity in these three fields of language are basic morphosyntax are set out in full in the book Measured Language Heggarty (in preparation), to be published by Blackwell in late 2005.  (This is a full revision and expansion based on Heggarty’s Ph.D. thesis:  click on these links for either a brief abstract or a fuller description). 

The method for similarity in lexis, and its specific application to the Andean languages using the data in our study, are due to appear in January 2005 in Heggarty (forthcoming).

Details on the method we use to produce quantifications of phonetic similarity, and examples of the results it produces for Romance varieties and a set of Indo-European languages, have already been published in Heggarty (2000) and will also appear in McMahon, Heggarty, McMahon & Slaska (in press).


Having produced our quantifications of language similarity, stage two in our research approach is to process these figures using various ‘family tree‑drawing’ programmes, initially devised for similar uses in biology, particularly genetics.  These include Phylip by Felsenstein (2001), Network by Bandelt et al. (1995), and especially the very recent NeighbourNet by Bryant & Moulton (2002).  These are explained in McMahon & McMahon (in preparation).

Our first publications specifically on our results for the Andean languages, starting with the lexical data, will appear in January 2005 in Heggarty (forthcoming).  This shows how we make use of these combined techniques to bring new insights to the analysis of linguistic data in problematic cases such as those of the Andes.

In the meantime, we have a full list of the papers already published by our research group which can be found by clicking here.


Back to Contents


References and Bibliography

Any work cited on these webpages that forms part of our main online bibliography for the Andean languages appears as a clickable link that takes you to the full bibliographical entry for it on our bibliography webpage.  (We plan to replace this system later with a frames version so that the entry appears in a window on the page you clicked from.)  The references given below are for other more general linguistics works we cite that are not in our Andean bibliography.



Bandelt, H-J. & P. Forster, B. C. Sykes & M. B. Richards (1995)  Mitochondrial portraits of human populations using median networks
 in: Genetics - 141: 743-753


Bryant, David & V. Moulton (2002)  
NeighborNet:  an agglomerative method for the construction of planar phylogenetic networks
Proceedings of the Workshop in Algorithms for Bioinformatics
programme can be downloaded data available at: 


Dyen, Isidore & Joseph B. Kruskal & Paul Black (1992)  An Indoeuropean classification: a lexicostatistical experiment 
in: Transactions of the American Philosophical Society - 82

data available at: www.ldc.upenn.edu


Embleton, Sheila M., (1986) 
  Brockmeyer: Bochum


Felsenstein, J. (2001)  PHYLIP: Phylogeny Inference Package. Version 3.6
 Department of Genetics, University of Washington: 


Forster, Peter & Alfred Toth (2003)  Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European
 in: Proceedings of the National Academy of Sciences - 100:15: 9079 9084 


Heggarty, P.A. (2000)  Quantifying Change Over Time in Phonetics
 in: Renfrew, C. & McMahon A. Trask L, (Eds): Time-Depth in Historical Linguistics - 2: 531-562
 MacDonald Institute for Archaeological Research: Cambridge


Heggarty, Paul A.  (forthcoming) 
Enigmas en los orígenes de los idiomas andinos:  aplicando nuevos métodos a las preguntas aún no resueltas
Revista Andina, 40
Cuzco, Peru:  Centro Bartolomé de las Casas


Heggarty, Paul A.  (in preparation) 
Measured Language:  From First Principles to New Techniques for Putting Numbers on Language Similarity
Oxford:  Blackwell


McMahon, April & McMahon (in preparation)  Language Classification by Numbers
Oxford:  Oxford University Press


McMahon, April, Paul Heggarty, Robert McMahon & Natalia Slaska (2005) 
Swadesh sublists and the benefits of borrowing:  an Andean case study
in:  McMahon, April (ed.):  Quantitative Methods in Language Comparison
Transactions of the Philological Society
, 103.2   Oxford:  Blackwell


Lohr, Marisa (1999)  Methods for the Genetic Classification of Languages
 in: Unpublished PhD thesis, University of Cambridge


Starostin, Sergei A. (1991)   Altaiskaia problema i proiskhozhdenie iaponskogo iazyka
  Nauka, Glavnaia redaktsiia vostochnoi literatury:  Moscow


Swadesh, Morris (1952)  Lexico-statistical dating of prehistoric ethnic contacts: With special reference to North American Indians and Eskimos.
 in: Proceedings of the American Philosophical Society - 96: 452-463 



Back to Contents

Back to Homepage