Comparative Study of Andean Language Families

This webpage presents a comparative linguistic study on the Andean languages, particularly the Quechua and Aymara language families (the latter also known as the Jaqi or Aru family – see the section below on the names of languages and their spelling). This study is part of a larger research project Quantitative Methods in Language Classification, by the department of linguistics at the University of Sheffield, U.K. – for more general information on this research project, see its website. The researcher responsible for the study on the Andean languages is Dr Paul Heggarty, author of this Quechua Language and Linguistics website.

For more details, there is plenty more information on Origins, History and Regional Variation in Quechua elsewhere on this website, just click on the link.

Back to Contents

Measures of Similarity for Accents, Dialects, Related and Unrelated Languages

The Various Degrees of Similarity/‌Difference Between The Language Varieties Covered in the Study

More specifically, we intend (among other things) to try to measure, to quantify, just how similar or how different various language varieties are relative to each other. The Quechua and Aymara families together present a continuum of degrees of similarity/‌difference between language varieties, from only minimally different regional ‘accents’ of Quechua, to entirely different languages which may not even be related to each other. This study will produce quantified comparisons over various spans of this continuum – that is, comparisons between pairs of language varieties showing all the various possible degrees of difference. Of course this is in any case a gradual scale of degree of difference, though in the familiar terminology one might therefore talk of the following four distinct levels, i.e. comparisons between:

• ‘Accents’, or regional variation within the same ‘dialect’: for example between the Quechua spoken in the Cochabamba region and those of Potosí and Uyuni, all forms of the Bolivian, or indeed the wider Cuzco-Collao ‘dialect’.

• Varieties which while markedly different, are generally still considered ‘dialects’ belonging to the ‘same language’: for example Ecuadoran, Ayacucho and Bolivian Quechua.

• Different languages, though clearly genealogically (‘genetically’) related ones, i.e. from the same family (Quechua or Aymara): for example the Quechua of Cuzco and that of Huancayo.

• Quite different languages, where it is not yet clear whether they share an ultimate common origin within a single family or not, or show similarities only due to prolonged and deep contact: for example Aymara and Quechua.

Indeed on this last and highest level, we also aim to look into what quantified comparative data might be able to tell us that might help elucidate the thorny issue of the nature of the relationship between the two families – common origin or just prolonged contact.

Two Types of Comparison

This study therefore involves two different types of comparison:

1. Between varieties for which we are sure that they share a common origin, i.e.:

(a) on the one hand, a comparison of all the varieties of Quechua amongst themselves;

(b) and on the other hand, a separate comparison between all the varieties of Aymara amongst themselves (i.e. including Jaqaru y el Kawki).

2. Between varieties for which we are not sure that they share a common origin, i.e. a comparison of any variety of Quechua against any variety of Aymara.

Back to Contents

Which Andean Languages are Covered?

For the full list of fieldwork locations for which data have already been collected, including photos of each area and of some of the speakers of these languages, click here for my index page of fieldwork locations.

The map and the ‘family tree’ below currently present only the details of the Quechua family. Further details on the Aymara family (much smaller, at least in terms of the surviving varieties for which we have evidence), including their specification on the map and ‘family tree’, will be added to this page in due course.

For Quechua some twenty varieties will be studied, from Ecuador, Peru, Bolivia and Argentina. The particular selection of varieties has been made on the basis of three criteria, i.e. in order to offer:

• coverage of all the various degrees of difference between varieties within the Quechua family (accents, dialects, closely related languages) – see above;

• coverage of all the main varieties within all the main branches of the ‘family tree’ of the language – or rather, family of closely related languages – that is Quechua. For more details, see the map and ‘family tree’ structure table below, and a brief note on how different the varieties are from each other.

• most intensive coverage of the areas considered most significant for a better understanding of the history, origins and development of the Quechua family (and its early contact with Aymara), that is in particular the areas whose varieties of Quechua are in some senses ‘intermediate’ between the two principal branches of the family: Pacaraos, Yauyos, etc.

Applying the same principles to Aymara, the study will cover:

• at least three forms of southern (or ‘Altiplano’) Aymara, one for each of its principal varieties;

• for central (or ‘Tupino’) Aymara: Jaqaru, and – to the extent that it is still possible to obtained reliable data for this all but extinct variety – Kawki.

For more information on Jaqaru and Kawki, particularly an in-depth look at the question of their endangerment and, for Jaqaru, the chances of long-term survival (Kawki is sadly already doomed), click to read the following article, in Spanish, by Dante Oliva León: Jacaru y Cauqui, al Borde del Silencio.

We also aim, if possible, to collect data for the Bolivian Andean language Uru‑Chipaya, apparently unrelated to either Quechua or Aymara.

Back to Contents

Map of Andean Language Varieties Covered in this Study

showing many of the varieties covered in the study: in the yellow boxes (provisional)

This map was put together mostly on the basis of book Lingüística Quechua, Cerrón-Palomino (1987)

for more details on the sources for it see my dialect variation page

This map will be revised and improved eventually,
when the colour scheme for dialects will also be matched up with that in the Quechua ‘family tree’ table below

Back to Contents

The Quechua ‘Family Tree’

The tree below is based on the one in the book Lingüística Quechua, alias Cerrón-Palomino (2003), which appears in turn to have been based on the first two main works on the Quechua family tree, namely Torero (1964) and Parker (1963). Both of these authors came to very similar conclusions, though apparently arrived at independently by two different linguists at around the same time.

However, it should be noted that this is not the only view of the relationships between Quechua dialects. The Ethnologue classification puts Pacaraos Quechua in the QII, not the QI group, for instance. Indeed, in his doctoral thesis, Landerman (1991) fairly convincingly calls into question even the fundamental distinction between the two main branches of the family tree, QI and QII. Once we have our own results from this comparative study, we hope to be able to contribute significantly to the debate ourselves.

Those varieties it is proposed to include in the lexical and phonetic comparisons are shown underlined.
Where more than one sub-variety is to be covered, this is indicated by the number in parentheses, e.g. [3].

For where these varieties are spoken, see the dialect map above.

Those varieties for which reliable descriptive grammars exist are shown in italics.
These are the ones I would propose to cover in the morphosyntactic comparisons.

PROTO-QUECHUA

HUAIHUASH (QI)

HUAMPUY (QII)

CENTRAL

PACARAOS

YUNGAY (QIIA)

CHINCHAY (QIIB-C)

Huailay

AP-AM-AH

Huancay

Central

Norteño

Sureño

Huailas

Alto Pativilca

Yaru

Pacaraos

Laraos

Cañaris & Incahuasi

Amazonas

Ayacucho

Conchucos

Alto Marañón

Jauja & Huanca

Lincha

Cajamarca

San Martín

Cuzco, Puno & Bolivia [3]

Alto Huallaga

Huangáscar & Topará

Apurí

Loreto

Argentina

Chocos

Ecuador [3] (Sierra y Selva)

Madeán

Colombia

Back to Contents

Which Data? How are Our Measures of Similarity Produced?

Data are being collect in order to make a detailed comparison of these varieties in three aspects:

• In their basic lexicon, based on a list of some 300 word-meanings: click here to see a preliminary version of our full meaning list. We have deliberately selected our list of meanings to cover in order for it to be as fully compatible as possible with similar lists already well known and used in studies on various language families around the world, including 100 and 200‑word lists first drawn up by Swadesh (1952), and the modified 200‑meaning version drawn up by Dyen, Kruskal and Black (1992). We are also in the process of adapting these as appropriate to the cultural and linguistic context of the Andes (and to a certain extent also Amazonia). The meanings also include many which have been identified by Lohr (1999) and Yakhontov (as reported in Starostin (1991:59-60) as those that appear to be generally resistant to being borrowed from one language to another, as well as many other meanings known to be more susceptible. Particular attention will be focused on the issue of possible cases of word borrowing, and how this might be identified by specific techniques, including statistical ones developed initially in genetics, for processing and analysing the comparative data.

• In their phonetics, based on the pronunciation of a sample list of some 100 ‘pan-Quechua’ cognates (and a different 100 ‘pan-Aymara’ ones) from among the 200 in the lexical comparison. For details on the method being used to produce quantifications of phonetic similarity, and examples of the results it produces for Romance varieties and a set of Indo-European languages, see Heggarty (2000).

• In certain aspects of their basic inflectional morphosyntax, for both nouns and verbs – which in these highly agglutinating languages principally means their morphology. Details on the method being used to produce quantifications of similarity in basic morphosyntax are contained in Paul Heggarty’s Ph.D. thesis (click on these links for either a brief abstract or a fuller description), which will be made available on this website in 2004, and later published with major revisions in 2004.

Back to Contents

Data Sources

The morphological comparison will be made for those varieties for which good descriptive grammars are available. This means at least six Peruvian varieties of Quechua, as shown in italics in the family tree table below. For full bibliographical details and short reviews of the works which are the main sources for these comparisons, click on these links (but first best click to open my bibliography file in a new window): for Junín-Huanca Quechua Cerrón-Palomino (1976a) and Weber (1989), for San Martín Coombs el al. (1976a), for Cuzco-Collao Cusihuamán (2001a), for Ancash-Huailas Parker, Gary (1976a), for Cajamarca-Cañaris Quesada C. (1976a), and for Ayacucho Soto Ruiz (1976a).

For the Aymara family this study will cover at least one variety of southern Aymara, based on Hardman et al. (1988) and Briggs (1993); and for central Aymara, the Jaqaru language, based on Hardman (1966), Hardman (1983), Hardman (2000).

The comparisons in basic lexicon and phonetics will be made for all the varieties in the study, using data collected on fieldwork trips to villages where each of the varieties is spoken, and reference to the main existing dictionaries and phonological descriptions for the varieties (where available).

Our definitive data and final results of our comparisons and quantifications will be posted on this website around September 2004.

Back to Contents

Data Processing and Analysis

Once the data have been collected, during early 2004 we will be producting quantifications of similarity between the language varieties covered in each of the fields mentioned here, using the techniques set out in Heggarty (2000). We shall then ‘process’ these data using various ‘family tree‑drawing’ algorithms initially devised for similar uses in biology, particularly genetics, such as Network by Bandelt et al. (1995) and Phylip by Felsenstein (2001). For more details on how we make use of these techniques and what they can bring to analysis of linguistic data in problematic cases such as those of the Andes, a full list of our research group’s articles, oral papers and their abstracts can be found by clicking here.

Back to Contents

Project Timetable and Progress

In early March 2004 I returned from my main period of fieldwork in the Andes to collect all my phonetic and lexical data. During March and April 2004 I will be processing and analysing all those data, to see what can be learned from them. I hope they will be able to contribute to the debate on these significant questions in Andean comparative and historical linguistics:

• whether the Quechua and Aymara families are or not ultimately related language families

• the classification of Quechua dialects, and Aymara dialects

• the most plausible range of dates for their initial separation of the Quechua language family, and the Aymara language family

• the most plausible location of the original Quechua ‘homeland’

During April and May 2004 I will then write up my results and conclusions in a major article due to be published by September 2004. I will also be making available as much of my data as possible on this website, as and when it is processed and I have time to present it appropriately on webpages. This too should be completed in time for publication of my article in September 2004.

Eventually further sections will be added to this webpage including a discussion of previous estimates and quantifications of the degree of diversity within the Quechua and Aymara families, and of the proposed pan-Andean orthography which will be used as the common reference orthography for the data to be posted on this site.

Back to Contents

Language Names

On these pages the principle is followed that each language is written in the form most appropriate to the orthography of the language of the text in which they are mentioned. That is, on the Spanish version of this page the spellings used are quechua, aimara, jacaru and cauqui, even though in their respective languages the spelling proper to that language gives: qhichwa, aymara, jaqaru, kawki. The somewhat anarchic orthography of English, meanwhile, generally accepts spellings as in the original language, unless an accepted form already exists, hence the spellings used are: Quechua, Aymara, Jaqaru, Kawki.

The family termed Aymara by Rodolfo Cerrón-Palomino (or in his Spanish spelling aimara) is also known as Jaqi or Aru by other linguists. This family includes not only the language most well known by the name of Aymara or Aymará (i.e. for Cerrón‑Palomino more specifically southern or Altiplano Aymara), but also central or Tupino Aymara, that is, the language varieties Jaqaru and Kawki, spoken in a few mountain villages in the district of Tupe, province of Yauyos, in the Lima department, Peru. The terminology followed on these pages is that of Cerrón‑Palomino,

Back to Contents

References

Bandelt, H-J. & P. Forster, B. C. Sykes & M. B. Richards (1995) Mitochondrial portraits of human populations using median networks
in: Genetics - 141: 743-753

Dyen, Isidore & Joseph B. Kruskal & Paul Black (1992) An Indoeuropean classification: a lexicostatistical experiment
data available at: www.ldc.upenn.edu
in: Transactions of the American Philosophical Society - 82

Embleton, Sheila M., (1986)
Brockmeyer: Bochum

Felsenstein, J. (2001) PHYLIP: Phylogeny Inference Package. Version 3.6
Department of Genetics, University of Washington:

Forster, Peter & Alfred Toth (2003) Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European
in: Proceedings of the National Academy of Sciences - 100:15: 9079 9084

Heggarty, P.A. (2000) Quantifying Change Over Time in Phonetics
in: Renfrew, C. & McMahon A. Trask L, (Eds): Time-Depth in Historical Linguistics - 2: 531-562
MacDonald Institute for Archaeological Research: Cambridge

Lohr, Marisa (1999) Methods for the Genetic Classification of Languages
in: Unpublished PhD thesis, University of Cambridge

Starostin, Sergei A. (1991) Altaiskaia problema i proiskhozhdenie iaponskogo iazyka
Nauka, Glavnaia redaktsiia vostochnoi literatury: Moscow

Swadesh, Morris (1952) Lexico-statistical dating of prehistoric ethnic contacts: With special reference to North American Indians and Eskimos.
in: Proceedings of the American Philosophical Society - 96: 452-463

Back to Contents

Back to Homepage