Spanish Learner Corpus Research
Current trends and future perspectives
Margarita Alonso-Ramos Universidade da Corua
doi: 10.1075/scl.78
ISBN: 978 90 272 6624 8 (ebook)
2016 John Benjamins B.V.
No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher.
John Benjamins Publishing Company https://benjamins.com
John Benjamins Publishing Company
Amsterdam/Philadelphia
Table of contents
Spanish learner corpus research: Achievements and challenges
Margarita Alonso-Ramos
What is missing in learner corpus design?
Yukio Tono
Learner Spanish on computer: The CAES Corpus de Aprendices de Espaol project
Guillermo Rojo and Ignacio Palacios
PoS-tagging a Spanish oral learner corpus: Criteria, procedure, and a sample analysis
Leonardo Campillos Llanos
The LANGSNAP longitudinal learner corpus: Design and use
Nicole Tracy-Ventura, Rosamond Mitchell and Kevin McManus
The
Aprescrilov corpus, or broadening the horizon of Spanish language learning in Flanders
Kris Buysse, Lydia Fernndez Pereda and Katrien Verbeckken
Spanish Corpus Proficiency Level Training Website and Corpus: An open-source, online resource for corpus linguistics studies
Dale Koike and Jennifer Witte
Factors that can have an impact on the processes of perceiving Spanish/L2
Ana Blanco Canales
Pragmatic principles in anaphora resolution at the syntax-discourse interface: Advanced English learners of Spanish in the CEDEL2 corpus
Cristbal Lozano
Discourse markers in CEDEL2 and SPLLOC corpora of learner Spanish: Analysis of some lexical-pragmatic failures
Nancy Vzquez Veiga
A corpus study of Spanish as a foreign language learners collocation production
Orsolya Vinzce, Marcos Garca-Salido, Ana Orol and Margarita Alonso-Ramos
Section 1 Introduction
Chapter 1 Spanish learner corpus research Achievements and challenges Margarita Alonso-Ramos
Universidade da Corua
This chapter presents a state-of-the-art overview of Spanish learner corpus research (SLCR). It starts by emphasizing the uniqueness of a monograph focusing on research dealing with learners of a language other than English. The next section is concerned with the status of Spanish as a foreign language in the world, and as a pluricentric language made up of a set of American and Spanish varieties. After that, the main features of learner corpus design and analysis in SLCR are reviewed. Besides of providing an overview of the main Spanish learner corpora, this chapter directs attention towards some of the challenges that this field of research will have to face. The last section briefly reviews the contributions to this volume.
Keywords
- Spanish learner corpora
- Spanish as a foreign language
- pluricentric Spanish
- Contrastive Interlanguage Analysis
- Computer-aided Error Analysis
- learning resources
Introduction
If it is usual to mention that learner corpus research (LCR) is a young field, as most researchers place its beginnings in the late 1980s ().
Over the last few years, publications of learner corpus research on languages other than English have proliferated. Without intending to be exhaustive, several projects should be mentioned: the Falko corpus (
The status of Spanish as a Foreign Language
According to ) reports more than 21 million Spanish learners, a figure produced by adding together the number of students of Spanish in 106 countries where Spanish is not an official language. Although the data are approximate, the Instituto Cervantes estimates that the actual demand for Spanish is at least 25% higher than the figure mentioned. The countries with the largest number of students of Spanish are the United States and Brazil, with more than 7 million and 6 million learners, respectively, whilst the number of registrations in the Instituto Cervantes centers multiplied by 13 between 1993 and 2014. Another indication of the status of Spanish as a foreign language is the increasing number of students taking the Diploma de Espaol como Lengua Extranjera (DELE, Diploma of Spanish as a Foreign Language) , an official multi-level qualification awarded by the Spanish Ministry of Education that has attracted around 60 thousand candidates in recent years.
Spanish is spoken as a first language in 21 countries in several geographic varieties. This variability can pose the problem of which variety of Spanish should be chosen as the target language (see
Learner corpus design and analysis in SLCR: Features and problems
Spanish learner corpora have the same biases as English learner corpora. Following the major categories proposed by concerning the design of learner corpora, the main features of Spanish learner corpora can be summarized in the following way:
language-related criteria:
mode: many more written than spoken or multimodal corpora
genre: letters and essays are the most common
style: narration, argumentation
topic: generally related to the learners personal knowledge, such as leisure, family, work. There is no Spanish academic learner corpus such as CALE () as of yet
task-related criteria:
data collection: many more cross-sectional than longitudinal corpora
data elicitation: most are written compositions based on a topic proposed by the teacher/researcher
use of references: most of the time this is not indicated
time limitations: most of the time none. An exception is LANGSNAP, where learners were assigned 15 minutes for the completion of the writing task.
learner-related criteria:
age: mostly young adults
motivation and attitude: voluntary participation, a positive attitude towards Spanish
learning context: Spanish as a Foreign Language in a school/university context
L1 background: English is the most common L1 of Spanish learner corpora
L2 proficiency: Common European Framework of Reference for Languages (CEFR, Council of Europe 2011) levels are usually indicated, but the proficiency level is not always well specified.
Language proficiency, which is not always optimally identified in English LCR, is also an issue in SLCR ( below).
In order to analyze the data, two methods are usually employed in SLCR, as they are in general LCR: the method that has become known as Contrastive Interlanguage Analysis (CIA, .
The CEA approach is not very frequently used in SLCR (see, however, Campillos 2014 and : 135 and 141) state, corpus data must be interpreted in order to be useful: the correct version against which a learner utterance is evaluated is simply a necessary methodological step in identifying an error. We also chose to include the target hypothesis in our annotation of collocations, even though at times it was not totally clear (for example, the erroneous collocation in the context lograr un gol could be interpreted as meter un gol score a goal (in sport) or as lograr un objetivo achieve an aim).
Next page