LitArk » Books » Children

Hanne Martine Eckhoff - Diachronic Treebanks for Historical Linguistics

Here you can read online Hanne Martine Eckhoff - Diachronic Treebanks for Historical Linguistics full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2020, publisher: John Benjamins Publishing Company, genre: Children. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Diachronic Treebanks for Historical Linguistics
Author:
Hanne Martine Eckhoff / Silvia Luraghi / Marco Passarotti
Publisher:
John Benjamins Publishing Company
Genre:
Books / Children
Year:
2020
Rating:
3 / 5
Favourites:
Add to favourites
Your mark:
- 60
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Diachronic Treebanks for Historical Linguistics: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Diachronic Treebanks for Historical Linguistics" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Hanne Martine Eckhoff: author's other books

Who wrote Diachronic Treebanks for Historical Linguistics? Find out the surname, the name of the author of the book and a list of all author's works by series.

Diachronic Treebanks for Historical Linguistics — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Diachronic Treebanks for Historical Linguistics" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Diachronic Treebanks for Historical Linguistics

Hanne Martine Eckhoff , Silvia Luraghi & Marco Passarotti University of Oxford | University of Pavia | Catholic University of the Sacred Heart, Milan

doi: 10.1075/bct.113

ISBN: (ebook)

Cataloging-in-Publication Data available from Library of Congress:
LCCN 2020029084

John Benjamins Publishing Company https://benjamins.com

Introduction The added value of diachronic treebanks for historical linguistics

Hanne Martine Eckhoff , Silvia Luraghi and Marco Passarotti University of Oxford | University of Pavia | Catholic University of the Sacred Heart, Milan

Ancient languages and digital sources

Over the last few decades, the widespread diffusion of digital technology and the growing ease of transferring information via the Internet have made an enormous amount of textual data available to scholars. The vastly increased availability of primary sources has radically changed the everyday life of scholars in the humanities, who are now able to access, query and process a wealth of empirical evidence in ways not possible before.

This development also encompasses ancient languages. The first aim in the eighties and the nineties was to digitize textual data and make them available on CD-ROM and online. Later, the need for linguistic annotation gave rise to projects aimed at building corpora enhanced with increasingly complex layers of metalinguistic information, such as part-of-speech (PoS) tagging and syntactic annotation, opening the field to precise queries for particular linguistic phenomena. We are now at a stage where several of these syntactically annotated corpora, or treebanks, have reached a mature state, providing representative selections of texts for several diachronic stages of a given language. These new resources allow for a new approach to diachronic studies of syntactic phenomena where scholars previously had to content themselves with empirical work on a much smaller scale.

This volume brings together a set of papers that report research on various diachronic matters supported by evidence from diachronic treebanks for different languages, i.e., treebanks that provide data for a language across several historical stages. We show that diachronic treebanks can provide considerable methodological advances in terms of greater transparency and better ways of exploiting frequently problematic source material, thus allowing us to shed new light on vexed topics.

What is a treebank?

In linguistics and philology, the term corpus has traditionally been used simply to denote a set of texts used to explore some linguistic phenomena. Many types of digital text resources are now also referred to as corpora. gives a much stricter definition: a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research. However, not even the strictest definitions have linguistic annotation of any kind among the criteria. Thus there is a great deal of variation in the amount of work that has gone into building and processing corpora and in the usefulness of the resource for linguists researching particular phenomena in a given corpus. A corpus may be anything from a digitized, machine-readable text collection that only allows queries for text strings, to a sophisticated, multi-layered text resource with several types of linguistic markup, queriable by a dedicated query engine. In this volume, we concern ourselves with one of the most labor-intensive corpus types of all: the treebank.

A treebank is a text corpus with exhaustive syntactic annotation, typically applied on top of lemmatization, PoS tagging and morphological annotation. Each of these annotation layers adds to the precision of queries. Lemmatization allows for queries for all word forms subsumed under a single lemma, eliminating the need to use regular expressions. Part-of-speech and morphological tags allow for queries for specific combinations of linguistic features at the word level, without having to refer to the word form. Syntactic tagging makes it possible to search for groups of words that are syntactically related, regardless of whether they are adjacent to each other or not. Since syntactic queries are mostly multi-word queries, and are typically combined with features from other layers, they can quickly become quite complex and require either a good query engine or that users master a query language. However, given such facilities, a treebank allows queries of great precision: if the annotation is good enough, it is possible to make queries almost entirely free of noise in terms of false positives and false negatives. For example, in a given language one may find all infinitives with preverbal pronominal direct objects in a single query.

Although some treebanks are annotated in accordance with the formalism of a particular syntactic framework, most strive to be relatively theory-neutral. There are two major groups of annotation schemes: phrase-structure-based schemes and dependency-based schemes. The first major treebank to be released, the Penn Treebank (19891996; initiative has developed a universal consensus-based scheme and works to convert as many treebanks as possible into that scheme.

The two main treebank styles are based on two different syntactic notions, both of which clearly have some psychological reality. Phrase-structure treebanks are based on the idea that words are organized into groups (constituents) with certain properties; for example, that an entire constituent can be substituted by a pro-word and will normally move together. Dependency treebanks, on the other hand, are based on the idea that every word in a sentence has one and only one syntactic head. As a brief illustration of the differences between these two main treebank styles, consider the two syntactic trees below. The tree in presents the dependency analysis in linear order.

Figure 1. A Penn-style phrase structure tree

Figure 2. A Prague dependency treebank tree

Figure 3. A Prague dependency treebank tree in linear order

Here we see that the phrase-structure analysis in the Penn-style tree is fairly flat, which brings the two analyses closer than they might have been if the Penn scheme had been binary-branching. The most striking difference in these examples is that the Penn analysis cannot have crossing branches, and therefore it deals with split coordination (the topic of Taylor and Pintzuks paper) with a trace (the *ICH*-1). The index of the trace is then picked up again in the CONJP-1, the second part of the coordination which is represented in its linear place in the sentence. In the dependency analysis, the fact that the coordination is split is not represented at all and can only be retrieved by combining the dependency analysis with word order information stored in a different layer (visualized in ). However, this analysis is computationally simpler, since every node in the tree corresponds to a lexical item.

Historical corpora and treebanks

Historical linguistics necessarily relies on corpora. This observation is captured by the German term Korpussprachen to refer to historical languages. Indeed, with most historical languages, all we have is a more or less extended corpus of written texts. This constitutes a limitation (one cannot check if something that is not in the corpus is not there because it is ungrammatical or through mere accident), but it also enables linguists working on these languages to base their assumptions on all attested forms. Extended corpora, even if finite, often exceed the linguists ability to check all occurrences: for this reason, the introduction of digitized corpora has been a welcome addition to historical linguistics, as indeed in research on spoken language. Parsed corpora have the further advantage of adding information at various levels of linguistic analysis through the addition of metadata. Among these, treebanks have become an increasingly useful resource for the data-driven study of linguistic structures at various levels ().

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Diachronic Treebanks for Historical Linguistics»

Look at similar books to Diachronic Treebanks for Historical Linguistics. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Hanne Bruun

Re-scheduling Television in the Digital Era

José M. González (editor)

Diachrony: Diachronic Studies of Ancient Greek Literature and Culture

Martine Hawkes

Archiving Loss

Hanne Hagtvedt Vik

Nordic Histories of Human Rights

Michael Hanne

Warring With Words: Narrative and Metaphor in Politics

Łukasz Jędrzejowski (editor)

The Linguistics of Olfaction: Typological and Diachronic Approaches to Synchronic Diversity

Bowern Claire Louise

The Routledge Handbook of Historical Linguistics

Dariya Rafiyenko

Postclassical Greek: Contemporary Approaches to Philology and Linguistics

Martine Extermann

Geriatric Oncology

Martine Fallon

My Energy Cookbook

I-Hsuan Chen

Diachronic Changes Underlying Synchronic Distribution: Scalar Inferences and Word Order

Hanne Blank

The Unapologetic Fat Girls Guide to Exercise and Other Incendiary Acts

Reviews about «Diachronic Treebanks for Historical Linguistics»

Discussion, reviews of the book Diachronic Treebanks for Historical Linguistics and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.