Abstract
The DiGreC (DIachrony of GREek Case) treebank is a corpus of selected sentences from Greek texts, ranging from Homer to Modern Greek, which have been annotated morphosyntactically and semantically. The corpus comprises excerpts from 655 texts, for a total of 3385 sentences and 56,440 word tokens; automated tagging and lemmatisation has been supplemented with manual review to ensure accuracy. The data exist in xml and csv formats, which can be manipulated and converted automatically to other schemata. A web site has also been created to allow users to interact with the data more easily, and to provide specialised functionality for searching and visualisation. This corpus was created to inform theoretical debates regarding the role of case in grammar, and may be of use to researchers searching for specific attestations of a range of different constructions in Greek.
Original language | English |
---|---|
Pages (from-to) | 1-12 |
Number of pages | 12 |
Journal | Research Data Journal for the Humanities and Social Sciences |
Volume | 6 |
Issue number | 1 |
Early online date | 6 Dec 2021 |
DOIs | |
Publication status | Published online - 6 Dec 2021 |
Bibliographical note
Funding Information:The funding for this work was provided by the Arts & Humanities Research Council, grant AH/P006612/1.
Funding Information:
The DiGreC (DIachrony of GREek Case) treebank has been created as part of the project “Investigating Variation and Change: Case in Diachrony”, funded by the Arts & Humanities Research Council ( ah/p 006612/1). The goal of this project has been to use the Greek language, which furnishes a large quantity of linguistic data over an unusually long span of time, to investigate syntactic phenomena, and to provide a clearer picture of the Greek case system and its changes over time, which has the potential to inform theoretical discussions on the nature of linguistic case. We have chosen to make the data used in this project available to the public in the form of a morphosyntactically and semantically annotated treebank. This article describes the features of this treebank, as well as the data selection principles and methodology involved in its construction.
Publisher Copyright:
© Morgan Macleod et al., 2021.
Keywords
- Greek
- Classics
- Semantics
- Linguistics
- Syntax
- Corpus
Fingerprint
Dive into the research topics of 'The DiGreC Treebank'. Together they form a unique fingerprint.Datasets
-
DiGreC (Diachrony of Greek Case) treebank
Anagnostopoulou, E. (Creator), Macleod, M. (Creator), Mertyris, D. (Creator) & Sevdali, C. (Creator), Ulster University, 16 Nov 2020
DOI: 10.21251/59fd3210-83fe-4d1c-8d18-f2cd1168ccd6, http://cid.ulster.ac.uk
Dataset
File