Gastvortrag von Robbie Love (Lancaster University, UK)

Hiermit lade ich alle interessierten Mitarbeiterinnen, Mitarbeiter und die Gäste des Instituts für Deutsche Sprache ein. Robbie Love, Lancaster University, UK, hält einen Vortrag zu: The Spoken BNC2014: project overview and methodological issues.
Zeit: Donnerstag, 3.12.2015, um 10:15 Uhr
Ort: Institut für deutsche Sprache, Mannheim – Raum 128 The Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and Cambridge University Press (CUP) are collaborating on a new corpus of spoken British English, known as the Spoken British National Corpus 2014 (Spoken BNC2014). This will be the first publicly-accessible corpus of its kind since the spoken component of the original British National Corpus (Leech 1993) (henceforth Spoken BNC1994). In this talk I will describe the methodology employed to collect recordings from project participants, and outline the success and difficulties we have experienced in using this method since the project began. It should be noted that the Spoken BNC1994 consisted of two parts – demographic recordings of spontaneous natural conversations and context-governed recordings made at specific types of meetings and events, (see e.g. Aston & Burnard, 1997:31, for a further discussion of these). The data collection and methodology outlined here relate only to the collection of demographic data. Following this, I will discuss methodological issues relating mainly to the transcription of the corpus recordings. The first is the development of a new transcription scheme: we could in theory have reused, unedited, the same scheme as Spoken BNC1994 (Crowdy 1994); however, we argue that this scheme is insufficiently detailed to minimize ambiguity in transcription, as too much is left to the discretion of individual transcribers. The second issue is a new one: speaker identification (the accuracy with which the transcriber identifies the speaker who produced each transcribed utterance, as opposed to the actual linguistic content of the utterance). I will briefly summarise the findings of an investigation into this topic and their potential ramifications for spoken corpora in general.

References:

Aston, G. & Burnard, L. 1997. The BNC handbook. Exploring the BNC with SARA. Edinburgh: Edinburgh University Press.

Crowdy, S. (1994). Spoken Corpus Transcription. Literary and Linguistic Computing, 9(1), 25-28.

Leech, G. (1993). 100 million words of English. English Today, 9-15. doi:10.1017/S0266078400006854