Digitale Sprachwissenschaft

Kontakt:
    <korpuslinguistik@ids-...>
 
Leitung:
    Dr. Marc Kupietz <kupietz@ids-...>
 
Wissenschaftliche Mitarbeiter:
    Cyril Belica <belica@ids-...>
    Dr. Harald Lüngen <luengen@ids-...>
    Rainer Perkuhn <perkuhn@ids-...>
 
Kooperationen:
    siehe hier
 
Ehemalige am Korpusaufbau beteiligte Mitarbeiter des IDS:
    siehe hier
 
Studentische Hilfskräfte:

  • Theresa Sick
  • Daniel Wachter

Corpora of Written Language

Protocols of Plenary Sessions

 

This collection of corpora has been published since 2013 and contains plenary protocols of all German parliaments, i. e. of the German Bundestag, the Federal Council of Germany and of all German state parliaments from (at least) 2000 until mid 2012.

Germany: The corpus “protocols of plenary sessions” of German parliaments was developed from  2011-2012 in the project PolMine - Corpus-based Political Research (Prof. Dr. Andreas Blätte and Silvia Berenz M.A.) at the Juniorprofessur für Politikwissenschaft der Stiftung Zukunft NRW of the University of Duisburg-Essen. Within the framework of PolMine, the protocols were acquired as PDF-documents and have been converted with the help of a conversion programme into the PolMine-XML-format.

In the IDS-project Corpus Development, this corpus has been converted into the IDS-text model, where each parliament equals one DeReKo-corpus, each parliamentary term equals one document and each protocol equals one text.

Due to the fully automatic extraction and conversion from PDF, the quality of the texts regarding word recognition and structure recognition varies, although most of the texts are of very good quality. The PolMine documentation on quality control provides information about the average quality value for each parliament and parliamentary term.

DeReKo

corpus sigle

parliament

since parlia- mentary term

since date

until date

number of texts (protocols)

number of current word forms

source:Project PolMine to Data and Analyses; as of 2013-02-02

pbt

German Bundestag

14

26.10.1998

about mid 2012

872

51 139 236

pbr

Federal Council of Germany

--

04.02.2000

about mid 2012

155

3 352 274

pbw

state parliament of Baden-Württemberg

12

11.06.1996

about mid 2012

378

18 730 308

pby

Bavarian state parliament

14

28.09.1998

about mid 2012

359

15 452 256

pbe

Berlin House of Representatives

14

18.11.1999

about mid 2012

228

12 433 700

pbb

state parliament of Brandenburg

3

29.09.1999

about mid 2012

254

11 826 395

phb

Bremische Bürgerschaft

15

07.07.1999

about mid 2012

264

11 549 459

phh

Hamburgische Bürgerschaft

16

08.10.1997

about mid 2012

363

13 532 044

phe

state parliament of Hesse

15

07.04.1999

about mid 2012

413

19 491 715

pmv

state parliament of Mecklenburg-Vorpommern

3

26.10.1998

about mid 2012

317

16 345 331

pni

state parliament of Lower Saxony

14

09.04.1998

about mid 2012

370

21 798 168

pnw

state parliament of North Rhine-Westphalia

12

01.06.1995

about mid 2012

486

25 443 901

prp

state parliament of  Rhineland-Palatinate

13

20.05.1996

about mid 2012

383

14 260 320

psl

state parliament of the Saarland

12

29.09.1999

about mid 2012

172

7 878 814

psn

state parliament of Saxony

3

13.10.1999

about mid 2012

318

17 920 528

pst

state parliament of Saxony--Anhalt

3

25.05.1998

about mid 2012

267

10 683 645

psh

state parliament of Schleswig-Holstein

14

23.04.1996

about mid 2012

458

19 329 586

pth

state parliament of Thuringia

3

01.10.1999

about mid 2012

322

18 190 222

parliamentary term: 65

amount of texts: 6422

amount of word forms: 309 357 902

 

Austria: From the DeReKo-2014-I release, protocols of the state parliament of Lower Austria are contained as well (editing: IDS)

DeReKo corpus

sigle

Parliament

since parliamentary term

since date

until date

number of texts

(protocols)

number of current word forms

pno

state parliament of Lower Austria

14

7.6.1993

2013 (inclusive)

220

not contained:

sessions 2,3,5-9,13-15

of the 14th parliamentary term;

special sessions

12 786 782

parliamentary term: 5

amount of texts: 220

amount of word forms 12 786 782