IDS-Logo
Direktion und zentrale Forschung

Kontakt:
    <korpuslinguistik@ids-...>
 
Leitung:
    Dr. Marc Kupietz <kupietz@ids-...>
 
Wissenschaftliche Mitarbeiter:
    Cyril Belica <belica@ids-...>
    Dr. Harald Lüngen <luengen@ids-...>
    Rainer Perkuhn <perkuhn@ids-...>
 
Kooperationen:
    siehe hier
 
Ehemalige am Korpusaufbau beteiligte Mitarbeiter des IDS:
    siehe hier
 
Studentische Hilfskräfte:

  • Anna Konovalova
  • Theresa Sick

 

 

Corpora of Written Language


Availability

By far the greatest part of DeReKo can be searched and analysed with COSMAS II free of charge for non-commercial purposes. Unfortunately, however, we may offer only a few sub-corpora due to copyright regulations and contractual agreements with right holders.

For more information see FAQ: "Are there terms and conditions that allow exceptions?"

With Licence Agreement

If you sign a licence agreement, the IDS is permitted to provide free access for the scientific use of the following corpora of written language:

If you are interested, please send an email to Mrs. Petra Brecht (brecht@ids-mannheim.de).

Download Server

In addition to that, the following corpora are available for download, each under CC-BY-SA-Licence:

  • Corpora of Speeches and Interviews (rei)
  • Wikipedia Corpus
  • Editing 2011 in Collaboration with the EuroGr@mm [1] Project
  • Editing 2013 and 2015 in Collaboration with Programme Area Research  Infrastruktures [2].

Year

WP-Sub-Corpus

I5

WikiXML

TreeTagger

Standoff

2011

Article

wpd11.xces.bz2

-/-

-/-

Article Discussions

wdd11.xces.bz2

2013

Article

wpd13.i5.xml.bz2

dewikixml-20130728-articles.tar.gz

wpd13.tt.xml.bz2

Article Discussions

wdd13.i5.xml.bz2

dewikixml-20130728-discussions.tar.gz

wdd13.tt.xml.bz2

Article-Sample

wpd13_sample.i5.xml.bz2

-/-

-/-

Article Discussions Sample

wdd13_sample.i5.xml.bz2

2015

Article

wpd15.i5.xml.bz2

wpd15.wikixml.tar.gz

wpd15.tt.xml.bz2

Article Discussions

wdd15.i5.xml.bz2

wdd15.wikixml.tar.gz

wdd15.tt.xml.bz2

User Discussions

wud15.i5.xml.bz2

wud15.wikixml.tar.gz

wud15.tt.xml.bz2

Article Sample

wpd15_sample.i5.xml.bz2

-/-

-/-

Article Diskussions Sample

wdd15_sample.i5.xml.bz2

User Diskussions Sample

wud15_sample.i5.xml.bz2



Article

Article Discussions

French

frwiki-20130904-articles.i5.bz2

frwiki-20130904-discussions.i5.bz2

Hungarian

huwiki-20140503-articles.i5.bz2

huwiki-20140503-discussions.i5.bz2

Norwegian

nowiki-20140512-articles.i5.bz2

nowiki-20140512-discussions.i5.bz2

Italian

itwiki-20130508-articles.i5.bz2

itwiki-20130508-discussions.i5.bz2

Polish

plwiki-20140503-articles.i5.bz2

plwiki-20140503-discussions.i5.bz2



Article

Article Discussions

User Discussions

English

enwiki-20150808-article.i5.utf8.xml.bz2

enwiki-20150808-talk.i5.utf8.xml.bz2

enwiki-20150808-user-talk.i5.utf8.xml.bz2

French

frwiki-20150808-article.i5.utf8.xml.bz2

frwiki-20150808-talk.i5.utf8.xml.bz2

frwiki-20150808-user-talk.i5.utf8.xml.bz2

Hungarian

huwiki-20150807-article.i5.utf8.xml.bz2

huwiki-20150807-talk.i5.utf8.xml.bz2

huwiki-20150807-user-talk.i5.utf8.xml.bz2

Norwegian

nowiki-20150807-article.i5.utf8.xml.bz2

nowiki-20150807-talk.i5.utf8.xml.bz2

nowiki-20150807-user-talk.i5.utf8.xml.bz2

Spanish

eswiki-20150808-article.i5.utf8.xml.bz2

eswiki-20150808-talk.i5.utf8.xml.bz2

eswiki-20150808-user-talk.i5.utf8.xml.bz2

Croatian

hrwiki-20150807-article.i5.utf8.xml.bz2

hrwiki-20150807-talk.i5.utf8.xml.bz2

hrwiki-20150807-user-talk.i5.utf8.xml.bz2

Italian

itwiki-20150808-article.i5.utf8.xml.bz2

itwiki-20150808-talk.i5.utf8.xml.bz2

itwiki-20150808-user-talk.i5.utf8.xml.bz2

Polish

plwiki-20150808-article.i5.utf8.xml.bz2

plwiki-20150808-talk.i5.utf8.xml.bz2

plwiki-20150808-user-talk.i5.utf8.xml.bz2

 

Literature

[1] Noah Bubenhofer, Stefanie Haupt, Horst Schwinn (2011): A Comparable Corpus of the Wikipedia: From Wiki Syntax to POS Tagged XML. Hamburg Working Paper in Multilingualism, 96 B

[2] Eliza Margaretha, Harald Lüngen (2014): Building linguistic corpora from Wikipedia articles and discussions. In: Journal for Language Technologie and Computational Linguistics (JLCL) 2/2014

Tools

see Overview

                               

 Sitemap     Suche     Impressum     Kontakt    Drucken