Ehemalige am Korpusaufbau beteiligte Mitarbeiter des IDS: siehe hier
Corpora of Written Language
Current Corpus Archive
Size and Extent
The IDS has started the construction of electronic text corpora in the mid sixties. The size of the corpora has increased from about 28 million text words in 1992 to 28 billion text words in 2015 (this is equivalent to about 70 million book pages, if an average of 400 words per page is assumed). Many staff members have participated in creating the largest collection of its kind worldwide. The corpus archive is being extended continually and existing corpus material is being edited in terms of quality management in an ongoing process. The results of these works are published regularly through the COSMAS II project (see Release-Chronicle).
Geographic Origin of the DeReKo Newspaper Sources
Unfortunately, a small part of the archived corpora is not accessible from outside the IDS for copyright and licensing reasons. Over the last years, this part could be reduced to under 5%. In general, the IDS corpora may be used for scientific, non-commercial purposes only. For more details about the options available for the use of the IDS corpora see: Information regarding the availability