Startseite : : Organisationsstruktur : : Direktion : : Korpuslinguistik Corpus Linguistics : : Projects : : Methods of Analysis : : Corpus Based Lemma and Word Form Lists
Grund- und WortformenlistenCorpus Based Lemma and Word Form ListsGrund- und WortformenlistenGrund- und WortformenlistenGrund- und WortformenlistenGrund- und Wortformenlisten
Direktion und zentrale Forschung

Head of Project:
    Cyril Belica <belica@ids-...>
Scientific Assistants:
    Dr. Marc Kupietz <kupietz@ids-...>
    Dr. Harald Lüngen <luengen@ids-...>
    Rainer Perkuhn <perkuhn@ids-...>
Student Assistants:
    Anna Schächtele

DeReWo – Corpus-Based Lemma and Word Form Lists

In this subproject we are developing methods to create frequency-based ranking lists of lemmata and word forms on the basis of random virtual corpora. By applying these methods to the Mannheim German Reference Corpus DeReKo, we generate different lists of lemmata and word forms of German language usage, for example the lemma candidate list with 350,000 entries for elexiko – the online-dictionary of contemporary German.

Current Main Subjects

  • spelling classification
  • paradigmatic classification
  • temporal / regional / text typological and similar differentiation
  • exceptions
  • quality management

DeReWo Lemma and Word Form Lists Currently Available for Download

Time and again, the Institute for the German Language keeps receiving queries regarding the “most common German words”, assuming that such requests are clear enough and therefore easy to answer. With the publication of the DeReWo lemma lists and word form lists we try to find a compromise between the fascinating diversity of our linguistic reality and the justified desire for its preferably compact, although partially simplifying description. With the help of general annotations we want to give you an overview of the issues, that are relevant for the creation and usage of such lists and which we have worked with. These general annotations are attached to the archives in their respective version. You can download the current version directly here. A detailed product-specific documentation is attached to each DeReWo-list in addition to the general annotations. The structure of this documentation is based on the structure of the general annotations. It is designed to help to understand the respective view of the language in question and the resulting simplifications and consequences for interpretation and use of the list.



Number of Entries

published on


Word Form +Lemma+POS-Frequency List


December 31, 2014



Lemma List


December 31, 2012



Lemma List


December 31, 2011



Lemma List


December 31, 2009



Word Form List


May 12, 2009



Lemma List


December 31, 2007


  • Using the DeReWo lists without knowing the corresponding documentation is scientifically dubious.
  • Referencing or passing on the DeReWo lists without the corresponding documentation is not allowed.
  • Commercial use of DeReWo lists is prohibited.
  • If you have problems downloading the lists, please proceed as follows:
    • first, download the archive and save it locally
    • then, unpack the archive (usually possible by double-clicking). A new folder will be created
    • start application (word processor, spreadsheet or the like)
    • load the file (not with a PDF-file-extension) from the new folder into the application
    • if required, enter the coding ISO-8859-15 (if necessary, look it up in the documentation)
    • if this does not lead to the desired results, please send an email to the address listed below


Back to Project Page