Dr. Lydia-Mai Ho-Dac (Université de Toulouse 2)

hält einen Vortrag zu

The WikiDisc corpus: In the backstage of Wikipedia

Dienstag, 18. Juni 2019, 10:00 Uhr, IDS Vortragssaal


Wikipedia constitutes a popular and extremely useful resource for studies in both linguistics and natural language processing. This presentation introduces a language resource based on the French Wikipedia online discussion pages: the WikiDisc corpus. The corpus includes 439,638 talk pages that corresponds to a sort of discussion forum associated with each article where contributors may discuss, interact, and sometimes negotiate, thereby collaboratively improving the article. The total corpus comprises more than 210 million words, structured in more than 3 million posts and more than 1 million threads (thematic sections). This talk will describe the building and the composition of the WikiDisc corpus which is publicly available at https://www.ortolang.fr/market/corpora/wikidisc