We are holding the kick-off meeting to explain and share a document clustering task on 30-language (*1) Wikipedia pages into a fine grained (219) NE categories define by Extended Named Entity. We prepare the training data using already-categorized Japanese Wikipedia data and the language link information. For some languages there are a huge training data (100k) and the task is to categorize the remaining Wikipedia pages. We will conduct this project as a “Resource by Collaborative Contribution” project, i.e. your system output will be used to create the Resource collaborative fashion. Anyone can participate the task.
At this meeting, we have an invited speech by Prof. Le-Hong Phuong, VNU University of Science, Hanoi. He lead the group which achieves the best performance at SHINRA2020-ML task.
Data release: Mar 2021
Kick-off meeting & Leaderboard open: May 25, 2021
Result submission deadline: Oct 15 2021
Evaluation results due back to participants: Nov 15 2021
Final report & meeting: Dec 2021
*1: The 30 target languages are: English, Spanish, French, German, Chinese, Russian, Portuguese, Italian, Arabic, Indonesian, Turkish, Dutch, Polish, Persian, Swedish, Vietnamese, Korean, Hebrew, Romanian, Norwegian, Czech, Ukrainian, Hindi, Finnish, Hungarian, Danish, Thai, Catalan, Greek, Bulgarian.
|日時||2021/05/25(火) 17:00 - 18:30|