The TrustML Young Scientist Seminars (TrustML YSS) started from January 28, 2022.
The TrustML YSS is a video series that features young scientists giving talks and discoveries in relation with Trustworthy Machine Learning.
For more information please see the following site.
This network is funded by RIKEN-AIP’s subsidy and JST, ACT-X Grant Number JPMJAX21AF, Japan.
【The 29th Seminar】
Date and Time: September 2nd 10:00 am – 11:00 am(JST)
Venue: Zoom webinar
Speaker: Kawin Ethayarajh (Stanford University)
Title: Understanding Dataset Difficulty with V-Usable Information
Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model. To address these questions, we frame dataset difficulty — w.r.t. a model — as the lack of V-usable information (Xu et al., 2019), where a lower value indicates a more difficult dataset for the model family V. Our framework allows for many types of comparisons under the same umbrella: not only can we compare different model families, but also different datasets, different slices of the same dataset, different instances in a distribution, and different input attributes. We apply our framework to discover annotation artefacts in widely-used NLP benchmarks, such as SNLI and CoLA.
All participants are required to agree with the AIP Seminar Series Code of Conduct.
Please see the URL below.
RIKEN AIP will expect adherence to this code throughout the event. We expect cooperation from all participants to help ensure a safe environment for everybody.
|Date||September 2, 2022 (Fri) 10:00 - 11:00|