Speaker: Kai Ming Ting (Professor at Federation University Australia, Australia)
Title: Lowest Probability Mass Neighbour Algorithms: Breaking loose from the metric constraint in distance-based neighbourhood algorithms
The use of distance metrics such as the Euclidean or Manhattan distance for nearest neighbour algorithms allows for interpretation as a geometric model, and it has been widely assumed that the metric axioms are a necessary condition for all data mining tasks. We show that this assumption is in fact an impediment to producing effective nearest neighbour models.
We propose to use mass-based dissimilarity, which employs estimates of the probability mass to measure dissimilarity, to replace the distance metric. This substitution effectively converts nearest neighbour (NN) algorithms into Lowest Probability Mass Neighbour (LMN) algorithms. Both types of algorithms employ exactly the same algorithmic procedures, except for the substitution of the dissimilarity measure. We show that LMN algorithms overcome key shortcomings of NN algorithms in classification and clustering tasks.
Unlike existing generalised data independent metrics (e.g., quasi-metric, meta-metric, semi-metric, peri-metric) and data dependent metrics, the proposed mass-based dissimilarity is unique because its self-dissimilarity is data dependent and non-constant.
This talk presents the latest results since the paper on mass-based dissimilarity was published in KDD2016: www.kdd.org/kdd2016/subtopic/view/overcoming-key-weaknesses-of-distance-based-neighbourhood-methods-using-a-d
After receiving his PhD from the University of Sydney, Kai Ming Ting had worked at the University of Waikato, Deakin University and Monash University. He joins Federation University Australia since 2014. He had previously held visiting positions at Osaka University, Nanjing University, and Chinese University of Hong Kong. His current research interests are in the areas of mass estimation, mass-based dissimilarity, anomaly detection, ensemble approaches, data streams, data mining and machine learning in general. He has served as a program committee co-chair for the Twelfth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-2008). He was a member of the program committee for a number of international conferences including ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, and International Conference on Machine Learning. He has received research funding from Australian Research Council, US Air Force of Scientific Research (AFOSR/AOARD), Toyota InfoTechnology Center, and Australian Institute of Sports. Awards received include the Runner-up Best Paper Award in 2008 IEEE ICDM (for Isolation Forest), and the Best Paper Award in 2006 PAKDD. He is the creator of isolation techniques, mass estimation and mass-based dissimilarity.
|Date||July 26, 2017 (Wed) 11:00 - 12:30|