05/24 2023 Seminar
- Title题目 Connecting Communities: Semi-supervised Node Classification On Disconnected and Ill-labelled Graphs
- Speaker报告人 李天意 (香港中文大学)
- Date日期 2023年5月24日 10:00
- Venue地点 南楼6420
The problem of limited labels is a serious challenge for business practice that relies on data-driven classification. Collecting response data can be expensive, time-delayed, or simply impossible in many contexts, resulting in semi-supervised classification where traditional machine classifiers may fail to satisfy. When data subjects are connected on a graph, such connectivity information could be exploited to mitigate the short of labels, where we classify unlabeled subjects (nodes) based on information from nodes’ neighbors. Graph neural networks (GNNs) are such graph-based message-passing architectures that have excelled in many semi-supervised tasks, in particular at small labeling rate. Besides common limitations of GNNs, here we identify a specific problem setting that naive GNNs may fail: response data (labels) are collected oftentimes not only partially, but also only from a subset of communities, who are disjoint components of the entire graph. This enriched limited labels problem poses a great challenge to GNN applications on real social networks.
In this study, we investigate GNNs’ performance at semi-supervised node classification in face of the limited labels problem. To enhance GNNs’ downgraded performance, we first try to select useful node features that help classification, and drop the rest from classifier input. More importantly, we obviate the disjoint-community issue fundamentally by augmenting the original network, in particular with inter-community edges. We experiment with three ideas to suggest new connectivity: (1) link prediction, (2) eigenvalue elevation, (3) strategic random edge addition. We apply the techniques to two well-studied social networks, evaluating techniques' effect in enhancing GNN’s performance and in modifying network topology. Through step-by-step analysis, we formulate a practical recipe for improving GNN's effectiveness at semi-supervised node classification on real social networks, discussing the conditions and limitations of this enhancement. Using GNN efficiently to solve real-world problems that maintain network data embodies rich application potentials.
Tianyi Li 教授本科毕业于北京大学，获得普林斯顿大学硕士学位，麻省理工学院博士学位，2021年加入香港中文大学决策与企业经济学系任助理教授。首位以独立获奖者身份获得国际系统动力学会Dana Meadows Award的中国学者。