Monday, May 18, 2020

Open Graph Benchmark: Datasets for Machine Learning on Graphs

A diverse collection of datasets for use in ML applications to graphs has been collected by Hu et al. The Benchmark is intuitively structured and includes evaluation protocols and metrics. Furthermore, the authors have reported the measured performance of a few popular approaches within each application (e.g., ROC-AUC,PRC-AUC, hits, or accuracy). There are several datasets in all three classes of tasks: Node property prediction (ogbn-), link property (ogbl-) prediction, and graph property prediction (ogbg-). 

Of particular interest to those of us who work in biochemistry broadly defined are the SMILES molecular graphs adapted from MoleculeNet [2] such as ogbg-molhiv (HIV) and ogbg-pcba (PubChem Bio Assay); however, also ogbl-ppa (Protein-Protein Association) and ogbn-proteins (Protein-Protein Association) are of interest. Note that MoleculeNet is not included in its entirety - far from it. So, that resource is definitely also interesting to have a close look at if you have not already explored it.

If you are the competitive type, your efforts can be submitted to scoreboards at the hosting website: https://ogb.stanford.edu