"Interactive Curation of Datasets for Training and Refining Generative Models"
Wenjie Ye, Yue Dong, and Pieter Peers

Computer Graphics Forum, Volume 38, Issue 7, October 2019
Abstract
We present a novel interactive learning-based method for curating datasets using user-defined criteria for training and refining Generative Adversarial Networks. We employ a novel batch-mode active learning strategy to progressively select small batches of candidate exemplars for which the user is asked to indicate whether they match the, possibly subjective, selection criteria. After each batch, a classifier that models the user's intent is refined and subsequently used to select the next batch of candidates. After the selection process ends, the final classifier, trained with limited but adaptively selected training data, is used to sift through the large collection of input exemplars to extract a sufficiently large subset for training or refining the generative model that matches the user's selection criteria. A key distinguishing feature of our system is that we do not assume that the user can always make a firm binary decision (i.e., "meets" or "does not meet" the selection criteria) for each candidate exemplar, and we allow the user to label an exemplar as "undecided". We rely on a non-binary query-by-committee strategy to distinguish between the user's uncertainty and the trained classifier's uncertainty, and develop a novel disagreement distance metric to encourage a diverse candidate set. In addition, a number of optimization strategies are employed to achieve an interactive experience. We demonstrate our interactive curation system on several applications related to training or refining generative models: training a Generative Adversarial Network that meets a user-defined criteria, adjusting the output distribution of an existing generative model, and removing unwanted samples from a generative model.


Download
Supplementary Material
Code
Bibtex
@article{Ye:2019:ICD,
author = {Ye, Wenjie and Dong, Yue and Peers, Pieter},
title = {Interactive Curation of Datasets for Training and Refining Generative Models},
month = {October},
year = {2019},
journal = {Computer Graphics Forum},
volume = {38},
number = {7},
doi = {https://doi.org/10.1111/cgf.13844},
}