Yang's research group wiki

Transfer Learning | 迁移学习

Introduction

机器学习与人工智能技术的实践常常面临人工标注样本不足的问题。而迁移学习便是最大化利用已有标记数据的方法之一。它通过让模型从已有的标记数据向无标记数据迁移，从而减少模型训练对人工标记的需求。现有的迁移学习方法大多要求已标记数据与无标记数据的分布非常接近。而现实数据由于不同信息来源和环境影响具，与标记数据在结构上有相当大的差异。如很多图像识别模型训练时采用无背景的产品图片，而现实中的大部分未标记的照片有着复杂多变的背景与照片，因此导致迁移的效果仍十分有限。本课题研究异构数据之间的迁移学习算法，利用原始特征之间相关性寻找迁移数据的最优表征方式，以此提高迁移学习的成效。

Feature augmentation

In supervised learning, we often use a large number of features (random variables) to learn a predictor for the target task (e.g. classification). Those random variables often have some natural redundancy. For instance, in an image represented by pixel values, nearby pixels are often highly correlated. Even in hand-craft features, such as Visual Bag-of-Words, feature components are not completely independent. The figure below shows the distribution of correlation coefficients among 800 features in the Office+Caltech dataset. The largest pairwise coefficient is 0.623.

While traditional machine learning algorithms tend to discard the dependencies among features, we believe that there are useful information in the covariance of features. In particular, their common information encode some hidden shared structures in the data. Our goal is to find the best feature representation that reveal such shared structures. We hypothesize that these new features will have higher transferability across different data distributions.

Unsupervised domain adaptation

Domain adaptation is one type of transfer learning problem that aims at learning from a source data distribution (source domain) a well performing model on a different (but related) target data distribution (target domain). In particular, we are interested in the unsupervised domain adaptation problem, where the target domain has no training labels.