Yang's research group wiki

This is an old revision of the document!

Transfer Learning | 迁移学习简介

机器学习与人工智能技术的实践常常面临人工标注样本不足的问题。而迁移学习便是最大化利用已有标记数据的方法之一。它通过让模型从已有的标记数据向无标记数据迁移，从而减少模型训练对人工标记的需求。现有的迁移学习方法大多要求已标记数据与无标记数据的分布非常接近。而现实数据由于不同信息来源和环境影响具，与标记数据在结构上有相当大的差异。如很多图像识别模型训练时采用无背景的产品图片，而现实中的大部分未标记的照片有着复杂多变的背景与照片，因此导致迁移的效果仍十分有限。本项目研究异构数据之间的迁移学习算法，利用原始特征之间相关性寻找迁移数据的最优表征方式，以此提高迁移学习的成效。

The unsupervised domain adaptation problem

Feature augmentation

In supervised learning, we often use a large number of features to represent each training example. Among the feature random variables, many are correlated. For instance, in an image represented by pixel values, nearby pixels are often highly correlated. Even in hand-craft features, such as visual-bag of words, feature components are not completely independent. See Figure 1 for an example.

While traditional machine learning algorithm tries to discard the dependencies among features, we believe that there are useful information in the correlated features. In particular, it indicates that certain variations in the data is stronger than in this project, we try to utilize the information that lie within the correlation