Dataset & Project Ideas

Application problems

Below are some datasets owned by TBSI lab2c. Please contact the course staff if you have questions about these datasets.

Construction worker dataset

Download

RGB videos of construction workers involved in seven different acitivities.

Project ideas:

  • Visual understanding of the scene: e.g. worker segmentation, hard hat detection

  • Detection of worker movement (skeleton extraction)

  • Classification of worker activity and environment

Multi-body sensor dataset for human acitivties

Download

A collection of multi-body sensor dataset measuring the movement of various activities (e.g. walking, running). Motion is captured using 5 IMU sensors attached to the body.

Project ideas:

  • Few-shot classification for unseen activity. Given full training data of a number of known activities, and a small number of training data of new activites. How to learn a classifier for the new activity?

  • Low cost activity recognition: how to select which sensors to use if we are only allowed to use k sensors to track a person?

Shenzhen volunteer activity dataset

Download

Volunteer participation records from the 战役先锋APP platform from Feb 2020 to Jan 2021. Each record contains user id and volunteer task descriptions including location, date time, actvity type etc.

Project ideas:

  • Find patterns in volunteer activities in response to the COVID-19 outbreak in Shenzhen

  • Predict the retention of a volunteer based on past volunteering history

  • Causality analysis on volunteer behavior

Chinese resume search dataset

A dataset of parsed sentences from Chinese resumes with labels, and a collection of (synthesized) job description search queries.

Download

Project ideas:

  • Multi-label classification of resume text

  • Extract semantic keywords from resume text based on the job description

  • Design a matching algorithm based on the job description and resume text

Text analysis for employee appraisal report

A dataset of employee appraisal reports and a collection of labeled training sentences

Download

Project ideas:

  • Generate summaries based on the report

  • Extract key phrases corresponding to different semantic attributes

This is part of a government funded project. A prize (2000) will be given if your proposed model is adopted in the real system

Quadruped robot tactile sensing

This dataset contains the time series (3D_Forces v.s. time) from the four foot tactile sensors (self-designed) and IMU sensors (pose and torques, angular positions, and angular velocities of the 12 joint motors) of a quadruped.

Download (Please ask course staffs for password.)

Project ideas:

  • Pose estimation based on the four foot tactile sensors.

  • Motor torque or angle estimation based on the four foot tactile sensors.

3D reconstruction of the Gelsight

Deformation images of the tactile sensor under a certain light field, Gelsight, with ground truth deformation measured by the tactile sensor.

Download (Please ask course staffs for password.)

Project ideas:

  • Reconstruct the 3D deformation of the tactile sensor based on the captured images.

Analytical and Theoretical problems

You need to show some original analysis in the problem you choose(It's ok if the problem has been solved before). Simple numerical experiments are also encouraged to verify your finding.

  • Derive the generalization bound of classic learning algorithms or their variations. (e.g. algorithms we learned in class or homeworks)

  • Give theoretical explanation to some empirical findings in machine learning (e.g. How does data augmentation and bootstrapping help reduce variance? )

  • Why does simple classifiers such as nearest neighbor perform so well in few-shot learning? (related paper)

  • When can we use self-training to generalize a model to a new domain? (related paper)

  • Investigate the robustness of existing learning algorithms under noise or perturbation

  • Estimate the performance of pre-trained features on unseen data in different domain or different task (e.g. transferability estimation)

  • Predict generalization in deep learning (related paper)

Hint: Try to simplify the problem. e.g. Making some assumptions on the model or data distribution? Try the 1D or 2D case before moving to higher dimensions.