Dataset & Project Ideas
Application problems
Below are some datasets owned by TBSI lab2c. Please contact the course staff if you have questions about these datasets.
Construction worker dataset
RGB videos of construction workers involved in seven different acitivities.
Project ideas:
Visual understanding of the scene: e.g. worker segmentation, hard hat detection
Detection of worker movement (skeleton extraction)
Classification of worker activity and environment
Multi-body sensor dataset for human acitivties
A collection of multi-body sensor dataset measuring the movement of various activities (e.g. walking, running). Motion is captured using 5 IMU sensors attached to the body.
Project ideas:
Few-shot classification for unseen activity. Given full training data of a number of known activities, and a small number of training data of new activites. How to learn a classifier for the new activity?
Low cost activity recognition: how to select which sensors to use if we are only allowed to use k sensors to track a person?
Shenzhen volunteer activity dataset
Volunteer participation records from the 战役先锋APP platform from Feb 2020 to Jan 2021. Each record contains user id and volunteer task descriptions including location, date time, actvity type etc.
Project ideas:
Find patterns in volunteer activities in response to the COVID-19 outbreak in Shenzhen
Predict the retention of a volunteer based on past volunteering history
Causality analysis on volunteer behavior
Chinese resume search dataset
A dataset of parsed sentences from Chinese resumes with labels, and a collection of (synthesized) job description search queries.
Project ideas:
Multi-label classification of resume text
Extract semantic keywords from resume text based on the job description
Design a matching algorithm based on the job description and resume text
Text analysis for employee appraisal report
A dataset of employee appraisal reports and a collection of labeled training sentences
Project ideas:
Generate summaries based on the report
Extract key phrases corresponding to different semantic attributes
This is part of a government funded project. A prize (2000) will be given if your proposed model is adopted in the real system
Quadruped robot tactile sensing
This dataset contains the time series (3D_Forces v.s. time) from the four foot tactile sensors (self-designed) and IMU sensors (pose and torques, angular positions, and angular velocities of the 12 joint motors) of a quadruped.
Download (Please ask course staffs for password.)
Project ideas:
Pose estimation based on the four foot tactile sensors.
Motor torque or angle estimation based on the four foot tactile sensors.
3D reconstruction of the Gelsight
Deformation images of the tactile sensor under a certain light field, Gelsight, with ground truth deformation measured by the tactile sensor.
Download (Please ask course staffs for password.)
Project ideas:
Reconstruct the 3D deformation of the tactile sensor based on the captured images.
Analytical and Theoretical problems
You need to show some original analysis in the problem you choose(It's ok if the problem has been solved before). Simple numerical experiments are also encouraged to verify your finding.
Derive the generalization bound of classic learning algorithms or their variations. (e.g. algorithms we learned in class or homeworks)
Give theoretical explanation to some empirical findings in machine learning (e.g. How does data augmentation and bootstrapping help reduce variance? )
Why does simple classifiers such as nearest neighbor perform so well in few-shot learning? (related paper)
When can we use self-training to generalize a model to a new domain? (related paper)
Investigate the robustness of existing learning algorithms under noise or perturbation
Estimate the performance of pre-trained features on unseen data in different domain or different task (e.g. transferability estimation)
Predict generalization in deep learning (related paper)
Hint: Try to simplify the problem. e.g. Making some assumptions on the model or data distribution? Try the 1D or 2D case before moving to higher dimensions.