CS 289A: Machine Learning Project

CS 289A: Intro to Machine Learning
Grad Final Project

For graduate students, the final project is worth 20% of your final grade. The project should be done in teams of 2–3 students, so please find a partner. After finding a partner, please schedule a meeting with one of the project TAs (listed below) using this sheet. Please be sure to do this before submitting your initial proposal. Feel free to reach out to any of the project TAs if you have questions!

The Project TAs for this semester and their emails are listed below:

Sara Pohland, spohland@berkeley.edu
Fred Shentu, fredshentu@berkeley.edu
Grace Luo, graceluo@berkeley.edu

Overview

The project theme may be anything related to machine learning techniques discussed in this class. We hope that this project will be useful to you, so you are encouraged to design a project that is related to your research interests (or the research interests of your teammate(s)). However, please be honorable and do not suggest a project that you have already fully completed as part of your research.

The theme of your research project is somewhat flexible, as long as it is related to ML concepts covered in this course. You may consider the following types of projects:

Critically revisit a published paper (i.e., reproduce the methods, experiments, and analysis of an exisiting paper).
Write a literature review of work in a specific domain of ML and make a critical comparison (ideally on a standard benchmark dataset).
Conduct an original theoretical research project based on open problems in ML.
Conduct an original practical research project by applying ML methods to a specific problem/dataset.

We also provide some project ideas at the bottom of this page for inspiration.

Deliverables

There are three assignments that you will need to submit (on Gradescope):

Initial Proposal (due Friday, April 11)
Project Video (due Monday, May 12)
Final Report (due Tuesday, May 13)

Each of these deliverables are described in more detail in the following sections.

Initial Proposal

Please write a 1-2 page proposal for your project. You may consider including the following information:

Background: What is the application domain or field of research? Why is the problem important? What specific questions will you try to answer?
Data: What data do you intend to work with? Describe the features, labels, number of samples, etc. Is this data publicly available? How will you access it?
Preliminary work: If you have done preliminary work, explain what you have already done.
Proposed work: What will be the core of the work for your project? How do you expect to spend most of your time? Do you expect to be judged primarily on your writing, your implementation of existing methods, your implementation of new ideas, your exhaustive exploration of methods and data, or something else? What will your contributions be?

Once you have written your propsal, all team members should submit it to the corresponding Gradescope assignment.

Project Video

As a part of your final project, you will be submitting a project video. Please be sure that you have followed the guidelines below before making your submission:

The video should be clear and understandable, describing everything you think is important about your project (motivation, background, techniques, results, etc.).
The video needs to be self-contained: any CS 289A student should be able to understand what you did (at least at a high level) without consulting any other materials.
There is no requirement on the format of your video. You can make the video as simple as slides with a voice overlay or as fancy as you want. As long as it is clear and understandable, you will not be graded on the fanciness of your video.
The video can be at most 3 minutes long. Note that this is a very strict requirement; we will not grade a video that is more than 3 minutes long.

Once you have completed your video, you must upload the video to YouTube. You may choose to keep the video private so that only those with the link can view it, in which case only the instructors will view it. After uploading your video, all team members should provide the link in the corresponding Gradescope assignment.

Final Report

The second part of your final project is the final report. Please be sure that you have followed the guidelines below before making your submission:

You must format your report using this template, which follows the ICML (International Conference on Machine Learning) guidelines. Please do not modify this template in any way; we will not consider final reports that do not follow these formatting guidelines.
There is no minimum length requirement, but the maximum length is 8 pages. As with the project video, this is a strict requirement; we will not grade a final report that is more than 8 pages long.

Once you have completed your report, all team members should submit it to the corresponding Gradescope assignment.

Grading Criteria

The initial proposal is not graded. Half of your grade will be based on the project video, and the other half will be based on the final report. The video and the final report will be graded with 4 criteria:

Relevance: You project should be related to machine learning.
Usefulness: Your work should seek to answer an interesting question.
Soundness: You should employ good ML and academic practices throughout your work.
Clarity: The material you present should be clear and well-organized.

Project Ideas

The ideas in this section are meant to provide inspiration; you can take these ideas, derive similar projects from them, or do something completely different (while following the guidelines discussed previously). The ideas in this list fall mainly under the fourth category of practical research. If you prefer to critically revisit a published paper, simply pick a paper of interest to you. If you prefer to conduct a literature review, simply pick a machine learning topic that interests you. If you prefer to conduct theoretical research, you'd better already know what you're doing.

Sara's suggestions:

Find a dataset that interests you from Kaggle or the UC Irvine Machine Learning Repository and compare the performance and computational requirements of the ML models discussed in this class. You could also use these datasets to conduct a project on feature engineering.
Explore methods to make the models discussed in this course more explainable, interpretable, trustworthy, or robust. There is much work on Explainable Artificial Intelligence (XAI) models, calibrating Neural Network (NN) models, responding to distributional shift, etc., which may be interesting to explore.

Fred's suggestions:

Apply LoRA Fine-Tuning to Your Favorite LLM from Hugging Face. Low-Rank Adaptation (LoRA) is an efficient technique for fine-tuning by introducing trainable low-rank matrices, significantly reducing the number of trainable parameters. Your task is to apply LoRA fine-tuning to a preferred LLM from Hugging Face's model hub. Experiment with different LoRA ranks and analyze their impact on performance, speed, and memory usage. This will help you understand the trade-offs involved in parameter-efficient fine-tuning.
Train a VQ-VAE on a Simple Dataset Such as CIFAR and Explore the Advantage of Adding Adversarial Loss. Vector Quantized Variational Autoencoders (VQ-VAEs) are powerful models for learning discrete latent representations. Benchmark the model's performance using both the training objective and the FID score to evaluate the quality of generated images.
Quantize One of Your Favorite Models to 8-bit or Even 4-bit and Compare the Performance Difference Between the Quantized Model and the Original Model. Model quantization reduces the numerical precision of model weights, leading to smaller model sizes and faster inference times. Choose a model you frequently use and quantize it to 8-bit and 4-bit representations. Compare the performance, accuracy, and resource utilization between the quantized models and the original.

Grace's suggestions:

Train a tiny generative pre-trained transformer (GPT). You can follow Karpathy's Zero to Hero tutorial.
Investigate scaling laws. You can try your hand at reproducing compute vs. loss curves with this open-source implementation. You can also investigate scaling laws for different relationships, like number of samples from a diffusion model vs. FID. Fit a curve to your data and assess how well it predicts held-out points.
Train a sparse autoencoder (SAE). Run your SAE on a huge dataset and check what interesting clusters emerge. You can also check out already trained SAEs for Gemma 2.

Other (potentially out-dated) ideas:

Try your hand at the distribution shift challenge WILDS. Can you improve upon the basic ERM framework by leveraging diverse datasets?
The method of integrated gradients is an attribution technique that attempts to attribute some fraction of the decision made by a neural network to individual features of the input. Implement integrated gradients and briefly evaluate it.
Businesses love to promote themselves on Yelp, leaving false positive reviews. With this dataset, predict which reviews are likely to have been written by the business itself.
Browsers typically download files to the Downloads folder (or another fixed, set folder). Develop a method for automatically placing files in the appropriate folder. You can constrain yourself to text documents, or to images.
Image captioning using RNN/LSTM is also an important topic. Try to generate meaningful text from images/videos. For example, take a look at the COCO Captioning Challenge.
Can you train a fine-grained image classifier? There are many specialized datasets, like this one for dog breeds. This can be applied with the project above or in isolation. Architectures? See this paper.
Reinforcement Learning tackles sequential decision-making problems where the only information about the system is received through interaction. A great resource to understand RL algorithms is Spinning Up. Pick your favorite algorithm from there and apply it to more complex simulated environment (e.g., MuJoCo is free for students).
If you want harder problems, involving various aspects of ML, you can also check this: OpenAI Requests for Research
Fake news 1. Can you train a system to decide if two articles are related and agree? The Fake News Challenge gives access to a dataset for this. You can propose your own solution and see if you get close to the winners!
Fake news 2. Can you classify articles by bias and factuality? There is a dataset (and SVM algorithm) built with this purpose. Their code is also available, therefore a substantial innovation should be attempted.
Can you fool a classifier with a real object? There are works that make traffic signs classification systems (trained on the LISA dataset) predict that a stop sign is a 45mph speed limit sign. Or that a 3D printed turtle is a rifle.
Can you visualize the features learned by a Deep Neural Network? When training Deep Neural Networks, the hidden state at each layer can be understood as features or representations useful to perform the desired task. One such tool is guided backpropagation, and more recently other dataset-wide visualizations have been proposed. See this blog post.
Dependently typed programming languages, such as Idris, provide safety guarantees that are not available in the more mainstream languages. Can any of these features be used to improve the data analytic stack, or developer ergonomics?
Try your hand at a computer vision challenge with one of these satellite image datasets. Can you predict building footprints, segment out roads, or even generate a map from a satellite image?
Ever wish you could read someone's mind? OpenNeuro hosts a variety of fMRI, EEG, ECoG, etc. data.
Can you transcribe music directly from audio files? MusicNet provides a curated collection of labeled classical music.
The price of Bitcoin has been erratically rising and falling over the past couple of years. Can you build a model to predict the price of Bitcoin? Try your hand at this dataset.
Can you predict a person’s Myers–Briggs personality type from the content they post on social media? This dataset makes for a fun challenge.
Can you identify questions that have the same intent? Try out this dataset of questions posted on Quora.
Can you identify the genre of a song from its spectrogram or other audio features? This dataset provides labeled audio tracks for classification.
Build and test a computer vision model that performs image-to-image translation with adversarial networks.
Play with offline reinforcement learning and the dataset d4rl: A benchmark for offline reinforcement learning.
Sentiment analysis of written sentences–for example, on movie reviews from the Rotten Tomatoes dataset.
Play with robotic imitation learning in simulation–for example, with robosuite: A Modular Simulation Framework and Benchmark for Robot Learning.

More inspiration:

Also, for inspiration, here are some of the final projects from a Neural Networks class at Stanford.