Sanjit Seshia's Publications clustering-based active learning for cpsgrader

Clustering-Based Active Learning for CPSGrader

Garvit Juniwal, Sakshi Jain, Alexandre Donzé, and Sanjit A. Seshia. Clustering-Based Active Learning for CPSGrader. In Proceedings of the Second ACM Conference on Learning @ Scale (L@S), pp. 399–403, March 2015.

Download

[pdf]

Abstract

In this work, we propose and evaluate an active learning algorithm in context of CPSGrader, an automatic grading and feedback generation tool for laboratory-based courses in the area of cyber-physical systems. CPSGrader detects the presence of certain classes of mistakes using test benches that are generated in part via machine learning from solutions that have the fault and those that do not (positive and negative examples). We develop a clustering-based active learning technique that selects from a large database of unlabeled solutions, a small number of reference solutions for the expert to label that will be used as training data. The goal is to achieve better accuracy of fault identification with fewer reference solutions as compared to random selection. We demonstrate the effectiveness of our algorithm using data obtained from an on-campus laboratory-based course at UC Berkeley.

BibTeX

@inproceedings{juniwal-las15,
  author    = {Garvit Juniwal and
               Sakshi Jain and
               Alexandre Donz{\'{e}} and
               Sanjit A. Seshia},
  title     = {Clustering-Based Active Learning for {CPSGrader}},
  booktitle = {Proceedings of the Second {ACM} Conference on Learning @ Scale (L@S)},
  pages     = {399--403},
  month     = {March},
  year      = {2015},
 abstract = {In this work, we propose and evaluate an active 
learning algorithm in context of CPSGrader, an 
automatic grading and feedback generation tool for 
laboratory-based courses in the area of cyber-physical 
systems. CPSGrader detects the presence of certain 
classes of mistakes using test benches that are 
generated in part via machine learning from solutions 
that have the fault and those that do not (positive and 
negative examples). We develop a clustering-based 
active learning technique that selects from a large 
database of unlabeled solutions, a small number of 
reference solutions for the expert to label that will be 
used as training data. The goal is to achieve better 
accuracy of fault identification with fewer reference 
solutions as compared to random selection. We 
demonstrate the effectiveness of our algorithm using 
data obtained from an on-campus laboratory-based 
course at UC Berkeley.},
}