Complex controls are increasingly common in power systems. Reinforcement learning (RL) has emerged as a strong candidate for the implementation of these controls. One common use of RL in this context is for prosumer pricing aggregations: Supply and demand data collected from many microgrid controllers are continually aggregated by a central provider, who uses online RL to learn an optimal pricing policy.
While RL is known to be effective for this task, it comes with potential vulnerabilities. What happens when some of the microgrid controllers are compromised by a malicious entity? We demonstrate that if data from a small fraction of microgrid controllers is adversarially perturbed, the learning of the RL aggregator can be significantly slowed down. With larger perturbations, the RL aggregator can be manipulated to learn a catastrophic pricing policy. We complement these findings with a robust RL algorithm that is optimal even in the presence of such adversaries.