## Toward the Fundamental Limits of Imitation LearningWith In sequential decision-making problems, Imitation learning (IL) posits to learn from the behavior of an expert given a collection of demonstrations, in the absence of reward feedback. In the past, several works have studied the performance of supervised learning approaches such as behavior cloning (Ross and Bagnell 2010), interactive approaches such as DAgger (Ross and Bagnell 2011) among others. However, from a theoretical point of view the statistical rates in even the simplest setting of episodic and tabular MDPs was not well understood. In this paper, we show that when the learner is provided a dataset of N expert trajectories ahead of time, and cannot interact with the MDP, behavior cloning under log-loss is in fact \(O( S H^2 \log(N) / N)\) suboptimal compared to the value collected by the expert, even when the expert follows an arbitrary stochastic policy. Here, \(S\) is the state space and \(H\) is the length of the horizon. If the expert is constrained to be deterministic, this rate is in fact optimal up to logarithmic factors, The quadratic dependence on the horizon \(H\) is known as error-compounding in the literature and is often attributed to the problem of At first glance, knowledge of the MDP transition may not seem helpful as the learner does not know the actions played by the expert at states unvisited in the dataset. At best the learner can now distinguish between the distributions induced under different actions. However, we show that this view is rather myopic - in the presence of a good model, learners can abundantly generate artificial training data by rolling out the expert policy on known states. This motivates our novel algorithm, Mimic-MD which incurs a suboptimality \(O(S H^{3/2}/N)\) when the model is known and the expert is deterministic. This breaks the error-compounding barrier even in the worst case and shows a provable benefit to interacting with the MDP in terms of reducing the sample complexity requirement of the number of expert demonstrations. In the backdrop of this algorithm, we introduce a formal reduction of Imitation Learning with a known model to the problem of |