Control and Reinforcement

Is Q-learning provably efficient?. C. Jin, Z. Allen-Zhu, S. Bubeck, and M. I. Jordan arxiv.org/abs/1807.03765, 2018.

Ray: A distributed framework for emerging AI applications. P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, W. Paul, M. I. Jordan, and I. Stoica arxiv.org/abs/1712.05889, 2017.

High-dimensional continuous control using generalized advantage estimation. J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel. International Conference on Learning Representations (ICLR), Puerto Rico, 2016.

Trust region policy optimization. J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. In F. Bach and D. Blei (Eds.), International Conference on Machine Learning (ICML), New York: ACM Press, 2015.

Optimism-driven exploration for nonlinear systems. T. Moldovan, S. Levine, M. I. Jordan, and P. Abbeel. In IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, 2015.

Mixed membership models for time series. E. Fox and M. I. Jordan. arXiv:1309.3533, 2013.

Particle Gibbs with ancestral sampling. F. Lindsten, M. I. Jordan, and T. Schön. Journal of Machine Learning Research, 15, 2145-2184, 2014.

Bayesian semiparametric Wiener system identification. F. Lindsten, T. Schön, and M. I. Jordan. Automatica, 49, 2053-2063, 2013.

The SCADS Director: Scaling a distributed storage system under stringent performance requirements. B. Trushkowsky, P. Bodik, A. Fox, M. Franklin, M. I. Jordan, and D. Patterson In 9th USENIX Conference on File and Storage Technologies (FAST '11), San Jose, CA, 2011.

Bayesian inference for queueing networks and modeling of Internet services. C. Sutton and M. I. Jordan. Annals of Applied Statistics, 5, 254-282, 2011.

Bayesian nonparametric inference of switching linear dynamical models. E. Fox, E. Sudderth, M. I. Jordan, and A. Willsky. IEEE Transactions on Signal Processing, 59, 1569-1585, 2011.

Bayesian nonparametric methods for learning Markov switching processes. E. Fox, E. Sudderth, M. I. Jordan, and A. Willsky. IEEE Signal Processing Magazine, 27, 43-54, 2010.

Nonparametric Bayesian identification of jump systems with sparse dependencies. E. Fox, E. Sudderth, M. I. Jordan, and A. Willsky. 15th IFAC Symposium on System Identification (SYSID), St. Malo, France, 2009.

Nonparametric Bayesian learning of switching linear dynamical systems. E. B. Fox, E. Sudderth, M. I. Jordan, and A. S. Willsky. In D. Koller, Y. Bengio, D. Schuurmans and L. Bottou (Eds.), Advances in Neural Information Processing Systems (NIPS) 22, 2009.

Kalman filtering with intermittent observations. B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M. I. Jordan, and S. Sastry. IEEE Transactions on Automatic Control, 49, 1453-1464, 2004.

Autonomous helicopter flight via reinforcement learning. A. Y. Ng, H. J. Kim, M. I. Jordan, and S. Sastry. In S. Thrun, L. Saul, and B. Schoelkopf (Eds.), Advances in Neural Information Processing Systems (NIPS) 17, 2004.

A minimal intervention principle for coordinated movement. E. Todorov and M. I. Jordan. In S. Becker, S. Thrun, and K. Obermayer (Eds.), Advances in Neural Information Processing Systems (NIPS) 16, 2003.

Optimal feedback control as a theory of motor coordination. E. Todorov and M. I. Jordan. Nature Neuroscience, 5, 1226-1235, [Supplementary information]. [News and views], 2002.

Random sampling of a continuous-time stochastic dynamical system. M. Micheli and M. I. Jordan. Proceedings of the Fifteenth International Symposium on Mathematical Theory of Networks and Systems, 2002.

PEGASUS: A policy search method for large MDPs and POMDPs. A. Y. Ng and M. I. Jordan. Uncertainty in Artificial Intelligence (UAI), Proceedings of the Sixteenth Conference, 2000.

Computational motor control. M. I. Jordan and D. M. Wolpert. In M. Gazzaniga (Ed.), The Cognitive Neurosciences, 2nd edition, Cambridge: MIT Press, 1999.

Computational aspects of motor control and motor learning. M. I. Jordan. In H. Heuer and S. Keele (Eds.), Handbook of Perception and Action: Motor Skills, New York: Academic Press, 1996.

Reinforcement learning by probability matching. P. N. Sabes and M. I. Jordan. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems (NIPS) 9, Cambridge MA: MIT Press, 1996.

Markov mixtures of experts. M. Meila and M. I. Jordan. In D. Touretzky, M. Mozer, and M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (NIPS) 9, MIT Press, 1996.

Reinforcement learning algorithm for partially observable Markov decision problems. T. S. Jaakkola, S. P. Singh, and M. I. Jordan. In G. Tesauro, D. S. Touretzky and T. K. Leen, (Eds.), Advances in Neural Information Processing Systems (NIPS) 8, Cambridge, MA: MIT Press, 1995.

An internal forward model for sensorimotor integration. D. M. Wolpert, Z. Ghahramani, and M. I. Jordan. Science, 269, 1880--1882, 1995.

Reinforcement learning with soft state aggregation. S. P. Singh, T. S. Jaakkola, and M. I. Jordan. In G. Tesauro, D. S. Touretzky and T. K. Leen, (Eds.), Advances in Neural Information Processing Systems (NIPS) 8, Cambridge, MA: MIT Press, 1995.

The moving basin: Effective action-search in adaptive control. W. Fun and M. I. Jordan, M. I. Proceedings of the World Conference on Neural Networks, Washington, DC, 1995.

Learning without state estimation in partially observable Markovian decision processes. S. P. Singh, T. S. Jaakkola, and M. I. Jordan. Machine Learning: Proceedings of the Eleventh International Conference, San Mateo, CA: Morgan Kaufmann, 284--292, 1994.

On the convergence of stochastic iterative dynamic programming algorithms. T. Jaakkola, M. I. Jordan and S. Singh. Neural Computation, 6, 1183--1190, 1994.

Forward models: Supervised learning with a distal teacher. M. I. Jordan and D. E. Rumelhart. Cognitive Science, 16, 307-354, 1992.

Learning piecewise control strategies in a modular neural network architecture. R. A. Jacobs and M. I. Jordan. IEEE Transactions on Systems, Man, and Cybernetics, 23, 337--345, 1993.

Optimal control: A foundation for intelligent control. D. A. White and M. I. Jordan. In D. A. White, and D. A. Sofge (Eds.), Handbook of Intelligent Control, Amsterdam: Van Nostrand, 1992.

A modular connectionist architecture for learning piecewise control strategies. R. A. Jacobs and M. I. Jordan. Proceedings of the 1991 American Control Conference, Boston, MA, pp. 343--351, 1991.

Learning to control an unstable system with forward modeling. M. I. Jordan, and R. A. Jacobs. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems (NIPS) 3, San Mateo, CA: Morgan Kaufmann, pp. 324--331, 1990.

Learning inverse mappings with forward models. M. I. Jordan. In K. S. Narendra (Ed.), Proceedings of the Sixth Yale Workshop on Adaptive and Learning Systems, New York: Plenum Press, 1990.