Transfer Learning: List of possible relevant papers

[Ando and Zhang, 2004] Rie K. Ando and Tong Zhang (2004). A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Technical Report RC23462, IBM T.J. Watson Research Center.
[Andre and Russell, 2002] Andre, D. and Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-02), pages 119-125, Edmonton, Alberta. AAAI Press.
[Baxter, 1997] Baxter, J. (1997). A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28:7.
[Baxter, 2000] Baxter, J. (2000). A model of inductive bias learning. JAIR, 12:149-198.
[Ben-David and Schuller, 2003] Ben-David, S. and Schuller, R. (2003). Exploiting task relatedness for multiple task learning. In Proc COLT, pages 567-580.
[Berger, 1985] Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York.
[Boutilier et al., 2001] Craig Boutilier, Ray Reiter and Bob Price (2001). Symbolic Dynamic Programming for First-order MDPs. In Proc. Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), Seattle, pp.690--697.
[Carota and Parmigiani, 2002] Carota, C. and Parmigiani G. (2002). Semiparametric regression for count data. Biometrika, 89, 265-281. [1997 tech report available here]
[Caruana, 1997] Caruana, R. (1997). Multitask learning. Machine Learning, 28:41-75.
[Dietterich, 2000] Dietterich, T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303.
[Dietterich et al., 2002] Dietterich, T. G., Busquets, D., Lopez de Mantaras, R., and Sierra, C. (2002). Action refinement in reinforcement learning by probability smoothing. In Proc. ICML, pages 107-114.
[Dzeroski et al., 2001] S. Dzeroski, L. De Raedt, and K. Driessens (2001). Relational reinforcement learning. Machine Learning, 43, 7-52.
[Evgeniou and Pontil, 2004] Theodoros Evgeniou and Massimiliano Pontil (2004). Regularized multi--task learning. In Proc. 17th SIGKDD Conf. on Knowledge Discovery and Data Mining.
[Evgeniou et al., 2005] Theodoros Evgeniou, Charles Micchelli, and Massimiliano Pontil (2005). Learning multiple tasks with kernel methods. J. Machine Learning Research, 6: 615--637.
[Gelman et al., 1995] Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis. Chapman and Hall.
[George and McCulloch, 1993] Ed George and Robert McCulloch (1993). Variable selection by Gibbs sampling. JASA, 88, 881-889.
[Griffin and Brown, 2005] J.E. Griffin and P.J. Brown (2005). Alternative prior distributions for variable selection with very many more variables than observations. Technical report, Dept. of Statistics, University of Warwick.
[Hengst, 2002] Hengst, B. (2002). Discovering Hierarchy in Reinforcement Learning with HEXQ. In Proceedings of the 19th International Conference on Machine Learning, pages 243-250.
[Heskes, 2000] Heskes, T. (2000). Empirical bayes for learning to learn. In Proc. ICML, pages 367-374.
[Hofmann and Puzicha, 1999] Hofmann, T. and Puzicha, J. (1999). Latent class models for collaborative filtering. In Proc. IJCAI, pages 688-693.
[Intrator and Edelman, 1996] Intrator, N. and Edelman, S. (1996). How to make a low-dimensional representation suitable for diverse tasks. Connection Science, 8:205-224.
[Kass and Steffey, 1989] Kass, R. and Steffey, D. (1989). Approximate Bayesian inference in conditionally independent hierarchical models. JASA, 84, 717-726.
[MacEachern, 1999] MacEachern, S.N. (1999). Dependent Nonparametric Processes. In ASA Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA: American Statistical Association, pp. 50-55.
[MacEachern et al., 2001] MacEachern, S., Kottas, A., and Gelfand, A. (2001). Spatial Nonparametric Bayesian Models. Technical Report 01-10, Institute of Statistics and Decision Sciences, Duke University.
[Maurer, 2005a] Andreas Maurer (2005). Algorithmic stability and meta-learning. JMLR, 6:967-994.
[Maurer, 2005b] Andreas Maurer (2005). Bounds for linear multi-task learning. Technical report.
[McGovern and Barto, 2001] McGovern, A. and Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. ICML, pages 361-368.
[Micchelli and Pontil, 2004] Charles Micchelli and Massimiliano Pontil (2004). Kernels for multi--task learning. Proc.NIPS 18.
[Muller et al., 2004] Muller, P., Quintana, F. and Rosner, G. (2004). A method for combining inference across related nonparametric Bayesian models. Journal of the Royal Statistical Society, Series B, 66(3), 735-749.
[Ng et al., 2004] Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., and Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. In International Symposium on Experimental Robotics.
[O'Sullivan et al., 1997] O'Sullivan, J., Mitchell, T., and Thrun, S. (1997). Explanation-based neural network learning for mobile robot perception. In Symbolic Visual Learning. Oxford University Press.
[Parr and Russell, 1998] Parr, R. and Russell, S. J. (1998). Reinforcement learning with hierarchies of machines. In Jordan, M. I., Kearns, M., and Solla, S. A., editors, Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, Massachusetts.
[Sammut and Banerji, 1986] Sammut, C. A. and Banerji, R. B. (1986). Learning concepts by asking questions. In Machine Learning: An AI Approach, volume 2, pages 167-192.
[Silver and Mercer, 1998] Silver, D. and Mercer, R. (1998). The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness. Connection Science, 8(2):277-294.
[Silver and Mercer, 2001] Silver, D. and Mercer, R. (2001). Selective Functional Transfer: Inductive Bias from Related Tasks. In Proceedings of the IASTED International Conference on Artificial Intelligence and Soft Computing (ASC2001), Cancun, Mexico, May, 2001, M.H. Hamza (Ed.), ACTA Press, p.182-189.
[Simard et al., 1992] Simard, P., Victorri, B., Le Cun, Y., and Denker, J. (1992). Tangent Prop輸 formalism for specifying selected invariances in an adaptive network. In Advances in Neural Information Processing Systems 4, pages 895-903.
[Singh and Cohn, 1998] Singh, S. and Cohn, D. (1998). How to dynamically merge markov decision processes. In Advances in Neural Information Processing Systems, volume 10.
[Sudderth et al., 2005b] E. Sudderth, A. Torralba, W. Freeman, & A. Willsky (2005). Describing visual scenes using transformed Dirichlet processes. Proc NIPS-05.
[Sudderth et al., 2005a] E. Sudderth, A. Torralba, W. Freeman, & A. Willsky (2005). Learning hierarchical models of scenes, objects, and parts. Proc. ICCV-05.
[Sutton and McCallum, 2005] Sutton, C. and McCallum, A. (2005). Composition of conditional random fields for transfer learning. Technical report, University of Massachusetts, Amherst, MA.
[Sutton et al., 1999] Sutton, R., Precup, D., and Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211.
[Taskar et al., 2001] Taskar, B., Segal, E., and Koller, D. (2001). Probabilistic clustering in relational data. In Proc. IJCAI, pages 870-876.
[Tadepalli et al., 2004] Prasad Tadepalli, Robert Givan, and Kurt Driessens (2004). Relational Reinforcement Learning: An Overview. In Proc. ICML-04 Workshop on Relational Reinforcement Learning.
[Teh et al., 2005] Y. W. Teh, M. I. Jordan, M. J. Beal and D. M. Blei (2005). Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems (NIPS) 17. (See also the extended technical report.)
[Thrun, 1996] Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing Systems, volume 8, pages 640-646.
[Wang and George, 2004] Xinlei Wang and Edward I. George (2004). A Hierarchical Bayes Approach to Variable Selection for Generalized Linear Models. Technical report SMU-TR-321, Department of Statistics, Southern Methodist University.
[Wu and Dietterich, 2004] Wu, P. and Dietterich, T. G. (2004). Improving svm accuracy by training on auxiliary data sources. In Proc. ICML, pages 871-878.