TY - GEN

T1 - Decentralized Bayesian Reinforcement Learning for Online Agent Collaboration

AU - Teacy, W. T. L.

AU - Chalkiadakis, G.

AU - Farinelli, A.

AU - Rogers, A.

AU - Jennings, N. R.

AU - McClean, S.

AU - Parr, G.

N1 - Reference text: [1] S. M. Aji and R. J. McEliece. The generalized distributive law. Information Theory, IEEE Transactions on, 46(2):325–343, 2000.
[2] D. S. Bernstein, S. Zilberstein, and N. Immerman. The complexity of decentralized control of markov decision processes. In Proceedings of UAI-2000, pages 32–37, 2000.
[3] C. Boutilier. Sequential optimality and coordination in multiagent systems. In Proceedings of IJCAI-99, pages 478–485, 1999.
[4] C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1–94, 1999.
[5] G. Chalkiadakis and C. Boutilier. Sequentially optimal repeated coalition formation under uncertainty. Autonomous Agents and Multi-Agent Systems, 24(3):441–484, 2012.
[6] R. Dearden, N. Friedman, and D. Andre. Model based bayesian exploration. In Proceedings of UAI’99, 1999.
[7] R. Dearden, N. Friedman, and S. Russell. Bayesian Q-Learning. In Proceedings of AAAI-98, 1998.
[8] M. DeGroot and M. Schervish. Probability & Statistics. Pearson Education, 3rd edition, 2002.
[9] A. Farinelli, A. Rogers, A. Petcu, and N. R. Jennings. Decentralised coordination of low-power embedded devices using the max-sum algorithm. In Proceedings of AAMAS 2008, pages 639–646, 2008.
[10] C. Guestrin, D. Koller, and R. Parr. Max-norm projections for factored mdps. In Proceedings of AAAI-01, pages 673–680, 2001.
[11] C. Guestrin, M. Lagoudakis, and R. Parr. Coordinated reinforcement learning. In Proceedings of ICML-02, pages 227–234, 2002.
[12] H. J. Kim. Moments of truncated student-t distribution. Journal of the Korean Statistical Society, 37:81–87, 2008.
[13] J. R. Kok and N. Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7:1789–1828, 2006.
[14] F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 42(2):498–519, 2001.
[15] J. Martin. Bayesian decision problems and Markov chains. Wiley, 1967.[16] R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked distributed pomdps: A synthesis of distributed constraint optimization and pomdps. In Proceedings of AAAI-05, pa
ges 133–139. 2005.
[17] D. Silver and J. Veness. Monte-carlo planning in large pomdps. In Neural Information Processing Systems 23, pages 2164–2172. 2010.
[18] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
[19] Y. Weiss and W. T. Freeman. On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47(2):723–735, 2001.

PY - 2012/6/4

Y1 - 2012/6/4

N2 - Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other hand, the single most pressing need in a cooperative multiagent system is to coordinate the local policies of autonomous agents with restricted capabilities to serve a system-wide goal. The presence of uncertainty makes this even more challenging, as the agents face the additional need to learn the unknown environment parameters while forming (and following) local policies in an online fashion. In this paper, we provide the first Bayesian reinforcement learning (BRL) approach for distributed coordination and learning in a cooperative multiagent system by devising two solutions to this type of problem. More specifically, we show how the Value of Perfect Information (VPI) can be used to perform efficient decentralised exploration in both model-based and model-free BRL, and in the latter case, provide a closed form solution for VPI, correcting a decade old result by Dearden, Friedman and Russell. To evaluate these solutions, we present experimental results comparing their relative merits, and demonstrate empirically that both solutions outperform an existing multiagent learning method, representative of the state-of-the-art.

AB - Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other hand, the single most pressing need in a cooperative multiagent system is to coordinate the local policies of autonomous agents with restricted capabilities to serve a system-wide goal. The presence of uncertainty makes this even more challenging, as the agents face the additional need to learn the unknown environment parameters while forming (and following) local policies in an online fashion. In this paper, we provide the first Bayesian reinforcement learning (BRL) approach for distributed coordination and learning in a cooperative multiagent system by devising two solutions to this type of problem. More specifically, we show how the Value of Perfect Information (VPI) can be used to perform efficient decentralised exploration in both model-based and model-free BRL, and in the latter case, provide a closed form solution for VPI, correcting a decade old result by Dearden, Friedman and Russell. To evaluate these solutions, we present experimental results comparing their relative merits, and demonstrate empirically that both solutions outperform an existing multiagent learning method, representative of the state-of-the-art.

KW - multiagent learning

KW - Bayesian techniques

KW - uncertainty

UR - http://www.ifaamas.org/

UR - http://www.ifaamas.org/

M3 - Conference contribution

BT - Unknown Host Publication

PB - International Foundation for Autonomous Agents and Multiagent Systems

ER -