DOI: 10.3724/SP.J.1004.2013.02021

Acta Automatica Sinica (自动化学报) 2013/39:12 PP.2021-2031

Tracking Learning Based on Gaussian Regression for Multi-agent Systems in Continuous Space

Improving adaption, realizing generalization in continuous space, and reducing dimensions are always viewed as the key issues for the implementation of multi-agent reinforcement learning (MARL) within continuous systems. To tackle them, the paper presents a learning mechanism and algorithm named model-based reinforcement learning with companion0s policy tracking for multi-agent systems (MAS MBRL-CPT). Stemming from the viewpoint to make the best responses to companions, a new expected immediate reward is defined, which merges the observation on companion's policy into the payoff fed back from the environment, and whose value is estimated online by stochastic approximation. Then a Q value function with dimension reduced is developed to set up Markov decision process (MDP) for strategy learning in multi-agent environment. Based on the model of state transition using Gaussian regression, the Q value functions w.r.t. the state-action samples for generalization are solved by dynamic programming, which then serve as the basic samples to realize the generalization of value functions and learned strategies. In the simulation of multi-cart-pole in continuous space, even if the dynamics and companions0 strategies are unknown in priori, MBRL-CPT entitles the learning agent to learn the tracking strategy to cooperate with its companions. The performance of MBRL-CPT shows its high efficiency and good generalization ability.

Key words:Continuous state space, multi-agent systems (MAS), model-based reinforcement learning (MBRL), Gaussian regression (GR)

ReleaseDate:2014-07-21 17:04:34

Funds:National Natural Science Foundation of China (61074058)