Multi-Agent Coordination
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Amit Konar. Multi-Agent Coordination
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Multi‐Agent Coordination. A Reinforcement Learning Approach
Preface
Acknowledgments
About the Authors
1 Introduction: Multi‐agent Coordination by Reinforcement Learning and Evolutionary Algorithms
1.1 Introduction
1.2 Single Agent Planning
1.2.1 Terminologies Used in Single Agent Planning. Definition 1.1
Definition 1.2
Definition 1.3
Definition 1.4
Definition 1.5
Definition 1.6
Definition 1.7
Definition 1.8
Example 1.1
1.2.2 Single Agent Search‐Based Planning Algorithms
1.2.2.1 Dijkstra's Algorithm
1.2.2.2 A* (A‐star) Algorithm
Definition 1.9
Definition 1.10
Algorithm 1.1 Dijkstra's Algorithm
Algorithm 1.2 A* Algorithm
Example 1.2
1.2.2.3 D* (D‐star) Algorithm
1.2.2.4 Planning by STRIPS‐Like Language
1.2.3 Single Agent RL
1.2.3.1 Multiarmed Bandit Problem
1.2.3.2 DP and Bellman Equation
1.2.3.3 Correlation Between RL and DP
1.2.3.4 Single Agent Q‐Learning
Definition 1.11
Algorithm 1.3 Single Agent Q‐Learning
1.2.3.5 Single Agent Planning Using Q‐Learning
1.3 Multi‐agent Planning and Coordination
1.3.1 Terminologies Related to Multi‐agent Coordination
Definition 1.12
Definition 1.13
1.3.2 Classification of MAS
1.3.3 Game Theory for Multi‐agent Coordination
Definition 1.14
Definition 1.15
Definition 1.16
Example 1.3
1.3.3.1 Nash Equilibrium
Definition 1.17
Pure Strategy NE
Mixed Strategy NE
Example 1.4
Example 1.5
1.3.3.2 Correlated Equilibrium
Definition 1.18
Example 1.6
1.3.3.3 Static Game Examples
Example 1.7
Example 1.8
1.3.4 Correlation Among RL, DP, and GT
1.3.5 Classification of MARL
1.3.5.1 Cooperative MARL
Static
Independent Learner and Joint Action Learner
Algorithm 1.4 Independent Learners
Algorithm 1.5 Joint Action Learners
Frequency Maximum Q‐Value heuristic
Algorithm 1.6 FMQ heuristic
Dynamic
Team‐Q
Algorithm 1.7 Team‐Q
Distributed‐Q
Algorithm 1.8 Distributed Q‐learning
Example 1.9
Example 1.10
Optimal Adaptive Learning
Algorithm 1.9 Optimal Adaptive Learning
Sparse Cooperative Q‐learning
Sequential Q‐learning
Algorithm 1.10 Joint Action Formation in SQL
Frequency of the Maximum Reward Q‐learning
Algorithm 1.11 FMRQ for an Agent i in Repeated Games
1.3.5.2 Competitive MARL
Minimax‐Q Learning
Algorithm 1.12 Minimax Q‐learning
Heuristically Accelerated Multi‐agent Reinforcement Learning
Algorithm 1.13 HAMRL for Zero‐sum Game
1.3.5.3 Mixed MARL
Static
Belief‐Based Learning Rule
Algorithm 1.14 Bully Algorithm
Algorithm 1.15 Meta Strategy Algorithm
Algorithm 1.16 AWESOME Algorithm
Direct Policy Search Based Algorithm
Dynamic
Equilibrium Dependent
Algorithm 1.17 NQL in General‐sum Game
Algorithm 1.18 Correlated‐Q Learning
Algorithm 1.19 Asymmetric‐Q Learning for the Leader
Algorithm 1.20 Asymmetric‐Q Learning for the Follower
Algorithm 1.21 Friend‐or Foe‐Q Learning
Definition 1.19
Definition 1.20
Algorithm 1.22 Negotiation to Evaluate the PSNE for Agent i in a Normal‐form Game
Algorithm 1.23 Negotiation to Evaluate the Nonstrict EDSP for Agent i in a Normal‐form Game
Algorithm 1.24 Negotiation‐Q Learning for Agent i in a Markov Game
Algorithm 1.25 Equilibrium Transfer‐Based MAQL
Equilibrium Independent
Algorithm 1.26 Policy Hill‐Climbing (PHC)
Algorithm 1.27 Win or Learn Fast‐PHC (WoLF‐PHC)
Algorithm 1.28 Non‐Stationary Converging Policies
Algorithm 1.29 EXORL for Agent i
1.3.6 Coordination and Planning by MAQL
1.3.7 Performance Analysis of MAQL and MAQL‐Based Coordination
1.4 Coordination by Optimization Algorithm
1.4.1 PSO Algorithm
Algorithm 1.30 Particle Swarm Optimization (PSO)
1.4.2 Firefly Algorithm
1.4.2.1 Initialization
1.4.2.2 Attraction to Brighter Fireflies
1.4.2.3 Movement of Fireflies
1.4.3 Imperialist Competitive Algorithm
Algorithm 1.31 Traditional Firefly Algorithm (FA)
1.4.3.1 Initialization
1.4.3.2 Selection of Imperialists and Colonies
1.4.3.3 Formation of Empires
1.4.3.4 Assimilation of Colonies
1.4.3.5 Revolution
1.4.3.6 Imperialistic Competition
Total Empire Power Evaluation
Reassignment of Colonies and Removal of Empire
Union of Empires
1.4.4 Differential Evolution Algorithm
1.4.4.1 Initialization
1.4.4.2 Mutation
1.4.4.3 Recombination
1.4.4.4 Selection
1.4.5 Off‐line Optimization
1.4.6 Performance Analysis of Optimization Algorithms
1.4.6.1 Friedman Test
1.4.6.2 Iman–Davenport Test
1.5 Summary
References
2 Improve Convergence Speed of Multi‐Agent Q‐Learning for Cooperative Task Planning
2.1 Introduction
2.2 Literature Review
2.3 Preliminaries
2.3.1 Single Agent Q‐learning
2.3.2 Multi‐agent Q‐learning
Definition 2.1
Definition 2.2
Definition 2.3
Algorithm 2.1 NE/CE‐Based Multi‐agent‐Q Learning
2.4 Proposed MAQL
Definition 2.4
2.4.1 Two Useful Properties
Statute 2.1
Property 2.1
Proof:
Definition 2.5
Property 2.2
Proof
2.5 Proposed FCMQL Algorithms and Their Convergence Analysis
2.5.1 Proposed FCMQL Algorithms
Algorithm 2.2 Fast Cooperative Multi‐agent Q‐Learning (FCMQL)
2.5.2 Convergence Analysis of the Proposed FCMQL Algorithms
Theorem 2.1
Proof
2.6 FCMQL‐Based Cooperative Multi‐agent Planning
Definition 2.6
Theorem 2.2
Proof
Algorithm 2.3 FCMQL‐Based Cooperative Multi‐agent Planning
2.7 Experiments and Results
Experiment 2.1 Study of Convergence Speed
Experiment 2.2 Planning Performance
Experiment 2.3 Run‐Time Complexity
Experiment 2.4 Real‐Time Planning
2.8 Conclusions
2.9 Summary
2.A More Details on Experimental Results. 2.A.1 Additional Details of Experiment 2.1
2.A.2 Additional Details of Experiment 2.2
2.A.3 Additional Details of Experiment 2.4
References
3 Consensus Q‐Learning for Multi‐agent Cooperative Planning
3.1 Introduction
3.2 Preliminaries
3.2.1 Single Agent Q‐Learning
3.2.2 Equilibrium‐Based Multi‐agent Q‐Learning
Definition 3.1
Definition 3.2
Algorithm 3.1 Equilibrium‐Based MAQL
3.3 Consensus
Definition 3.3
Definition 3.4
Definition 3.5
3.4 Proposed CoQL and Planning
3.4.1 Consensus Q‐Learning
Theorem 3.1
Proof
Algorithm 3.2 Consensus Q‐Learning (CoQL)
3.4.2 Consensus‐Based Multi‐robot Planning
Algorithm 3.3 Consensus‐Based Planning
3.5 Experiments and Results
3.5.1 Experimental Setup
3.5.2 Experiments for CoQL
3.5.3 Experiments for Consensus‐Based Planning
3.6 Conclusions
3.7 Summary
References
4 An Efficient Computing of Correlated Equilibrium for Cooperative Q‐Learning‐Based Multi‐Robot Planning
4.1 Introduction
4.2 Single‐Agent Q‐Learning and Equilibrium‐Based MAQL
4.2.1 Single Agent Q‐Learning
4.2.2 Equilibrium‐Based MAQL
Definition 4.1
4.3 Proposed Cooperative MAQL and Planning
4.3.1 Proposed Schemes with Their Applicability
4.3.2 Immediate Rewards in Scheme‐I and ‐II
Definition 4.2
4.3.3 Scheme‐I‐Induced MAQL
Note 4.1
Note 4.2
Note 4.3
Theorem 4.1
Proof
Theorem 4.2
Proof
4.3.4 Scheme‐II‐Induced MAQL
Definition 4.3
Note 4.4
Lemma 4.1
Proof
Lemma 4.2
Proof
Lemma 4.3
Proof
Lemma 4.4
Proof
Lemma 4.5
Proof
Lemma 4.6
Proof
Theorem 4.3
Proof
Theorem 4.4
Proof
4.3.5 Algorithms for Scheme‐I and II
Algorithm 4.1 Scheme‐I and –II‐Induced ΩQL (ΩQL‐I and ΩQL‐II)
4.3.6 Constraint ΩQL‐I/ΩQL‐II(CΩQL‐I/CΩQL‐II)
4.3.7 Convergence
Algorithm 4.2 Constraint ΩQL‐I/ΩQL‐II (CΩQL‐I/CΩQL‐II)
Lemma 4.7
Proof
Lemma 4.8
Proof
Lemma 4.9
Proof
Theorem 4.5
Proof
Theorem 4.6
Proof
4.3.8 Multi‐agent Planning
Algorithm 4.3 Ω Multi‐agent Planning (ΩMP)
Algorithm 4.4 Constraint Ω Multi‐agent Planning (CΩMP)
4.4 Complexity Analysis
4.4.1 Complexity of CQL
4.4.1.1 Space Complexity. Learning
Planning
4.4.1.2 Time Complexity. Learning
Planning
4.4.2 Complexity of the Proposed Algorithms
4.4.2.1 Space Complexity. Learning
Planning
4.4.2.2 Time Complexity. Learning
Planning
Space‐complexity Analysis of Stick‐Carrying and Triangle‐Carrying Problem
4.4.3 Complexity Comparison
4.4.3.1 Space Complexity
4.4.3.2 Time Complexity
4.5 Simulation and Experimental Results
4.5.1 Experimental Platform
4.5.1.1 Simulation
4.5.1.2 Hardware
4.5.2 Experimental Approach
4.5.2.1 Learning Phase
4.5.2.2 Planning Phase
4.5.3 Experimental Results
Experiment 4.1 Performance Test of the Proposed Learning Algorithms
Experiment 4.2 Object (Box)‐Carrying By Scheme‐I‐Based Proposed Planning Algorithms
Experiment 4.3 Stick‐ or Triangle‐Carrying By Model‐II‐Based Proposed Planning Algorithms
4.6 Conclusion
4.7 Summary
Appendix 4.A Supporting Algorithm and Mathematical Analysis
Algorithm 4.A.1 Correlated Q‐Learning (CQL)
References
5 A Modified Imperialist Competitive Algorithm for Multi‐Robot Stick‐Carrying Application
5.1 Introduction
5.2 Problem Formulation for Multi‐Robot Stick‐Carrying
5.3 Proposed Hybrid Algorithm
5.3.1 An Overview of ICA
5.3.1.1 Initialization
5.3.1.2 Selection of Imperialists and Colonies
5.3.1.3 Formation of Empires
5.3.1.4 Assimilation of Colonies
5.3.1.5 Revolution
5.3.1.6 Imperialistic Competition
5.3.1.6.1 Total Empire Power Evaluation
5.3.1.6.2 Reassignment of Colonies and Removal of Empire
5.3.1.6.3 Union of Empires
5.4 An Overview of FA
5.4.1 Initialization
5.4.2 Attraction to Brighter Fireflies
5.4.3 Movement of Fireflies
5.5 Proposed ICFA
5.5.1 Assimilation of Colonies
5.5.1.1 Attraction to Powerful Colonies
5.5.1.2 Modification of Empire Behavior
5.5.1.3 Union of Empires
Algorithm 5.1 Imperialist Competitive Firefly Algorithm (ICFA)
5.6 Simulation Results
5.6.1 Comparative Framework
5.6.2 Parameter Settings
5.6.3 Analysis on Explorative Power of ICFA
5.6.4 Comparison of Quality of the Final Solution
5.6.5 Performance Analysis
5.7 Computer Simulation and Experiment
5.7.1 Average Total Path Deviation (ATPD)
5.7.2 Average Uncovered Target Distance (AUTD)
5.7.3 Experimental Setup in Simulation Environment
5.7.4 Experimental Results in Simulation Environment
5.7.5 Experimental Setup with Khepera Robots
5.7.6 Experimental Results with Khepera Robots
5.8 Conclusion
5.9 Summary
Appendix 5.A Additional Comparison of ICFA
References
6 Conclusions and Future Directions
6.1 Conclusions
6.2 Future Directions
Index. a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
z
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
IEEE Press 445 Hoes Lane Piscataway, NJ 08854
.....
Unlike CQL, Chapter 4 proposes an attractive approach to adapt composite rewards of all the agents in one Q‐table in joint state–action space during learning, and subsequently, these rewards are employed to compute CE in the planning phase. Two separate models of multi‐agent Q‐learning have been proposed. If the success of only one agent is enough to make the team successful, then model‐I is employed. However, if an agent's success is contingent upon other agents and simultaneous success of the agents is required, then model‐II is employed. It is also shown that the CE obtained by the proposed algorithms and by the traditional CQL are identical. In order to restrict the exploration within the feasible joint states, constraint versions of the said algorithms are also proposed. Complexity analysis and experiments have been undertaken to validate the performance of the proposed algorithms in multi‐robot planning on both simulated and real platforms.
Chapter 5 hybridizes the Firefly Algorithm (FA) and the Imperialist Competitive Algorithm (ICA). The above‐explained hybridization results in the Imperialist Competitive Firefly Algorithm (ICFA), which is employed to determine the time‐optimal trajectory of a stick, being carried by two robots, from a given starting position to a predefined goal position amidst static obstacles in a robot world map. The motion dynamics of fireflies of the FA is embedded into the sociopolitical evolution‐based meta‐heuristic ICA. Also, the trade‐off between the exploration and exploitation is balanced by modifying the random walk strategy based on the position of the candidate solutions in the search space. The superiority of the proposed ICFA is studied considering run‐time and accuracy as the performance metrics. Finally, the proposed algorithm has been verified in a real‐time multi‐robot stick‐carrying problem.
.....