Читать книгу Multi-Agent Coordination - Amit Konar - Страница 4

List of Illustrations

1 Chapter 1Figure 1.1 Single agent system.Figure 1.2 Three discrete states in an environment.Figure 1.3 Robot executing action Right (R) at state s1 and moves to the nex...Figure 1.4 Deterministic state‐transition.Figure 1.5 Stochastic state‐transition.Figure 1.6 Two‐dimensional 5 × 5 grid environment.Figure 1.7 Refinement approach in robotics.Figure 1.8 Hierarchical tree.Figure 1.9 Hierarchical model.Figure 1.10 Two‐dimensional 3 × 3 grid environment.Figure 1.11 Corresponding graph of Figure 1.10.Figure 1.12 Two‐dimensional 3 × 3 grid environment with an obstacle.Figure 1.13 Structure of reinforcement learning.Figure 1.14 Variation of average reward with the number of trial for differe...Figure 1.15 Correlation between the RL and DP.Figure 1.16 Single agent Q‐learning.Figure 1.17 Possible next state in stochastic situation.Figure 1.18 Single agent planning.Figure 1.19 Multi‐agent system with m agents.Figure 1.20 Robots executing joint action <R, L> at joint state <1, 8> and m...Figure 1.21 Classification of multi‐robot systems.Figure 1.22 Hands gestures in rock‐paper‐scissor game: (a) rock, (b) paper, ...Figure 1.23 Rock‐paper‐scissor game.Figure 1.24 Reward mapping from joint Q‐table to reward matrix.Figure 1.25 Pure strategy Nash equilibrium evaluation. (a) Fix A1 = L and A2...Figure 1.26 Evaluation of mixed strategy Nash equilibrium.Figure 1.27 Reward matrix for tennis game.Figure 1.28 Reward matrix of in a common reward two‐agent static game.Figure 1.29 Pure strategy Egalitarian equilibrium, which is one variant of C...Figure 1.30 Game of chicken.Figure 1.31 Reward matrix in the game of chicken.Figure 1.32 Constant‐sum game.Figure 1.33 Matching pennies.Figure 1.34 Reward matrix in Prisoner's Dilemma game.Figure 1.35 Correlation among the MARL, DP, and GT.Figure 1.36 Classification of multi‐agent reinforcement learning.Figure 1.37 The climbing game reward matrix.Figure 1.38 The penalty game reward matrix.Figure 1.39 The penalty game reward matrix.Figure 1.40 Individual Q‐values obtained in the climbing game reward matrix ...Figure 1.41 The penalty game reward matrix.Figure 1.42 Individual Q‐values obtained in the penalty game reward matrix b...Figure 1.43 Reward matrix of a three‐player coordination game.Figure 1.44 Reward matrix in a two‐player two‐agent game.Figure 1.45 Nonstrict EDNP in normal‐form game.Figure 1.46 Multistep negotiation process between agent A and B.Figure 1.47 Multi‐robot coordination for the well‐known stick‐carrying probl...Figure 1.48 Multi‐robot local planning by swarm/evolutionary algorithm.Figure 1.49 Surface plot of (1.97).Figure 1.50 Surface plot of (1.98).Figure 1.51 Steps of Differential evolution (DE) algorithm [132].
2 Chapter 2Figure 2.1 Block diagram of reinforcement leaning (RL).Figure 2.2 Experimental workspace for two agents during the learning phase....Figure 2.3 Convergence plot of NQLP12 and reference algorithms for two agent...Figure 2.4 Average of average reward (AAR) plot of NQLP12 and reference algo...Figure 2.5 Joint action selection strategy in EQLP12 and reference algorithm...Figure 2.6 Cooperative path planning to carry a triangle by three robots in ...Figure 2.7 Cooperative path planning to carry a stick by two Khepera‐II mobi...Figure 2.8 Cooperative path planning to carry a stick by two Khepera‐II mobi...Figure 2.A.1 Convergence plot of FCMQL and reference algorithms for two agen...Figure 2.A.2 Convergence plot of EQLP12 and reference algorithms for three a...Figure 2.A.3 Convergence plot of EQLP12 and reference algorithms for four ag...Figure 2.A.4 CR versus learning epoch plot for FCMQL and reference algorithm...Figure 2.A.5 Average of average reward (AAR) plot of FCMQL and reference alg...Figure 2.A.6 Average of average reward (AAR) plot of EQLP12 and reference al...Figure 2.A.7 Average of average reward (AAR) plot of EQLP12 and reference al...Figure 2.A.8 Joint action selection strategy in EQLP12 and reference algorit...Figure 2.A.9 Joint action selection strategy in EQLP12 and reference algorit...Figure 2.A.10 Path planning with stick in deterministic situation by: (a) NQ...Figure 2.A.11 Path planning with stick in stochastic situation by: (a) NQIMP...Figure 2.A.12 Path planning with triangle in stochastic situation by: (a) NQ...Figure 2.A.13 Path planning with square in stochastic situation by: (a) NQIM...Figure 2.A.14 Path planning with square in deterministic situation by: (a) N...
3 Chapter 3Figure 3.1 Equilibrium selection in multi‐agent system. (a) Two UE (ax and b...Figure 3.2 AAR versus learning epoch for two‐agent system.Figure 3.3 AAR versus learning epoch for three‐agent system.Figure 3.4 Planning path offered by the consensus‐based multi‐agent planning...Figure 3.5 Planning path offered by the Nash Q‐learning‐based planning algor...
4 Chapter 4Figure 4.1 Corner cell, boundary cell, and other cell.Figure 4.2 Feasible joint states for two‐agent systems in stick‐carrying pro...Figure 4.3 Convergence comparison of ΩQL, CΩQL, NQL, FQL, and CQL algorithms...Figure 4.4 Convergence comparison of ΩQL, CΩQL, NQL, FQL, and CQL algorithms...Figure 4.5 Convergence comparison of ΩQL, CΩQL, NQL, FQL, and CQL algorithms...Figure 4.6 (Map 4.1) Planning with box by CQIP, CΩMP, and ΩMP algorithms.Figure 4.7 (Map 4.1) Planning using Khepera‐II mobile robot by CQIP, CΩMP, a...Figure 4.8 (Map 4.2) Planning with stick by CQIP, CΩMP, and ΩMP algorithms....Figure 4.9 (Map 4.2) Path planning using Khepera‐II mobile robot by CQIP, CΩ...Figure 4.10 (Map 4.3) Path planning with triangle employing CQIP, CΩMP, and ...
5 Chapter 5Figure 5.1 Diagram illustrating the calculation of d.Figure 5.2 Evolution of the expected population variance.Figure 5.3 Relative performance in mean best objective function versus funct...Figure 5.4 Relative performance in mean best objective function versus funct...Figure 5.5 Relative performance in mean best objective function versus funct...Figure 5.6 Relative performance in mean best objective function versus funct...Figure 5.7 Relative performance in accuracy versus function evaluation for I...Figure 5.8 Variation of FEs required for convergence to predefined threshold...Figure 5.9 Graphical representation of Bonferroni–Dunn's procedure consideri...Figure 5.10 Initial (a) and final configuration of the world map after execu...Figure 5.11 Average total path traversed versus number of obstacles.Figure 5.12 Average total path deviation versus number of obstacles.Figure 5.13 Average uncovered target distance versus number of steps with nu...Figure 5.14 Final configuration of the world map after experiment using Khep...

Подняться наверх