Читать книгу Artificial Intelligent Techniques for Wireless Communication and Networking - Группа авторов - Страница 22
1.3.2 Policy-Based Method
ОглавлениеIn the modern world, the number of potential acts may be very high or unknown. For instance, a robot learning to move on open fields may have millions of potential actions within the space of a minute. In these conditions, estimating Q-values for each action is not practicable. Policy-based approaches learn the policy specific function, without computing a cost function for each action. An illustration of a policy-based algorithm is given by Policy Gradient (Figure 1.5).
Policy Gradient, simplified, works as follows:
1 Requires a condition and gets the probability of some action based on prior experience
2 Chooses the most possible action
3 Reiterates before the end of the game and evaluates the total incentives
4 Using back propagation to change connection weights based on the incentives.
Figure 1.5 Policy based learning.