Читать книгу Artificial Intelligent Techniques for Wireless Communication and Networking - Группа авторов - Страница 14
1.2 Comprehensive Study 1.2.1 Introduction
ОглавлениеIn most Artificial Intelligence (AI) subjects, we build mathematical structures to tackle problems. For RL, the Markov Decision Process (MDP) is the solution. It sounds complicated, but it provides a basic structure to model a complex problem. The world is observed and behavior performed by an individual (e.g. a human). Rewards are released, but they may be rare and delayed. The long-delayed incentives very often make it incredibly difficult to untangle the data and track what series of acts led to the rewards [11].
Markov decision process (MDP) Figure 1.2 is composed of:
State in MDP can be represented as raw images or we use sensors for robotic controls to calculate the joint angles, velocity, and pose of the end effector.
A movement in a chess game or pushing a robotic arm or a joystick may be an event.
The reward is very scarce for a GO match: 1 if we win or −1 if we lose. We get incentives more often. We score whenever we hit the sharks in the Atari Seaquest game (Figure 1.3).
If it is less than one the discount factor discounts potential incentives. In the future, money raised also has a smaller current value, and we will need it to further converge the solution for a strictly technical reason.
We can indefinitely rollout behaviour or limit the experience to N steps in time. This is called the horizon.
Figure 1.2 Markov process.
Figure 1.3 Raw images of State.
System dynamics is the transformation function. After taking action, it predicts the next condition. When we address model-based RL later, it is called the model that plays a significant role. RL’s ideas come from many areas of study, including the theory of power. In a particular setting, distinct notations can be used. It is possible to write the state as s or x, and the behavior as an or u. An action is the same as a control operation. We may increase the benefits or and the costs that are actually negative for each other [10].