Читать книгу The Smart Cyber Ecosystem for Sustainable Development - Группа авторов - Страница 46

2.4.3.4 Reinforcement Learning

Similar to unsupervised learning in the sense that the machine has to learn by itself. However, a reward mechanism is applied to tune the algorithm based on observation of performance, enabling continuous self-update of the machine. Reinforcement learning algorithms try to define a model of the environment by determining the dynamics of the environment. The algorithm uses an agent which interacts with a dynamic environment in a trial-and-error manner. It provides feedback to the algorithm. The agent makes decisions on what actions to be performed to optimize the reward. A policy determines how the agent should behave at a given time. Thus, the algorithm learns by exploring the environment and exploiting the knowledge. The feedback from the environment is used to learn the best policy to optimize the cumulative reward.

The most commonly known reinforcement algorithm is the Q-Learning. The RL algorithm interacts with the environment to learn Q values. The Q value is initialized. The machine observes the current state, chooses an action from a set of possible actions, and performs the action. The algorithm observes the reward and the new state. The Q-value is updated based on the new state and the reward. Then, the state is set to the new state and the process repeats until a terminal state is reached.

The Smart Cyber Ecosystem for Sustainable Development

Подняться наверх