Читать книгу Communication Networks and Service Management in the Era of Artificial Intelligence and Machine Learning - Группа авторов - Страница 37
2.2.3 Reinforcement Learning
ОглавлениеThe goal of reinforcement learning is to learn the best sequence of actions (policy) in a given environment to maximize the cumulative reward. Figure 2.3 shows an overview of the reinforcement learning model. In this case, reinforcement learning model acts as a decision‐making agent, making actions in an environment and receives rewards/penalties while trying to solve a problem. In reinforcement learning problems, the environment is in a certain state (from a set of possible states) at any given time. The state information may be complete (Markov) or incomplete (non‐Markov). The agent has a set of actions (from a set of possible actions), and when an action is taken, the state of the environment changes. Thus, unlike unsupervised or supervised learning, reinforcement learning explicitly interacts with the “task”. The model is built interactively with the task, not independently from the task. At each time step, a reward signal is typically assumed, where the reward might just be “you have not failed.” Indeed, there might never be any “ultimate reward” other than to maximize the duration between failures, or maximize the number of packets routed. In supervised learning, the data label explicitly tells us what to do. Conversely, reinforcement models might attempt to learn a function describing the relative “value” of being in each state. Decision‐making would then simplify to identifying the action that moved the current state to the next state with most “value.” Reinforcement learning is therefore also explicitly engaged in establishing the order in which it is exposed to state from the task. This is again distinct from either supervised or unsupervised learning in which the data is generally assumed to conform to the independent and identically distributed (i.i.d.) assumption. Moreover, when complete information is available, a reinforcement learning agent may make optimal decisions from the current state alone.1 However, when complete state information is not present, then the reinforcement learning agent would additionally have to develop internal models of state that extend state to previously visited values. Needless to say, this requirement has implications for the representation adopted as well as the process of credit assignment. Reinforcement learning algorithms have a wider spectrum of applications than supervised learning algorithms, however, they might take a longer time to converge given that the feedback is less explicit than with supervised and unsupervised learning. It should be noted here that the application of reinforcement learning in network and service management is developing rapidly and we see more and more impressive results in the field [14–16].