Читать книгу Artificial Intelligent Techniques for Wireless Communication and Networking - Группа авторов - Страница 28
Partial Observability
ОглавлениеIt is partly measurable for almost all real systems where we would like to incorporate reinforcement learning. For example, the efficiency of mechanical parts may deteriorate over time, ‘identical’ widgets may exhibit performance variations provided the same control inputs, or it may simply be unknown the condition of certain parts of the system (e.g. the mental state of users of a suggested system).
Two common strategies to dealing with partial observability, including input history, and modelling history using repeated networks in the model. In addition, Robust MDP formalisms provide clear mechanisms to ensure that sensor and action noise and delays are robust to agents. If a given deployment setting may have initially unknown but learnable noise sources, then techniques for device detection may be used to train a policy that can learn in which environment it operates.