Читать книгу Artificial Intelligent Techniques for Wireless Communication and Networking - Группа авторов - Страница 24
1.4.2 Challenges Off-Line Learning
ОглавлениеTraining is also not possible directly online, but learning happens offline, using records from a previous iteration of the management system. Broadly speaking, we would like it to be the case that the new system version works better than the old one and that implies that we will need to perform off-policy assessment (predicting performance before running it on the actual system). There are a couple of approaches, including large sampling, for doing this. The introduction of the first RL version (the initial policy) is one special case to consider; there is also a minimum output requirement to be met before this is supposed to occur. The warm-start efficiency is therefore another important ability to be able to assess.