Читать книгу Artificial Intelligent Techniques for Wireless Communication and Networking - Группа авторов - Страница 25
Learning From Limited Samples
ОглавлениеThere are no different training and assessment environments for many actual systems. All training knowledge comes from the real system, and during training, the agent does not have a separate exploration policy as its exploratory acts do not come for free. Given this greater exploration expense, and the fact that very little of the state space is likely to be explored by logs for learning from, policy learning needs to be data-efficient. Control frequencies may be 1 h or even multi-month time steps (opportunities to take action) and even longer incentive horizons. One easy way to measure a model’s data efficiency is to look at the amount of data needed to meet a certain output threshold.