Читать книгу Artificial Intelligent Techniques for Wireless Communication and Networking - Группа авторов - Страница 20

ii) Tuning the discount factor

Оглавление

When the model available to the agent is predicted from data, the policy discovered using a short iterative horizon will probably be better than a policy discovered with the true horizon. On the one hand, since the objective function is revised, artificially decreasing the planning horizon contributes to a bias. If a long planning horizon is focused, there is a greater chance of over fitting (the discount factor is close to 1). This over fitting can be conceptually interpreted as related to the aggregation of errors in the transformations and rewards derived from data in relation to the real transformation and reward chances [4].

Artificial Intelligent Techniques for Wireless Communication and Networking

Подняться наверх