Читать книгу Artificial Intelligent Techniques for Wireless Communication and Networking - Группа авторов - Страница 29

Reward Functions

Device or product owners do not have a good image of what they want to refine in certain instances. The incentive function is always multidimensional and involves different sub-goals to be balanced. Another great insight here which reminds me of machine latency discussions) is that ‘normal performance’ (i.e. expectation) is always an inadequate measure, and for all task instances, the system needs to perform well. A common approach is to use a Conditional Value at Risk (CVaR) target to measure the full distribution of rewards across classes, which looks at a given percentile of the distribution of rewards rather than the predicted reward.

Artificial Intelligent Techniques for Wireless Communication and Networking

Подняться наверх