Читать книгу Intelligent Security Management and Control in the IoT - Mohamed-Aymen Chalouf - Страница 49

2.6. Performance evaluation

After having described our access controller proposition, in this section we evaluate its performances, using a simulation environment that we have built in Simpy (2020).

We have considered an NB-IoT antenna in which access requests arrive according to a Poisson distribution with an average rate between two arrivals of 0.018 s. We have considered a number of preambles N equal to 16, with an arrival frequency equal to 0.1 s. In the system considered, each device attempting access will be able to do so a maximum of 16 times. Beyond this limit, the terminal abandons transmission.

Our controller’s performance, which is based on the TD3 technique, is compared to an adaptive approach. We have considered a measurement horizon H equal to 10. Use of a larger measurement window does not allow a significant improvement in performances, which means that a window of 10 measurements makes it possible to reflect sufficiently the real state of the network.

The adaptive approach consists of gradually increasing the blocking probability when the number of attempts is beyond a predefined threshold above the optimal value. When a value is below a predefined threshold below the optimal value, the blocking probability is gradually reduced, to allow more terminals to attempt access.

In Figures 2.7 and 2.8, the blocking probabilities for both strategies considered are expressed. The adaptive technique (Figure 2.7) starts with an access probability of 1 and adapts itself according to the traffic conditions, which change following a Poisson distribution. For the strategy, which is based on the TD3 algorithm, there is an initial stage lasting 200 s, where the algorithm tries to explore the action space according to a uniform law (Figure 2.8). It is only after this stage that the algorithm begins to make use of its learning, which is refined in line with its experiences.

We can note that under TD3 (Figure 2.8), future actions have no links with past actions, unlike the adaptive case. In fact, the values of the actions can change completely, because they depend only on the state of the network, which can change very quickly.

Figures 2.9 and 2.10 describe the impact of control laws, described previously, on the average latency of the access attempts. In these plots, we do not consider the terminals that have abandoned transmission of sets of a number of maximum attempts. Even though we can note, in Figure 2.10, some terminals with latencies slightly higher than those in Figure 2.9, the latency is globally of the same order, that is, the TD3 algorithm does not show any advantage in terms of latency.

Figure 2.7. Access probability with the adaptive controller

Figure 2.8 Access probability with the controller using TD3

Figure 2.9. Average latency of the terminals with the adaptive controller

Figure 2.10. Average latency of the terminals with the controller using TD3

Even though TD3 does not show any particular advantage in terms of latency, we can see in Figure 2.12 that after an exploration stage, the revenue improves very significantly. This recompense is clearly higher than for the adaptive controller, which shows a reduced and very variable recompense (Figure 2.11). In fact, the average of the recompense in TD3 is in the order of 13.91%, while the adaptive controller shows a recompense in the order of 3.6%. This recompense reflects the fact that under TD3, the average number of terminals attempting access gets closer to the optimum. This result, perhaps also shown in Figure 2.14, shows that the number of attempts with TD3 is closer still to the optimum which is equal to 15.49. In fact, the average number of attempts using the adaptive controller is equal to 30.12 (Figure 2.13), while it is equal to 19.6 for our approach.

Figure 2.11. The average recompense with the adaptive controller

Figure 2.12. The average recompense with the controller using TD3

We can note that in Figure 2.13 the adaptive technique does not make it possible to control correctly the number of attempts. In fact, we very often reach numbers significantly higher than the optimum. This triggers many collisions at access and new access attempts. We also see that the number of abandonments remains relatively significant compared to the TD3 controller (Figure 2.14). The latter, after the exploration stage, succeeds in significantly reducing the number of abandonments, which demonstrates the effectiveness of the proposed approach.

Figure 2.13. Access attempts (blue) and abandonments (red) with the adaptive controller. For a color version of this figure, see www.iste.co.uk/chalouf/intelligent.zip

Figure 2.14. Access attempts (blue) and abandonments (red) with the controller using TD3. For a color version of this figure, see www.iste.co.uk/chalouf/intelligent.zip

It should be noted that having recourse to our approach based on reinforcement learning, we have an improvement in performance with each access attempt. The limit, however, is found in the estimation errors, which lead to errors in calculating the recompense, hence the importance of having precise estimators.

Intelligent Security Management and Control in the IoT

Подняться наверх