Читать книгу Cyberphysical Smart Cities Infrastructures - Группа авторов - Страница 44

3.4 Simulators

Now that we know about the fields and tasks that embodied AI can shine in, the question is how our agents should be trained. One may say it is good to directly train in the physical world and expose them to its richness. Although a valid solution, this choice comes with a few drawbacks. First, the training process in the real world is slow, and the process cannot be sped up or parallelized. Second, it is very hard to control the environment and create custom scenarios. Third, it is expensive, both in terms of power and time. Fourth, it is not safe, and improperly trained or not fully trained robots can hurt themselves, humans, animals, and other assets. Fifth, for the agent to generalize the training, it has to be done in plenty of different environments that is not feasible in this case.

Our next choice is simulators, which can successfully deal with all the aforementioned problems pretty well. In the shift from Internet AI to embodied AI, simulators take the role that was previously played by traditional datasets. Additionally, one more advantage of using simulators is that the physics in the environment can be tweaked as well. For instance, some traditional approaches in this field [64] are sensitive to noise, and for the remedy, the noise in the sensors can be turned off for the purpose of this task.

As a result, agents nowadays are often developed and benchmarked in simulators [65, 66], and once a promising model has been trained and tested, it can then be transferred to the physical world [67, 68].

House3D [69], AI2‐THOR [70], Gibson [71], CHALET [72], MINOS [73], and Habitat [74] are some of the popular simulators for the embodied AI studies. These platforms vary with respect to the 3D environments they use, the tasks they can handle, and the evaluation protocols they provide. These simulators support different sensors such as vision, depth, touch, and semantic segmentation.

In this chapter, we mainly focus on MINOS and Habitat since they provide more customization abilities (number of sensors, their positions, and their parameters) and are implemented in a loosely coupled manner to generalize well to new multisensory tasks and environments. As their API can be used to define any high‐level task and the material, object clutter variation and many more can be programmatically configured for the environment. They both support navigation with both continuous and discrete state spaces. Also, for the purpose of their benchmarks, all the actuators are noiseless, but they both have the ability to enable noises if desired [75].

In the last section, we saw numerous task definitions and how they each can be tackled by the agents. So, before jumping into MINOS and Habitat simulators and reviewing them, let us first get more familiarized with the three main goal‐directed navigation tasks, namely, PointGoal Navigation, ObjectGoal Navigation, and RoomGoal Navigation.

In PointGoal Navigation, an agent is appeared at a random starting position and orientation in a 3D environment and is asked to navigate to target coordinates that are given relative to the agent's position. The agent can access its position via an indoor GPS. There exists no ground‐truth map, and the agent must only use its sensors to do the task. The scenarios start the same for ObjectGoal Navigation and RoomGoal Navigation as well; however, instead of coordinates, the agent is asked to find an object or go to a specific room.

Cyberphysical Smart Cities Infrastructures

Подняться наверх