Читать книгу Machine Learning For Dummies - John Paul Mueller, John Mueller Paul, Luca Massaron - Страница 30

Defining Big Data

Оглавление

Big data is substantially different from being just a large database. Yes, big data implies lots of data, but it also includes the idea of complexity and depth. A big data source describes something in enough detail that you can begin working with that data to solve problems for which general programming proves inadequate.

As an example of big data complexity, consider Google’s self-driving cars (https://waymo.com/). The car must consider not only the mechanics of the car’s hardware and position with space but also the effects of human decisions, road conditions, environmental conditions, and other vehicles on the road, which is why our roads aren’t crowded with them yet (see https://www.vox.com/future-perfect/2020/2/14/21063487/self-driving-cars-autonomous-vehicles-waymo-cruise-uber). It’s not hard to imagine some of the human-specific issues that self-driving cars will need to address, such as people taking a nap when they should be watching the road even with the self-driving car in control (https://robbreport.com/motors/cars/canadian-police-arrest-sleeping-driver-tesla-autopilot-1234570071/).

The data source for a self-driving car (or any other complex endeavor for that matter) contains many variables — all of which affect the vehicle in some way. Traditional programming might be able to crunch all the numbers, but not in real time. You don’t want the car to crash into a wall and have the computer finally decide five minutes later that the car is going to crash into a wall. The processing must prove timely so that the car can avoid the wall.

The acquisition of big data can also prove daunting. The sheer bulk of the dataset isn’t the only problem to consider — also essential is to consider how the dataset is stored and transferred so that the system can process it. In most cases, developers try to store the dataset in memory to allow fast processing. Using a hard drive to store the data would prove too costly, time-wise.

Machine Learning For Dummies

Подняться наверх