Читать книгу Machine Learning For Dummies - John Paul Mueller, John Mueller Paul, Luca Massaron - Страница 31

JUST HOW BIG IS BIG?

Big data can really become quite big. For example, suppose that your Google self-driving car has a few HD cameras and a couple hundred sensors that provide information at a rate of 100 times/s. What you might end up with is a raw dataset with input that exceeds 100 Mbps. Processing that much data is incredibly hard.

Part of the problem right now is determining how to control big data. Currently, the attempt is to log everything, which produces a massive, detailed dataset. However, this dataset isn’t well formatted, again making it quite hard to use. As this book progresses, you discover techniques that help control both the size and the organization of big data so that the data becomes useful in making predictions.

When thinking about big data, you also consider anonymity. Big data presents privacy concerns. However, because of the way machine learning works, knowing specifics about individuals isn’t particularly helpful anyway. Machine learning is all about determining patterns — analyzing training data in such a manner that the trained algorithm can perform tasks that the developer didn’t originally program it to do. Personal data has no place in such an environment.

Finally, big data is so large that humans can’t reasonably visualize it without help. Part of what defines big data as big is the fact that a human can learn something from it, but the sheer magnitude of the dataset makes recognition of the patterns impossible (or would take a really long time to accomplish). Machine learning helps humans make sense of and use big data.

Подняться наверх