Читать книгу Data Lakes For Dummies - Alan R. Simon - Страница 50

Providing a path of least resistance

Оглавление

Business users around your organization build new stand-alone data marts because that’s what they’ve done for a long, long time. They realize that the best way to bring data-driven insights into the way they do business is to take charge of their own fate and build an end-to-end solution. Old habits are extremely difficult to break!

Beyond a blockade on new data mart development, your data lake can give these business users a path of least resistance. Make it easier for them to go to the data lake for the data they need instead of doing everything on their own.

Suppose that a new chief people officer (CPO) is hired to lead your company’s HR organization. Jan, the new CPO, is a big believer in applying super-advanced analytics, such as machine learning and artificial intelligence, to numerous HR functions: employee evaluations, salary adjustments and promotions, succession planning, and more.

Jan appoints an analytics team within HR and tells them that, within the next three months, they need to have some initial machine learning models built in time for the semiannual employee evaluation cycle. Raul, the analytics teamleader, has been with your company for 15 years and has built several HR-specific data marts in the past for similar needs.

Raul assigns two of the team members, Julia and Dhiraj, to analyze the HR data in Workday (a cloud-based HR and financial management system) to figure out what data needs to be brought into the machine learning model. Raul also assigns another team member, Tamara, to start designing an Amazon Redshift database to store the HR data and support the machine learning algorithms.

Not so fast, Raul!

Raul submits his budget request for the new HR employee incentive evaluation and involvement operations (EIEIO) data mart and is surprised to learn that he needs to present his business case to the company’s new data mart exception board. Raul starts preparing his PowerPoint slides, and comes across item number 2: “State why your analytical needs cannot be met through existing data lake content.”

“Hmm … a data lake,” Raul thinks. “I wonder if the data we need is already in there?”

Sure enough, Raul goes browsing through the data lake catalog and finds that the data lake already has a ton of HR data from Workday that is regularly refreshed. He asks Julia and Dhiraj to match up the work that they’ve done so far with what the data lake catalog shows. Within two hours, they report back with the fantastic news: “Everything we need is in the data lake already!”

A well-constructed data lake offers business users a path of least resistance when it comes to gathering the data they need for their analytical needs. Raul’s team will still need to build the machine learning models to produce the analytics that Jan, your CPO, wants to apply to the next evaluation cycle. But they no longer need to proceed with analytics on a business-as-usual basis, constantly acquiring and storing the same data over and over in different data marts.

Over time, as familiarity with the data lake spreads throughout your organization, fewer unnecessary data mart requests such as Raul’s will need to be redirected back to the data lake. Raul wasn’t deliberately trying to do everything on his own; he just wasn’t familiar enough with what the data lake provided, not only to HR but to your company as a whole.

Data Lakes For Dummies

Подняться наверх