Читать книгу Applied Microsoft Business Intelligence - Sarka Dejan - Страница 16
Part I
Overview of the Microsoft Business Intelligence Toolset
Chapter 2
Designing an Effective Business Intelligence Architecture
Planning an Analytical Model
ОглавлениеThe next logical step in architecting a business intelligence solution is deciding whether an analytical model is necessary. Prior to the release of SQL Server 2012, making that decision was a much simpler process. However, considering the added and enhanced features in that release as well as the latest release, as of this writing, making that determination is now a little more complicated. Why, and what has complicated this process?
In most cases, the driving factors behind developing analytical models is improving query performance time. Developers have come up with some workarounds that circumvent the need for an analytical model, such as building aggregate tables, using indexed views, or adding more hardware. Although these solutions work, they usually only temporarily fix the problem. As data needs grow, the return on either of the aforementioned solutions becomes too expensive or simply just does not work. Recognizing this, Microsoft included a new column-based index within the database engine. By implementing these index types, you can increase query performance several magnitudes over, possibly eliminating the need to develop an analytical model entirely.
Although the new column-based index may improve query performance, when it comes to addressing analytical needs such as complex time-based analysis, key performance indicators, hierarchies, unary operators, and other capabilities that are beyond the scope of a relational model, the only choice is to develop an analytical model. Prior to SQL Server 2008R2, you only had a single type to choose from when developing a business intelligence solution based on the Microsoft stack. However, with later releases of SQL Server, you have three choices: Power Pivot, tabular, and multidimensional. Chapter 5 provides a detailed explanation that will assist you in choosing the right one. However, regardless of which model you select, the planning process is very similar across all three.
Prior to building the model, you should consider the following;
● Make a concerted effort to ensure that the data warehouse schema is almost 100 percent complete, including the data governance and load process. The model development could actually begin without the data; however, little can be done in regards to validating the code without data.
● Ensure that the model requirements have been scoped. This may seem like an obvious step, but often developers start building the model without consulting any stakeholders or end users. Typically, the result is something irrelevant or wrong.
● Decide on data latency. In other words, can the data in the model be 15 minutes behind, 1 hour behind, 1 day behind, or 1 week behind. This decision is ultimately based on end-user needs.
If a decision was made not to develop a data warehouse and instead obtain the data directly from the source, it is possible to have almost real-time access to data by implementing either a Direct Query tabular model or Real-time Online Analytical Process (ROLAP) multidimensional model. Note that this feature is not available when developing a Power Pivot model. These methods can also be implemented if a data warehouse is implemented, but the data will be as fresh as the data in the warehouse. For example, if the data is loaded into the warehouse nightly, then you'll have a 24-hour difference between the data in the source and what you'll see in the model.
On the other hand, a period of latency may be acceptable. If that is the case, you should add a step after the data warehouse is loaded and prior to anyone accessing the data. This step will load or process the data into the model. For tabular data it would be an In-Memory model, and for multidimensional data it would be a Multidimensional Online Analytical Processing (MOLAP) model. In addition, adding this step also opens up the possibility of developing a Power Pivot model, which was not available before.
One more thing to consider is a hybrid scenario where some data can afford latency and some may not. In that case, one solution would be to use a Hybrid Online Analytical Processing (HOLAP) model, in which some objects access data real-time and some require the processing of data.