Data Reading from Data Warehouse in Cube Processing

Overview

As most of us know, SSAS Cube should be processed before you create any analytic reports on that. However, we might oversight what cube processing does and how to read the data from data warehouse database (DWH).

This write-up will give you a clear picture what is happening behind this mysterious cube processing and what detailed queries are submitted to DWH to fetch data.

Cube Processing

Processing is a series of steps, in which Analysis Services loads data from a relational data source into a multidimensional model, aggregates measures, saves the data into xml files system and builds some indexes for internal use to achieve the quick response on coming queries. For objects that use MOLAP storage, data is saved on disk in the database file folder. For ROLAP storage, processing occurs on demand, in response to an MDX query on an object. For objects that use ROLAP storage, processing refers to updating the cache before returning query results.

By examining the data folder of the cube database, you can find a list of XML files and folders as below. Every xml file with the corresponding folder store the information of the data source (*.ds.xml), data source view (*.dsv.xml), cube and dimensions.

 Image

(Figure 1: SSAS database file structure)

Before saving the structure and data information into cube database files, the most important step for SSAS is to read the data from data warehouse database.

Reading Data from Data Warehouse Database

In order to clearly present the queries that cube processing submits to DWH, I designed one cube and captured the queries to illustrate how exactly the processing works.

Figure 2 shows the data modeling of this cube.

 Image

(Figure2: Cube Modeling)

As it can be seen, in this cube, we have dimensions including DimCurrency, DimCustomer, DimStore, DimDate, DimProduct, DimPromotion and one measure group “FactonlineSales”.

When processing the cube, every dimension will be processed, that is to say, read data from dimension tables from data warehouse database. Take the dimension currency as an example; dimension currency is designed as “figure 3”, CurrencyKey as the dimension key and other three attributes are “Description”, “Label” and “Name”. The key and all attributes are correspondingly to be read from DWH by queries. Table 1 list the queries how this cube processing reads data for the dimension “currency”.

Image 

(Figure 3: Dimension Currency)

 Image

(Figure 4: Dimension Processing)

–Dim Currency

–Key reading

 SELECT DISTINCT

 [dbo_DimCurrency].[CurrencyKey] AS [dbo_DimCurrencyCurrencyKey0_0],

 [dbo_DimCurrency].[CurrencyLabel] AS [dbo_DimCurrencyCurrencyLabel0_1],

 [dbo_DimCurrency].[CurrencyName] AS [dbo_DimCurrencyCurrencyName0_2],

 [dbo_DimCurrency].[CurrencyDescription] AS [dbo_DimCurrencyCurrencyDescription0_3]

 FROM [dbo].[DimCurrency] AS [dbo_DimCurrency]

 

–Attribute: [CurrencyDescription]

SELECT DISTINCT [dbo_DimCurrency].[CurrencyDescription] AS [dbo_DimCurrencyCurrencyDescription0_0]

FROM [dbo].[DimCurrency] AS [dbo_DimCurrency]

 

–Attribute: [CurrencyLabel]

SELECT  DISTINCT [dbo_DimCurrency].[CurrencyLabel] AS [dbo_DimCurrencyCurrencyLabel0_0]

FROM [dbo].[DimCurrency] AS [dbo_DimCurrency]

 

–Attribute: [CurrencyName]

SELECT DISTINCT [dbo_DimCurrency].[CurrencyName] AS [dbo_DimCurrencyCurrencyName0_0]

FROM [dbo].[DimCurrency] AS [dbo_DimCurrency]

 

(Table1: Queries reading currency dimension data)

 

After processing all dimensions, Cube processing also needs to handle the measure group. In this case, the measure group is “FactOnlineSales”. (By default, every fact forms one measure group.) During the time processing measure group, SSAS engine sends the query to DWH to read data intensively from the corresponding fact table. In this case, the query reading on fact table shows in the “table 2”. By this query, Cube tries to get the answer of every measure broken down by all possible keys combination.  

Image 

(Figure 5: Measure Group Processing)

 

–Processing measure group

SELECT [dbo_FactOnlineSales].[dbo_FactOnlineSalesSalesQuantity0_0] AS [dbo_FactOnlineSalesSalesQuantity0_0],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesSalesAmount0_1] AS [dbo_FactOnlineSalesSalesAmount0_1],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesReturnQuantity0_2] AS [dbo_FactOnlineSalesReturnQuantity0_2],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesReturnAmount0_3] AS [dbo_FactOnlineSalesReturnAmount0_3],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesDiscountQuantity0_4] AS [dbo_FactOnlineSalesDiscountQuantity0_4],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesDiscountAmount0_5] AS [dbo_FactOnlineSalesDiscountAmount0_5],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesTotalCost0_6] AS [dbo_FactOnlineSalesTotalCost0_6],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesUnitCost0_7] AS [dbo_FactOnlineSalesUnitCost0_7],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesUnitPrice0_8] AS [dbo_FactOnlineSalesUnitPrice0_8],

[dbo_FactOnlineSales].[dbo_FactOnlineSales0_9] AS [dbo_FactOnlineSales0_9],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesOnlineSalesKey0_10] AS [dbo_FactOnlineSalesOnlineSalesKey0_10],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesDateKey0_11] AS [dbo_FactOnlineSalesDateKey0_11],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesProductKey0_12] AS [dbo_FactOnlineSalesProductKey0_12],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesCustomerKey0_13] AS [dbo_FactOnlineSalesCustomerKey0_13],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesStoreKey0_14] AS [dbo_FactOnlineSalesStoreKey0_14],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesPromotionKey0_15] AS [dbo_FactOnlineSalesPromotionKey0_15],

[dbo_FactOnlineSales].[dbo_FactOnlineSalesCurrencyKey0_16] AS [dbo_FactOnlineSalesCurrencyKey0_16]

  FROM

(

SELECT [SalesQuantity] AS [dbo_FactOnlineSalesSalesQuantity0_0],

[SalesAmount] AS [dbo_FactOnlineSalesSalesAmount0_1],

[ReturnQuantity] AS [dbo_FactOnlineSalesReturnQuantity0_2],

[ReturnAmount] AS [dbo_FactOnlineSalesReturnAmount0_3],

[DiscountQuantity] AS [dbo_FactOnlineSalesDiscountQuantity0_4],

[DiscountAmount] AS [dbo_FactOnlineSalesDiscountAmount0_5],

[TotalCost] AS [dbo_FactOnlineSalesTotalCost0_6],

[UnitCost] AS [dbo_FactOnlineSalesUnitCost0_7],

[UnitPrice] AS [dbo_FactOnlineSalesUnitPrice0_8],

1   AS [dbo_FactOnlineSales0_9],

[OnlineSalesKey] AS [dbo_FactOnlineSalesOnlineSalesKey0_10],

[DateKey] AS [dbo_FactOnlineSalesDateKey0_11],

[ProductKey] AS [dbo_FactOnlineSalesProductKey0_12],

[CustomerKey] AS [dbo_FactOnlineSalesCustomerKey0_13],

[StoreKey] AS [dbo_FactOnlineSalesStoreKey0_14],

[PromotionKey] AS [dbo_FactOnlineSalesPromotionKey0_15],

[CurrencyKey] AS [dbo_FactOnlineSalesCurrencyKey0_16]

FROM

   (

       SELECT OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey,

                     SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity,

                     ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice

       FROM    FactOnlineSales

   ) AS [FactOnlineSales]

)    AS [dbo_FactOnlineSales]

 

(Table2: Processing Measure Group)

Conclusion

Although the queries reading data from DWH could be slightly differently according to the different cube storages (MOLAP, HOLAP and ROLAP) and processing options, this article still depicts the essence of the cube processing.  Hope this brings you more ideas on cube processing internals. See you next time.

Advertisements

Published by

Derek Dai

focusing DB, Big Data and BI tech.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s