Introduction on SQL Server 2012 “ColumnStore” Index (Part II)

Overview

Last time, we introduced the new index structure in SQL 2012 “ColumnStore” index, examined the storage mechanism and illustrated how this index helped us to boost the query performance in data warehouse. (http://blog.gnetgroup.com/bi/2012/10/22/introduction-to-sql-server-2012-columnstore-index-part-i/)  Today, some related topics will be covered furthermore including considerations in loading data into the “ColumnStore” index-enabled table, typical scenarios using and not using “ColumnStore” and some constraints in “ColumnStore” index in this release.

Considerations for Loading Data

According to the design of “ColumnStore” index, it makes the table read-only, which means you cannot write data into this table in a normal way including insertion, updating and deleting. Well, this sounds not good but it is fine for tables in data warehouse. As we all know, one of characteristics in data warehouse is that tables are mainly for reading for end users and data loading happens usually in a fix time window, for example nightly or once a week. Even though, how do we load data into a read-only table?

Don’t worry! There are basically two approaches to help us walk around this issue.

Disabling Index and re-enabling Index

The first approach is frequently used in many ETL design. Before loading the data, disable the index and after that, re-enable the index, which even is deemed as a best practice for loading data into a big table because it can speed up the whole ETL process and the implementation is as simple as ABC. The detailed steps can follow as disable “ColumnStore” index, Load data and re-enable the index.  

Partition Switching

Comparing with the first approach, partition switching especially fits for the big data loading, which has outstanding performance benefits.

The steps are design as below,

  • Load data into a staging table
  • Create “ColumnStore” Index on stage table
  • Split one empty partition on main table
  • Switch stage table into the empty partition

 Also you can refer the scripts created in table 1. (Assuming your main table has been designed as a partition table)

 

insert into  [stg].[FactOnlineSales]

select * from [dbo].[FactOnlineSales] where OnlineSalesKey>=31000000

 

 

–Create CS index on partition table

select * from [STG].[FactOnlineSales]

–Step 2, Create CS Index on stg

 

CREATE NONCLUSTERED COLUMNSTORE INDEX [NonClusteredColumnStoreIndex_Demo_STG] ON [STG].[FactOnlineSales]

(

       [OnlineSalesKey],

       [DateKey],

       [StoreKey],

       [ProductKey],

       [PromotionKey],

       [CurrencyKey],

       [CustomerKey],

       [SalesOrderNumber],

       [SalesOrderLineNumber],

       [SalesQuantity],

       [SalesAmount],

       [ReturnQuantity],

       [ReturnAmount],

       [DiscountQuantity],

       [DiscountAmount],

       [TotalCost],

       [UnitCost],

       [UnitPrice],

       [ETLLoadID],

       [LoadDate],

       [UpdateDate]

)WITH (DROP_EXISTING = Off)

 

 

—Create one more parition on the base table

ALTER PARTITION SCHEME [ps_fact_sales]

NEXT USED [Primary]

go

 

/*

ALTER PARTITION FUNCTION [pf_range_fact]()

MERGE RANGE (30500000)

go

*/

 

 

ALTER PARTITION FUNCTION [pf_range_fact]()

SPLIT RANGE (31000000)

go

 

–View  partition

SELECT

partition = $PARTITION.[pf_range_fact]([OnlineSalesKey])

,rows      = COUNT(*)

,min    = MIN([OnlineSalesKey])

,max    = MAX([OnlineSalesKey])

FROM [dbo].[FactOnlineSales_Prtition]

GROUP BY $PARTITION.[pf_range_fact]([OnlineSalesKey])

ORDER BY PARTITION

 

–Switch the data into the new parition

ALTER TABLE [STG].[FactOnlineSales]

SWITCH TO [dbo].[FactOnlineSales_Prtition] PARTITION 6

 

(Table1: Partition Switching)

Typical Scenarios to or not to Consider Using “ColumnStore” Index

Generally speaking, “ColumnStore” index especially fits for the big tables in data warehouse. The main reason, which has been mentioned before, is that tables in data warehouse system for end users are read-only (query intensive). On the contrary, if you system is mainly for read-and-write, then this should not be the choice for you.

Specifically, we can consider using “ColumnStore” index in following scenarios. First of all, if you have the report query running on data warehouse system directly and the performance is good enough for end users, probably plugging this magic index on the main fact tables is a good idea. Secondly, if you are experiencing the long time for cube processing, enabling this magic index on the fact table a good panacea for you. Then, in your EDW system, if your cube is designed as ROLAP mode, in order to accelerate the data analysis for your business users, “ColumnStore” index is a smart choice. Next, if you have deployed the Tabular BI sematic model with a “Direct Query” mode, you should try this magic index. Maybe more, on the top of head, these are the typical cases we can consider using “ColumnStore” index.

On the other hand, in some cases, “ColumnStore” index does not fit. For example, if your fact table is required to be frequently updated on (if your cube enables the feature of writing back), as we said the fact table with a “ColumnStore” index is read-only, so we cannot use “ColumnStore” index on this kind of fact tables. Additionally, if your table cannot be implemented by partition switching or index rebuilding somehow, you also cannot consider using this index. Last but not least, if your query actually is high-selective enough, then you common index helps and “ColumnStore” is not necessary.

Constraints

In this release, SQL Server 2012, there are still some constraints in “ColumnStore” index. So far, you only can create one and only one “ColumnStore” index per table; and only non-clustered “ColumnStore” index is supported and you cannot create a clustered one. “ColumnStore” index also cannot be created on indexed view. Moreover, you cannot blend the “ColumnStore” with a “filter” index. Lastly, not every column type is supported to be included into “ColumnStore” index. The unsupported data types are Binary, BLOB, UniqueIdentifier, CLR, decimal(>18 digits), (n)varchar(max), datetime2.

Conclusion

In this paper (Part II), it has shared two best practices how to move data into the fact table with a “ColumnStore” index. Also, it points out the typical scenarios for using and not suing “ColumnStore” index. In the end, we also examine some constraints for “ColumnStore” in this release.