Data Quality Service (DQS) Introduction
The data-quality solution provided by Data Quality Services (DQS), which is a native component in SQL server 2012, enables a data steward or IT professional to maintain the quality of their data and ensure that the data is suited for its business usage. DQS is a knowledge-driven solution that provides both computer-assisted and interactive ways to manage the integrity and quality of your data sources. DQS enables you to discover, build, and manage knowledge about your data. You can then use that knowledge to perform data cleansing, matching, and profiling. You can also leverage the cloud-based services of reference data providers in a DQS data-quality project.
As more enterprises get aware of the importance of data quality, it is high time to start using DQS in SQL Server 2012 (DQS is a bundle-ship component, which means there is no extra charge for you). However, in this version, installation and configuration of DQS is not yet a GUI-oriented tool, bringing some challenges to embrace this useful tool. This paper helps to go through the entire process how to install DQS, configure it and use it.
Before installing DQS, it is necessary to select the Data Quality Services check box during the SQL Server setup. As a result, SQL Server Setup will copy an installer file, DQSInstaller.exe, under the SQL Server instance directory on your computer. You must run this file after you have completed the SQL Server Setup to complete the Data Quality Server installation. Let’s start from here. I assume you have installed your SQL server 2012 and checked DQS option on during the setup.
Find out your DQS install tool, MenuàAll ProgramsàMicrosoft SQL Server 2012àData Quality ServicesàData Quality Server Installer, as shown in below picture.
(Figure 1: DQS Installer)
Click to run this executable program, a DOS executable. In the beginning, it asks for the password for creating a certificate file. Remember to provide a password meeting high complexity. After providing the password and confirming the password, the installation file will help you to generate a certification file and create three databases respectively DQS_MAIN, DQS_PROJECTS and DQS_STAGING_DATA. So far, there is no chance to customize your database name. And please notice, if this is not the first time you run this installation, you have to make sure there are no legacy data files and log files belonging to these three databases under the default SQL Server installation data file folder and log file folder otherwise it complains installation failure because of failing to create databases. In a normal case, it takes about 5 to 10 minutes to finish the installation and it popups the windows as the figure 2. If the window report failure, you have to check DQS.log for error reasons.
(Figure 2: DQS Installer finished successfully)
Check your database instance, you can find three new databases created for DQS internal use as shown in figure 3. Meanwhile, if you browse to the database data file folder, you can find a certification file generated as shown in figure 4.
(Figure 3, DQS Databases)
(Figure 4, DQS Certification file)
Better together—DQS and MDS (Master Data Service)
Once we configure the DQS working with MDS, it provides you the opportunity to match records to eliminate the duplicates visually, which is greatly helpful for master data management.
Here, this article assumes that you have well installed and configured MDS in your server. To enable DQS working with DMS, you need to launch your MDS configuration as this menu sequence, StartàAll ProgramsàMicrosoft SQL Server 2012àMaster Data ServicesàMaster Data services Configuration Manager as below figure 5 shows.
(Figure 5, DMS Configuration Manager)
Launch DMS manager and switch to the tab “Web Configuration” in the left pane then hit the button “Enable Integration with Data Quality Service” in the bottom right (Refer Figure 6). Instantly, it pops up the window to report integration successfully done (See Figure 7).
(Figure 6, MDS Configuration Manager)
(Figure 7, DQS Integration)
Start your journey of DQS
Till now, you are ready to use every feature of DQS. Around SQL Server, you have a couple of weapons in term of DQS. First of all, the main battle field is “Data Quality Client”, a fancy client tool. Likewise, you can find this tool in the start menu as it shows in figure 8. And the fancy UI is shown as figure 9. This tool can help you to create knowledge base, new a data quality project, monitor activity and fulfill some DQS management work.
(Figure 8, DQ Client Menu Item)
(Figure 9, DQ Client home Screen)
In addition to Data Quality Client, the MDS add-in also changes after you enable DQS integration with DMS. Two new buttons show up in “the Master Data” ribbon (See figure 10), which can help to identify the duplicate records before you publish them to the MDS.
(Figure 10, “Match Data” in MDS Excel Add-in)
Last but not least, there has been a native component in SSIS 2012 named “DQS Cleansing” (See figure 11). This ETL dataflow component can help you clean data in your ETL workflow.
(Figure 11, DQS Cleansing Component in SSIS)
Now, as all these done, you are ready to start using this powerful tool “Data Quality Service” to improve your enterprise data quality. Start your DQS journey with luck.