Earlier this week I posted about having a new challenge – how to use SSIS to dynamically define and generate rules for data cleansing. This week I discovered that the way to do this was NOT to do it! Instead of writing a custom set of code, I will be using SQL Server features Data Quality Service and Master Data Services to help import our data, and I will have SSIS call those packages to actually process the data. While I had known about these two features in SQL Server, I hadn’t had an opportunity to dig deep into what they do. I read about it this week and started playing with the tools, and I will be using them to help develop a non-code-driven data cleansing engine for my customer. By the end next week I should be an expert 😉
My customer aggregates data from various sources, and has no standard input format. Each customer defines their own data structures, and my customer has to quickly onboard these data feeds. They don’t want these processes to require developers, and they don’t want them to be so difficult that business analysts can’t do the work. DQS and MDS are both new ways of thinking about data for my customer, but I think the combination of these tools will work for them.
For those who haven’t used them yet, Data Quality Services is an “add-on” to SQL Server that allows a user to define data cleansing “rules” (for example: “Field must be 5 characters”, “field can’t be empty”, “Address must be valid”, “Code must match a value in a list of codes”, etc etc). I can then create SSIS packages that will read the data files, and “cleanse” the file according to the rules. The rules engine in DQS is pretty dynamic and will be easy to teach to the business analysts. It even works with external data applications (Address cleansing, Business validation, etc). You can create some pretty complex rules for validation and formatting, and this will help me NOT have to code a dynamic rules engine for the majority of the business rules. There will be some complex rules that DQS can’t handle, but as of this point my plan is to use SSIS/DQS/MDS to manage the data cleansing and validation process, and only use SSIS/Coding for the really tough rules. Whoo hoo!! Crisis avoided!
I’ll be posting more on DQS an MDS and how they help me with the data cleansing/standardization process over the next few weeks.
Picture of the Day: My photo from Saltwater State park that was featured as a “Photo of the Week” by the Washington State Parks “Adventure Awaits” webpage!! See more pictures from that beautiful evening at my SmugMug photo website: http://mskathyv.smugmug.com/LocalShoots/Summertime2013.