The latest release of SAP Information Steward includes the Data Cleansing Advisor, which has some great new features that help organizations get a jump-start on their Information Steward implementation. Today I’ll show an example process for using Data Cleansing Advisor to suggest Information Steward validation rules and create a cleansing solution.
Profiling the Data
The first step in generation cleansing rules is to have Information Steward profile the data. First, create a column profiling task (select table →Profile→Columns) and make sure to select the advanced profiling options “Median & Distribution” and “Word distribution”.
Next, create a Content Type profiling task (select table →Profile→Content Type).
Data Validation Advisor Suggestions
Once these tasks have been run, returning to the Workspace Home→Profile Results→Advanced tab will indicate with icons in the “Advisor” column if any validation or data cleansing rules have been suggested. As shown in the figure below, the golden gear icon indicates suggested validation rules, and the broom icon indicated cleansing advisor rules.
Clicking on the Data Validation Advisor (golden gear icon) opens up the list of suggested validation rules, as shown in the image below. These rules typically suggest non-null nor non-zero columns, and sometimes pick up patterns (for example that a column should have only Y or N). Since this is a small set of fake data, the rules are relatively simplistic, but potentially still useful.
To download PDF, and Continue Reading…
Hillary Bliss is the Analytics Practice Lead at Decision First Technologies, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendor to integrate business analysis and predictive modeling solutions into the organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.