Skip to main content

Aperture Data Studio use cases

Validate and clean

Goal

I want my new marketing campaign to only target customers with deliverable postal and email addresses.

Task

Validate customer addresses and emails, clean them and remove duplicate records.

Prerequisites

  • You're licensed to use:
    • Experian Match, Experian Batch and Email Validate
    • the following workflow steps: Find duplicates, Validate addresses and Validate emails 
  • Experian Match and Experian Batch have been configured according to your license
  • The following sample data file is available in Data Studio: Find Duplicates Demo Data.csv

Steps

1) Tag your data

Note that tagging data in preview allows the workflow steps to automatically pick up the relevant columns.

  1. In Data Explorer click Sample data source.
  2. Right-click on Find Duplicates Demo Data.csv and select Preview and configure.
  3. Open the Headings tab.
  4. Click Multi select.
  5. Select the following headings: Address1Address2 and Address3.
  6. Right-click and select Tag columns. Click Yes to confirm that you want to tag multiple headings.
  7. Click Edit. Under System, select Address then click Tag.

    The name of the tag will be shown under the column heading on the left-hand side.

  8. Tag the remaining headings one by one by right-clicking and selecting Tag column
    HeadingTag 
    Name Name
    Town Locality
    County Province
    Postcode Postal Code
    Email Email
    Dob Date
  9. Click Apply in the top menu to save changes. 

2) Create the workflow 

The workflow will validate postal addresses and emails and then remove duplicate records.

2.1) Validate addresses

  1. Go to Workflow Designer and click Create a new workflow.
  2. Enter the name of the workflow (e.g. Validate and clean) and click Submit.
  3. Click Sample data source then drag and drop Find Duplicates Demo Data file.
  4. From the Workflow steps tab drag the Validate addresses step and connect it to the data source. Because we've tagged the data already, the step will automatically pick up the address columns: Selected columns 6/10.
  5. Click Selected columns to confirm that the the correct columns are selected. Click  to confirm.
  6. Click Select country and pick United Kingdom from the list.
  7. Click Show data. Scroll to the right to see the validation results. Click Close when done.

Filter out unwanted rows:

  1. Drag the Split step and connect it to the Validate addresses one.
  2. Click Filter then Create. We're interested in the following results only: Verified Correct, Good Full Match and Tentative Full Match.
  3. Search for Equals under Compare and drag that in.
  4. Click <Input value> and select Address Validate: MatchResult.
  5. Click <Comparison value> , type in Verified Correct and enter.
  6. Repeat steps 4 and 5 with the <Comparison value> for Good Full Match and Tentative Full Match. Three separate and unconnected Compare filters should be created.
  7. Search for OR and drag it in.
  8. Now connect it to all three Compare dialogs. By default, only two <Input> fields are available, so click  to add a third node and connect it:


  9. Click Apply in the top menu to save changes.
  10. In the Split step, click Show passing rows to see the 62 rows that passed the filter. Click Close to get back to your workflow.

Tidy up results:

  1. Drag the Transform step and connect it to the Show passing rows node in the Split step.
  2. Click Columns.
  3. Click Multi select.
  4. Select
    Address1
    Address2
    A
    ddress3
    T
    own
    County and
    Postcode then right-click and Hide. Click Yes to confirm. 
  5. Tag validated address columns below, one at a time. Right-click on the column, select Tag column then Edit to add a tag:
    Column Tag
    Address Validate: addressLine1 Address
    Address Validate: addressLine2 Address
    Address Validate: addressLine3 Address
    Address Validate: locality Locality
    Address Validate: province Province
    Address Validate: postalCode Postal Code
    Address Validate: country Country
  6. Click  to save changes. The workflow should look like this:

 

2.2) Validate emails

Check that you can access the Google DNS (8.8.8.8) before proceeding. If you can't, use nslookup to find a DNS server that can be used on your network and add it: Configuration > Step settings > Validate emails > DNS servers.

  1. Drag Validate emails step and connect it to the Transform one. The previously tagged Email field should be picked up automatically.
  2. Click <Select validation type> and select Domain level.
  3. Click Show data. Scroll to the right to see the validation results. Click Close when done.

Filter out unwanted rows:

  1. Drag the Split step and connect it to the Validate emails one.
  2. Click Filter then Create. We're interested in validated emails only.
  3. Search for True and drag that in.
  4. Click <Input value> and select Email Domain: Result.
  5. Click Apply in the top menu to save changes.
  6. In the Split step, click Show passing rows to see the 45 rows that passed the filter.
  7. Click Close to get back to your workflow. It should look like this:


2.3) Remove duplicate records

  1. Drag Find duplicates step and connect it to the Show passing rows in the Split step. The Selected columns should show 10/15.
  2. Click Undefined blocking keys and select GBR_Individual_Default.
  3. Click Undefined ruleset and select GBR_Individual_Default.
  4. Click Show data to start the matching process. Note that this might take a minute. Scroll to the right to see the match status results. Click Close when done.

Filter out unwanted rows:

  1. Drag the Split step and connect it to the Find duplicates one.
  2. Click Filter then Create. We're interested in exact matches only.
  3. Search for Equals under Compare and drag that. 
  4. Click <Input value> and select Duplicates: Match Status.
  5. Click <Comparison value>, type in EXACT and enter.
  6. Click Apply in the top menu to save changes.
  7. Drag the Group step and connect it to the Show passing rows in the Split step. 
  8. Click Columns.
  9. Right-click on Duplicates: Cluster ID and select Group by.
  10. Click  to save changes.
  11. Click Show data to see the 10 exact records. Click Close when done.
  12. Before exporting the results, we want to combine these exact matches with other records for comparison.
  13. Drag the Union step and connect it to the Group one.
  14. Click Undefined source and connect it to the Show failing rows in the Split step.
  15. Click Show data in the Union step to see the 29 records we have now identified as deliverable from the initial 103. 
     

2.4) Export data

  1. We want to export data into two files: one with validated and cleaned results, and another with the rest of the records.
  2. Drag the Export step and connect it to the Union one.
  3. Click Settings to specify the file type (.csv will be selected by default).
  4. Click Apply
  5. To export all the remaining results, we need to combine them first.
  6. Drag the Union step and connect it to Show failing rows in the first two Split steps.
  7. Drag the Export step and connect it to this new Union one.

The final workflow should look like this:

Combine data sources

Goal

I want to combine several data sources into one to have a global view of my data.

Task

Combine three source files containing customer information (contact/order details) based on the customer ID and order number.

Prerequisites

  • The following sample data files are available in Data Studio:
    • Customer V1.csv
    • Purchase Order Header.csv
    • Purchase Order Detail.csv

Steps

  1. In the Workflow Designer, click Create a new workflow.
  2. Enter a name (e.g. Combine data sources) and click Submit.
  3. Expand Sample Data source in the left-hand side menu.
  4. Drag and drop the following files into the workspace:
    • Customer V1

    • Purchase Order Header

    • Purchase Order Detail

  5. Drag one of the purchase order files on top of the other one and select Join left. This will connect the files in a Join step. The Undefined Column error will appear for both of the purchase order files.
  6. In the Join step, click  to see the suggested joins. Only exact column name matches will be suggested.
  7. Click on the suggested Order Id ⇐⇒ Order Id join to select these columns.
  8. Click Show data in this step to view the join then click Close to return to the workflow. 
  9. The next step is to join this result to the Customer V1 file. From the Workflow steps tab drag and drop the Join step into the workspace.
  10. Link this Join step to the first one as well as the Customer V1 file.
  11. In the Join step, click  to see the suggested joins. 
  12. Click on the suggested Customer Id ⇐⇒ Customer Id join to apply it.
  13. Click Show data to view the results of this join. You will see that the join returns 0 rows. Click Close to return to the workflow.
  14. Click Keys in the last Join step. This will open up a drilldown of the two join columns side by side. This will show that we don’t need the characters preceding the ‘‘ from Customer V1 file.
  15. Click Close to return to the workflow.
  16. In the Workflow steps tab, drag the Transform step into the workflow.
  17. Connect the Transform step to both Customer V1 and the last Join step.
  18. In the Transform step, click Show data.
  19. Right-click on the Customer Id column header and select Add transformation.
  20. Drag the After function from the left-hand side into the workspace. Enter ‘‘ as the Suffix Value argument and click ApplyYou will now see the changes reflected in the data and the transformed icon  will appear in the Customer Id column header.
  21. Click Close to return to the workflow.
  22. In the Transform step, click Undefined column and select Customer Id.
  23. Click Show data on the second Join step to view the final result. You should now see the combination of your customer contact details and order details in one view.

Optional steps:

  • Click Arrange in the top menu.
  • You can also edit the names of each step to reflect their purpose. For example: double-click on the Transform title and enter Remove '-'.  Rename the first Join step to say 'Join customer IDs'.

The complete workflow will look like this: 

Analyze trends

Goal

I want to see how the quality of my customer data changes over time.

Task

Analyze validation results using snapshots.

Prerequisites

  • You're licensed to use the Analyze trends workflow step 
  • The following sample .csv files are available: Customer V1 and Customer V3.

Steps

1) Create a validation workflow

  1. Go to Workflow Designer and click Create a new workflow.
  2. Enter a name (e.g. Analyze trends) and click Submit.
  3. From Sources tab, drag and drop the Customer V1 file.
  4. Connect a Transform step followed by a Validate step.
  5. Create the following rules:
    - First order date is date
    - Customer ID not NULL
    - Email syntax is valid
  6. Click Results by rule to view the results.

2) Save results as a snapshot

  1. To monitor these results over time, we have to save them as a snapshot.
  2. Connect Take snapshot step to the Results for analysis node of the Validate step. This is essentially a ‘pivoted’ view of the results dashboard designed for this purpose.
  3. Name the snapshot. 
  4. Click Execute from the top menu to execute the workflow (this will also create the snapshot).

3) Replace the source file

  1. (mocking up a new version of the source file – every week etc.)
  2. Delete the customer v1 file and replace it with customer v3
  3. Click ‘show rule data’ to see that the results have changed (slightly, might be worth choosing better rules for the actual one)
  4. Execute the workflow again to save a snapshot of these results

4) Analyze the trends

  1. (either on this workflow or on another workflow)
  2. Drag on a ‘use snapshot range step’ – Point it at the snapshot that we built earlier:
  3. Connect this up to an analyse trends step
  4. It will have automatically picked up the columns from the snapshot based on data tags (the ones that start with trend – ‘trend metric’ etc) – This will only work if you have saved the ‘Show data’ node from the validation step
  5. Choose the metric that you are interested in (i.e. Failed rows, Passed rows etc) then click show data to see:
  6. Can change the metric in the side menu to passed rows etc
  7. Return to the workflow and click ‘show chart’ to see these results shown as a chart