Skip to main content

Aperture Data Studio use cases

Once you've installed Aperture Data Studio, you can try these use cases to get more familiar with the product.

Analyze trends

Goal

I want to see how the quality of my customers' data changes over time.

Task

Create a snapshot of data and analyze the trends.

Prerequisites

  • You're licensed to use the Analyze trends workflow step 
  • The following sample data files are available in Data Studio:
    • Customer V1.csv
    • Customer V3.csv

Steps

1) Create a validation workflow

  1. Go to Workflow Designer and click Create a new workflow.
  2. Enter a name (e.g. Analyze trends) and click Submit.
  3. In Available data sources tab on the left-hand side click Sample Data Source.
  4. Drag and drop the Customer V1 file.
  5. Open the Workflow steps (second) tab.
  6. Drag the Transform step and connect it to the source file.
  7. Drag and connect the Validate step to the Transform one. The workflow should look like this:

  8. We want to create three separate validation rules to check that:
    - First Order Date is a date
    - Customer Id is not null and
    - The email syntax is valid
  9. Click Configure Rules in the Validate step.
  10. Click Add new rule, give it a name (e.g. First order date is date) and click Create.
  11. Search for Date and drag that in.
  12. Click <Input value> and select First Order Date to check that values in this column are dates.
  13. Click Apply in the top menu to save changes.
  14. Create the second rule. Click Configure Rules again and then Add new rule.
  15. Give it a name (e.g. Customer Id is not null) and click Create.
  16. Search for Null and drag that in.
  17. Click <Input value> and select Customer Id to check that there's a value.
  18. Click on (this will change the value to ). 
  19. Click Apply in the top menu to save changes.
  20. Create the last rule. Click Configure Rules and then Add new rule.
  21. Give it a name (e.g. Email syntax is valid) and click Create.
  22. Search for Matches expression (under Compare) and drag that in.
  23. Click <Input value> and select Email.
  24. Click <Comparison value> and search for Email Address (under Email Expressions). This will check that the email syntax is valid.
  25. Click Apply in the top menu to save changes. The workflow will look like this:
  26. In the Validate step, click Results by rule to view the results:
  27. Click Close in the top menu to go back to the workflow.

2) Save results as a snapshot

  1. To see how these results change over time, we have to first create a copy of the current view by taking a snapshot.
  2. Open the Workflow steps (second) tab.
  3. Drag Take snapshot step and connect it to the Results for analysis node in the Validate step.
  4. Click on the auto-generated snapshot name (Snapshot01), enter a name (e.g. Validation Results) and press enter. The workflow will look like this:
  5. Click Execute in the top menu to execute the workflow (this will also create the snapshot).
  6. By default, the Scheduled start will be set to Now, so just click Execute
  7. Click Dismiss in the Job Completed dialog to get back to the workflow.

3) Replace the source file

  1. To simulate a different version of the source file, we'll use a different sample file. In Available data sources (first) tab on the left-hand side the Sample Data Source should be visible.
  2. Drag and drop the Customer V3 file.
  3. Remove the original file (Rows for Customer V1) by clicking X.
  4. Connect Rows for Customer V3 to the Transform step. The workflow should now look like this:
  5. In the Validate step, click Results by rule to see how the results have changed:
  6. Click Close in the top menu to get back to the workflow.
  7. Click Execute in the top menu to execute the workflow (this will also create the snapshot).
  8. By default, the Scheduled start will be set to Now, so just click Execute
  9. Click Dismiss in the Job Completed dialog to get back to the workflow.

4) Analyze the trends

  1. To analyze trends using the two snapshots we've created, you can either create a new workflow entirely or use the one we've just created. We'll use the same one. Open the Workflow steps (second) tab.
  2. Drag the Use snapshot range step in.
  3. Click Undefined Workflow and select Analyze trends (or the name of the workflow we've just created).
  4. Click Undefined Snapshot and select Validation Results (or the name of the first snapshot we've created).
  5. From the Workflow steps (second) tab, drag the Analyze trends step and connect it to the Use Snapshot Range one.
  6. The tagged columns from the snapshot will be automatically picked up. By default, the Failed Rows metric will be selected.
  7. Click Show data in the Analyze trends step to see the results:
  8. Click Close in the top menu to get back to the workflow.

    To see other metrics (such as passed rows), click on Failed Rows and select the required one. To view results as a chart, click Show Chart.

Combine data sources

Goal

I want to combine several data sources into one to have a global view of my customers' data.

Task

Combine data from three source files containing customer information (contact/order details) based on the customer ID and order number.

Prerequisites

The following sample data files are available in Data Studio:

  • Customer V1.csv
  • Purchase Order Header.csv
  • Purchase Order Detail.csv

Steps

  1. Go to Workflow Designer and click Create a new workflow.
  2. Enter a name (e.g. Combine data sources) and click Submit.
  3. In Available data sources tab on the left-hand side click Sample Data Source.
  4. Drag and drop the following files, one at a time:
    - Customer V1
    - Purchase Order Header
    - Purchase Order Detail
  5. Drag one of the purchase order files on top of the other one and select Join left in the Drop actions dialog that appears. This will connect the files in a Join step. 
  6. In the Join step, click  to see the suggested joins. Only exact column name matches will be suggested.
  7. Click on the suggested Order Id ⇐⇒ Order Id join to select these columns.
  8. Click Show data in the Join step to view the joined data. Click Close in the top menu to return to the workflow. 
  9. The next step is to join this result to the Customer V1 file. 
  10. Open the Workflow steps (second) tab on the left-hand side.
  11. Drag and drop another Join step.
  12. Connect this Join step to Rows for Customer V1 and the first Join step:


  13. In the second Join step, click  to see the suggested joins. 
  14. Click on the suggested Customer Id ⇐⇒ Customer Id join to apply it.
  15. Click Show data in the last step to view the results of this join. You will see that the join returns 0 rows. Click Close to return to the workflow.
  16. Click Keys in the last Join step. Viewing the two columns side by side, we can see that we don’t need the characters preceding the ‘‘ from the Customer V1 file. Click Close to return to the workflow.
  17. From the Workflow steps (second) tab on the left-hand side, drag the Transform step. We now have to connect the Transform step to both the Customer V1 and the last Join step manually.
  18. Click on the Undefined source node in the Transform step and drag it to the Rows for Customer V1.
  19. Now click on the end node in the Transform step and drag it to the top node in the last Join step.
  20. Click Arrange in the top menu to auto-arrange the steps. The workflow should look like this:


  21. In the Transform step, click Show data.
  22. Right-click on the Customer Id column header and select Add transformation.
  23. Search for After and drag it in.
  24. Click Suffix value,  type in  then press enter.
  25. Click Apply in the top menu. You will see that the hyphen has been removed from the Customer Id values and the  icon appears in next to the header to indicate a transformed column).
  26. Click Close in the top menu to return to the workflow.
  27. In the Transform step, click Undefined column and select Customer Id.
  28. Click Show data on the second Join step to view the final result. You should now see the combination of your customer contact details and order details in one view.

You can also change the title of each step and give it a more descriptive name. For example: double-click on the Transform title, type in Remove '-' and press enter; rename the first Join step to say 'Join customer IDs':

Validate and clean

Goal

I want my new marketing campaign to only target customers with deliverable postal and email addresses.

Task

Validate customer addresses and emails, clean them and remove duplicate records.

Prerequisites

  • You're licensed to use:
  • Experian Match and Experian Batch have been configured according to your license
  • The following sample data file is available in Data Studio: Find Duplicates Demo Data.csv

Steps

1) Tag your data

Note that tagging data in preview allows the workflow steps to automatically pick up the relevant columns.

  1. In Data Explorer click Sample Data Source.
  2. Right-click on Find Duplicates Demo Data.csv and select Preview and configure.
  3. Open the Headings tab.
  4. Click Multi select.
  5. Select the following headings: Address1Address2 and Address3.
  6. Right-click and select Tag columns. Click Yes to confirm that you want to tag multiple headings.
  7. Click Edit. Under System, select Address then click Tag.

    The name of the tag will be shown under the column heading on the left-hand side.

  8. Tag the remaining headings one by one by right-clicking and selecting Tag column
    HeadingTag 
    Name Name
    Town Locality
    County Province
    Postcode Postal Code
    Email Email
    Dob Date
  9. Click Apply in the top menu to save changes. 

2) Create the workflow 

The workflow will validate postal addresses and emails and then remove duplicate records.

2.1) Validate addresses

  1. Go to Workflow Designer and click Create a new workflow.
  2. Enter a name (e.g. Validate and clean) and click Submit.
  3. In Available data sources tab on the left-hand side click Sample Data Source.
  4. Drag and drop the Find Duplicates Demo Data file.
  5. Open the Workflow steps (second) tab.
  6. Drag the Validate addresses step and connect it to the data source. Because we've tagged the data already, the step will automatically pick up the address columns: Selected columns 6/10.
  7. To confirm that the correct columns have been auto-selected, click Selected columns. In the Validated list you should see Address1, Address2, Address3, Town, County and Postcode. Click  to confirm.
  8. Click Select country and pick United Kingdom from the list.
  9. Click Show data. Scroll to the right to see the validation results. Click Close when done.

Filter out unwanted rows:

  1. Drag the Split step and connect it to the Validate addresses one.
  2. Click Filter then Create. We're interested in the following results only: Verified Correct, Good Full Match and Tentative Full Match.
  3. Search for Equals under Compare and drag that in.
  4. Click <Input value> and select Address Validate: MatchResult.
  5. Click <Comparison value>, type in Verified Correct and enter.
  6. Repeat steps 3-5 with the <Comparison value> for Good Full Match and Tentative Full Match. Three separate and unconnected Compare filters should be created.
  7. Search for OR and drag it in.
  8. Now connect it to all three Compare dialogs. By default, only two <Input> fields are available, so click  to add a third node and connect it:


  9. Click Apply in the top menu to save changes.
  10. In the Split step, click Show passing rows to see the 62 rows that passed the filter. Click Close to get back to your workflow.

Tidy up results:

  1. Drag the Transform step and connect it to the Show passing rows node in the Split step.
  2. Click Columns.
  3. Click Multi select.
  4. Select
    Address1
    Address2
    A
    ddress3
    T
    own
    County and
    Postcode then right-click and Hide. Click Yes to confirm. 
  5. Tag validated address columns below, one at a time. Right-click on the column, select Tag column then Edit to add a tag:
    Column Tag
    Address Validate: addressLine1 Address
    Address Validate: addressLine2 Address
    Address Validate: addressLine3 Address
    Address Validate: locality Locality
    Address Validate: province Province
    Address Validate: postalCode Postal Code
    Address Validate: country Country
  6. Click  to save changes. The workflow should look like this:

 

2.2) Validate emails

We use a Domain Name System (DNS) lookup to check whether the email’s domain exists. Confirm that you can access the Google DNS (8.8.8.8) before proceeding. If you can't, use nslookup to find a DNS server that can be used on your network and add it: Configuration > Step settings > Validate emails > DNS servers.

  1. Drag Validate emails step and connect it to the Transform one. The previously tagged Email field should be picked up automatically.
  2. Click <Select validation type> and select Domain level.
  3. Click Show data. Scroll to the right to see the validation results. Click Close when done.

Filter out unwanted rows:

  1. Drag the Split step and connect it to the Validate emails one.
  2. Click Filter then Create. We're interested in validated emails only.
  3. Search for True and drag that in.
  4. Click <Input value> and select Email Domain: Result.
  5. Click Apply in the top menu to save changes.
  6. In the Split step, click Show passing rows to see the 45 rows that passed the filter.
  7. Click Close to get back to your workflow. It should look like this:


2.3) Remove duplicate records

  1. Drag Find duplicates step and connect it to the Show passing rows in the Split step. The Selected columns should show 10/15.
  2. Click Undefined blocking keys and select GBR_Individual_Default.
  3. Click Undefined ruleset and select GBR_Individual_Default.
  4. Click Show data to start the matching process. Note that this might take a minute. Scroll to the right to see the match status results. Click Close when done.

Filter out unwanted rows:

  1. Drag the Split step and connect it to the Find duplicates one.
  2. Click Filter then Create. We're interested in exact matches only.
  3. Search for Equals under Compare and drag that. 
  4. Click <Input value> and select Duplicates: Match Status.
  5. Click <Comparison value>, type in EXACT and enter.
  6. Click Apply in the top menu to save changes.
  7. Drag the Group step and connect it to the Show passing rows in the Split step. 
  8. Click Columns.
  9. Right-click on Duplicates: Cluster ID and select Group by.
  10. Click  to save changes.
  11. Click Show data to see the 10 exact records. Click Close when done.
  12. Before exporting the results, we want to combine these exact matches with other records for comparison.
  13. Drag the Union step and connect it to the Group one.
  14. Click Undefined source and connect it to the Show failing rows in the Split step.
  15. Click Show data in the Union step to see the 29 records we have now identified as deliverable from the initial 103. 
     

2.4) Export data

  1. We want to export data into two files: one with validated and cleaned results, and another with the rest of the records.
  2. Drag the Export step and connect it to the Union one.
  3. Click Settings to specify the file type (.csv will be selected by default) and enter a file name.
  4. Click Apply
  5. To export all the remaining results, we need to combine them first.
  6. Drag the Union step and connect it to Show failing rows in the first two Split steps.
  7. Drag the Export step and connect it to this new Union one.
  8. Click Execute in the top menu to execute the workflow.

The final workflow should look like this: