Skip to main content

Address cleaning and data validation

Address cleaning

Address cleaning provides functionality which enables you to add columns with cleaned address fields to your existing tables as well as data related to the quality of the cleaned addresses and the cleaning action that was taken.  Additionally, if you have purchased the data, you can enhance your records with DataPlus to add a variety of fields including geographic coordinates, Experian Mosaic demographic information, and local authority data.

Setting up data

Address cleaning functionality in Experian Pandora is provided by Experian Batch, which is automatically installed alongside your Pandora installation.

To make use of the data cleaning and validation functionality, you must install any licensed datasets and tell Batch where those data files are located by modifying configuration files.

1. Install data

Install any of the datasets you have received.

Experian data files have a vintage and built-in expiry date. You will need to keep your data updated on a regular basis for address cleaning to continue to work.

2. Configure data

In the Experian Pandora installation directory, edit the configuration settings in the following files. You must specify fully qualified paths.

qawserve.ini

The qawserve.ini file tells the Batch integration where to find the data files, and how the data is mapped to a country.

Installed data directory

Under the [QADefault] section, add a line to the InstalledData setting, specifying the location where the data is installed. 

If you have more than one dataset, each one must be on its own line preceded by a '+' sign. The format of the setting is: InstalledData={ISO},{Data Directory}

Windows example:

InstalledData=GBR,C:\pandora\batch data\GBR
+USA,C:\pandora\batch data\USA

Unix example:

InstalledData=GBR,/pandora/batchdata/GBR
+USA,/pandora/batchdata/USA

Data mapping

In the same section, add at least one line to the DataMappings setting, to specify the datasets you wish to use. If you have data for more than one country, each set of data must be on its own line preceded by a ‘+’ sign. The format of the setting is:
DataMappings={data mapping identifier},{dataset/group name},{dataset+additional datasets}

Example:

DataMappings=GBR,Great Britain,GBR
+USA,USA,USA

USA data only

If you are using USA data, you must also specify the location of the supplementary USA Batch data and libraries. In the same section of qawserve.ini file, set the path using the CorrectADataLocUSA setting.  

Windows example: 

CorrectADataLocUSA=C:\pandora\batch data\USA\CorrectAddress 

Unix example: 

CorrectADataLocUSA=/pandora/batchdata/USA/CorrectAddress 

Canada data only

If you are using Canada data, you must also specify the location of the supplementary Canada Batch data and libraries. In the same section of qawserve.ini file, set the path using the CorrectADataLocCAN setting.  

Windows example: 

CorrectADataLocCAN=C:\pandora\batch data\CAN 

Unix example: 

CorrectADataLocCAN=/pandora/batchdata/CAN 
qaworld.ini

If you are using USA or Canada data, you must update the CorrectAApiLoc setting to point to the Experian Pandora installation directory. This is the location where the supplementary library used for USA and Canada address matching is installed alongside the Pandora program files.

Windows example:

CorrectAApiLoc= C:\Program Files\Experian\Pandora 5.8.0

Unix example:

CorrectAApiLoc=/opt/pserver
qalicn.ini

In order for the address cleaning in Pandora to work, you must have a valid license for each dataset you wish to use. Add the license keys you have been provided with to the qalicn.ini file, one per line.

Clean Address Settings

You can view existing layouts as well as add, edit or delete layouts on the Clean Address Settings dialog. To open this dialog, right-click System > click Clean Address Settings

Creating and editing layouts

Click the New button to create a new layout. Give your new layout a name and a description and choose which dataset you would like your new layout to use. You will only be able to select the datasets that you have installed.

Output Lines

From this tab you can choose how many output address lines you would like your new layout to have. Clicking the Add Address Line button will add a new address line.

Clicking on the plus icon for each address line allows you to select the address elements you wish to be output in that address line, as well as setting the line width.

Input Lines

From this tab you can choose the input address lines you would like your new layout to have. The configuration is performed in the same way as adding output address lines. 

This feature allows you to specify which data element is to appear on which line in the input address. If you know that a line in your database always contains the same type of data element, (for example, if line 4 always contains a country), you can add the Country element to the address line 4 input. This improves the speed and quality of matching.

The available input line elements are Name, Organisation and Country.

Properties

The properties tab contains two properties that you can change.

Batch Timeout

This property allows you to set the length of time in milliseconds that Batch spends on a clean before it will timeout. A longer timeout will improve the address matching for very dirty/incomplete addresses, however it may lead to slower performance as the application spends more time trying to clean.

Cache Memory

This property allows you to specify the amount of system memory (in MB) that Batch can use for data caching. This can be used to increase cleaning performance by allowing it to use additional system memory.

Cleaning Addresses

Cleaning your address data is done in the drilldown view. Your chosen selected address output columns will be added to the drilldown.

Validate and Enrich

To clean your address data, right-click the drilldown column heading to the left of where you would like your cleaned data columns to be placed, select Validate and Enrich, and select the layout you would like to use.

Input Columns

Within the dialog window, you will be shown the Available Input Columns.This will contain all the columns in your table. You can select the columns that you wish to clean by double clicking on each one or by selecting them and clicking the Add button. 

Your selected input columns will be moved over to Selected Input Columns. You can remove the columns by double clicking on them or selecting them and clicking the Remove button. You can also move the columns up and down the list by selecting them and clicking the arrow icons.

Output Columns

Below Available Input Columns you will be shown the Available Output Columns. If Address Element is selected in the dropdown menu, you will be shown the address element output columns from your selected layout. If Cleaning Result is selected, you will be shown a number of available columns related to the results of the cleaning. You can select the columns that you wish to include in the output by double clicking on each one or by selecting them and clicking the Add button. 

Your selected output columns will be moved over to Selected Output Columns. You can remove the columns by double clicking them or selecting them and clicking the Remove button. You can also move the columns up and down the list by selecting them and clicking the arrow icons.

Once you have your desired input and output columns and they are in the correct order, click Ok.

Cleaned Address Data

Once you start a clean, your cleaned address data columns along with any result data columns will be added to your table. You will see a progress indicator, showing you the progress of your clean.

If you wish to edit the address cleaning in any way, you can do so by bringing up the drilldown column heading menu and selecting Validate and Enrich.

Data Transformations

As well as cleaning address data it is also possible to perform a Generic Info Test transformation on the cleaned data. The transformation can be selected from the custom transformation window. It allows you to test your cleaned data for changes, errors or other such information. The results will be in a true or false format.

Address validation

You can validate the correctness of the addresses in your data using the Is Valid Address or Is Not Valid Address functions.

The address validation functionality is performed by Batch, so you must have a valid license key for Batch, and must have the address layouts set up correctly. When you use the address validation functions, Batch will perform a verification of your addresses and the validation results (true/false) are displayed in a new column.

To do this, follow these steps:

  1. Open the Expression Editor dialog. To open the dialog, right-click on the table column header and from the drop down menu, select Insert Column > Custom.
  2. On the right-hand panel, locate the Is Valid Address or Is Not Valid Address function, and drag it onto the canvas:
     
  3. Click the <The Address> field, and select the column that contains your address fields. If your address is split into multiple columns (such as Addressline 1, postcode, town), you must first combine the addresses into a single column
  4. Select the verification level that you would like to validate your address against.
    For Is Valid Address, the result returns 'true' for the selected verification level and above. For Is Not Valid Address, the result returns 'true' for the selected verification level and below.
    The verification levels are:
    Verification levelMeaning
    Verified Correct Batch verified the input address as a good-quality match to a complete address. No corrections or formatting changes were necessary.
    Good Match Batch verified the input address as a good-quality match to a complete address, although minor corrections or formatting changes may have been applied.
    Good Premise Partial Batch was not able to find a full match to a correct address, but found a good match to premise level by excluding organisation or sub-premise details.
    Tentative Match Batch found a match to a complete address, but the overall differences between the input and Cleaned addresses are significant enough to reduce Batch's confidence in the match.
    Multiple Matches Batch found more than one correct address which matched the input address. This generally means that no single address could be matched with high confidence.
    Poor Match Batch found a match to an address, but with low confidence. This often means that the Cleaned address is not deliverable.
    Partial Match Batch was unable to find a full correct address which matched the input address. This often occurs when the property number is missing from the input address.
    Foreign Address Batch could not find a matching address because the input address referred to a country other than that which you ran your clean against.
    Unmatched Batch was unable to match the input address to any correct address. For more detail, see the Match Success Indicator assigned to this address.
  5. Select the layout to use. 
  6. Click Create. Your results are displayed in a new column.

Combine Address Fields

If the addresses in your table are split across multiple columns, you can combine them into a single column by using the Combine Address Fields function.

To do this, follow these steps:

  1. Open the Expression Editor dialog. To open the dialog, right-click on the table column header and from the drop down menu, select Insert Column > Custom.
  2. On the right-hand panel, locate the Combine Address Fields function, and drag it onto the canvas:
  3. Click the <Address Element> field, and select the first column that contains address elements.
  4. Click the + sign to add a new <Address Element> field. Map the rest of the address elements onto their own lines:
     
  5. Click Create. The combined address is displayed in a new column.
    Alternatively, you can combine this directly with the Is Valid Address or Is Not Valid Address function if you do not want the combined address fields to be included in your table as a new column:

Data validation

You can also validate the format of certain fields in your data by creating a custom expression in the Expression Editor. To open the dialog, right-click on the column header and from the drop down menu, select Insert Column > Custom. The Expression Editor dialog will open, with the title New Transformed Column with Custom Rule.  

Email format validation

Note that this functionality is only available with Experian Pandora v5.8.0 and later.

You can validate if a field value is or is not a valid email format by using the Check If... Valid Email Format or Check If Not... Valid Email Format functions.

Note that this does not validate if the email address itself is a valid, deliverable address.

National Insurance number validation (UK only)

Note that this functionality is only available with Experian Pandora v5.8.0 and later.

You can validate if a field value is in a valid National Insurance number format by using the Check If... National Insurance Number (UK) or Check If Not... National Insurance Number (UK) functions.