We strongly recommend using Chrome to access the application.
Aperture Data Studio allows you to:
We recommend that you change your password before uploading your files. Click on your username and select Change password.
Preview the first 1,000 rows and check that your data is in order. If required, make changes: right-click on your file/data source and select Preview and configure.
You can move, rename and exclude column headings and change character sets/delimiters/quotation marks. Click Apply in the top-right corner when you’re happy with the changes. The file will now appear as 'Configured'.
Larger files might take a while to load. A percentage progress bar will be shown over the file name in the explorer view (on the left-hand side).
Data Explorer allows you to manage your data sources, prepare and configure your data.
This menu lists all your data sources: both raw files and database connections.
My files contains files uploaded by you only; Data sources lists all the available connected/configured database connections.
To upload/connect to your data, do one of the following:
The supported file formats are:
Note that .dat files have to be uploaded together with the associated metadata files such as .cob or .ddl.
The supported database drivers/services are:
You can create JDBC connections for the following databases:
Find out how to add a custom JDBC driver.
For each data source, the administrator can specify individual login credentials for each user/user group to ensure that restricted data is not accessible to anyone without the appropriate permissions.
To create and assign access credentials:
To remove access from the user/team, simply right-click on them and select Remove access.
To ensure your data has been interpreted correctly and is displayed as expected, you can preview it: right-click on your source data and select Preview and configure. A preview of the first 1,000 rows will be shown.
We strongly recommend that you preview your data before loading it. This ensures that the data we cache is as expected.
You can move, rename and exclude column headings and change character sets/delimiters/quotation marks.
To configure how the data should be interpreted, use the Data tab.
Depending on the file type, you can specify various settings. For example, for .csv files:
The Headings tab allows you to manipulate the column headings: you can rename, re-order, remove and exclude them. You can also tag data.
Alternatively, modify columns by right-clicking on the column header, selecting the required option and clicking Apply in the top-right corner to save your changes.
You can standardize your data easily before loading it into Data Studio: right-click on the column header and select Edit.
You can now apply one or more standardisation options:
Note that if the Treat blanks as null option is turned off, and both Remove leading blanks and Remove trailing blanks are on, Data Studio will ensure that at least a single space is retained for entirely blank values.
If you want to keep the null and blank values separate, we recommend that you turn off Remove leading blanks and Remove trailing blanks. Otherwise, Data Studio will not load values with varying spaces and will not treat them as different values.
By default, all of the standardization options above are enabled (except for Auto-parse scientific numbers). To change the enabled/disabled options for all files:
In preview, you can only move headings, not data.
To rearrange columns, you can simply drag and drop the column headings. Click Apply in the top-right corner when you’re happy with the changes. The file will now appear as 'Configured'.
To revert all the changes made to your data:
Data tags allow you to enrich source data by specifying additional information about what each column contains. These are then used to improve the user experience by providing sensible defaults and suggestions. Tags are used in various workflow steps such as Find duplicates and Validate addresses.
Note that if you upload a file with the same name, the tags will be retained.
There are two types of data tags:
Tags in Data Studio are hierarchical: a parent tag (e.g. address) can have multiple child ones (e.g. street, city).
While you can't modify system defined tags, you can create a child tag (e.g. PO box) and assign it to the system one (e.g. Address). Once created, it will appear in the User defined list under that system parent tag.
To create new and manage existing tags, go to Glossary > Data tags. To add a new one, click Create a new data tag.
Note that Consumer users will not be able to access Glossary. Find out about user roles.
The best way to manually tag data is before it has been loaded, in preview:
If you tag columns in a workflow step, the tags will be applied to that column in all the subsequent steps for that workflow only.
View data and transform it as required.
Changes made when viewing files in Data Explorer are temporary and will not persist. However, the operations you make can be saved as a workflow and the data changes can be saved to a file.
Once the data is loaded you can:
Dependencies are normally used for table normalisation – a process of decomposing a table into smaller tables in terms of attribute sets.
Keys are used to identify records in a table and are therefore crucial when joining tables in cross-domain queries. The analysis of keys depends on the results of dependency analysis.
Once the table is loaded, right-click on it and select Dependency analysis.
This option will be disabled if the table hasn’t already been profiled.
Using the Analyse dependencies dialog, specify the settings for the analysis such as the maximum number of columns that will be considered as a multi-column key and the minimum correlation percentage to be considered as dependency.
The analysis result will be shown under the table name, following the Configured and Loaded status.
To see the generated results, right-click on the table and select Dependencies or Keys.
This menu lists all your data sources both raw files and database connections.
My files contains files available to you only; Data sources lists all the available database connections.
Transforming your data is easy with Aperture Data Studio. Visual workflows are built using a variety of drag-and-drop steps.
When creating/modifying workflows, the top menu allows you to:
By default, the auto-connect for the workflow steps is enabled, allowing two steps with input/output nodes to be automatically linked when moved close to each other. You can disable this by clicking Disable snap; click Enable snap to turn the auto-connect back on.
To import one or more workflows (.wfl), click in the top menu.
Once added to the workspace, each workflow step can be renamed: double-click on the title of the step and enter the required name.
This step allows you to see data quality trends over time.
Note that connecting from the Results for analysis node will mean that the output columns are automatically tagged for trend analysis.
In the step dialog, specify the following:
There are two options for displaying the trend results:
You can also use this step to create regular reports, providing insight into the quality of your data.
The branch step allows you to duplicate data from one input so that it can be used for further workflow steps.
The rows from the source data will be duplicated (branched) into two outputs. To create more than two duplicates of the data, you can link several branch steps.
Click Show data in the step dialog to view the results.
The chart step allows you to create interactive charts of your data at any point in your workflow.
Including this step within the workflow will not affect the data connected to it. Connect this step to one input to create various types of charts. Click Bar chart (2D) in the step dialog to change the type of chart.
To add more columns to the chart (up to 10), click Add data series column. To further customize the chart, click Additional options.
|Chart type||The type of chart you wish to draw (e.g. bar, line, pie).|
|Data label column||The column for labeling the x-axis. For example, if your data is split by year, use this setting to label the bars with their respective year.|
|Split series||Specify whether to display all the data series on the same chart or split them up.|
|Title||The title of your graph, displayed at the top.|
|Description||The description of your graph, displayed under the title.|
|X-axis label||Labels the X-axis (at the bottom).|
|Y-axis label||Labels the Y-axis (on the left).|
|Show legend||Specify whether to display the legend at the bottom.|
|Show trendline||Specify whether to calculate and draw a trendline on the graph - a straight line of the best fit for your data.|
|Show X/Y zero line||Specify whether to draw a line at the origin for the axis (useful if you have negative data).|
|Logarithmic X/Y axis||Specify whether to convert the selected axis to a logarithmic scale.|
|Show horizontal/ vertical grid lines||Specify whether to draw grid lines in the respective axis.|
|Stacked?||Specify whether to stack your data on top of each other.|
|Show section labels||(Pie chart only) Specify whether to show or hide the labels on segments.|
Click Show chart in the step dialog to view your data as an interactive chart. Click View data to also show the data alongside the chart.
The chart will be based on the top 1,000 rows.
The export step allows you to export workflow's output data.
Each workflow can contain multiple export steps, so you can export data at any point. Each export step can be configured individually.
Click Settings in the step dialog to specify export file type (.csv, .txt, .xls or .xlsx, .xml, .dat or .json) and location (server’s export directory or data source).
To execute workflows with multiple outputs, the export settings defined in each export step have to be used.
Exporting to a DBMS
When the Workflow output type is DBMS, use Mode to determine how the data will be inserted into the target system:
“INSERT INTO tablename (column1, …) VALUES (value1, …)”
“UPDATE tablename SET (column1=value1, …) WHERE (columnName=value, … )”
“DELETE FROM tablename WHERE columnName=value, … “
Grouping allows you to group values from one input into one or more columns and aggregate the remaining columns, if required.
Click Grouping in the step dialog then drag and drop the required columns up, into the Grouped list. Remove columns from this list by simply dragging them down.
Click to apply the changes and Show data in the step dialog to view the results.
You also configure grouped columns and use aggregate functions on the grouped values when in Show data view: right-click on the grouped column heading and select Add aggregate. In the dialog that appears, specify the name/description for the new column, the source column to be used for aggregation and the required function.
For some aggregation functions, you can choose to only process unique values by enabling the Distinct option.
The join step allows you to join two inputs into one based on a column from each input.
To specify the columns you want to join with, click Undefined column and select the required one. Do this for the second data source.
The type of join can be configured by clicking on the Venn diagram. There are four types of join: left, right, inner or full.
Use the All matches menus to specify different combinations of join. The top left menu is for the top join table and the bottom right one for the second join table.
In the join dialog, click Show data to view the results of the join. Specify which values from the joined columns to use in any following steps: all the matches, only duplicates or only singletons.
Click to see the suggested joins. Only exact matches from the two sources will be suggested. Click on the suggested join to apply it.
The multi view step is a very useful step for viewing your data side by side (horizontally or vertically).
You can link this step to 2-3 data sources to view and compare the data. By default, the horizontal view will be selected. To change to the vertical view, click Horizontal split.
Click Show view to display the results. For more complex views, you can also link this step to another multi-view step.
Only the top input will be passed on to a subsequent non-multi view step.
This step allows you to profile data at any point in a workflow. We examine each value and identify various attributes (such as type, min/max values, count) and determine its uniqueness and completeness.
Click on Profiled columns to select the column(s) you want to profile. In the Profiled columns list on the left, select the required ones and click . To see the results, click Show data in the step dialog.
Note that results might take some time to load.
You can now export or take a snapshot of your profiled data to track data uniqueness over time.
This will allow you to use the script as a workflow step to transform your data.
Click Show data in the step dialog to view the results. The output can then be used as input to any further step(s).
To use this step, you have to have at least one R script available in the script library: Glossary > Scripts.
This will allow you to use the script as a workflow step to transform your data.
Click Show data in the step dialog to view the results.
A snapshot is a versioned copy of data taken at a certain point in time (and saved internally in Data Studio as a table with any transformations). Each snapshot is automatically assigned a unique name which you can modify.
Once created, a snapshot will appear in the list of Available data sources and can be used in any workflow.
Take snapshot step
This step will create a snapshot (once the workflow is executed). Each time you execute a workflow with this step, a new version of the snapshot will be created.
Use latest snapshot step
This step allows you to use the latest available snapshot. First, choose the workflow it belongs to then select the required snapshot.
Use snapshot range step
This step allows you to use several snapshots at a time (it's the union of all the snapshots in the range). If your data changes over time, you can compare the current version to the previously created snapshots. To choose the required snapshot range, find the workflow it belongs to then select the snapshot range.
Click Show data to see the details of all your snapshots in one table.
Find out how to configure an ODBC connection.
Splicing allows you to combine two inputs into one by presenting the values next to each other.
Data in the columns is not combined or merged - the duplicated values are renamed with an appended numeric value. The output – combined values from two sources – can then be used as input to any further step(s).
Click Show data in the step dialog to view the results.
The split step allows you to apply a true/false filter to one input and split the resulting data into two outputs based on these values.
To create a filter, click Filter in the step dialog and Create in the left-hand menu. Specify the necessary filters and click Apply to save the changes.
Click Show passing/Failing rows to view the values that passed/failed the filter. The passing and failing results can be individually linked to any further step(s).
The union step allows you to combine two inputs into one output by presenting the values on top of each other. The source columns with matching headings are combined.
The output – combined values from two sources – can then be used as input to any further step(s).
Click Show data in the step dialog to view the results.
The validation step allows you to identify data that matches the specified rule(s). The results will either be true or false.
To perform validation, you have to set validation rule(s):
You will be taken to the Expression editor, allowing you to define the required rule(s). Once created, apply the rule(s) using the Edit validation configuration dialog.
Click View or sort rows to preview the data that will be validated.
To view validation results, click to see the required data: passing, failing and ignored rows. Each of these results can be further analysed by linking them to other workflow steps.
To view information on each row, click Show data. You can right-click on a failing row and select Reasons for failed rules to see even further detail.
Click Results by rule to view the overall summary of validation results.
Use this step to validate emails based on the format or domain address.
The two validation types are available:
You can customize the way emails are validated:
We use a Domain Name System (DNS) lookup to check whether the email’s domain exists. If not specified, the DNS server will default to Google (18.104.22.168) which may be blocked by your firewall. If blocked, use nslookup to find a DNS server that can be used on your network and add it by clicking Add a new DNS server.
By default, the validation results are cached with results refreshed every 30 days. To change this, go to Configuration > Step settings > Email step > Email validation cache validity.
This tab lists all the outputs in your workflow, providing a list view which is useful for larger/complex workflows. To quickly find a specific output, right-click on it and select Locate.
Workflows can be shared with other users connected to the same instance of Data Studio. If you share a workflow, other users will be able to access and modify it.
To share a workflow, go to Workflow Designer, select the required one and click Share.
Note that multiple users can edit the same shared workflow simultaneously, and therefore potentially overwrite each other's changes.
All the shared workflows available to you will appear in the Shared workflows list.
To revert access to a workflow, select it and click Unshare. Note that administrators are able to unshare any workflow.
A workflow can be executed on an ad-hoc basis or configured to be executed in the future (and at defined intervals, if required).
Go to Workflow Designer and click Execute on the workflow you want to execute now or later.
In the Schedule workflow execution dialog specify when and how often you want to execute the workflow and whether to export the output table.
When scheduling the workflow to be executed later, use the End time type to specify when the End action will be applied. This is the action that will be performed after the workflow is executed. The options are:
You can also choose to export the workflow as a file or a script.
Refreshing data sources for workflow execution
When the workflow is re-executed, the data that was most recently read form the data source will be used.
To force all data sources used in a workflow to be automatically refreshed before each execution, the following conditions have to be met:
By default, the refresh setting is disabled for all workflows. To enable it for all new workflows, go to Configuration > All server settings and enable Default workflow refresh on execution.
Define your business terminology, create data tags/scripts and other objects used by workflow steps. You can also view and edit the blocking keys and rulesets for the Find duplicates step.
Note that Consumer users will not be able to see this area. Find out about user roles.
The constants area shows the existing constant libraries. You can view the standard business constants installed with Data Studio or create your own. The business constants supplied are regular expressions that can be used in transformations to verify the format of commonly found data fields. You can also create and define your own business constant libraries to use in transformations or with custom steps.
You can create new and manage existing data tags using Glossary.
Experian Match blocking keys
This contains the default blocking keys used by the Find duplicates step. You can't change the default set but administrators can create their own keys to customize the behavior of the step.
Experian Match rulesets
This contains the default rulesets used by the Find duplicates step. You can't change the default ones but administrators can create their own rulesets to customize the behavior of the step.
The scripts section allows you to define scripts that can be used by the Script steps in a workflow. Scripts have to be added to a script library.
To add a script library, click Create a new script library, enter a name of the library, a description (optional) and click Okay. Note that script libraries can be deleted only when they are empty.
All scripts have to be added to an existing library. To add a new script, click on the required library then click Create a new script. Scripts require a unique name, the area of use, indication of whether they can output logging to a file, the language type and the script code. The script definition can optionally also have parameters to change the script’s behavior.
Manage users, configure workflow steps and specify product settings.
Note that only administrators will be able to see this area. Find out about user roles.
These allow you to customize Data Studio. You can specify:
Configure certain workflow steps and their behavior.
All users in Data Studio have to have a role assigned to them.
The following roles are available:
Note that these roles can’t be modified.
|Capability||Consumer||Designer and developer||Administrator|
|Access to Data Explorer||Yes||Yes||Yes|
|Access to Workflow Designer||Yes||Yes||Yes|
|Access to Monitoring||Yes||Yes||Yes|
|Access to Configuration||Yes|
|Access to Glossary||Yes||Yes|
To create a new user:
Note that you can add as many users as you want but the number of active users you can have will depend on your licence. Once your user limit is reached, you won't be able to set users to Enabled.
You can also group users for easier management by assigning them to teams.
Users with the Administrator role will be automatically assigned to the Administrators team. By default, there’s a team called Initial group which you can modify.
This is the method of authentication that will be used. The following options are available:
To manage user details, go to Configuration > Users. Right-click on the required user to edit their details/change password or disable them.
View logs, manage scheduled and running jobs.
A list of all audit events such as user logins, viewed data and workflow operations (create, delete, share).
A list of running, queued and completed jobs (e.g. workflow executions).
A list of detailed system logs. To download logs as a .txt file, click on your username in the top menu and select Download log files. Note that this option is available to administrators only.
A list of temporary processes that may run to support various one-off system operations.
A list of all repository objects (e.g. users, business constants, data tags, regular expressions, data sources).
A list of various checks that the system will run periodically (e.g checking disk space).
A list of permanent processes that support various system operations (e.g job scheduling, housekeeping, loading, alert notification).
A list of user and server sessions.
A list of all snapshots. This includes information such as when snapshots were created and how much disk space is currently used. Also, administrators have the option to delete individual snapshots.
A list of the key sub-directories that form the database.
Various statistics relating to the running of the system (e.g. number of CPUs, free memory, network information).
A list of running and completed data operation tasks (e.g. load, profile, find duplicates).
Data Studio allows you to audit (record and report) user and system activities.
The audits are encrypted and stored in \ApertureDataStudio\data\alerts\audits.txt.
Enabling and tuning
Auditing can produce lengthy logs, not all of which may be essential to your requirements, so to help tailor and target your auditing, you can enable auditing based on specific events.
You can view and change audit options in Configuration > Security > Audit types.
Once an audit is enabled, every time the event occurs, an entry is created in the audit log, detailing the cause, user information and when it happened.
Reporting and monitoring
The events generated by active audits are visible in Monitoring > Audits.
The resulting view shows all the events generated by Data Studio with the most recent events at the top. The view can be sorted, filtered, or exported to a .csv file. This view will also update itself periodically and display any events that have happened in the meantime.
When you right-click on a column heading and select Add transformation, you will be taken to the expression editor.
You can try various transformations without affecting data. Your changes will not be saved until you click Apply in the top menu.
The left-hand menu contains two tabs: one for transformation functions and the other for connections.
You can either scroll down to find the required function or use the search box to find it then drag and drop in the area on the right.
Example: remove special characters from phone numbers.
The Expression preview in the bottom-left corner allows you to see how the data will be transformed if you save your changes. You can also see how a transformation will behave by using sample/dummy data: click Single at the bottom and enter the values you want to test.
To revert the transformation, right-click on the column header and select Remove transformation.
You can either continue building on this transformation using the Workflow Designer by clicking Save as workflow or save the data as a .csv file by clicking Download as .CSV.