Skip to main content

Aperture Data Studio user guide

Overview

We strongly recommend using Chrome to access the application.

If you have any issues or questions, try troubleshooting or contact support.

Find out about main features and benefits of Aperture Data Studio.

Aperture Data Studio allows you to:

  • Get started quickly with an easy installation and an intuitive browser-based interface
  • Save time and money by creating operational data quality workflows
  • Improve transparency with audit reports
  • Increase efficiency of your data migrations
  • Standardise and improve quality of your data with powerful data transformations
  • Encourage collaboration with reusable workflows

Quick start

Once you've installed the application, we recommend that you change your password before uploading your files. Click on your username and select Change password.

To get started, see an overview of the functionality or try out use cases.

Functionality overview

We recommend that you change your password before uploading your files. Click on your username and select Change password.

  1. Upload your data using the Data Explorer. You can either:
    • upload your source files: right-click on My files > Upload file or
    • connect to your database using a JDBC driver: Data sources Click here to create a new data source.

  2. Preview, configure and tag your data.

    Preview the first 1,000 rows and check that your data is in order. If required, make changes: right-click on your file/data source and select Preview and configure.
    You can move, rename and exclude column headings and change character sets/delimiters/quotation marks. Click Apply in the top-right corner when you’re happy with the changes. The file will now appear as 'Configured'.

  3. Load and explore your data.
    Right-click on it and select View data. The previewed file will now appear as Loaded. To profile the data once the file is loaded, click View Profile/Outliers. It will appear as Profiled.

    Larger files might take a while to load. A percentage progress bar will be shown over the file name in the explorer view (on the left-hand side).

    You can now understand your data: filter, sort, view the outliers and discover inconsistencies. Ensure the loading has completed before you start manipulating the data.

  4. Create workflows.
    In the Workflow Designer, click Create a new workflow, name it and click Submit. Add your loaded file(s) and open the Workflow steps tab to manipulate your data as required in order to:
    • create a new and improved version of your data,
    • structure and prepare it for migration,
    • validate against user defined rules,
    • find matching/duplicated values,
    • cleanse records and more.

  5. Report/export your data.
    In the Workflow Designer, you can create a .pdf report for each workflow: click Report in the top menu. 
    You can also save the workflow output as a .csv, .txt, .xls, .xml, .dat, or a .json file by clicking Execute in the top menu. To re-use a workflow, save it as a .wfl file by clicking Export in the top menu.

Data Explorer

Data Explorer allows you to manage your data sources, prepare and configure your data.

Sources

This menu lists all your data sources: both raw files and database connections.

My files contains files uploaded by you only; Data sources lists all the available connected/configured database connections.

To upload/connect to your data, do one of the following:

  • drag and drop files into the window or
  • go to My files > Click here to upload a file or
  • click upload in the top menu or
  • connect to your database using a JDBC driver: Data sources > Click here to create a new data source.

The supported file formats are:

  • .csv
  • .txt
  • .xls
  • .xlsx
  • .sas7dbat
  • .psv
  • .dat 

Note that .dat files have to be uploaded together with the associated metadata files such as .cob or .ddl.

The supported database drivers/services are:

You can create JDBC connections for the following databases:

  • Amazon Redshift
  • Apache Hive which supports:
    • Apache Hadoop Hive
    • HortonWorks
    • Cloudera CDH Hive
    • Amazon EMR Hive
    • IBM BigInsights Hive
    • MapR Hive
    • Pivotal HD Hive
  • DB2
  • Greenplum which supports:
    • Pivotal Greenplum
    • Pivotal HAWQ
  • IBM Informix
  • MongoDB
  • MySQL
  • Oracle Database
  • Oracle Service Cloud
  • PostgreSQL
  • Progress OpenEdge
  • Salesforce which supports:
    • Salesforce.com
    • Veeva CRM
    • Force.com Applications
    • Financial Force
  • Spark SQL
  • Microsoft SQL which supports:
    • Microsoft SQL Server
    • Microsoft SQL Azure
  • SAP Sybase

Find out how to add a custom JDBC driver

Preview and configure

To ensure your data has been interpreted correctly and is displayed as expected, you can preview it: right-click on your source data and select Preview and configure. A preview of the first 1,000 rows will be shown. 

We strongly recommend that you preview your data before loading it. This ensures that the data we cache is as expected.

You can move, rename and exclude column headings and change character sets/delimiters/quotation marks. 

To configure how the data should be interpreted, use the Data tab.

Depending on the file type, you can specify various settings. For example, for .csv files:

  • Character set: the character set for the data.
  • Language and region: the formatting region (e.g. whether to use UK or US format for dates).
  • End of line delimiter: the sequence of characters denoting the end of the line.
  • Delimiter: the delimiter to be used for separating fields in this file.
  • Quote: the delimiter to be used when loading quoted values from delimited files.
  • Headings on first line: include or exclude column headings in the first row.

The Headings tab allows you to manipulate the column headings: you can rename, re-order, remove and exclude them. You can also tag data.

Alternatively, modify columns by right-clicking on the column header, selecting the required option and clicking Apply in the top-right corner to save your changes. 

Standardize data

You can standardize your data easily before loading it into Data Studio: right-click on the column header and select Edit.

You can now apply one or more standardisation options: 

    • Auto-parse integers: convert numeric values with no fractional part to integers. We don’t recommend disabling this option if the column values are integer numbers.
    • Auto-parse decimals: convert numeric values with fractional parts to decimals. We don’t recommend disabling this option if the column values are decimal numbers.
    • Auto-parse dates: parse dates based on the language for the file.
    • Auto-parse scientific numbers: convert values in scientific notation (e.g. ‘1E47’) into numbers.
    • Treat blanks as null: we recommend to have this option enabled to treat all blank values in the column as empty/null.
    • Remove leading blanks: we recommend to have this option enabled to remove the leading white space from the column values.
    • Remove trailing blanks: remove the trailing white space from the column values.

      Note that if the Treat blanks as null option is turned off, and both Remove leading blanks and Remove trailing blanks are on, Data Studio will ensure that at least a single space is retained for entirely blank values.

If you want to keep the null and blank values separate, we recommend that you turn off Remove leading blanks and Remove trailing blanks. Otherwise, Data Studio will not load values with varying spaces and will not treat them as different values.

By default, all of the standardization options above are enabled (except for Auto-parse scientific numbers). To change the enabled/disabled options for all files:

  1. Go to Configuration > All server settings.
  2. Find Default standardisation rules.
  3. Enable/disable options as required and click Apply

In preview, you can only move headings, not data. 

To rearrange columns, you can simply drag and drop the column headings. Click Apply in the top-right corner when you’re happy with the changes. The file will now appear as 'Configured'.

To revert all the changes made to your data:

  • click Cancel in the top menu to remove changes made since the last configuration or
  • right-click on the file and select Reset configuration when not previewing the file to completely reset the configuration. 

Data tagging

Data tags allow you to enrich source data by specifying what each column contains. Tags are also used in various workflow steps such as Find duplicates and Validate addresses.

Note that if you upload a file with the same name, the tags will be retained.

There are two types of data tags:

  • System defined (cannot be modified, used in the workflow steps)
  • User defined


Tags in Data Studio are hierarchical: a parent tag (e.g. address) can have multiple child ones (e.g. street, city). 

While you can't modify system defined tags, you can create a child tag (e.g. PO box) and assign it to the system one (e.g. Address). Once created, it will appear in the User defined list under that system parent tag.

To create new and manage existing tags, go to Glossary > Data tags. To add a new one, click Create new data tag.

The best way to tag data is before it has been loaded, in preview:

  1. Go to Data Explorer.
  2. Right-click on the required file and select Preview and configure.
  3. Open the Headings tab to see all your column headings.
  4. Click Multi select and choose one or more headings of the columns you want to tag (e.g. City, Country).
  5. Right-click and select Tag columns.
  6. In the Tag column dialog, click Edit.
  7. Apply the necessary tag(s) (e.g. Address) and click away for changes to be saved.
    The Selected tab will show all the tags applied to this column.
  8. Click Tag to save changes.


If you tag columns in a workflow step, the tags will be applied to that column in all the subsequent steps for that workflow only. 

View data

View data and transform it as required. 

Changes made when viewing files in Data Explorer are temporary and will not persist. However, the operations you make can be saved as a workflow and the data changes can be saved to a file.

Once the data is loaded you can: 

  • Configure and transform your data.
  • Save as workflow: saves your transformation/configuration as a workflow in 'My workflows' list in the Workflow Designer.
  • Download as .CSV: saves your transformed data as a .csv file in your browser’s default download location.

Dependency/Key analysis

Dependencies are normally used for table normalisation – a process of decomposing a table into smaller tables in terms of attribute sets.

Keys are used to identify records in a table and are therefore crucial when joining tables in cross-domain queries. The analysis of keys depends on the results of dependency analysis.

Once the table is loaded, right-click on it and select Dependency analysis

This option will be disabled if the table hasn’t already been profiled.

Using the Analyse dependencies dialog, specify the settings for the analysis such as the maximum number of columns that will be considered as a multi-column key and the minimum correlation percentage to be considered as dependency.

The analysis result will be shown under the table name, following the Configured and Loaded status.

To see the generated results, right-click on the table and select Dependencies or Keys.

 

Workflow Designer

Transform, manipulate, analyze and export your data with re-usable graphical workflows. 

If you have issues or questions with workflows, try the troubleshooting.

Available sources

This menu lists all your data sources both raw files and database connections.

My files contains files available to you only; Data sources lists all the available database connections.

Workflow steps

Transforming your data is easy with Aperture Data Studio. Visual workflows are built using a variety of drag-and-drop steps.

Some steps (e.g. Find duplicates or Validate addresses) will have to be configured and licensed before you can use them. Find out about licensing.

When creating/modifying workflows, the top menu allows you to:

  • create a .pdf report on each workflow by clicking Report,
  • re-use a workflow by saving it as a .wfl file: click Export,
  • save the workflow output as a .csv, .txt, .xls, .xlsx, .xml, .dat or .json file by clicking Execute (single output only),
  • auto-arrange the workflow steps by clicking Arrange,
  • change the cursor to be able to select multiple steps by clicking Select. To switch back, click Move.

By default, the auto-connect for the workflow steps is enabled, allowing two steps with input/output nodes to be automatically linked when moved close to each other. You can disable this by clicking Disable snap; click Enable snap to turn the auto-connect back on.

To import one or more workflows (.wfl), click  in the top menu.

Once added to the workspace, each workflow step can be renamed: double-click on the title of the step and enter the required name.

Analyze trends step

This step allows you to see data quality trends over time. 

It works best when combined with the Use snapshot range step, using snapshots taken from the Validate step (and connected to the final output Show data node). 

Note that connecting from the Show data node will mean that the output columns are automatically tagged for trend analysis. 

In the step dialog, specify the following:

  • Category - the column that determines the version of results (e.g. execution time)
  • Series - the column that determines the data series (e.g. rule name)
  • Metric - the column containing the names of the metrics (e.g. processed rows)
  • Value - the column containing the values for the metrics (e.g. 90)
  • Metric (for display) - which metric (e.g. passed rows) you wish to view across each Category and Series. Options will be populated from the Metric column. This affects both the data and chart outputs.

There are two options for displaying the trend results:

  • Show data - view as a grid with a visual indication (red or green triangle) for each value of the selected metric
  • Show chart - creates a line graph showing changes over time of the selected metric, for each series

You can also use this step to create regular reports, providing insight into the quality of your data.

Branch step

The branch step allows you to duplicate data from one input so that it can be used for further workflow steps.

The rows from the source data will be duplicated (branched) into two outputs. To create more than two duplicates of the data, you can link several branch steps.

Click Show data in the step dialog to view the results.

Chart step

The chart step allows you to create interactive charts of your data at any point in your workflow.

Including this step within the workflow will not affect the data connected to it. Connect this step to one input to create various types of charts. Click Bar chart (2D) in the step dialog to change the type of chart.

To add more columns to the chart (up to 10), click Add data series column. To further customize the chart, click Additional options.

Chart typeThe type of chart you wish to draw (e.g. bar, line, pie).
Data label column The column for labeling the x-axis. For example, if your data is split by year, use this setting to label the bars with their respective year.
Split series Specify whether to display all the data series on the same chart or split them up.
Title The title of your graph, displayed at the top. 
Description The description of your graph, displayed under the title. 
X-axis label Labels the X-axis (at the bottom). 
Y-axis label Labels the Y-axis (on the left). 
Show legend Specify whether to display the legend at the bottom. 
Show trendline Specify whether to calculate and draw a trendline on the graph - a straight line of the best fit for your data. 
Show X/Y zero line Specify whether to draw a line at the origin for the axis (useful if you have negative data). 
Logarithmic X/Y axis Specify whether to convert the selected axis to a logarithmic scale. 
Show horizontal/ vertical grid lines Specify whether to draw grid lines in the respective axis. 
Stacked? Specify whether to stack your data on top of each other. 
Show section labels (Pie chart only) Specify whether to show or hide the labels on segments. 

Click Show chart in the step dialog to view your data as an interactive chart. Click View data to also show the data alongside the chart.

The chart will be based on the top 1,000 rows.

Export step

The export step allows you to export workflow's output data.

Each workflow can contain multiple export steps, so you can export data at any point. Each export step can be configured individually.

Click Settings in the step dialog to specify export file type (.csv, .txt, .xls or .xlsx, .xml, .dat or .json) and location (server’s export directory or data source).

To execute workflows with multiple outputs, the export settings defined in each export step have to be used.


Exporting to a DBMS

When the Workflow output type is DBMS, use Mode to determine how the data will be inserted into the target system:

  • Insert - rows are inserted into the table. The equivalent SQL statement is:
    “INSERT INTO tablename (column1, …) VALUES (value1, …)”
  • Update - rows are updated. The mapping dialog will show which source columns are mapped to which target columns and which ones have been specified as keys. You can change the mapping and keys at this stage. The keys referred to are those columns that will appear in the equivalent SQL statement’s WHERE clause:
    “UPDATE tablename SET (column1=value1, …) WHERE (columnName=value, … )”
  • Delete - rows are deleted. The keys are the column names used in the WHERE clause:
    “DELETE FROM tablename WHERE columnName=value, … “ 
  • Insert or update (if insert fails) - if a row insert fails (there is a key collision), then the statement will be retried as an update.
  • Update or insert (if update fails) - if the update fails (no record found with the given key), then the statement will be retried as an insert.
  • Create table - selecting this option will automatically create a table in the database. You will have to specify a name for the new table which will be validated against the data source. 

Group step

Grouping allows you to group values from one input into one or more columns and aggregate the remaining columns, if required.

Click Grouping in the step dialog then drag and drop the required columns up, into the Grouped list. Remove columns from this list by simply dragging them down.

Click  to apply the changes and Show data in the step dialog to view the results.

You also configure grouped columns and use aggregate functions on the grouped values when in Show data view: right-click on the grouped column heading and select Add aggregate. In the dialog that appears, specify the name/description for the new column, the source column to be used for aggregation and the required function.

For some aggregation functions, you can choose to only process unique values by enabling the Distinct option.

Join step

The join step allows you to join two inputs into one based on a column from each input.

To specify the columns you want to join with, click Undefined column and select the required one. Do this for the second data source.

The type of join can be configured by clicking on the Venn diagram. There are four types of join: left, right, inner or full.

Use the All matches menus to specify different combinations of join. The top left menu is for the top join table and the bottom right one for the second join table.

  • All matches - perform a standard join
  • Singletons only - will return the first match of a 1 to many match AND any matches that occur only once (this is a normal relational join with duplicates removed)
  • Duplicates onlywill return all the matches of things that are 1 to many only (this is a normal relational join excluding things that match only once)

In the join dialog, click Show data to view the results of the join. Specify which values from the joined columns to use in any following steps: all the matches, only duplicates or only singletons.

Click  to see the suggested joins. Only exact matches from the two sources will be suggested. Click on the suggested join to apply it. 

Multi view step

The multi view step is a very useful step for viewing your data side by side (horizontally or vertically).

You can link this step to 2-3 data sources to view and compare the data. By default, the horizontal view will be selected. To change to the vertical view, click Horizontal split.

Click Show view to display the results. For more complex views, you can also link this step to another multi-view step.

Only the top input will be passed on to a subsequent non-multi view step.

Profile step

This step allows you to profile data at any point in a workflow. We examine each value and identify various attributes (such as type, min/max values, count) and determine its uniqueness and completeness.

Click on Profiled columns to select the column(s) you want to profile. In the Profiled columns list on the left, select the required ones and click . To see the results, click Show data in the step dialog. 

Note that results might take some time to load.

You can now export or take a snapshot of your profiled data to track data uniqueness over time. 

Script (JavaScript) step

To use this step, you have to have at least one JavaScript available in the script library: Glossary > Scripts.

This will allow you to use the script as a workflow step to transform your data.

Click Show data in the step dialog to view the results. The output can then be used as input to any further step(s).

Script (R) step

To use this step, you have to have at least one R script available in the script library: Glossary > Scripts.

This will allow you to use the script as a workflow step to transform your data.

Click Show data in the step dialog to view the results. 

Snapshot steps

A snapshot is a versioned copy of data taken at a certain point in time (and saved internally in Data Studio as a table with any transformations). Each snapshot is automatically assigned a unique name which you can modify.

Once created, a snapshot will appear in the list of Available data sources and can be used in any workflow. 

Take snapshot step
This step will create a snapshot (once the workflow is executed). Each time you execute a workflow with this step, a new version of the snapshot will be created.

Use latest snapshot step
This step allows you to use the latest available snapshot. First, choose the workflow it belongs to then select the required snapshot.

Use snapshot range step
This step allows you to use several snapshots at a time (it's the union of all the snapshots in the range). If your data changes over time, you can compare the current version to the previously created snapshots. To choose the required snapshot range, find the workflow it belongs to then select the snapshot range.

Click Show data to see the details of all your snapshots in one table.

Find out how to configure an ODBC connection.

Splice step

Splicing allows you to combine two inputs into one by presenting the values next to each other.

Data in the columns is not combined or merged - the duplicated values are renamed with an appended numeric value. The output – combined values from two sources – can then be used as input to any further step(s).

Click Show data in the step dialog to view the results.

Split step

The split step allows you to apply a true/false filter to one input and split the resulting data into two outputs based on these values.

To create a filter, click Filter in the step dialog and Create in the left-hand menu. Specify the necessary filters and click Apply to save the changes.

Click Show passing/Failing rows to view the values that passed/failed the filter. The passing and failing results can be individually linked to any further step(s). 

Transform step

This step allows you to transform data in numerous ways. See Expression editor for more details.

Union step

The union step allows you to combine two inputs into one output by presenting the values on top of each other. The source columns with matching headings are combined.

The output – combined values from two sources – can then be used as input to any further step(s).

Click Show data in the step dialog to view the results.

Validate step

The validation step allows you to identify data that matches the specified rule(s). The results will either be true or false.

To perform validation, you have to set validation rule(s):

  • select from the list of suggested rules (picked up from the input columns) or
  • click Configure rules then add a new rule in the step dialog.

You will be taken to the Expression editor, allowing you to define the required rule(s). Once created, apply the rule(s) using the Edit validation configuration dialog.

Click View or sort rows to preview the data that will be validated.

To view validation results, click to see the required data: passing, failing and ignored rows. Each of these results can be further analysed by linking them to other workflow steps.

To view information on each row, click Show data. You can right-click on a failing row and select Reasons for failed rules to see even further detail.

Click Results by rule to view the overall summary of validation results. 

Validate emails step

Use this step to validate emails based on the format or domain address.

The two validation types are available:

  • Format check - whether the email is in a valid format (a syntax check against the RegEx values as defined in the Glossary)
  • Domain level - whether the email domain is valid/exists or not (a DNS server check to see if the domain exists and is an email server)


You can customize the way emails are validated:

  • specify your company's DNS servers: go to Configuration > Step settings > Validate emails > DNS servers.

    We use a Domain Name System (DNS) lookup to check whether the email’s domain exists. If not specified, the DNS server will default to Google (8.8.8.8) which may be blocked by your firewall. If blocked, use nslookup to find a DNS server that can be used on your network and add it by clicking Add a new DNS server.

  • define email address format: go to Glossary > Constants > Email Expressions > Email Address


By default, the validation results are cached with results refreshed every 30 days. To change this, go to Configuration > Step settings > Email step > Email validation cache validity.

Workflow outputs

This tab lists all the outputs in your workflow, providing a list view which is useful for larger/complex workflows. To quickly find a specific output, right-click on it and select Locate.

Share workflows

Workflows can be shared with other users connected to the same instance of Data Studio. If you share a workflow, other users will be able to access and modify it.

To share a workflow, go to Workflow Designer, select the required one and click Share.

Note that multiple users can edit the same shared workflow simultaneously, and therefore potentially overwrite each other's changes.

All the shared workflows available to you will appear in the Shared workflows list.

To revert access to a workflow, select it and click Unshare. Note that administrators are able to unshare any workflow.

Execute workflows

A workflow can be executed on an ad-hoc basis or configured to be executed in the future (and at defined intervals, if required).

Go to Workflow Designer and click Execute on the workflow you want to execute now or later.

In the Schedule workflow execution dialog specify when and how often you want to execute the workflow and whether to export the output table.  

When scheduling the workflow to be executed later, use the End time type to specify when the End action will be applied. This is the action that will be performed after the workflow is executed. The options are:

  • None – no action performed after execution
  • Kill – if the end time has been specified and the execution hasn’t yet finished, workflow execution will be stopped
  • Reschedule – when the process completes or the end time has been reached, the execution will be rescheduled to run again according to the specified time intervals

You can also choose to export the workflow as a file or a script.

 

Refreshing data sources for workflow execution

When the workflow is re-executed, the data that was most recently read form the data source will be used.

To force all data sources used in a workflow to be automatically refreshed before each execution, the following conditions have to be met:

  • The refresh setting is enabled: select the required workflow and click Edit details. Enable Refresh on execution and click Apply.
  • If the data source is a file, it has to be modified more recently than the last time the data was read.
  • No other workflow that would cause the data source to be refreshed is being executed. This prevents two workflows attempting to refresh the same data source concurrently.

 

By default, the refresh setting is disabled for all workflows. To enable it for all new workflows, go to Configuration > All server settings and enable Default workflow refresh on execution.

Glossary

Define your business terminology, create data tags/scripts and other objects used by workflow steps. You can also view and edit the blocking keys and rulesets for the Find duplicates step.

Constants
The constants area shows the existing constant libraries. You can view the standard business constants installed with Data Studio or create your own. The business constants supplied are regular expressions that can be used in transformations to verify the format of commonly found data fields. You can also create and define your own business constant libraries to use in transformations or with custom steps.

Data tags
You can create new and manage existing data tags using Glossary.

Experian Match blocking keys
This contains the default blocking keys used by the Find duplicates step. You can't change the default set but administrators can create their own keys to customize the behavior of the step.

Experian Match rulesets
This contains the default rulesets used by the Find duplicates step. You can't change the default ones but administrators can create their own rulesets to customize the behavior of the step. 

Scripts
The scripts section allows you to define scripts that can be used by the Script steps in a workflow. Scripts have to be added to a script library.

To add a script library, click Create a new script library, enter a name of the library, a description (optional) and click Okay. Note that script libraries can be deleted only when they are empty.

All scripts have to be added to an existing library. To add a new script, click on the required library then click Create a new script. Scripts require a unique name, the area of use, indication of whether they can output logging to a file, the language type and the script code. The script definition can optionally also have parameters to change the script’s behavior.

Configuration

Manage users, configure workflow steps and specify product settings.  

Note that only administrators will be able to see this area. Find out about user roles.

System settings

These allow you to customize Data Studio. You can specify:

  • All server settings
  • Communication
  • Data handling
  • Loading
  • Security
  • Storage
  • System

Find out about how to change the database location or add a SSL certificate.

Step settings

Configure certain workflow steps and their behavior.

  • Enable steps
  • Find duplicates
  • Validate addresses
  • Validate emails

Manage users

All users in Data Studio have to have a role assigned to them.

The following roles are available:

  • Consumer
  • Designer and developer
  • Administrator

Note that these roles can’t be modified.

Capability Consumer Designer and developer Administrator 
Create workflow    Yes   Yes
Delete workflow    Yes  Yes
Edit workflow    Yes  Yes
Execute workflow  Yes  Yes   Yes
Manage Security       Yes
Update licenses        Yes

To create a new user:

  1. Go to Configuration and click Users.
  2. Click Create a new user.
  3. You have to enter the following:
    - Username (can contain alphanumeric characters and underscores only)
    - Display name (no restrictions)
    - Role (choose from one of the three roles)
    - Enabled (specify whether this user is active, i.e. able to use Data Studio)

Note that you can add as many users as you want but the number of active users you can have will depend on your licence. Once your user limit is reached, you won't be able to set users to Enabled.

You can also group users for easier management by assigning them to teams.

Users with the Administrator role will be automatically assigned to the Administrators team. By default, there’s a team called Initial group which you can modify.

Authentication mechanism
This is the method of authentication that will be used. The following options are available:

  • Aperture Data Studio internal – use this for users requiring access to Data Studio only, without any additional authentication.
  • LDAP and MS Active Directory (AD) server – use this to allow users to log in with their LAN IDs (fill in the LDAP username field). If this option is not available, go to Configuration > All server settings and enable LDAP properties. Find out how to configure LDAP.
  • Kerberos server – use the Kerberos username for the login.

To manage user details, go to Configuration Users. Right-click on the required user to edit their details/change password or disable them.

Monitoring

View logs, manage scheduled and running jobs.

Audits 
A list of all audit events such as user logins, viewed data and workflow operations (create, delete, share). 

Jobs
A list of running, queued and completed jobs (e.g. workflow executions).

Log messages
A list of detailed system logs. 

Processes 
A list of temporary processes that may run to support various one-off system operations.

Repository objects
A list of all repository objects (e.g. users, business constants, data tags, regular expressions, data sources).

Restart log 
A list of various checks that the system will run periodically (e.g checking disk space).

Services
A list of permanent processes that support various system operations (e.g job scheduling, housekeeping, loading, alert notification).

Sessions
A list of user and server sessions.

Snapshots
A list of all snapshots. This includes information such as when snapshots were created and how much disk space is currently used. Also, administrators have the option to delete individual snapshots.

Storage
A list of the key sub-directories that form the database.

System
Various statistics relating to the running of the system (e.g. number of CPUs, free memory, network information).

Tasks
A list of running and completed data operation tasks (e.g. load, profile, find duplicates).

Auditing

Data Studio allows you to audit (record and report) user and system activities.

The audits are encrypted and stored in \ApertureDataStudio\data\alerts\audits.txt.

Enabling and tuning
Auditing can produce lengthy logs, not all of which may be essential to your requirements, so to help tailor and target your auditing, you can enable auditing based on specific events.

You can view and change audit options in Configuration > SecurityAudit types.

Once an audit is enabled, every time the event occurs, an entry is created in the audit log, detailing the cause, user information and when it happened.

Audit events

  • Create a workflow: tracks when a workflow is created.
  • Data exported: tracks when data is exported, either via a workflow step or any other method.
  • Data viewed: tracks all attempts to view data in a drilldown.
  • Delete workflow: tracks when a workflow is deleted.
  • Edit workflow design: tracks when a workflow’s design is changed.
  • Edit workflow details: tracks when a workflow’s name or description is changed.
  • Execute Workflow: tracks when a workflow is executed.
  • Repository change: tracks when a repository object is created, edited or deleted. This includes tracking changes to users, datastores, entities, glossary objects and more.
  • Script execution: tracks when a script is executed.
  • User access: tracks user login, logout and login failure events.
  • Workflow shared: tracks when a workflow is shared or unshared.

 

Reporting and monitoring
The events generated by active audits are visible in Monitoring > Audits.

The resulting view shows all the events generated by Data Studio with the most recent events at the top. The view can be sorted, filtered, or exported to a .csv file. This view will also update itself periodically and display any events that have happened in the meantime.

Expression editor

When you right-click on a column heading and select Add transformation, you will be taken to the expression editor.

You can try various transformations without affecting data. Your changes will not be saved until you click Apply in the top menu.

The left-hand menu contains two tabs: one for transformation functions and the other for connections

You can either scroll down to find the required function or use the search box to find it then drag and drop in the area on the right.

Example: remove special characters from phone numbers.

  1. Go to Data Explorer > Sample data source.
  2. Drag and drop the Customer V1 file into the workspace.
  3. Right-click on the Telephone column heading and select Add transformation.
  4. In the search field on the left type in Remove noise and click on the function when it’s shown. It will appear in the right-hand side (workspace).
  5. Tick the following boxes: Remove vowels and Remove whitespace.
  6. Click Apply in the top-right corner. You will be taken back to the file view. The  icon will appear next to the Telephone header indicating that values in this column have been transformed.

The Expression preview in the bottom-left corner allows you to see how the data will be transformed if you save your changes. You can also see how a transformation will behave by using sample/dummy data: click Single at the bottom and enter the values you want to test.

To revert the transformation, right-click on the column header and select Remove transformation.

You can either continue building on this transformation using the Workflow Designer by clicking Save as workflow or save the data as a .csv file by clicking Download as .CSV.