Skip to main content

Aperture Data Studio - Workflow trigger execution

Overview

Workflow trigger execution is the process of triggering the execution of a workflow when a ‘watched’ file changes. This allows for the unattended execution of workflows when a new version of a file is copied/uploaded. 

This process is driven by a configuration file (written in YAML) which, when uploaded to Data Studio, will cause certain source files to be monitored such that when they change, the specified workflows are executed. 

Note that only Administrator users can upload YAML files. 

The configuration file (YAML)

The following sample YAML will be used to illustrate the intended behavior:

---
workflows:
 - name: mountains
 - sourceTriggers:
 - source: mountains
   Location: test yaml
   filenamePattern: mountains.csv
   appendExtension: .tmp.#

The file has to start with three dashes (---) followed by the keyword workflows:. Next, indented with two spaces and a dash is the keyword name: which is the (case-sensitive) workflow name. Lastly, the indented keyword sourceTriggers: starts a new section used to define various parameters for one or more source triggers.

Multiple source triggers are supported, however, the trigger is the logical OR of the data source changes (not AND) meaning that only one data source file has to change in order to trigger the workflow execution.

The parameters for the sourceTriggers: keyword are:

  • source (required)

    The name of the workflow source node that defines the data source (as shown in the Available data sources tab in the Workflow Designer). 

  • location (required)

    The name of a file data store defined in filedatastores.properties. The location may also be defined as a device:\directory\ (e.g. d:\data\import\5) on which the watched file resides. It must be accessible to the server i.e. must be a directory named in the filedatastores.properties file or an import directory. When the YAML file is processed, this location is resolved to an absolute location on disk.

  • filenamePattern (required) 

    Defines the name of a file/files that have to be watched. You may use a Regular Expression that must match a file used as a data source input. Note that on Linux file names are case sensitive. 

  • appendExtension (optional)

    An optional parameter indicating that the file has to be renamed prior to initiating the workflow to ensure that it has a unique name (and therefore won’t be overwritten before it's fully processed). If the extension contains a #, the # will be replaced by a number to ensure the uniqueness of the renamed files.

This example defines multiple workflows with multiple sources; the first time the customer file is used, it will be renamed to customer.csv.tmp.1; the next time it’s updated it will be renamed to customer.csv.tmp.2, etc.:

---
workflows:
- name: My Workflow
- sourceTriggers:
- source: Customer Data
  location: c:\data\customer
  filenamePattern: customer.csv
  appendExtension: .tmp.#
- source: Product Data
  location: c:\data\product
  filenamePattern: product_\d+.csv
  appendExtension: .tmp.#
- name: mountains
- sourceTriggers:
- source: mountains
Etc.

A typical filedatastores.properties should look something like this:

Pserver\u0020Demo\u0020Data=d\:/pserver demo data; flatten=false
Saturn\u0020Demo\u0020Data=s\:/data/Training and Demo Data; flatten=false
Test2=d\:/data; flatten=true
test\u0020yaml=c\:/test yaml

Note that the same source file may be used to trigger more than one workflow at the same time. The execution of each workflow occurs sequentially.

Uploading the YAML file

Only Administrator users can upload YAML files. 

You upload the YAML file in the same way as any other data file. The only requirement is that the file has to have the .yaml extension.

When a YAML file is uploaded, the FileUploadHandler will parse the file and report any parse errors to the user. The contents of the YAML file will replace the previous upload.

Therefore, to delete a workflow, remove the workflow name from the YAML file and re-upload it. To add a workflow, add it to the YAML file and re-upload it. To modify a workflow, change the details in the YAML file and re-upload it.

All existing triggers can be removed by uploading a YAML file with no workflows:

---
workflows:

Note that it's the contents of the YAML file that are relevant, not the YAML filename. If you upload a completely different YAML file then the previously watched workflows will be deleted.

The workflow entries in the YAML file will be checked for:

  • The name matches a known workflow (case dependent).
  • The source matches a known source name.
  • The location matches a data store name or directory that is known to the server and exists on disk.
  • The filenamePattern is a valid regular expression.
  • There are no invalid keywords in the YAML file.


This file will be loaded at server startup so that if the server is shut down and restarts, all previously watched workflows will be reloaded just as if the user had reloaded the YAML file.

All YAML file uploads and the resulting parse actions are reported back to the user in the UI and in the server’s log file. The uploads and all workflow executions are audited as usual.

The administrator may verify that the correct workflow triggers have been loaded by clicking on the username in the top menu and selecting Show Workflow Triggers.

Uploading the 'watched' file

When uploading a file with the same name as any of the 'watched' files, you will get an option to either overwrite or create a new version of the file.

To ensure the defined trigger continues to work, you have to overwrite the existing file. 

A dialog will appear when the job has completed successfully.

Notifications

You can set up notifications to report on the state of the triggered workflow. 

Because workflows are executed asynchronously, if the user is currently logged in, they will see the Job Completed dialog once a workflow has completed executing.