Microsoft Research Trident Workbench

Category Cross-Omics>Workflow Knowledge Bases/Systems/Tools

Abstract Trident is a scientific workflow workbench that is built on top of a commercial workflow system to leverage existing functionality.

Trident was developed in collaboration with the scientific community for use in a number of ongoing e-science projects that make use of scientific workflows.

Trident is implemented on top of ‘Windows Workflow’ (WF), a ‘workflow enactment engine’ included at No additional cost in the Windows operating system.

The Windows WF extensible development model enables the creation of domain specific activities which can then be used to compose workflows that are useful and understandable by domain scientists.

The key elements of the Trident architecture include a ‘visual composer’ and library that enable scientists to visually author a workflow using a catalog of existing activities and complete workflows.

The Trident Registry serves as a catalog of known data sets, services, workflows and activities, and compute resources, as well as maintaining state for all active workflows.

An execution engine supports launching workflows remotely and according to a schedule.

Admin tools are provided to allow users to register and manage computational resources, publish workflows for external use, and track all workflows currently running or recently completed.

Users can also schedule and queue workflow execution based on time, resource availability, etc.

A set of community tools includes a web service that enables users to launch workflows from any web browser and a repository that facilitates the publishing and sharing of workflows and workflow results with other scientists which integrate with MyExperiment (see G6G Abstract Number 20513).

Trident Runtime services –

WF provides several runtime services which can be used as required by attaching the service implementation to the workflow runtime.

Two (2) of the most useful for the manufacturer's implementation of Trident are:

1) Tracking service - This service enables event based tracking of a running workflow through the use of extensible tracking profiles.

2) Persistence service - This service allows the workflow executor to serialize and restore the entire working state of an in-progress workflow, allowing the executor to pause and resume workflows and archive intermediate state to any capable storage device.

Additional services can be constructed to run alongside these basic services.

The manufacturer's current implementation of Trident includes a service for automatic ‘provenance capture’, a monitoring service that listens for events pertaining to ‘machine utilization’, resources available, etc., a service that schedules workflows on High-Performance Computing (HPC) clusters, and a fault-tolerance and recovery service for workflows.

Features of the Trident Registry –

The Trident Registry consists of a series of modules and abstractions to provide flexibility to the scientists as to where they actually store their data.

1) Trident allows the user to dynamically select where to store data (results) output from a workflow, such as SQL Server, Amazon S3, SSDS, etc.

2) A data provider abstraction that allows actual data contents to be referenced to external entities, allowing scientists to host their data anywhere (external data stores, community databases or servers, etc).

3) Strong typing of objects referenced and stored by the Registry reduces runtime issues common to software development, leading to a more robust system.

4) Programming APIs that allow workflows being executed to record experiment results in the Registry in a consistent and organized way. Scientists or services can later navigate through this data to implement new functionality.

Scheduling Workflows in Trident -- Trident provides:

1) The ability to schedule workflows to run on any machine, or collection of machines, from a single and easy to use console (local or web based).

2) Ability to schedule entire workflows and individual activities on an HPC cluster.

3) Scheduling that takes workflow compute and data requirements into consideration, and is aware of resource utilization (CPU, databases, disks, I/O, memory) within a cluster to optimize scheduling and support job priorities.

4) Ability to pause, resume, stop and restart specific workflows and entire queues on specific machines.

5) Ability to recover from failures and take corrective actions when workflow execution does Not go as expected.

Provenance and Monitoring --

Trident adds a publication/subscription mechanism called the Blackboard that utilizes custom and built-in WF tracking services to provide extensible workflow monitoring and provenance support.

This model allows for both evolutionary and runtime provenance and enables:

1) Customizable logging for analysis and recovery.

2) Reporting and visualizations of intermediate data products from a running workflow.

3) Provenance record capture either locally or in the cloud.

4) Fault tolerance messaging and repair.

5) Workflow execution monitoring with resource usage analysis and intelligent completion estimates.

Web Services and Portal --

In addition to providing client application tools to facilitate scientific workflows, a library of web services are included that allows access to Trident’s key features, including access to repositories of workflows, ability to launch and monitor workflows remotely, and the integration with repositories and scientific networking sites outside of Trident.

While workflow execution must still be done in a .Net capable environment, these web services allow access to the features of Trident from any platform connected to the internet.

Trident includes a web portal written in Silverlight (an additional Microsoft product…) that allows scientists to launch and manage workflows from any internet location.

The portal works with a variety of browsers running on Windows, Mac OS, or Linux.

System Requirements

Web-based. The portal works with a variety of browsers running on Windows, Mac OS, or Linux.

Manufacturer

Manufacturer Web Site Microsoft Research Trident Workbench

Price Contact manufacturer.

G6G Abstract Number 20515

G6G Manufacturer Number 101785