Publication Date



Technical Report: UTEP-CS-09-24


Capturing provenance about artifacts produced by distributed scientific processes is a challenging task. For example, one approach to facilitate the execution of a scientific process in distributed environments is to break down the process into components and to create workflow specifications to orchestrate the execution of these components. However, capturing provenance in such an environment, even with the guidance of orchestration logic, is difficult because of important details that may be hidden by the component abstractions. In this paper, we show how to use abstract workflows to systematically enhance scientific processes to capture provenance at appropriate levels of detail. Abstract workflows lack the specification of an orchestration logic to execute a scientific process, and instead, are intended to document scientific processes as understood by scientists. Hence, abstract workflows can be specifically designed to capture the details of scientific processes that are relevant to the scientist with respect to provenance. In addition, abstract workflows are coupled with a representation of provenance that can accommodate distributed provenance-generating source code. We also show how the approach described in this paper has been used for capturing provenance for scientific processes in the Earth science, environmental science and solar physics domains.