Repository: Freie Universität Berlin, Math Department

CuttleFlow: Infrastructure-Specific Workflow Adaption for Improved Reusability

Mecquenem, Ninon De and Bosse, Simon and Bountris, Vasilis and Mohammadi, Somayeh and Reinert, Knut and Leser, Ulf (2024) CuttleFlow: Infrastructure-Specific Workflow Adaption for Improved Reusability. In: 2024 IEEE 20th International Conference on e-Science (e-Science), 16-20 September 2024, Osaka, Japan.

Full text not available from this repository.

Official URL: https://doi.org/10.1109/e-Science62913.2024.106787...

Abstract

Scientific workflows have gained popularity for large-scale data analysis due to their potential to improve the reproducibility, scalability and documentation of complex multistep scientific analysis pipelines. However, their reusability is currently limited in practice, as a workflow is typically developed for a specific infrastructure. This is reflected in the choice of tools (e.g. less/more memory requirements), their configuration (e.g. number of threads) and the workflow topology (e.g. data parallel scatter/gather). Re-running such a workflow requires access to the same, or at least a highly similar, computing environment, effectively reducing its use by other groups. To address this challenge, we present CuttleFlow, a novel method for adapting and rewriting scientific workflows given a description of an infrastructure and its inputs. CuttleFlow starts from an abstract workflow description and compiles it into an infrastructure-specific logical workflow using three types of rewriting operations, namely tool replacement, tool reconfiguration, and data scattering/gathering for task parallelization. We implement a prototype based on NextFlow and evaluate it for two important bioinformatics data analysis problems, namely RNAseq and metagenomics, on a distributed infrastructure. We demonstrate the large impact that the rewriting of CuttleFlow can have on runtime, achieving a reduction in makespan of up to 71%. We also demonstrate a significant reduction in resource usage through our rewriting approach.

Item Type:Conference or Workshop Item (Paper)
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:3243
Deposited By: Anja Kasseckert
Deposited On:29 Jan 2025 15:26
Last Modified:29 Jan 2025 15:26

Repository Staff Only: item control page