Mecquenem, Ninon De and Bosse, Simon and Bountris, Vasilis and Mohammadi, Somayeh and Reinert, Knut and Leser, Ulf (2024) CuttleFlow: Infrastructure-Specific Workflow Adaption for Improved Reusability. In: 2024 IEEE 20th International Conference on e-Science (e-Science), 16-20 September 2024, Osaka, Japan.
Full text not available from this repository.
Official URL: https://doi.org/10.1109/e-Science62913.2024.106787...
Abstract
Scientific workflows have gained popularity for large-scale data analysis due to their potential to improve the reproducibility, scalability and documentation of complex multistep scientific analysis pipelines. However, their reusability is currently limited in practice, as a workflow is typically developed for a specific infrastructure. This is reflected in the choice of tools (e.g. less/more memory requirements), their configuration (e.g. number of threads) and the workflow topology (e.g. data parallel scatter/gather). Re-running such a workflow requires access to the same, or at least a highly similar, computing environment, effectively reducing its use by other groups. To address this challenge, we present CuttleFlow, a novel method for adapting and rewriting scientific workflows given a description of an infrastructure and its inputs. CuttleFlow starts from an abstract workflow description and compiles it into an infrastructure-specific logical workflow using three types of rewriting operations, namely tool replacement, tool reconfiguration, and data scattering/gathering for task parallelization. We implement a prototype based on NextFlow and evaluate it for two important bioinformatics data analysis problems, namely RNAseq and metagenomics, on a distributed infrastructure. We demonstrate the large impact that the rewriting of CuttleFlow can have on runtime, achieving a reduction in makespan of up to 71%. We also demonstrate a significant reduction in resource usage through our rewriting approach.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Subjects: | Mathematical and Computer Sciences > Computer Science |
Divisions: | Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group |
ID Code: | 3243 |
Deposited By: | Anja Kasseckert |
Deposited On: | 29 Jan 2025 15:26 |
Last Modified: | 29 Jan 2025 15:26 |
Repository Staff Only: item control page