You, Xintian Arthur (2015) Hastings: An R pipeline for large-scale RNA-Seq data analysis. Technical Report . (Unpublished)
PDF
- Draft Version
Restricted to Registered users only 789kB |
Abstract
Motivation: With the advance of high-throughput sequencing technologies, large-scale datasets becomes increasingly common, from thousands of patients involved in a cohort study to thousands of single cells from the same tissue. Despite major advances in understanding the molecular mechanisms governing biological processes from development to diseases, heterogeneity from between two individual persons to two cells from the same tissue is yet to be scrutinized. Several studies have revealed new insights from such large-scale datasets [1, 2], yet their analysis protocols are not always straightforward to be adopted or extend. In particular, it is often more of an art to select various parameters along the whole analysis procedures, and visualization of high dimensional data could be very challenging. Results: Here, we present Hastings to face the demand of large-scale data analysis and visualization for RNA-Seq gene expression data. As demonstrated in the three examples, Hastings can eficiently identify sub-groups in an unsupervised manner, identify potential marker genes and generate clear 2D visualizations. Hastings could be widely applied from bench to clinics.
Item Type: | Article |
---|---|
Subjects: | Mathematical and Computer Sciences > Mathematics > Applied Mathematics Biological Sciences > Others in Biological Sciences > Applied Biological Sciences |
Divisions: | Department of Mathematics and Computer Science > Institute of Mathematics > Comp. Proteomics Group Department of Mathematics and Computer Science > Institute of Mathematics Department of Mathematics and Computer Science > Institute of Mathematics > BioComputing Group |
ID Code: | 1755 |
Deposited By: | Admin Administrator |
Deposited On: | 12 Nov 2015 10:12 |
Last Modified: | 03 Mar 2017 14:41 |
Repository Staff Only: item control page