Abstract
Original language | English |
---|---|
Title of host publication | Unknown Host Publication |
Publisher | IEEE |
Pages | 890-895 |
Number of pages | 6 |
ISBN (Print) | 978-1-5090-1611-2 |
DOIs | |
Publication status | Published - 19 Jan 2017 |
Event | The Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine: The 3rd Workshop on High Performance Computing on Bioinformatics (HPCB 2016) - Shenzhen, China Duration: 19 Jan 2017 → … |
Workshop
Workshop | The Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine: The 3rd Workshop on High Performance Computing on Bioinformatics (HPCB 2016) |
---|---|
Period | 19/01/17 → … |
Keywords
- high-throughput sequencing
- grid
- cloud
- personalised medicine
Fingerprint Dive into the research topics of 'The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing'. Together they form a unique fingerprint.
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing. / Lightbody, Gaye; Browne, Fiona; Zheng, Huiru; Haberland, Valeriia; Blayney, Jaine.
Unknown Host Publication. IEEE, 2017. p. 890-895.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
TY - GEN
T1 - The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing
AU - Lightbody, Gaye
AU - Browne, Fiona
AU - Zheng, Huiru
AU - Haberland, Valeriia
AU - Blayney, Jaine
N1 - Reference text: [1] M. Baker, “Next-generation sequencing: adjusting to data overload.,” Nat. Methods, vol. 7, no. 7, pp. 495–499, 2010. [2] N. A. Miller, E. G. Farrow, M. Gibson, et al., “A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases,” Genome Med., vol. 7, no. 1, p. 100, Sep. 2015. [3] M. L. Metzker, “Sequencing technologies — the next generation,” Nat. Rev. Genet., vol. 11, no. 1, pp. 31–46, 2009. [4] N. J. Loman, R. V Misra, T. J. Dallman, C. Constantinidou, S. E. Gharbia, J. Wain, and M. J. Pallen, “Performance comparison of benchtop high-throughput sequencing platforms.,” Nat. Biotechnol., vol. 30, no. 5, pp. 434–9, 2012. [5] E. L. Van Dijk, H. Lè Ne Auger, Y. Jaszczyszyn, and C. Thermes, “Ten years of next-generation sequencing technology,” Trends Genet., vol. 30, no. 9, pp. 418–426, 2014. [6] S. N. Naccache, S. Federman, N. Veeeraraghavan, M. Zaharia, et al., “A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples,” Genome Res., vol. 24, no. 7, pp. 1180–1192, 2014. [7] E. R. Mardis, “The impact of next-generation sequencing technology on genetics,” Trends Genet., vol. 24, no. 3, pp. 133–141, 2008. [8] J. W. Davey, P. A. Hohenlohe, P. D. Etter, et al., “Genome-wide genetic marker discovery and genotyping using next-generation sequencing,” Nat. Publ. Gr., vol. 12, no. 7, pp. 499–510, 2011. [9] L. Orlando, M. T. P. Gilbert, and E. Willerslev, “Reconstructing ancient genomes and epigenomes.,” Nat. Rev. Genet., vol. 16, no. 7, pp. 395–408, Jun. 2015. [10] J. L. Vassy, D. M. Lautenbach, H. M. McLaughlin, S. W. Kong, et al., “The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.,” Trials, vol. 15, p. 85, 2014. [11] J. E. Lai-Cheong and J. A. McGrath, “Next-generation diagnostics for inherited skin disorders.,” J. Invest. Dermatol., vol. 131, no. 10, pp. 1971–1973, 2011. [12] S. J. Sanders, M. T. Murtha, A. R. Gupta, J. D. Murdoch, et al., “De novo mutations revealed by whole-exome sequencing are strongly associated with autism,” Nature, vol. 485, no. 7397, pp. 237–241, 2012. [13] M. Choi, U. I. Scholl, W. Ji, T. Liu, I et al., “Genetic diagnosis by whole exome capture and massively parallel DNA sequencing.,” Proc. Natl. Acad. Sci. U. S. A., vol. 106, no. 45, pp. 19096–101, 2009. [14] J. Xuan, Y. Yu, T. Qing, L. Guo, and L. Shi, “Next-generation sequencing in the clinic: Promises and challenges,” Cancer Lett., vol. 340, no. 2, pp. 284–295, 2013. [15] G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, et al., “Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing,” Nat. Methods, vol. 4, no. 8, pp. 651–657, 2007. [16] J. Lu, G. Getz, E. A. Miska, E. A. Saavedra, et al., “MicroRNA expression profiles classify human cancers.,” Nature, vol. 435, no. June, pp. 834–838, 2005. [17] Cancer Genome Atlas Research Network, “The Molecular Taxonomy of Primary Prostate Cancer.,” Cell, vol. 163, no. 4, pp. 1011–25, 2015. [18] M. Quail, M. Smith, P. Coupland, T. D. Otto, et al., “A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.,” BMC Genomics, vol. 13, no. 1, p. 341, Jan. 2012. [19] A. J. Pinho and D. Pratas, “Mfcompress: A compression tool for fasta and multi-fasta data,” Bioinformatics, vol. 30, no. 1, pp. 117–118, 2014. [20] D. Qiao, W.-K. Yip, and C. Lange, “Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data,” BMC Bioinformatics, vol. 13, no. 1, p. 100, 2012. [21] B. C.L. and A. Nair, “Benchmark dataset for Whole Genome sequence compression,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. PP, no. c, pp. 1–10, 2016. [22] T. Ma and A. Zhang, “Omics Informatics: From Scattered Individual Software Tools to Integrated Workflow Management Systems.,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. PP, no. c, 2016. [23] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, et al., “The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.,” Genome Res., vol. 20, no. 9, pp. 1297–303, Sep. 2010. [24] “ScienceCloud, Dassault Systémes Biovia Corp.” [Online]. Available: https://www.sciencecloud.com/. [Accessed: 12-Sep-2016]. [25] “DNAnexus.” [Online]. Available: https://www.dnanexus.com/. [Accessed: 12-Sep-2016]. [26] S. V Angiuoli, M. Matalka, A. Gussman, K. Galens, et al., “CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing,” BMC Bioinformatics, vol. 12, no. 1, p. 356, 2011. [27] T. E. Anderson, D. E. Culler, and D. A. Patterson, “Case for NOW (Networks of Workstations),” IEEE Micro, vol. 15, no. 1, pp. 54–64, 1995. [28] A. Barak and O. La’adan, “The MOSIX multicomputer operating system for high performance cluster computing,” Futur. Gener. Comput. Syst., vol. 13, no. 4–5, pp. 361–372, 1998. [29] J. Blayney, V. Haberland, G. Lightbody, and F. Browne, “Biomarker Discovery , High Performance and Cloud Computing : A Comprehensive Review,” in IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 1514–1519. [30] “Welcome to ApacheTM Hadoop®!” [Online]. Available: http://hadoop.apache.org/. [Accessed: 12-Sep-2016]. [31] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Sixth Symp. Oper. Syst. Des. Implement., 2004, vol. 51, no. 1, pp. 107–113. [32] R. C. Taylor, “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.,” BMC Bioinformatics, vol. 11 Suppl 1, no. Suppl 12, p. S1, 2010. [33] A. Kawalia, S. Motameny, S. Wonczak, H. Thiele, et al., “Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow,” PLoS One, vol. 10, no. 5, p. e0126321, May 2015. [34] A. Schumacher, L. Pireddu, M. Niemenmaa, A. Kallio, E. Korpelainen, G. Zanetti, and K. Heljanko, “SeqPig: Simple and scalable scripting for large sequencing data sets in hadoop,” Bioinformatics, vol. 30, no. 1, pp. 119–120, 2014. [35] S. J. Sul and A. Tovchigrechko, “Parallelizing BLAST and SOM algorithms with MapReduce-MPI library,” IEEE Int. Symp. Parallel Distrib. Process. Work. Phd Forum, pp. 481–489, 2011. [36] B. Barney, “Message Passing Interface (MPI).” [Online]. Available: https://computing.llnl.gov/tutorials/mpi/. [Accessed: 12-Sep-2016]. [37] W. Y. Chen, Y. Song, H. Bai, C. J. Lin, and E. Y. Chang, “Parallel spectral clustering in distributed systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 3, pp. 568–586, 2011. [38] S. J. Plimpton and K. D. Devine, “MapReduce in MPI for Large-scale graph algorithms,” Parallel Comput., vol. 37, no. 9, pp. 610–632, 2011. [39] Apache, “Apache SparkTM - Lightning-Fast Cluster Computing.” [Online]. Available: http://spark.apache.org/. [Accessed: 05-Nov-2016]. [40] J. Melanakos, “Parallel Computing on a Personal Computer | Biomedical Computation Review,” Biomedical Computation Review, Jul-2008. [41] Zhe Fan, Feng Qiu, A. Kaufman, and S. Yoakum-Stover, “GPU Cluster for High Performance Computing,” in Proceedings of the ACM/IEEE SC2004 Conference, 2004, vol. 0, no. 1, pp. 47–47. [42] D. A. Carr, C. Paszko, and D. Kolva, “SeqNFind®: A GPU Accelerated Sequence Analysis Toolset Facilitates Bioinformatics,” Nat. methods, Appl. notes, pp. 1–4, 2011. [43] “CUDA GPUs | NVIDIA Developer.” [Online]. Available: https://developer.nvidia.com/cuda-gpus. [Accessed: 12-Sep-2016]. [44] Y. Liu, B. Schmidt, and D. L. Maskell, “Cushaw: A cuda compatible short read aligner to large genomes based on the Burrows-Wheeler transform,” Bioinformatics, vol. 28, no. 14, pp. 1830–1837, 2012. [45] P. Klus, S. Lam, D. Lyberg, M. Cheung, et al., “BarraCUDA - a fast short read sequence aligner using graphics processing units,” BMC Res. Notes, vol. 5, no. 1, p. 27, 2012. [46] C. M. Liu, T. Wong, E. Wu, R. Luo, S. et al., “SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads,” Bioinformatics, vol. 28, no. 6, pp. 878–879, 2012. [47] Y. Liu, A. Wirawan, and B. Schmidt, “CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions,” BMC Bioinformatics, vol. 14, no. 1, p. 117, 2013. [48] “R: The R Project for Statistical Computing.” [Online]. Available: https://www.r-project.org/. [Accessed: 12-Sep-2016]. [49] L. Tierney, “Simple Network of Workstations for R, Department of Statistics and Actuarial Science University of Iowa.” [Online]. Available: http://homepage.stat.uiowa.edu/~luke/R/cluster/cluster.html. [Accessed: 12-Sep-2016]. [50] I. D. Shterev, S.-H. Jung, S. L. George, and K. Owzar, “permGPU: Using graphics processing units in RNA microarray association studies.,” BMC Bioinformatics, vol. 11, p. 329, 2010. [51] J. Buckner, J. Wilson, M. Seligman, B. Athey, S. Watson, and F. Meng, “The gputools package enables GPU computing in R,” Bioinformatics, vol. 26, no. 1, pp. 134–135, 2009. [52] V. Starostenkov, “Hadoop + GPU: Boost performance of your big data project by 50x-200x? | Network World,” Network World, 2013. [Online]. Available: http://www.networkworld.com/article/2167576/tech-primers/hadoop---gpu--boost-performance-of-your-big-data-project-by-50x-200x-.html. [Accessed: 12-Sep-2016]. [53] “BaseSpace Hub NGS Data Analysis | Cloud and onsite bioinformatics analysis.” [Online]. Available: http://www.illumina.com/informatics/research/sequencing-data-analysis-management/basespace.html. [Accessed: 12-Sep-2016]. [54] “Elastic Compute Cloud (EC2) Cloud Server & Hosting.” [Online]. Available: https://aws.amazon.com/ec2/. [Accessed: 12-Sep-2016]. [55] “Google Genomics - Store, process, explore and share | Google Cloud Platform.” [Online]. Available: https://cloud.google.com/genomics/. [Accessed: 12-Sep-2016]. [56] Microsoft, “Microsoft Azure: Cloud Computing Platform and Services.” [Online]. Available: https://azure.microsoft.com/en-us/. [Accessed: 12-Sep-2016]. [57] T. Kwon, W. G. Yoo, W.-J. Lee, W. Kim, and D.-W. Kim, “Next-generation sequencing data analysis on cloud computing,” Genes Genomics, vol. 37, no. 6, pp. 489–501, Jun. 2015. [58] M. C. Schatz, “CloudBurst: Highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009. [59] B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, “Searching for SNPs with cloud computing.,” Genome Biol., vol. 10, no. 11, p. R134, 2009. [60] D. Field, B. Tiwari, T. Booth, S. Houten, D. Swan, N. Bertrand, and M. Thurston, “Open software for biologists: from famine to feast.,” Nat. Biotechnol., vol. 24, no. 7, pp. 801–803, 2006. [61] A. L. Mcguire, M. Basford, L. G. Dressler, A. L. Mcguire, et al. “Ethical and practical challenges of sharing data from genome-wide association studies : The eMERGE Consortium experience,” pp. 1001–1007, 2011. [62] J. G. Reid, A. Carroll, N. Veeraraghavan, M. Dahdouli, et al., “Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.,” BMC Bioinformatics, vol. 15, no. 1, p. 30, 2014. [63] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. Ullah Khan, “The rise of ‘big data’ on cloud computing: Review and open research issues,” Inf. Syst., vol. 47, pp. 98–115, 2015. [64] U.S. Government, “Health Insurance Portability and Accountability Act of 1996,” 1996. [Online]. Available: https://www.gpo.gov/fdsys/pkg/PLAW-104publ191/html/PLAW-104publ191.htm. [Accessed: 12-Sep-2016]. [65] “Methods for De-identification of PHI | HHS.gov.” [Online]. Available: http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html. [Accessed: 12-Sep-2016]. [66] European Commission, “Reform of EU data protection rules - European Commission.” [Online]. Available: http://ec.europa.eu/justice/data-protection/reform/index_en.htm. [Accessed: 05-Nov-2016]. [67] M. Schatz, B. Langmead, and S. Salzberg, “Cloud computing and the DNA data race,” Nat. Biotechnol., vol. 28, no. 7, pp. 691–693, 2010. [68] L. Bendekgey, “Cloud computing reduces HIPAA compliance risk in managing genomic data | Healthcare IT News,” Healthcare IT News, 2013. [Online]. Available: http://www.healthcareitnews.com/blog/cloud-computing-reduces-hipaa-compliance-risk-managing-genomic-data. [Accessed: 12-Sep-2016]. [69] “BC Platforms - Software platforms for next-generation sequencing.” [Online]. Available: http://bcplatforms.com/. [Accessed: 12-Sep-2016]. [70] “Big Compute: HPC and Batch | Microsoft Azure.” [Online]. Available: https://azure.microsoft.com/en-gb/solutions/big-compute/. [Accessed: 12-Sep-2016]. [71] A. Dupuy and R. M. Simon, “Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting M ethods,” J. Natl. Cancer Inst., vol. 99, no. 2, pp. 147–157, 2007. [72] R. Simon, “Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers,” J. Clin. Oncol., vol. 23, no. 29, pp. 7332–7341, 2005. [73] Y. Erlich, “A vision for ubiquitous sequencing,” Genome Res., vol. 25, no. 10, pp. 1411–1416, 2015.
PY - 2017/1/19
Y1 - 2017/1/19
N2 - We have reached the era of full genome sequencing using high throughput sequencing technologies pouring out gigabases of reads in a day. To fully benefit from such a profusion of data high performance tools and systems are needed to extract the information lying within the sequences. This paper provides an overview of the evolution of high-throughput sequencing and the tools, infrastructure and data management developing in this space to support a key area in personalized medicine. The paper concludes by providing an outlook in the future of such technologies and their applications and how they might shape clinical governance.
AB - We have reached the era of full genome sequencing using high throughput sequencing technologies pouring out gigabases of reads in a day. To fully benefit from such a profusion of data high performance tools and systems are needed to extract the information lying within the sequences. This paper provides an overview of the evolution of high-throughput sequencing and the tools, infrastructure and data management developing in this space to support a key area in personalized medicine. The paper concludes by providing an outlook in the future of such technologies and their applications and how they might shape clinical governance.
KW - high-throughput sequencing
KW - grid
KW - cloud
KW - personalised medicine
U2 - 10.1109/BIBM.2016.7822643
DO - 10.1109/BIBM.2016.7822643
M3 - Conference contribution
SN - 978-1-5090-1611-2
SP - 890
EP - 895
BT - Unknown Host Publication
PB - IEEE
T2 - The Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine: The 3rd Workshop on High Performance Computing on Bioinformatics (HPCB 2016)
Y2 - 19 January 2017
ER -