The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing

Gaye Lightbody, Fiona Browne, Huiru Zheng, Valeriia Haberland, Jaine Blayney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We have reached the era of full genome sequencing using high throughput sequencing technologies pouring out gigabases of reads in a day. To fully benefit from such a profusion of data high performance tools and systems are needed to extract the information lying within the sequences. This paper provides an overview of the evolution of high-throughput sequencing and the tools, infrastructure and data management developing in this space to support a key area in personalized medicine. The paper concludes by providing an outlook in the future of such technologies and their applications and how they might shape clinical governance.

Workshop

WorkshopThe Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine: The 3rd Workshop on High Performance Computing on Bioinformatics (HPCB 2016)
Period19/01/17 → …

Fingerprint

Grid computing
Cloud computing
Throughput
Information management
Medicine
Genes

Keywords

  • high-throughput sequencing
  • grid
  • cloud
  • personalised medicine

Cite this

Lightbody, Gaye ; Browne, Fiona ; Zheng, Huiru ; Haberland, Valeriia ; Blayney, Jaine. / The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing. Unknown Host Publication. 2017. pp. 890-895
@inproceedings{38241f9ec60549a090db9806f9ee9f63,
title = "The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing",
abstract = "We have reached the era of full genome sequencing using high throughput sequencing technologies pouring out gigabases of reads in a day. To fully benefit from such a profusion of data high performance tools and systems are needed to extract the information lying within the sequences. This paper provides an overview of the evolution of high-throughput sequencing and the tools, infrastructure and data management developing in this space to support a key area in personalized medicine. The paper concludes by providing an outlook in the future of such technologies and their applications and how they might shape clinical governance.",
keywords = "high-throughput sequencing, grid, cloud, personalised medicine",
author = "Gaye Lightbody and Fiona Browne and Huiru Zheng and Valeriia Haberland and Jaine Blayney",
note = "Reference text: [1] M. Baker, “Next-generation sequencing: adjusting to data overload.,” Nat. Methods, vol. 7, no. 7, pp. 495–499, 2010. [2] N. A. Miller, E. G. Farrow, M. Gibson, et al., “A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases,” Genome Med., vol. 7, no. 1, p. 100, Sep. 2015. [3] M. L. Metzker, “Sequencing technologies — the next generation,” Nat. Rev. Genet., vol. 11, no. 1, pp. 31–46, 2009. [4] N. J. Loman, R. V Misra, T. J. Dallman, C. Constantinidou, S. E. Gharbia, J. Wain, and M. J. Pallen, “Performance comparison of benchtop high-throughput sequencing platforms.,” Nat. Biotechnol., vol. 30, no. 5, pp. 434–9, 2012. [5] E. L. Van Dijk, H. L{\`e} Ne Auger, Y. Jaszczyszyn, and C. Thermes, “Ten years of next-generation sequencing technology,” Trends Genet., vol. 30, no. 9, pp. 418–426, 2014. [6] S. N. Naccache, S. Federman, N. Veeeraraghavan, M. Zaharia, et al., “A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples,” Genome Res., vol. 24, no. 7, pp. 1180–1192, 2014. [7] E. R. Mardis, “The impact of next-generation sequencing technology on genetics,” Trends Genet., vol. 24, no. 3, pp. 133–141, 2008. [8] J. W. Davey, P. A. Hohenlohe, P. D. Etter, et al., “Genome-wide genetic marker discovery and genotyping using next-generation sequencing,” Nat. Publ. Gr., vol. 12, no. 7, pp. 499–510, 2011. [9] L. Orlando, M. T. P. Gilbert, and E. Willerslev, “Reconstructing ancient genomes and epigenomes.,” Nat. Rev. Genet., vol. 16, no. 7, pp. 395–408, Jun. 2015. [10] J. L. Vassy, D. M. Lautenbach, H. M. McLaughlin, S. W. Kong, et al., “The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.,” Trials, vol. 15, p. 85, 2014. [11] J. E. Lai-Cheong and J. A. McGrath, “Next-generation diagnostics for inherited skin disorders.,” J. Invest. Dermatol., vol. 131, no. 10, pp. 1971–1973, 2011. [12] S. J. Sanders, M. T. Murtha, A. R. Gupta, J. D. Murdoch, et al., “De novo mutations revealed by whole-exome sequencing are strongly associated with autism,” Nature, vol. 485, no. 7397, pp. 237–241, 2012. [13] M. Choi, U. I. Scholl, W. Ji, T. Liu, I et al., “Genetic diagnosis by whole exome capture and massively parallel DNA sequencing.,” Proc. Natl. Acad. Sci. U. S. A., vol. 106, no. 45, pp. 19096–101, 2009. [14] J. Xuan, Y. Yu, T. Qing, L. Guo, and L. Shi, “Next-generation sequencing in the clinic: Promises and challenges,” Cancer Lett., vol. 340, no. 2, pp. 284–295, 2013. [15] G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, et al., “Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing,” Nat. Methods, vol. 4, no. 8, pp. 651–657, 2007. [16] J. Lu, G. Getz, E. A. Miska, E. A. Saavedra, et al., “MicroRNA expression profiles classify human cancers.,” Nature, vol. 435, no. June, pp. 834–838, 2005. [17] Cancer Genome Atlas Research Network, “The Molecular Taxonomy of Primary Prostate Cancer.,” Cell, vol. 163, no. 4, pp. 1011–25, 2015. [18] M. Quail, M. Smith, P. Coupland, T. D. Otto, et al., “A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.,” BMC Genomics, vol. 13, no. 1, p. 341, Jan. 2012. [19] A. J. Pinho and D. Pratas, “Mfcompress: A compression tool for fasta and multi-fasta data,” Bioinformatics, vol. 30, no. 1, pp. 117–118, 2014. [20] D. Qiao, W.-K. Yip, and C. Lange, “Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data,” BMC Bioinformatics, vol. 13, no. 1, p. 100, 2012. [21] B. C.L. and A. Nair, “Benchmark dataset for Whole Genome sequence compression,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. PP, no. c, pp. 1–10, 2016. [22] T. Ma and A. Zhang, “Omics Informatics: From Scattered Individual Software Tools to Integrated Workflow Management Systems.,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. PP, no. c, 2016. [23] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, et al., “The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.,” Genome Res., vol. 20, no. 9, pp. 1297–303, Sep. 2010. [24] “ScienceCloud, Dassault Syst{\'e}mes Biovia Corp.” [Online]. Available: https://www.sciencecloud.com/. [Accessed: 12-Sep-2016]. [25] “DNAnexus.” [Online]. Available: https://www.dnanexus.com/. [Accessed: 12-Sep-2016]. [26] S. V Angiuoli, M. Matalka, A. Gussman, K. Galens, et al., “CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing,” BMC Bioinformatics, vol. 12, no. 1, p. 356, 2011. [27] T. E. Anderson, D. E. Culler, and D. A. Patterson, “Case for NOW (Networks of Workstations),” IEEE Micro, vol. 15, no. 1, pp. 54–64, 1995. [28] A. Barak and O. La’adan, “The MOSIX multicomputer operating system for high performance cluster computing,” Futur. Gener. Comput. Syst., vol. 13, no. 4–5, pp. 361–372, 1998. [29] J. Blayney, V. Haberland, G. Lightbody, and F. Browne, “Biomarker Discovery , High Performance and Cloud Computing : A Comprehensive Review,” in IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 1514–1519. [30] “Welcome to ApacheTM Hadoop{\circledR}!” [Online]. Available: http://hadoop.apache.org/. [Accessed: 12-Sep-2016]. [31] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Sixth Symp. Oper. Syst. Des. Implement., 2004, vol. 51, no. 1, pp. 107–113. [32] R. C. Taylor, “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.,” BMC Bioinformatics, vol. 11 Suppl 1, no. Suppl 12, p. S1, 2010. [33] A. Kawalia, S. Motameny, S. Wonczak, H. Thiele, et al., “Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow,” PLoS One, vol. 10, no. 5, p. e0126321, May 2015. [34] A. Schumacher, L. Pireddu, M. Niemenmaa, A. Kallio, E. Korpelainen, G. Zanetti, and K. Heljanko, “SeqPig: Simple and scalable scripting for large sequencing data sets in hadoop,” Bioinformatics, vol. 30, no. 1, pp. 119–120, 2014. [35] S. J. Sul and A. Tovchigrechko, “Parallelizing BLAST and SOM algorithms with MapReduce-MPI library,” IEEE Int. Symp. Parallel Distrib. Process. Work. Phd Forum, pp. 481–489, 2011. [36] B. Barney, “Message Passing Interface (MPI).” [Online]. Available: https://computing.llnl.gov/tutorials/mpi/. [Accessed: 12-Sep-2016]. [37] W. Y. Chen, Y. Song, H. Bai, C. J. Lin, and E. Y. Chang, “Parallel spectral clustering in distributed systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 3, pp. 568–586, 2011. [38] S. J. Plimpton and K. D. Devine, “MapReduce in MPI for Large-scale graph algorithms,” Parallel Comput., vol. 37, no. 9, pp. 610–632, 2011. [39] Apache, “Apache SparkTM - Lightning-Fast Cluster Computing.” [Online]. Available: http://spark.apache.org/. [Accessed: 05-Nov-2016]. [40] J. Melanakos, “Parallel Computing on a Personal Computer | Biomedical Computation Review,” Biomedical Computation Review, Jul-2008. [41] Zhe Fan, Feng Qiu, A. Kaufman, and S. Yoakum-Stover, “GPU Cluster for High Performance Computing,” in Proceedings of the ACM/IEEE SC2004 Conference, 2004, vol. 0, no. 1, pp. 47–47. [42] D. A. Carr, C. Paszko, and D. Kolva, “SeqNFind{\circledR}: A GPU Accelerated Sequence Analysis Toolset Facilitates Bioinformatics,” Nat. methods, Appl. notes, pp. 1–4, 2011. [43] “CUDA GPUs | NVIDIA Developer.” [Online]. Available: https://developer.nvidia.com/cuda-gpus. [Accessed: 12-Sep-2016]. [44] Y. Liu, B. Schmidt, and D. L. Maskell, “Cushaw: A cuda compatible short read aligner to large genomes based on the Burrows-Wheeler transform,” Bioinformatics, vol. 28, no. 14, pp. 1830–1837, 2012. [45] P. Klus, S. Lam, D. Lyberg, M. Cheung, et al., “BarraCUDA - a fast short read sequence aligner using graphics processing units,” BMC Res. Notes, vol. 5, no. 1, p. 27, 2012. [46] C. M. Liu, T. Wong, E. Wu, R. Luo, S. et al., “SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads,” Bioinformatics, vol. 28, no. 6, pp. 878–879, 2012. [47] Y. Liu, A. Wirawan, and B. Schmidt, “CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions,” BMC Bioinformatics, vol. 14, no. 1, p. 117, 2013. [48] “R: The R Project for Statistical Computing.” [Online]. Available: https://www.r-project.org/. [Accessed: 12-Sep-2016]. [49] L. Tierney, “Simple Network of Workstations for R, Department of Statistics and Actuarial Science University of Iowa.” [Online]. Available: http://homepage.stat.uiowa.edu/~luke/R/cluster/cluster.html. [Accessed: 12-Sep-2016]. [50] I. D. Shterev, S.-H. Jung, S. L. George, and K. Owzar, “permGPU: Using graphics processing units in RNA microarray association studies.,” BMC Bioinformatics, vol. 11, p. 329, 2010. [51] J. Buckner, J. Wilson, M. Seligman, B. Athey, S. Watson, and F. Meng, “The gputools package enables GPU computing in R,” Bioinformatics, vol. 26, no. 1, pp. 134–135, 2009. [52] V. Starostenkov, “Hadoop + GPU: Boost performance of your big data project by 50x-200x? | Network World,” Network World, 2013. [Online]. Available: http://www.networkworld.com/article/2167576/tech-primers/hadoop---gpu--boost-performance-of-your-big-data-project-by-50x-200x-.html. [Accessed: 12-Sep-2016]. [53] “BaseSpace Hub NGS Data Analysis | Cloud and onsite bioinformatics analysis.” [Online]. Available: http://www.illumina.com/informatics/research/sequencing-data-analysis-management/basespace.html. [Accessed: 12-Sep-2016]. [54] “Elastic Compute Cloud (EC2) Cloud Server & Hosting.” [Online]. Available: https://aws.amazon.com/ec2/. [Accessed: 12-Sep-2016]. [55] “Google Genomics - Store, process, explore and share  |  Google Cloud Platform.” [Online]. Available: https://cloud.google.com/genomics/. [Accessed: 12-Sep-2016]. [56] Microsoft, “Microsoft Azure: Cloud Computing Platform and Services.” [Online]. Available: https://azure.microsoft.com/en-us/. [Accessed: 12-Sep-2016]. [57] T. Kwon, W. G. Yoo, W.-J. Lee, W. Kim, and D.-W. Kim, “Next-generation sequencing data analysis on cloud computing,” Genes Genomics, vol. 37, no. 6, pp. 489–501, Jun. 2015. [58] M. C. Schatz, “CloudBurst: Highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009. [59] B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, “Searching for SNPs with cloud computing.,” Genome Biol., vol. 10, no. 11, p. R134, 2009. [60] D. Field, B. Tiwari, T. Booth, S. Houten, D. Swan, N. Bertrand, and M. Thurston, “Open software for biologists: from famine to feast.,” Nat. Biotechnol., vol. 24, no. 7, pp. 801–803, 2006. [61] A. L. Mcguire, M. Basford, L. G. Dressler, A. L. Mcguire, et al. “Ethical and practical challenges of sharing data from genome-wide association studies : The eMERGE Consortium experience,” pp. 1001–1007, 2011. [62] J. G. Reid, A. Carroll, N. Veeraraghavan, M. Dahdouli, et al., “Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.,” BMC Bioinformatics, vol. 15, no. 1, p. 30, 2014. [63] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. Ullah Khan, “The rise of ‘big data’ on cloud computing: Review and open research issues,” Inf. Syst., vol. 47, pp. 98–115, 2015. [64] U.S. Government, “Health Insurance Portability and Accountability Act of 1996,” 1996. [Online]. Available: https://www.gpo.gov/fdsys/pkg/PLAW-104publ191/html/PLAW-104publ191.htm. [Accessed: 12-Sep-2016]. [65] “Methods for De-identification of PHI | HHS.gov.” [Online]. Available: http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html. [Accessed: 12-Sep-2016]. [66] European Commission, “Reform of EU data protection rules - European Commission.” [Online]. Available: http://ec.europa.eu/justice/data-protection/reform/index_en.htm. [Accessed: 05-Nov-2016]. [67] M. Schatz, B. Langmead, and S. Salzberg, “Cloud computing and the DNA data race,” Nat. Biotechnol., vol. 28, no. 7, pp. 691–693, 2010. [68] L. Bendekgey, “Cloud computing reduces HIPAA compliance risk in managing genomic data | Healthcare IT News,” Healthcare IT News, 2013. [Online]. Available: http://www.healthcareitnews.com/blog/cloud-computing-reduces-hipaa-compliance-risk-managing-genomic-data. [Accessed: 12-Sep-2016]. [69] “BC Platforms - Software platforms for next-generation sequencing.” [Online]. Available: http://bcplatforms.com/. [Accessed: 12-Sep-2016]. [70] “Big Compute: HPC and Batch | Microsoft Azure.” [Online]. Available: https://azure.microsoft.com/en-gb/solutions/big-compute/. [Accessed: 12-Sep-2016]. [71] A. Dupuy and R. M. Simon, “Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting M ethods,” J. Natl. Cancer Inst., vol. 99, no. 2, pp. 147–157, 2007. [72] R. Simon, “Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers,” J. Clin. Oncol., vol. 23, no. 29, pp. 7332–7341, 2005. [73] Y. Erlich, “A vision for ubiquitous sequencing,” Genome Res., vol. 25, no. 10, pp. 1411–1416, 2015.",
year = "2017",
month = "1",
day = "19",
doi = "10.1109/BIBM.2016.7822643",
language = "English",
isbn = "978-1-5090-1611-2",
pages = "890--895",
booktitle = "Unknown Host Publication",

}

Lightbody, G, Browne, F, Zheng, H, Haberland, V & Blayney, J 2017, The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing. in Unknown Host Publication. pp. 890-895, The Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine: The 3rd Workshop on High Performance Computing on Bioinformatics (HPCB 2016), 19/01/17. https://doi.org/10.1109/BIBM.2016.7822643

The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing. / Lightbody, Gaye; Browne, Fiona; Zheng, Huiru; Haberland, Valeriia; Blayney, Jaine.

Unknown Host Publication. 2017. p. 890-895.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - The Role of High Performance, Grid and Cloud Computing in High-Throughput Sequencing

AU - Lightbody, Gaye

AU - Browne, Fiona

AU - Zheng, Huiru

AU - Haberland, Valeriia

AU - Blayney, Jaine

N1 - Reference text: [1] M. Baker, “Next-generation sequencing: adjusting to data overload.,” Nat. Methods, vol. 7, no. 7, pp. 495–499, 2010. [2] N. A. Miller, E. G. Farrow, M. Gibson, et al., “A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases,” Genome Med., vol. 7, no. 1, p. 100, Sep. 2015. [3] M. L. Metzker, “Sequencing technologies — the next generation,” Nat. Rev. Genet., vol. 11, no. 1, pp. 31–46, 2009. [4] N. J. Loman, R. V Misra, T. J. Dallman, C. Constantinidou, S. E. Gharbia, J. Wain, and M. J. Pallen, “Performance comparison of benchtop high-throughput sequencing platforms.,” Nat. Biotechnol., vol. 30, no. 5, pp. 434–9, 2012. [5] E. L. Van Dijk, H. Lè Ne Auger, Y. Jaszczyszyn, and C. Thermes, “Ten years of next-generation sequencing technology,” Trends Genet., vol. 30, no. 9, pp. 418–426, 2014. [6] S. N. Naccache, S. Federman, N. Veeeraraghavan, M. Zaharia, et al., “A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples,” Genome Res., vol. 24, no. 7, pp. 1180–1192, 2014. [7] E. R. Mardis, “The impact of next-generation sequencing technology on genetics,” Trends Genet., vol. 24, no. 3, pp. 133–141, 2008. [8] J. W. Davey, P. A. Hohenlohe, P. D. Etter, et al., “Genome-wide genetic marker discovery and genotyping using next-generation sequencing,” Nat. Publ. Gr., vol. 12, no. 7, pp. 499–510, 2011. [9] L. Orlando, M. T. P. Gilbert, and E. Willerslev, “Reconstructing ancient genomes and epigenomes.,” Nat. Rev. Genet., vol. 16, no. 7, pp. 395–408, Jun. 2015. [10] J. L. Vassy, D. M. Lautenbach, H. M. McLaughlin, S. W. Kong, et al., “The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.,” Trials, vol. 15, p. 85, 2014. [11] J. E. Lai-Cheong and J. A. McGrath, “Next-generation diagnostics for inherited skin disorders.,” J. Invest. Dermatol., vol. 131, no. 10, pp. 1971–1973, 2011. [12] S. J. Sanders, M. T. Murtha, A. R. Gupta, J. D. Murdoch, et al., “De novo mutations revealed by whole-exome sequencing are strongly associated with autism,” Nature, vol. 485, no. 7397, pp. 237–241, 2012. [13] M. Choi, U. I. Scholl, W. Ji, T. Liu, I et al., “Genetic diagnosis by whole exome capture and massively parallel DNA sequencing.,” Proc. Natl. Acad. Sci. U. S. A., vol. 106, no. 45, pp. 19096–101, 2009. [14] J. Xuan, Y. Yu, T. Qing, L. Guo, and L. Shi, “Next-generation sequencing in the clinic: Promises and challenges,” Cancer Lett., vol. 340, no. 2, pp. 284–295, 2013. [15] G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, et al., “Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing,” Nat. Methods, vol. 4, no. 8, pp. 651–657, 2007. [16] J. Lu, G. Getz, E. A. Miska, E. A. Saavedra, et al., “MicroRNA expression profiles classify human cancers.,” Nature, vol. 435, no. June, pp. 834–838, 2005. [17] Cancer Genome Atlas Research Network, “The Molecular Taxonomy of Primary Prostate Cancer.,” Cell, vol. 163, no. 4, pp. 1011–25, 2015. [18] M. Quail, M. Smith, P. Coupland, T. D. Otto, et al., “A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.,” BMC Genomics, vol. 13, no. 1, p. 341, Jan. 2012. [19] A. J. Pinho and D. Pratas, “Mfcompress: A compression tool for fasta and multi-fasta data,” Bioinformatics, vol. 30, no. 1, pp. 117–118, 2014. [20] D. Qiao, W.-K. Yip, and C. Lange, “Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data,” BMC Bioinformatics, vol. 13, no. 1, p. 100, 2012. [21] B. C.L. and A. Nair, “Benchmark dataset for Whole Genome sequence compression,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. PP, no. c, pp. 1–10, 2016. [22] T. Ma and A. Zhang, “Omics Informatics: From Scattered Individual Software Tools to Integrated Workflow Management Systems.,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. PP, no. c, 2016. [23] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, et al., “The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.,” Genome Res., vol. 20, no. 9, pp. 1297–303, Sep. 2010. [24] “ScienceCloud, Dassault Systémes Biovia Corp.” [Online]. Available: https://www.sciencecloud.com/. [Accessed: 12-Sep-2016]. [25] “DNAnexus.” [Online]. Available: https://www.dnanexus.com/. [Accessed: 12-Sep-2016]. [26] S. V Angiuoli, M. Matalka, A. Gussman, K. Galens, et al., “CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing,” BMC Bioinformatics, vol. 12, no. 1, p. 356, 2011. [27] T. E. Anderson, D. E. Culler, and D. A. Patterson, “Case for NOW (Networks of Workstations),” IEEE Micro, vol. 15, no. 1, pp. 54–64, 1995. [28] A. Barak and O. La’adan, “The MOSIX multicomputer operating system for high performance cluster computing,” Futur. Gener. Comput. Syst., vol. 13, no. 4–5, pp. 361–372, 1998. [29] J. Blayney, V. Haberland, G. Lightbody, and F. Browne, “Biomarker Discovery , High Performance and Cloud Computing : A Comprehensive Review,” in IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 1514–1519. [30] “Welcome to ApacheTM Hadoop®!” [Online]. Available: http://hadoop.apache.org/. [Accessed: 12-Sep-2016]. [31] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Sixth Symp. Oper. Syst. Des. Implement., 2004, vol. 51, no. 1, pp. 107–113. [32] R. C. Taylor, “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.,” BMC Bioinformatics, vol. 11 Suppl 1, no. Suppl 12, p. S1, 2010. [33] A. Kawalia, S. Motameny, S. Wonczak, H. Thiele, et al., “Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow,” PLoS One, vol. 10, no. 5, p. e0126321, May 2015. [34] A. Schumacher, L. Pireddu, M. Niemenmaa, A. Kallio, E. Korpelainen, G. Zanetti, and K. Heljanko, “SeqPig: Simple and scalable scripting for large sequencing data sets in hadoop,” Bioinformatics, vol. 30, no. 1, pp. 119–120, 2014. [35] S. J. Sul and A. Tovchigrechko, “Parallelizing BLAST and SOM algorithms with MapReduce-MPI library,” IEEE Int. Symp. Parallel Distrib. Process. Work. Phd Forum, pp. 481–489, 2011. [36] B. Barney, “Message Passing Interface (MPI).” [Online]. Available: https://computing.llnl.gov/tutorials/mpi/. [Accessed: 12-Sep-2016]. [37] W. Y. Chen, Y. Song, H. Bai, C. J. Lin, and E. Y. Chang, “Parallel spectral clustering in distributed systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 3, pp. 568–586, 2011. [38] S. J. Plimpton and K. D. Devine, “MapReduce in MPI for Large-scale graph algorithms,” Parallel Comput., vol. 37, no. 9, pp. 610–632, 2011. [39] Apache, “Apache SparkTM - Lightning-Fast Cluster Computing.” [Online]. Available: http://spark.apache.org/. [Accessed: 05-Nov-2016]. [40] J. Melanakos, “Parallel Computing on a Personal Computer | Biomedical Computation Review,” Biomedical Computation Review, Jul-2008. [41] Zhe Fan, Feng Qiu, A. Kaufman, and S. Yoakum-Stover, “GPU Cluster for High Performance Computing,” in Proceedings of the ACM/IEEE SC2004 Conference, 2004, vol. 0, no. 1, pp. 47–47. [42] D. A. Carr, C. Paszko, and D. Kolva, “SeqNFind®: A GPU Accelerated Sequence Analysis Toolset Facilitates Bioinformatics,” Nat. methods, Appl. notes, pp. 1–4, 2011. [43] “CUDA GPUs | NVIDIA Developer.” [Online]. Available: https://developer.nvidia.com/cuda-gpus. [Accessed: 12-Sep-2016]. [44] Y. Liu, B. Schmidt, and D. L. Maskell, “Cushaw: A cuda compatible short read aligner to large genomes based on the Burrows-Wheeler transform,” Bioinformatics, vol. 28, no. 14, pp. 1830–1837, 2012. [45] P. Klus, S. Lam, D. Lyberg, M. Cheung, et al., “BarraCUDA - a fast short read sequence aligner using graphics processing units,” BMC Res. Notes, vol. 5, no. 1, p. 27, 2012. [46] C. M. Liu, T. Wong, E. Wu, R. Luo, S. et al., “SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads,” Bioinformatics, vol. 28, no. 6, pp. 878–879, 2012. [47] Y. Liu, A. Wirawan, and B. Schmidt, “CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions,” BMC Bioinformatics, vol. 14, no. 1, p. 117, 2013. [48] “R: The R Project for Statistical Computing.” [Online]. Available: https://www.r-project.org/. [Accessed: 12-Sep-2016]. [49] L. Tierney, “Simple Network of Workstations for R, Department of Statistics and Actuarial Science University of Iowa.” [Online]. Available: http://homepage.stat.uiowa.edu/~luke/R/cluster/cluster.html. [Accessed: 12-Sep-2016]. [50] I. D. Shterev, S.-H. Jung, S. L. George, and K. Owzar, “permGPU: Using graphics processing units in RNA microarray association studies.,” BMC Bioinformatics, vol. 11, p. 329, 2010. [51] J. Buckner, J. Wilson, M. Seligman, B. Athey, S. Watson, and F. Meng, “The gputools package enables GPU computing in R,” Bioinformatics, vol. 26, no. 1, pp. 134–135, 2009. [52] V. Starostenkov, “Hadoop + GPU: Boost performance of your big data project by 50x-200x? | Network World,” Network World, 2013. [Online]. Available: http://www.networkworld.com/article/2167576/tech-primers/hadoop---gpu--boost-performance-of-your-big-data-project-by-50x-200x-.html. [Accessed: 12-Sep-2016]. [53] “BaseSpace Hub NGS Data Analysis | Cloud and onsite bioinformatics analysis.” [Online]. Available: http://www.illumina.com/informatics/research/sequencing-data-analysis-management/basespace.html. [Accessed: 12-Sep-2016]. [54] “Elastic Compute Cloud (EC2) Cloud Server & Hosting.” [Online]. Available: https://aws.amazon.com/ec2/. [Accessed: 12-Sep-2016]. [55] “Google Genomics - Store, process, explore and share  |  Google Cloud Platform.” [Online]. Available: https://cloud.google.com/genomics/. [Accessed: 12-Sep-2016]. [56] Microsoft, “Microsoft Azure: Cloud Computing Platform and Services.” [Online]. Available: https://azure.microsoft.com/en-us/. [Accessed: 12-Sep-2016]. [57] T. Kwon, W. G. Yoo, W.-J. Lee, W. Kim, and D.-W. Kim, “Next-generation sequencing data analysis on cloud computing,” Genes Genomics, vol. 37, no. 6, pp. 489–501, Jun. 2015. [58] M. C. Schatz, “CloudBurst: Highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009. [59] B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, “Searching for SNPs with cloud computing.,” Genome Biol., vol. 10, no. 11, p. R134, 2009. [60] D. Field, B. Tiwari, T. Booth, S. Houten, D. Swan, N. Bertrand, and M. Thurston, “Open software for biologists: from famine to feast.,” Nat. Biotechnol., vol. 24, no. 7, pp. 801–803, 2006. [61] A. L. Mcguire, M. Basford, L. G. Dressler, A. L. Mcguire, et al. “Ethical and practical challenges of sharing data from genome-wide association studies : The eMERGE Consortium experience,” pp. 1001–1007, 2011. [62] J. G. Reid, A. Carroll, N. Veeraraghavan, M. Dahdouli, et al., “Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.,” BMC Bioinformatics, vol. 15, no. 1, p. 30, 2014. [63] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. Ullah Khan, “The rise of ‘big data’ on cloud computing: Review and open research issues,” Inf. Syst., vol. 47, pp. 98–115, 2015. [64] U.S. Government, “Health Insurance Portability and Accountability Act of 1996,” 1996. [Online]. Available: https://www.gpo.gov/fdsys/pkg/PLAW-104publ191/html/PLAW-104publ191.htm. [Accessed: 12-Sep-2016]. [65] “Methods for De-identification of PHI | HHS.gov.” [Online]. Available: http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html. [Accessed: 12-Sep-2016]. [66] European Commission, “Reform of EU data protection rules - European Commission.” [Online]. Available: http://ec.europa.eu/justice/data-protection/reform/index_en.htm. [Accessed: 05-Nov-2016]. [67] M. Schatz, B. Langmead, and S. Salzberg, “Cloud computing and the DNA data race,” Nat. Biotechnol., vol. 28, no. 7, pp. 691–693, 2010. [68] L. Bendekgey, “Cloud computing reduces HIPAA compliance risk in managing genomic data | Healthcare IT News,” Healthcare IT News, 2013. [Online]. Available: http://www.healthcareitnews.com/blog/cloud-computing-reduces-hipaa-compliance-risk-managing-genomic-data. [Accessed: 12-Sep-2016]. [69] “BC Platforms - Software platforms for next-generation sequencing.” [Online]. Available: http://bcplatforms.com/. [Accessed: 12-Sep-2016]. [70] “Big Compute: HPC and Batch | Microsoft Azure.” [Online]. Available: https://azure.microsoft.com/en-gb/solutions/big-compute/. [Accessed: 12-Sep-2016]. [71] A. Dupuy and R. M. Simon, “Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting M ethods,” J. Natl. Cancer Inst., vol. 99, no. 2, pp. 147–157, 2007. [72] R. Simon, “Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers,” J. Clin. Oncol., vol. 23, no. 29, pp. 7332–7341, 2005. [73] Y. Erlich, “A vision for ubiquitous sequencing,” Genome Res., vol. 25, no. 10, pp. 1411–1416, 2015.

PY - 2017/1/19

Y1 - 2017/1/19

N2 - We have reached the era of full genome sequencing using high throughput sequencing technologies pouring out gigabases of reads in a day. To fully benefit from such a profusion of data high performance tools and systems are needed to extract the information lying within the sequences. This paper provides an overview of the evolution of high-throughput sequencing and the tools, infrastructure and data management developing in this space to support a key area in personalized medicine. The paper concludes by providing an outlook in the future of such technologies and their applications and how they might shape clinical governance.

AB - We have reached the era of full genome sequencing using high throughput sequencing technologies pouring out gigabases of reads in a day. To fully benefit from such a profusion of data high performance tools and systems are needed to extract the information lying within the sequences. This paper provides an overview of the evolution of high-throughput sequencing and the tools, infrastructure and data management developing in this space to support a key area in personalized medicine. The paper concludes by providing an outlook in the future of such technologies and their applications and how they might shape clinical governance.

KW - high-throughput sequencing

KW - grid

KW - cloud

KW - personalised medicine

U2 - 10.1109/BIBM.2016.7822643

DO - 10.1109/BIBM.2016.7822643

M3 - Conference contribution

SN - 978-1-5090-1611-2

SP - 890

EP - 895

BT - Unknown Host Publication

ER -