The first >2Mb continuous DNA read sequence has been reported by Oxford Nanopore sequencing technology, which is regarded as another giant leap in the development of Nanopore sequencing. The result was published on BioRxiv by a team at the University of Nottingham, led by Alex Payne, etc.
The output data of NanoporeMinION sequencing is in fast5 format, which can be further converted into fastq format through the base-calling procedure. Previously, the MinKNOW was the conventional choice for MinION base-calling. But Alexander Payne found that the MinKNOW may interrupt long reads by error, while they eliminated this “bug”, the length of reads can achieve over 2Mb.
Why long reads?
Traditional short-read DNA sequencing technologies (also known as the Next-Generation Sequencing, NGS) may provide data about the small fragments of genomic DNA sequences, and it is a great challenge for researchers to assemble these large number of small pieces into a complete genome or dataset.
Nanopore sequencing technology has significant advantages in sequencing read length, including greatly improve the continuity of genome assembly and overcome the problem that caused by complex repeat sequences or structural variation which beyond the capabilities of short-read sequencing. The article recently published on Nature Biotechnology about human Y chromosome centromere sequences achieved by Nanopore sequencing have emphasized the abilities of Nanopore sequencing in solving complex repetitive regions. In addition, the identification of complex tandem rearrangement in the nematode genome and structural variations in the drosophila genome are all outstanding applications of Nanopore long reading sequencing. (For more details, see the “extended reading” at the end of this article)
As the first third-generation sequencing company with the Nanopore sequencing platform in China, NextOmics Biosciences have obtained a great amount of excellent data. Here we present you some results.
Case one: Assembly of an insect’s genome by Nanopore sequencing
The size of the insect genome was estimated to be ~ 330Mb based on k-mer analysis
Fig. 1 K-mer analysis
Qualified sample DNA was extracted, and 30Gb of third-generation data were sequenced on the Oxford Nanopore GridlON X5 platform, with a maximum reading length of 270Kb, and reads N50 length was 26.8kb. Long read length is a prerequisite for more accurate genome assembly.
Fig. 2 Distribution of read length
Genome assembly applying a variety of software and select the optimal scheme. The ultra-long read Nanopore sequencing matching super-computing platform enables genome assembly to be more continuous and faster. In this case, Contig N50 can be >7mb, which has reached the high-quality assembly level as insect model animal fruit flies.
Table1 The assembly results
The assembled genome was compared with the insect genome database by BUSCO to assess the integrity of the assembly of conservative genes, and the entire genome was reflected by indirect measurement f. The results show that after Nanopolish+Pilon (* 2) correction, BUSCO evaluation can reach ~ 98% and genome assembly integrity is good.
Table2 BUSCO assessment
Case 2: Ultra-long sequencing data from an animal
Nanopore ultra-long sequencing can achieve ultra-long reading length. Based on its unique transposase library, DNA sequencing library containing ultra-long fragments, and the ultra-long DNA sequence can be obtained by Nanopore sequencing. Ultra-long sequences will greatly facilitate the de novo assembly of genome and the identification of complex structural variations (SVs) of chromosomes.
Fig. 3 Process of Ultra-long library construction
NextOmics Bioscience conducted ultra-long library construction and sequencing on the blood of a mammal based on Nanopore sequencing platform. The reads N50 of multiple libraries were longer than 70kb, and the longest read length was more than 1Mb.
Fig. 4 Library reads N50 (partial)
Fig. 5 Distribution of read-length in a library
Payne, A., Holmes, N., Rakyan, V. & Loose, M. Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files., doi:10.1101/312256 (2018).