Global profiling of 6mA sites at single-nucleotide resolution in the genome of Arabidopsis thaliana using PacBio sequencing Nextomics program

In April 2018, Nextomics joins hands with research group of Xiaofeng Gu, biotechnology research institute, Chinese academy of agricultural sciences, and research group of Hao Yu, department of biological sciences and Temasek life sciences laboratory, National University of Singapore, to publish a paper titled as “DNA N6-Adenine Methylation in Arabidopsis thaliana in the journal of Development Cell. This research reported that global profiling of 6mA sites at single-nucleotide resolution in the genome of A. thaliana at different developmental stages and ecotypes using single-molecule real-time sequencing.

This is the first research article of distributing eukaryote methylation mapped by PacBio SMRT.

Highlights

  • DNA methylation on N6-adenine (6mA) widely occurs in the Arabidopsisgenome;
  • 6mA is more enriched on gene bodies than intergenic regions;
  • 6mA is a dynamic DNA modification during Arabidopsisdevelopment;
  • 6mA is associated with actively expressed genes in

Results

6mA widely occurs in the Arabidopsis genome

Dot blot analysis of gDNA extracted from 3- to 21-day-old Col wild-type plants revealed a gradual increase in the 6mA signal intensity from vegetative to reproductive stage (Fig.1).

Fig. 1 6mA Occurs in Arabidopsis Genomic DNA

Genome-Wide Mapping of 6mA in Arabidopsis sequenced by PacBio SMRT

Compare the dynamic and distribution of 6mA in D9 and D21, and combined with the transcriptome data researchers revealed the function of 6mA during the development of Arabidopsis (Fig.2).

Fig.2 Circos plots of 6mA DNA methylation profiles of 9- and 21-day-old Col wild-type Arabidopsis.

6mA is more enriched on gene bodies than intergenic regions

6mA distribution in Col genomic regions divided into gene bodies, promoters, 5’intergenic, 3’intergenic, and other intergenic regions. The results showed that 32% of 6mA sites were located within gene bodies, and protein-coding genes contained more than half of the 6mA sites in all methylated genes (Fig. 3).

Fig. 3 Distribution Pattern of 6mA Methylation in Genomic DNA of 9-Day-Old Col Plants

6mA is a dynamic DNA modification during Arabidopsis development

Genome-wide mapping of 6mA in gDNA from 21-day-old Col wild-type plants at the reproductive stage by SMRT sequencing identified 184,633 6mA sites, among which 9,909 sites were overlapped with those in 9-day-old plants (Fig. 4), which implied that 6mA DNA methylation positively correlates with the transition from vegetative to reproductive growth.

Fig.4 Venn diagram comparison of the numbers of 6mA sites in D9 and D21.

6mA associated with actively expressed genes in Arabidopsis

Analysis of 6mA methylomes and RNA sequencing data demonstrates that 6mA frequency positively correlates with the gene expression level and the transition from vegetative to reproductive growth in Arabidopsis (Fig. 5).

Fig. 5 Correlation between 6mA and Gene Expression in Col Plants.

This research uncover 6mA as a DNA mark associated with actively expressed genes in Arabidopsis, suggesting that 6mA serves as a hitherto unknown epigenetic mark in land plants. Nextomics took part in the program and provided technical support.

Reference

Liang et al., DNA N6-Adenine Methylation in Arabidopsis thaliana, Developmental Cell (2018), https://doi.org/10.1016/j.devcel.2018.03.012

Other eukaryote methylation published papers

[1]Greer, E.L. et al. DNA methylation on N6-adenine in C. elegans. Cell 161, 868–878 (2015).

[2]Wu, T.P. et al. DNA methylation on N(6)-adenine in mammalian embryonic stem cells. Nature 532, 329–333 (2016).

[3]Mondo, S.J. et al. Widespread adenine N6-methylation of active genes in fungi. Nature Genetics (2017).

Genome sequence of the progenitor of wheat A subgenome Triticum urartu

Nature                 Published: 09 May 2018

Abstract:Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing3, linked reads and optical mapping4,5. We assembled seven chromosomescale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.

Read the original article: https://www.nature.com/articles/s41586-018-0108-0

2.2Mb! Nanopore sequencing set a new record for the length of single continuous sequence!

The first >2Mb continuous DNA read sequence has been reported by Oxford Nanopore sequencing technology, which is regarded as another giant leap in the development of Nanopore sequencing. The result was published on BioRxiv by a team at the University of Nottingham, led by Alex Payne, etc.[1]

The output data of NanoporeMinION sequencing is in fast5 format, which can be further converted into fastq format through the base-calling procedure. Previously, the MinKNOW was the conventional choice for MinION base-calling. But Alexander Payne found that the MinKNOW may interrupt long reads by error, while they eliminated this “bug”, the length of reads can achieve over 2Mb.

Why long reads?

Traditional short-read DNA sequencing technologies (also known as the Next-Generation Sequencing, NGS) may provide data about the small fragments of genomic DNA sequences, and it is a great challenge for researchers to assemble these large number of small pieces into a complete genome or dataset.

Nanopore sequencing technology has significant advantages in sequencing read length, including greatly improve the continuity of genome assembly and overcome the problem that caused by complex repeat sequences or structural variation which beyond the capabilities of short-read sequencing. The article recently published on Nature Biotechnology about human Y chromosome centromere sequences achieved by Nanopore sequencing have emphasized the abilities of Nanopore sequencing in solving complex repetitive regions. In addition, the identification of complex tandem rearrangement in the nematode genome and structural variations in the drosophila genome are all outstanding applications of Nanopore long reading sequencing. (For more details, see the “extended reading” at the end of this article)

As the first third-generation sequencing company with the Nanopore sequencing platform in China, NextOmics Biosciences have obtained a great amount of excellent data. Here we present you some results.

Case one: Assembly of an insect’s genome by Nanopore sequencing

The size of the insect genome was estimated to be ~ 330Mb based on k-mer analysis

Fig. 1 K-mer analysis

Qualified sample DNA was extracted, and 30Gb of third-generation data were sequenced on the Oxford Nanopore GridlON X5 platform, with a maximum reading length of 270Kb, and reads N50 length was 26.8kb. Long read length is a prerequisite for more accurate genome assembly.

Fig. 2 Distribution of read length

Genome assembly applying a variety of software and select the optimal scheme. The ultra-long read Nanopore sequencing matching super-computing platform enables genome assembly to be more continuous and faster. In this case, Contig N50 can be >7mb, which has reached the high-quality assembly level as insect model animal fruit flies.

Table1 The assembly results

The assembled genome was compared with the insect genome database by BUSCO to assess the integrity of the assembly of conservative genes, and the entire genome was reflected by indirect measurement f. The results show that after Nanopolish+Pilon (* 2) correction, BUSCO evaluation can reach ~ 98% and genome assembly integrity is good.

Table2 BUSCO assessment

Case 2: Ultra-long sequencing data from an animal

Nanopore ultra-long sequencing can achieve ultra-long reading length. Based on its unique transposase library, DNA sequencing library containing ultra-long fragments, and the ultra-long DNA sequence can be obtained by Nanopore sequencing. Ultra-long sequences will greatly facilitate the de novo assembly of genome and the identification of complex structural variations (SVs) of chromosomes.

Fig. 3 Process of Ultra-long library construction

NextOmics Bioscience conducted ultra-long library construction and sequencing on the blood of a mammal based on Nanopore sequencing platform. The reads N50 of multiple libraries were longer than 70kb, and the longest read length was more than 1Mb.

Fig. 4 Library reads N50 (partial)   

   Fig. 5 Distribution of read-length in a library

[1]Payne, A., Holmes, N., Rakyan, V. & Loose, M. Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files., doi:10.1101/312256 (2018).

Long-read sequencing technology generate a heated discussion in The Jackson Laboratory

As a large number of highly impact articles which based on the long-read sequencing technology were published, the world’s top medical genomics Laboratory The Jackson Laboratory for Genomic Medicine held a long-read sequencing workshop on April 23 to 25, 2018. Hundreds of experts and scholars in the field of biology, bioinformatics, medical genetics from across the globe had gathered in Farmington CT to discuss about the technology and molecular biology driving each sequencing platform, including those from Pacific Biosciences, 10X Genomics, and Oxford Nanopore. Prof. Kai Wang, chief scientist of NextOmics Bioscience (China, Wuhan), gave an excellent presentation about the detection of structural variations (SVs) in human genome via different long-read sequencing platforms and win a high praise.

This academic symposium mainly focused on the latest technologies such as PacBio, Oxford Nanopore, 10X Genomics, Bionano and Hi-C etc. The chief scientist of NextOmics Bioscience (China, Wuhan), Professor Kai Wang, was invited to give a talk on the topic “Long-Read Sequencing Meets the Human Genomics”. He clarified the characteristics and advantages of applying Nanopore, PacBio and Bionano technologies to detect human genome SVs. Then he introduced a new algorithm which broke through the present technical bottlenecks of microsatellite sequences repeat unit identification based on PacBio Sequencing, RepeatHMM, developed by his team. In addition, he also declared a detection technology of facial shoulder brachial muscular dystrophy (FSHD) based on Bionano SaphyrTM platform perfectly solves the existing problems of FSHD diagnosis. It is worth mentioning that Prof. Wang’s idea of focusing on long-read sequencing techniques to study SVs coincides with Michael Schatz, a renowned scientist in computer science and biology at Johns Hopkins university. And Dr. Yijun Ruan, the director of genomic science department belongs to The Jackson Laboratory, talk about the Hi-C technology, he focused on the applications of Hi-C in analysis of genomic SVs, interactions in transcriptome, and the modification of genome etc. He provided a new insight about genome research by the perspective of three-dimensional level. Beyond that, many experts and scholars had introduced new bioinformatics algorithms to participants, such as associate professor Winston Timp, the department of biomedical engineering at Johns Hopkins university, showed their latest tools based on Nanopore for characterizing the genome and epigenome.

The Jackson Laboratory

The Jackson Laboratory is an independent, nonprofit biomedical research institution dedicated to the discovery of precise genomic solutions for human disease.

For more details: https://www.jax.org/

Nextomics Biosciences

Nextomics Biosciences, founded in 2011 in Biolake, Wuhan, is a worldwide third generation sequencing (TGS) leader, which has branches in Beijing Life Science Park and Philadelphia, the US. It owns the largest TGS center in China & the US, and 8Tb TGS data could be yielded monthly and more than 20 TGS sequencers are now in operation.

As the first provider of third-generation sequencing service in China, Nextomics has successfully developed bioinformatics analysis pipelines based on PacBio SMRT sequencing technology since 2013 and launched Sequel genomic center since 2016. Significantly, Nextomics has become the first in China and one of the first in the world certified Nanopore service providers since 2017. Nanopore sequencing can provide extremely long read lengths to make genome assembly more accurate and simpler through MinION, GridION, and PromethION, and especially, it can offer the real-time direct RNA sequence. Nextomics has depth of TGS experience and has completed over 4,00 hundreds of genome projects ranging from de novo genome assembly to full-length transcriptome analysis and metagenomics, which is always focusing on meeting the demands of customers and updating the technologies of TGS.

NextOmics has established a comprehensive and cutting-edge omics research center by using a variety of technologies, such as the Oxford Nanopore, optical mapping (BioNano), PacBio sequencing and High-throughput chromosome conformation capture (Hi-C). In 2018, NextOmics will also leverage Nanopore Technology in areas such as Ultra Long Reads, direct RNA sequencing and DNA/RNA modifications. We sincerely welcome worldwide customers and collaborators to work with NextOmics and enjoy the unique advantages of long read sequencing. Let’s embrace the long-read sequencing era!

Piercing the dark matter: bioinformatics of long- range sequencing and mapping

Nature Reviews                 29 March 2018

AbstractSeveral new genomics technologies have become available that offer long- read sequencing or long- range mapping with higher throughput and higher resolution analysis than ever before. These long- range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.

Read the original article: https://www.nature.com/articles/s41576-018-0003-4

Linear assembly of a human centromere on the Y chromosome

Nature Biotechnology     Published: 19 March 2018

Abstract:The human genome reference sequence remains incomplete owing to the challenge of assembling long tracts of nearidentical tandem repeats in centromeres. We implemented a nanopore sequencing strategy to generate high-quality reads that span hundreds of kilobases of highly repetitive DNA in a human Y chromosome centromere. Combining these data with short-read variant validation, we assembled and characterized the centromeric region of a human Y chromosome.

Read the original article: https://www.nature.com/articles/nbt.4109

PacBio also Updated at last! Both Software and Reagent, for Higher Throughput and Longer Reads!

Pacific Biosciences of California, Inc. formally announced a new version of Sequel® Software (V5.1) and a new polymerase on Mar. 7, 2018. Combined, these enhancements increase throughput and the overall performance of Single Molecule, Real-Time (SMRT®) Sequencing for key applications such as de novo assembly, structural variant detection, targeted sequencing, and RNA sequencing (Iso-Seq® method), making genomic research more economical.

With this release the Sequel System can achieve up to 10 Gb per SMRT Cell for genomic libraries, effectively doubling the throughput when using ultra-long inserts (>40 kb) for de novo genome assembly. For targeted and RNA sequencing, customers can achieve up to 20 Gb per SMRT Cell.

For human whole genome sequencing (WGS) studies, the new improvements support sensitive detection of structural variants with as little as 5- to 10-fold coverage per individual. As a result, customers can now complete low-cost WGS studies in thousands of individuals using fewer SMRT Cells.

For long amplicons (>3 kb), the new polymerase increases the number of high-quality sequences per SMRT Cell, reducing costs for HLA sequencing and other targeted applications. Further, software enhancements for multiplexed samples simplify the analytical workflow.

Appendix: Based on PacBio SMRT technology, NectOmics has assembled hundreds of genomes and authored mature process for full-length transcriptome, some published papers as follow:

Classic NextOmics genome-assembling projects

Classic NextOmics Full-length Transcriptome projects based on the Third Generation Sequencing Technology

High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell

Nature Communications      Published: 07 February 2018

AbstractThe handheld Oxford Nanopore MinION sequencer generates ultra-long reads with minimal cost and time requirements, which makes sequencing genomes at the bench feasible. Here, we sequence the gold standard Arabidopsis thaliana genome (KBS-Mac-74 accession) on the bench with the MinION sequencer, and assemble the genome using typical consumer computing hardware (4 Cores, 16 Gb RAM) into chromosome arms (62 contigs with an N50 length of 12.3 Mb). We validate the contiguity and quality of the assembly with two independent single-molecule technologies, Bionano optical genome maps and Pacific Biosciences Sequel sequencing. The new A. thaliana KBS-Mac-74 genome enables resolution of a quantitative trait locus that had previously been recalcitrant to a Sanger-based BAC sequencing approach. In summary, we demonstrate that even when the purpose is to understand complex structural variation at a single region of the genome, complete genome assembly is becoming the simplest way to achieve this goal.

Read the original article: https://www.nature.com/articles/s41467-018-03016-2

Nanopore sequencing and assembly of a human genome with ultra-long reads

Nature Biotechnology    Published: 29 January 2018

Abstract:We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ~30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ~3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ~6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.

Read the original articlehttps://www.nature.com/articles/nbt.4060

The axolotl genome and the evolution of key tissue formation regulators

Nature          Published: 24 January 2018

Abstract:Salamanders serve as important tetrapod models for developmental, regeneration and evolutionary studies. An extensive molecular toolkit makes the Mexican axolotl (Ambystoma mexicanum) a key representative salamander for molecular investigations. Here we report the sequencing and assembly of the 32-gigabase-pair axolotl genome using an approach that combined long-read sequencing, optical mapping and development of a new genome assembler (MARVEL). We observed a size expansion of introns and intergenic regions, largely attributable to multiplication of long terminal repeat retroelements. We provide evidence that intron size in developmental genes is under constraint and that species-restricted genes may contribute to limb regeneration. The axolotl genome assembly does not contain the essential developmental gene Pax3. However, mutation of the axolotl Pax3 paralogue Pax7 resulted in an axolotl phenotype that was similar to those seen in Pax3/ and Pax7/ mutant mice. The axolotl genome provides a rich biological resource for developmental and evolutionary studies.

Read the original article: https://www.nature.com/articles/nature25458/