Compared to other short-read length sequencing technologies
- Long read length; longest reads can reach 2Mb, which can easily span most repetitive sequence regions or heterozygous regions.
- There is no PCR process during library construction and sequencing, no GC preference, and more complete and uniform genome coverage.
- Can easily detect larger insertions, provide its sequence information, and identify its sequence type to see if it is caused by the insertion of complex repetitive sequences such as transposons.
- Can detect other more complex structural variations, and the identification accuracy rate is higher.
Compared to other long-read sequencing technologies
- Longer read length, higher detection sensitivity.
Other long-read sequencing platforms require multiple cells to achieve the amount of data required for SV calling, and thus cannot meet the requirements for simultaneous sequencing of multiple animal and plant genome samples. On the other hand, ONT PromethlON48 (P48) has obvious cycle advantages in population genome research. P48 can parallelly run up to 48 samples at the same time, with single cell output reaching nearly 30× the data volume requirements of most species, greatly reducing the time required for population sequencing.
- Higher sequencing throughput, saving research cycle
The average read N50 length of ONTP48 is 25-40 kb, while the read N50 length of the Ultra-long reads library method can reach more than 100 kb. Thus, ONTP48 has a higher detection capability in repeated regions and can cover more complete SVs, enhancing the detection sensitivity of large SVs (>10kb) and rare SVs.
Structural Variation Detection
Based on comparison to the reference genome sequence, all potential SV mutations in the whole genome of each sample are detected by structural variation detection and analysis software.
Figure 1 SV chromosome distribution map
Enrichment analysis of important SV-related genes
SVs that occur in gene exons and in upstream or downstream regulatory regions are more likely to have a greater impact on genes. Following shows the enrichment analysis of SV-related genes occurring in exon, upstream, or downstream regions in the annotation results.
Figure 2 GO analysis statistics. The x-axis is the Rich Factor, which represents the percentage of genes on the enrichment to the annotated genes; the y-axis represents the entries on the enrichment; the size of the dots represents the number of genes on the enrichment, and the color represents the Qvalue (the lower the QValue, the more significant the results).
Population structure is an intuitive manifestation of differences between groups. The greater the difference between groups, the more obvious the structure displayed.
Figure 3 Integration diagram of group structure.
Each vertical subgraph is the grouping result of a specific K; the horizontal category includes different samples, each corresponding to each K one-to-one. The color label represents population information under each K, but the colors between K and K do not have any relationship, that is, the yellow in K=4 and the yellow in K=5 do not represent the same group and are entirely independent.
SV identification of differences between groups
Fst (Fixation index, fixed index) is a commonly used index to measure the degree of population differentiation caused by differences in genetic structure. The calculation of Fst requires at least two groups; the larger the Fst value (1≥Fst≥0), the greater the differentiation between groups.
Figure 4 Fst and highlighted key genes.
In the figure, each color represents the results of a chromosome, and each dot represents the Fst result value of an SV; the position of the black dots is the Fst value corresponding to the highlighted SV, and the gene next to it is the gene annotated to the SV.