In a groundbreaking study published in Scientific Reports, researchers conducted a comprehensive comparison of the DNBSEQ platform and Illumina HiSeq 2000 for bacterial genome assembly. The study revealed that the DNBSEQ platform, specifically the BGISEQ-500 sequencer, demonstrated the lowest sequencing error rates among short-read technologies, making it a potential substitute for Illumina platforms in fulfilling the growing demand for cultivated bacterial genome sequencing.
The comparison was performed using sequencing data from both the BGISEQ-500 and Illumina HiSeq 2000, focusing on various aspects of genome assembly, such as genome quality assessment, genome alignment, functional annotation, mutation detection, and metagenome mapping. To account for potential contamination in sequencing and insert size bias in the DNB technology, the researchers simulated sequencing reads and analyzed the impact of sequence contamination and insert size on genome assembly.
The study included 76 bacterial strains, encompassing 64 unique species from the Culturable Genome Reference version two (CGR2) project. These strains were sequenced using both the BGISEQ-500 and Illumina HiSeq 2000, resulting in 152 shotgun sequencing datasets. Through genome assembly and taxonomic annotation, the strains were classified into five phyla, 34 genera, and 64 species, covering the main phyla of the human gut microbiota.
The genome assemblies from both platforms exhibited high quality, with completeness exceeding 93% and contamination below 5%. Statistical analyses showed that the completeness of genome assemblies from the BGISEQ-500 was significantly higher than that from the HiSeq 2000. Additionally, the length of genomes based on data from the two platforms was remarkably consistent.
To further evaluate the genome assemblies, several metrics were employed, including Principal Coordinates Analysis (PCoA) and comparisons of 16S rRNA gene sequences. The PCoA results indicated that assemblies from the same strain were closely grouped together, irrespective of the sequencing platform. Moreover, the paired genome assemblies exhibited high similarity in terms of 16S rDNA sequences.
Various parameters used in establishing clusters of species at the genome level, such as average amino acid identity (AAI), average nucleotide identity (ANI), tetra-nucleotide signature (Tetra) correlation, and Mash distance, were calculated to compare the differences between pairwise genome assemblies from the two platforms. The comparisons supported the notion that the genome assemblies from both platforms were extremely close and did not differ significantly.
Furthermore, comparisons were made between the platforms in terms of Single Nucleotide Variants (SNVs), insertions, and deletions. The numbers of SNVs called by different programs were relatively similar for both platforms, while more insertions and deletions were detected in the genome assemblies from each platform, respectively.
To evaluate the impact of the sequencing platform on metagenomic reads mapping, the distribution of genome assemblies from the BGISEQ-500 and HiSeq 2000 in a Chinese healthy cohort was analyzed. The results showed no significant difference between the assemblies from the two platforms, indicating that the choice of platform does not significantly affect sequence mapping in metagenomic data analysis.
In conclusion, the study demonstrates that the DNBSEQ platform, particularly the BGISEQ-500 sequencer, holds great promise as a substitute for Illumina platforms in bacterial genome assembly. The comparisons conducted in this study emphasize the remarkable similarities between the genome assemblies obtained from both platforms, validating the potential of the DNBSEQ platform in fulfilling the increasing demands for cultivated bacterial genome sequencing.
The findings of this study pave the way for future advancements in bacterial genome sequencing, providing researchers with a robust alternative platform that offers low sequencing error rates and high-quality assemblies. As the demand for bacterial genome sequencing continues to rise, the DNBSEQ platform has the potential to revolutionize the field and facilitate groundbreaking discoveries in microbiology and human health.