Nature等三篇论文发布SMRT测序技术新应用成果
生物通 · 2015/11/18
Pacific Biosciences公司近日风光无限,发布了新系统,带动股价大涨。同时,它的单分子实时(SMRT)测序技术也助力了多个植物和动物基因组的研究。这些成果近期发表在多个期刊上,展现了SMRT测序的独特魅力。


Pacific Biosciences公司近日风光无限,发布了新系统,带动股价大涨。同时,它的单分子实时(SMRT)测序技术也助力了多个植物和动物基因组的研究。这些成果近期发表在多个期刊上,展现了SMRT测序的独特魅力。

最新一期的《Nature》杂志发表了Oropetium thomaeum近乎完整的基因组草图。Oropetium thomaeum是一种耐旱草类,在遭受极端干旱后,获得水分时可再度生长。Donald Danforth植物科学中心的研究人员及合作者利用PacBio RS II测序系统,以72倍的覆盖度分析了其245 Mb的基因组。

据介绍,整个测序时间不到1周,试剂成本低于1万美元。所获得的组装结果覆盖了基因组的99%,包含端粒和着丝粒序列、长末端重复反转录转座子、串联重复基因,以及其他难以接近的基因组元件。准确性超过99.999%。这一植物是通过PacBio的“世界上最有趣的基因组”资助项目测序的,以帮助科学家们确定Oropetium thomaeum极端耐旱背后的生物学机制。

作者指出,组装后序列的连续性使其不同于短读长测序仪产生的基因组草图。“大多数基于NGS的基因组都包含数万个短的contig,分布在数千个scaffold中。”它们缺乏具有生物学意义的序列,包括调控区域,转座子等。相反,SMRT测序带来了近乎完整的Oropetium基因组。

在另一篇发表于《美国国家科学院院刊》(PNAS)上的文章中,研究人员利用SMRT测序研究了黑腹果蝇Y染色体中一段之前难以对付的区域。作者发现了一个新的基因FDY,它来自另一条不同的染色体。这个基因所在的区域有55 kb,含有假基因、转座子和高度重复序列。“PacBio产生了几乎无差错的FDY区域组装,这是我们多年辛勤工作未能实现的,”他们写道。

在《Genome Biology》中,另一篇文章报道了Iso-Seq测序方法。这种方法产生了全长的转录本,可用于基因模型的预测和验证,以及基因组注释。以甜菜为例,研究人员验证了2000多个现有的基因,并鉴定出665个新颖的基因结构。基于Iso-Seq数据的基因模型预测使得甜菜的基因组注释有了17%的改善。

“这些文章进一步增添了证据,说明利用SMRT测序可产生高质量、完整的基因组,”PacBio的首席科学家Jonas Korlach谈道。“这些基因组作为动植物研究界的宝贵资源,正在迅速推进我们对复杂生物的重要机制的了解。”

所有文章仅代表作者观点,不代表本站立场。如若转载请联系原作者。
查看更多
  • Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

    Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE)2. Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.

    展开 收起
  • Birth of a new gene on the Y chromosome of Drosophila melanogaster

    Contrary to the pattern seen in mammalian sex chromosomes, where most Y-linked genes have X-linked homologs, the Drosophila X and Y chromosomes appear to be unrelated. Most of the Y-linked genes have autosomal paralogs, so autosome-to-Y transposition must be the main source of Drosophila Y-linked genes. Here we show how these genes were acquired. We found a previously unidentified gene (flagrante delicto Y, FDY) that originated from a recent duplication of the autosomal gene vig2 to the Y chromosome of Drosophila melanogaster. Four contiguous genes were duplicated along with vig2, but they became pseudogenes through the accumulation of deletions and transposable element insertions, whereas FDY remained functional, acquired testis-specific expression, and now accounts for ∼20% of the vig2-like mRNA in testis. FDY is absent in the closest relatives of D. melanogaster, and DNA sequence divergence indicates that the duplication to the Y chromosome occurred ∼2 million years ago. Thus, FDY provides a snapshot of the early stages of the establishment of a Y-linked gene and demonstrates how the Drosophila Y has been accumulating autosomal genes.

    展开 收起
  • Exploiting single-molecule transcript sequencing for eukaryotic gene prediction

    We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes.

    展开 收起
发表评论 我在frontend\modules\comment\widgets\views\文件夹下面 test