[go: up one dir, main page]

CN117940582A - Double-end sequencing method and composition - Google Patents

Double-end sequencing method and composition Download PDF

Info

Publication number
CN117940582A
CN117940582A CN202280059251.XA CN202280059251A CN117940582A CN 117940582 A CN117940582 A CN 117940582A CN 202280059251 A CN202280059251 A CN 202280059251A CN 117940582 A CN117940582 A CN 117940582A
Authority
CN
China
Prior art keywords
sequencing
primer
strand
nucleic acid
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280059251.XA
Other languages
Chinese (zh)
Inventor
J·哈尼斯
R·H·瑞美
王琳
F·布洛克
K·帕特森
B·A·罗曼
A·斯帕克斯
E·霍罗维兹
J·威尔森
M-J·R·沈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pacific Biosciences of California Inc
Original Assignee
Pacific Biosciences of California Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pacific Biosciences of California Inc filed Critical Pacific Biosciences of California Inc
Priority claimed from PCT/US2022/036374 external-priority patent/WO2023283347A1/en
Publication of CN117940582A publication Critical patent/CN117940582A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides methods and compositions for performing nucleic acid sequencing, particularly double-ended sequencing. The method uses a multiplex sequencing template that can be generated by rolling circle amplification of an asymmetric circular nucleic acid having a central double stranded region comprising target nucleic acid sequences joined at each end to form a circular construct.

Description

双端测序方法及组合物Double-end sequencing method and composition

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请是非临时实用新型专利申请,其要求下列在先临时专利申请的优先权和利益:由Jeremiah Hanes等人于2021年9月20日提交的标题为“双端测序方法及组合物(PAIRED-END SEQUENCING METHODS AND COMPOSITIONS)”的USSN 63/246,188以及由Jeremiah Hanes等人于2021年7月8日提交的标题为“双端测序方法及组合物(PAIRED-ENDSEQUENCING METHODS AND COMPOSITIONS)”的USSN 63/219,738。这些申请中的每一个均出于所有目的通过引用整体并入本文。This application is a non-provisional utility patent application, which claims priority and benefit to the following prior provisional patent applications: USSN 63/246,188, entitled “PAIRED-END SEQUENCING METHODS AND COMPOSITIONS” filed by Jeremiah Hanes et al. on September 20, 2021, and USSN 63/219,738, entitled “PAIRED-END SEQUENCING METHODS AND COMPOSITIONS” filed by Jeremiah Hanes et al. on July 8, 2021. Each of these applications is incorporated herein by reference in its entirety for all purposes.

关于在联邦资助的研究和开发下所作发明的权利的声明STATEMENT AS TO RIGHTS TO INVENTS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

不适用。not applicable.

背景技术Background technique

核酸测序技术的发展已经在许多领域取得了数不胜数的进步。快速且可靠地测定DNA和RNA分子序列的能力使得分子生物学、进化生物学、医学诊断学和分子医学以及许多其它领域取得了许多进展。The development of nucleic acid sequencing technology has made countless advances in many fields. The ability to quickly and reliably determine the sequence of DNA and RNA molecules has enabled many advances in molecular biology, evolutionary biology, medical diagnostics and molecular medicine, as well as many other fields.

然而,在使用某些边合成边测序或通过结合技术进行的测序时,能够可靠地获得的序列数据量可能限于相对少量的碱基。虽然短的序列读段在如SNP分析和基因分型的应用中可能非常有用,但在许多情况下,能够可靠地获得相同模板分子的进一步序列数据可能是有利的。为此,已经使用了双端或成对测序技术,例如,特别是在全基因组鸟枪法测序的背景下。双端测序可以允许从单个多核苷酸双链体的两个位置确定序列的两个“读段(read)”。已知双端序列出现在单个双链体上并因此在基因组中连接或配对的认知可以极大地帮助将整个基因组序列组装成共有序列。从双端测序获得的另外的信息还能够有益于其它应用,例如,涉及对无细胞DNA进行测序的应用,如循环肿瘤DNA的检测和产前无细胞DNA筛查。However, when using certain sequencing while synthesis or sequencing by combining techniques, the amount of sequence data that can be reliably obtained may be limited to a relatively small amount of bases. Although short sequence reads may be very useful in applications such as SNP analysis and genotyping, in many cases, it may be advantageous to reliably obtain further sequence data of the same template molecule. For this reason, double-ended or paired sequencing techniques have been used, for example, particularly in the context of whole genome shotgun sequencing. Double-ended sequencing can allow two "reads" of sequences to be determined from two positions of a single polynucleotide duplex. The knowledge that known double-ended sequences appear on a single duplex and are therefore connected or paired in the genome can greatly help to assemble the entire genome sequence into a consensus sequence. The additional information obtained from double-ended sequencing can also be beneficial to other applications, for example, relating to applications in which cell-free DNA is sequenced, such as the detection of circulating tumor DNA and prenatal cell-free DNA screening.

本文提供用于进行双端测序的新颖且有用的组合物和方法。这些组合物和方法为多种测序方法,例如,逐步测序方法提供优势。Provided herein are novel and useful compositions and methods for performing double-end sequencing. These compositions and methods provide advantages for a variety of sequencing methods, e.g., stepwise sequencing methods.

发明内容Summary of the invention

一般类别的实施例提供用于双端测序的方法。在方法中,提供核酸多联体,其包括多重顺序拷贝的:第一衔接子区域、靶核酸序列的正向链、不同于第一衔接子区域的第二衔接子区域以及与正向链互补的靶核酸序列的反向链。进行测序过程,以通过将第一测序引物与第一衔接子区域杂交并且通过从第一测序引物测序获得靶核酸序列的第一部分的第一读段,以及将第二测序引物与第二衔接子区域杂交并且通过从第二测序引物测序获得靶核酸序列的第二部分的第二读段,产生靶核酸序列的双端读段。第一读段和第二读段构成靶核酸序列的双端读段。通常,从第一测序引物测序在从第二引物测序开始之前完成,但是在一些实施例中,从第一引物和从第二引物测序是交替的或同时的。A general class of embodiments provides a method for double-end sequencing. In the method, a nucleic acid concatemer is provided, which includes multiple sequential copies of: a first adapter region, a forward strand of a target nucleic acid sequence, a second adapter region different from the first adapter region, and a reverse strand of a target nucleic acid sequence complementary to the forward strand. A sequencing process is performed to generate a double-end read of a target nucleic acid sequence by hybridizing a first sequencing primer with the first adapter region and obtaining a first read of a first portion of the target nucleic acid sequence by sequencing from the first sequencing primer, and hybridizing a second sequencing primer with the second adapter region and obtaining a second read of a second portion of the target nucleic acid sequence by sequencing from the second sequencing primer. The first read and the second read constitute a double-end read of the target nucleic acid sequence. Typically, sequencing from the first sequencing primer is completed before sequencing from the second primer, but in some embodiments, sequencing from the first primer and from the second primer is alternate or simultaneous.

在一些实施例中,核酸多联体是通过提供环状核酸分子并且使用环状核酸分子作为模板进行滚环扩增产生核酸多联体而产生的。环状核酸分子包括中心区域,所述中心区域包括靶核酸序列的正向链和互补反向链。通常为双链的中心区域具有两个末端。正向链利用第一连接区域在一个末端处连接反向链,并且正向链利用第二连接区域在另一个末端处连接反向链。第一连接区域和第二连接区域彼此不同,并且是所得到的多联体中的衔接子区域的互补物。环状核酸分子和核酸多联体通常但不一定是DNA分子。In certain embodiments, nucleic acid concatemers are produced by providing circular nucleic acid molecules and using circular nucleic acid molecules as templates for rolling circle amplification to produce nucleic acid concatemers. Circular nucleic acid molecules include a central region, and the central region includes a forward strand and a complementary reverse strand of a target nucleic acid sequence. The central region, which is usually a double strand, has two ends. The forward strand utilizes a first connection region to connect the reverse strand at one end, and the forward strand utilizes a second connection region to connect the reverse strand at the other end. The first connection region and the second connection region are different from each other and are complements of the adapter region in the resulting concatemer. Circular nucleic acid molecules and nucleic acid concatemers are usually but not necessarily DNA molecules.

在一类实施例中,滚环扩增在溶液中进行。然后可以将所得到的多联体结合到例如使用所述方法产生的其它多联体的有序或无序阵列内的表面。给定多联体在阵列中的位置可以是预定的或者随机的。任选地,在滚环扩增反应中延伸的引物包含结合对的第一成员(例如,生物素),并且表面携带有结合对的第二成员(例如,亲和素或链霉亲和素)。在另一类实施例中,滚环扩增步骤在固体支持物的表面上进行。例如,在滚环扩增反应中延伸的引物可以在与环状核酸分子结合之前或之后并且在扩增反应开始之前结合到例如相同引物的有序或无序阵列内的表面。引物可以共价或非共价地结合到表面(例如,通过如上所述的结合对)。In one class of embodiments, rolling circle amplification is performed in solution. The resulting concatemers can then be bound to surfaces within an ordered or disordered array of other concatemers, for example, produced using the method. The position of a given concatemer in an array can be predetermined or random. Optionally, the primers extended in the rolling circle amplification reaction include a first member of a binding pair (e.g., biotin), and the surface carries a second member of a binding pair (e.g., avidin or streptavidin). In another class of embodiments, the rolling circle amplification step is performed on the surface of a solid support. For example, the primers extended in the rolling circle amplification reaction can be bound to surfaces within an ordered or disordered array of, for example, the same primers, before or after binding to the circular nucleic acid molecule and before the amplification reaction begins. Primers can be covalently or non-covalently bound to a surface (e.g., by binding pairs as described above).

在一类实施例中,从第一测序引物和从第二测序引物的测序同时进行。通过从第一测序引物测序产生的第一组可检测信号能够例如基于它们的强度与通过从第二测序引物测序产生的第二组可检测信号区分开来。例如,通过提供不同浓度的第一测序引物和第二测序引物,通过以可延伸寡核苷酸和不可延伸寡核苷酸的混合物的形式提供测序引物中的一种,和/或通过使用以显著不同的效率退火至它们各自的衔接子区域的第一测序引物和第二测序引物,可以产生第一测序过程和第二测序过程之间信号强度的差异。与已知参考序列的映射可以促进第一读段和第二读段的测定。In one class of embodiments, sequencing from a first sequencing primer and from a second sequencing primer is performed simultaneously. The first set of detectable signals generated by sequencing from the first sequencing primer can be distinguished from the second set of detectable signals generated by sequencing from the second sequencing primer, for example, based on their intensity. For example, by providing different concentrations of the first sequencing primer and the second sequencing primer, by providing one of the sequencing primers in the form of a mixture of extendable oligonucleotides and non-extendable oligonucleotides, and/or by using the first sequencing primer and the second sequencing primer that anneal to their respective adapter regions with significantly different efficiencies, a difference in signal intensity between the first sequencing process and the second sequencing process can be generated. Mapping with a known reference sequence can facilitate the determination of the first read and the second read.

尽管在一些实施例中,从第一测序引物和第二测序引物测序是交替或同时的,但更通常的是,在开始从第二测序引物测序之前完成从第一测序引物测序。在其中在从第一引物测序之后进行从第二引物测序的一些实施例中,在获得第一读段之后,去除通过从第一测序引物测序形成的新生链,例如,在第二测序引物与多联体上的第二衔接子区域杂交之前。新生链可以通过任何合适的过程进行去除,例如切割和洗涤、核酸外切酶消化或变性。在其中在从第一引物测序之后进行从第二引物测序的其它实施例中,通过从第一引物测序形成的新生链未被去除。例如,可以将新生链进行阻断,使得它们不会干扰从第二引物测序。因此,在一些实施例中,在进行从第二测序引物测序之前、通常在第二测序引物与第二衔接子区域杂交之前将通过从第一引物测序形成的新生链的3′端进行阻断。Although in some embodiments, sequencing from the first sequencing primer and the second sequencing primer is performed alternately or simultaneously, it is more common that sequencing from the first sequencing primer is completed before sequencing from the second sequencing primer is started. In some embodiments in which sequencing from the second primer is performed after sequencing from the first primer, after obtaining the first read, the nascent chain formed by sequencing from the first sequencing primer is removed, for example, before the second sequencing primer is hybridized with the second adapter region on the concatemer. The nascent chain can be removed by any suitable process, such as cutting and washing, exonuclease digestion or denaturation. In other embodiments in which sequencing from the second primer is performed after sequencing from the first primer, the nascent chain formed by sequencing from the first primer is not removed. For example, the nascent chains can be blocked so that they do not interfere with sequencing from the second primer. Therefore, in some embodiments, the 3′ end of the nascent chain formed by sequencing from the first primer is blocked before sequencing from the second sequencing primer, usually before the second sequencing primer is hybridized with the second adapter region.

许多合适的测序过程在本领域中是已知的,并且可以应用于本发明方法的实践。例如,从第一测序引物和第二测序引物测序可以涉及通过掺入进行的测序、通过连接进行的测序或通过杂交技术进行的测序。在优选类别的实施例中,从第一测序引物和第二测序引物测序包括通过结合技术进行的测序。在测序期间,第一测序引物和第二测序引物任选地利用链置换聚合酶而延伸。测序可以在存在单链结合蛋白的情况下进行。Many suitable sequencing processes are known in the art and can be applied to the practice of the inventive method. For example, sequencing from the first sequencing primer and the second sequencing primer can involve sequencing by incorporation, sequencing by connection, or sequencing by hybridization techniques. In a preferred class of embodiments, sequencing from the first sequencing primer and the second sequencing primer includes sequencing by binding techniques. During sequencing, the first sequencing primer and the second sequencing primer are optionally extended using a strand displacement polymerase. Sequencing can be performed in the presence of a single-stranded binding protein.

在例如其中使用缺乏链置换活性的聚合酶的一些实施例中,在从第一测序引物测序之前合成与正向链互补的第一掩蔽链,和/或在从第二测序引物测序之前合成与反向链互补的第二掩蔽链。通常,在将第一测序引物与第一衔接子区域杂交之前产生与正向链互补的第一掩蔽链,并且在将第二测序引物与第二衔接子区域杂交之前产生与反向链互补的第二掩蔽链。可替代地,可以利用可逆终止子将第一测序引物和第二测序引物在它们的3′端处进行阻断;第一引物可以在第一掩蔽链合成发生之前与第一衔接子区域杂交,并且第二引物可以在进行第二掩蔽链合成之前与第二衔接子区域杂交。在开始从引物测序之前去除可逆终止子。In some embodiments, for example, in which a polymerase lacking strand displacement activity is used, a first masked strand complementary to the forward strand is synthesized prior to sequencing from the first sequencing primer, and/or a second masked strand complementary to the reverse strand is synthesized prior to sequencing from the second sequencing primer. Typically, a first masked strand complementary to the forward strand is generated prior to hybridizing the first sequencing primer to the first adapter region, and a second masked strand complementary to the reverse strand is generated prior to hybridizing the second sequencing primer to the second adapter region. Alternatively, the first sequencing primer and the second sequencing primer can be blocked at their 3′ ends using a reversible terminator; the first primer can hybridize to the first adapter region before synthesis of the first masked strand occurs, and the second primer can hybridize to the second adapter region before synthesis of the second masked strand is performed. The reversible terminator is removed prior to initiating sequencing from the primers.

核酸多联体通常包含许多拷贝的其重复单元,例如,至少在要使用的测序技术中提供可检测信号的拷贝数。例如,核酸多联体可以包含至少10个顺序拷贝的第一衔接子区域、正向链、第二衔接子区域和反向链,例如,至少50个、至少100个、至少500个、至少1000个、至少5000个或至少10,000个顺序拷贝。在一些实施例中,核酸多联体可以包含50至20,000个顺序拷贝的第一衔接子区域、正向链、第二衔接子区域和反向链,例如,50至10,000或100至5,000个顺序拷贝。本领域技术人员将理解,所使用的拷贝数可以变化,例如,取决于正在进行的分析的类型。在重复单元长度较小(例如,少于几百个碱基)的情况下,拷贝数可以高于重复单元较大(例如,数千到数万个碱基)的情况。一个考虑因素是所产生的多联体的总分子量。本领域技术人员将理解如何控制多联体的拷贝数和总分子量以用于他们正在进行的分析。Nucleic acid concatemers generally include many copies of their repeating units, for example, at least providing a copy number of a detectable signal in the sequencing technology to be used. For example, nucleic acid concatemers may include at least 10 sequential copies of the first adapter region, the forward strand, the second adapter region, and the reverse strand, for example, at least 50, at least 100, at least 500, at least 1000, at least 5000, or at least 10,000 sequential copies. In certain embodiments, nucleic acid concatemers may include 50 to 20,000 sequential copies of the first adapter region, the forward strand, the second adapter region, and the reverse strand, for example, 50 to 10,000 or 100 to 5,000 sequential copies. It will be appreciated by those skilled in the art that the copy number used may vary, for example, depending on the type of analysis being performed. In the case where the repeating unit length is small (for example, less than several hundred bases), the copy number may be higher than the case where the repeating unit is large (for example, thousands to tens of thousands of bases). A consideration is the total molecular weight of the concatemers produced. Those skilled in the art will understand how to control the copy number and total molecular weight of the concatemers for the analysis they are performing.

另一种一般类别的实施例提供用于核酸测序的方法。在方法中,提供核酸多联体,其包括多重顺序拷贝的:第一衔接子区域、靶核酸序列的正向链、不同于第一衔接子区域的第二衔接子区域以及与正向链互补的靶核酸序列的反向链。进行测序过程以测定靶核酸的至少一部分(例如,正向链的一部分、反向链的一部分,或者两者)的序列。在一类实施例中,将掩蔽引物与第二衔接子区域杂交并延伸(例如,利用链置换聚合酶)以产生与正向链互补的第一掩蔽链。第一掩蔽链并不同样与整个第一衔接子区域互补。将第一测序引物与第一衔接子区域杂交,并且通过从第一测序引物测序获得靶核酸序列的第一部分的第一读段。在产生第一掩蔽链后,进行从第一测序引物的测序。第一掩蔽链的合成通常但不一定发生在第一测序引物与第一衔接子区域杂交之前。Another general class of embodiments provides a method for nucleic acid sequencing. In the method, a nucleic acid concatemer is provided, which includes multiple sequential copies of: a first adapter region, a forward strand of a target nucleic acid sequence, a second adapter region different from the first adapter region, and a reverse strand of a target nucleic acid sequence complementary to the forward strand. A sequencing process is performed to determine the sequence of at least a portion of the target nucleic acid (e.g., a portion of the forward strand, a portion of the reverse strand, or both). In one class of embodiments, a masked primer is hybridized to the second adapter region and extended (e.g., using a strand displacement polymerase) to produce a first masked strand complementary to the forward strand. The first masked strand is not also complementary to the entire first adapter region. A first sequencing primer is hybridized to the first adapter region, and a first read of the first portion of the target nucleic acid sequence is obtained by sequencing from the first sequencing primer. After the first masked strand is generated, sequencing from the first sequencing primer is performed. The synthesis of the first masked strand typically but not necessarily occurs before the first sequencing primer hybridizes to the first adapter region.

通过在合适的位置停止延伸,可以使用各种策略来确保第一掩蔽链不与整个第一衔接子区域互补。在一类实施例中,在延伸掩蔽引物之前,将阻断链置换的寡核苷酸与第一衔接子区域杂交。在一类实施例中,第一衔接子区域包括至少一个非天然核苷酸,并且掩蔽引物在排除所述至少一个非天然核苷酸的互补物的条件下延伸。在一类实施例中,在掩蔽引物延伸之前,向第一衔接子区域中引入切口。例如,通过将订书钉型寡核苷酸(stapleoligonucleotide)的一个末端与第一衔接子区域杂交,并且将订书钉型寡核苷酸的另一个末端与第二衔接子区域杂交,可以使所得到的片段保持接近。在某些实施例中,单个寡核苷酸可以用作测序引物和订书钉型寡核苷酸两者。因此,任选地,第一测序引物的5′端与第二衔接子区域杂交和/或掩蔽引物的5′端与第一衔接子区域杂交。By stopping the extension at the appropriate position, various strategies can be used to ensure that the first masked strand is not complementary to the entire first adapter region. In one class of embodiments, before extending the masked primer, the oligonucleotide displaced by the blocking strand is hybridized with the first adapter region. In one class of embodiments, the first adapter region includes at least one non-natural nucleotide, and the masked primer is extended under the condition of excluding the complement of the at least one non-natural nucleotide. In one class of embodiments, before the masked primer is extended, a nick is introduced into the first adapter region. For example, by hybridizing one end of a staple oligonucleotide (stapleoligonucleotide) with the first adapter region, and hybridizing the other end of the staple oligonucleotide with the second adapter region, the resulting fragment can be kept close. In certain embodiments, a single oligonucleotide can be used as both a sequencing primer and a staple oligonucleotide. Therefore, optionally, the 5' end of the first sequencing primer hybridizes with the second adapter region and/or the 5' end of the masking primer hybridizes with the first adapter region.

在需要靶核酸序列的双端读段的情况下,可以使用各种方法中的任何一种来获得第二读段。第二读段可以从多联体内的靶标的另一条链获得。因此,在一类实施例中,从第一测序引物测序完成后,通过从第一测序引物测序产生的新生链进一步延伸产生第二掩蔽链。第二掩蔽链与反向链互补但不与整个第二衔接子区域互补。去除第一掩蔽链。将第二测序引物与第二衔接子区域杂交,并且通过从第二测序引物测序获得靶核酸序列的第二部分的第二读段。在一些实施例中,掩蔽引物包括5′磷酸基团,并且第一掩蔽链通过用λ核酸外切酶消化而去除。In cases where a double-ended read of a target nucleic acid sequence is desired, any of a variety of methods may be used to obtain a second read. The second read may be obtained from another strand of the target within the concatemer. Therefore, in one class of embodiments, after sequencing from the first sequencing primer is completed, a second masked strand is generated by further extending the nascent strand generated by sequencing from the first sequencing primer. The second masked strand is complementary to the reverse strand but not to the entire second adapter region. The first masked strand is removed. The second sequencing primer is hybridized to the second adapter region, and a second read of the second portion of the target nucleic acid sequence is obtained by sequencing from the second sequencing primer. In some embodiments, the masked primer includes a 5′ phosphate group, and the first masked strand is removed by digestion with lambda exonuclease.

第二读段可以从通过延伸第一测序引物产生的链来获得,而不是直接从多联体获得。因此,在一类实施例中,第一测序引物包括各自与第一衔接子区域杂交并且位于不与第一衔接子区域杂交的中心区域侧翼的5′区域和3′区域。第一衔接子区域的一部分在第一测序引物与第一衔接子区域杂交时保持单链。将通过从第一测序引物测序产生的新生链进一步延伸以产生与反向链互补的第一延伸链。该第一延伸链通常并不同样与整个第二衔接子区域互补。将置换引物与第一衔接子区域的单链部分杂交,并且利用链置换聚合酶延伸以从反向链置换第一延伸链。在置换延伸链的剩余部分后,第一延伸链的5′区域保持与第一衔接子区域杂交。将第二测序引物与第一延伸链杂交,并且通过从第二测序引物测序获得靶核酸序列的第二部分的第二读段。The second read can be obtained from the chain generated by extending the first sequencing primer, rather than directly from the concatemer. Therefore, in one class of embodiments, the first sequencing primer includes a 5' region and a 3' region that are each hybridized to the first adapter region and are located on the flanks of the central region that is not hybridized to the first adapter region. A portion of the first adapter region remains single-stranded when the first sequencing primer hybridizes to the first adapter region. The nascent chain generated by sequencing from the first sequencing primer is further extended to produce a first extended chain complementary to the reverse chain. The first extended chain is generally not also complementary to the entire second adapter region. A displacement primer is hybridized to the single-stranded portion of the first adapter region, and a chain displacement polymerase is extended to displace the first extended chain from the reverse chain. After displacing the remainder of the extended chain, the 5' region of the first extended chain remains hybridized to the first adapter region. The second sequencing primer is hybridized to the first extended chain, and a second read of the second portion of the target nucleic acid sequence is obtained by sequencing from the second sequencing primer.

基本上上文所述的所有特征也相关地适用于这些实施例,例如,关于多联体中重复单元的拷贝数等。Essentially all features described above also apply in relation to these embodiments, for example with regard to the number of copies of the repeating units in the concatemers etc.

在一些实施例中,核酸多联体是通过提供环状核酸分子并且使用环状核酸分子作为模板进行滚环扩增产生核酸多联体而产生的。环状核酸分子包括中心区域,所述中心区域包括靶核酸序列的正向链和互补反向链。通常为双链的中心区域具有两个末端。正向链利用第一连接区域在一个末端处连接反向链,并且正向链利用第二连接区域在另一个末端处连接反向链。第一连接区域和第二连接区域彼此不同,并且是所得到的多联体中的衔接子区域的互补物。环状核酸分子和核酸多联体通常但不一定是DNA分子。In certain embodiments, nucleic acid concatemers are produced by providing circular nucleic acid molecules and using circular nucleic acid molecules as templates for rolling circle amplification to produce nucleic acid concatemers. Circular nucleic acid molecules include a central region, and the central region includes a forward strand and a complementary reverse strand of a target nucleic acid sequence. The central region, which is usually a double strand, has two ends. The forward strand utilizes a first connection region to connect the reverse strand at one end, and the forward strand utilizes a second connection region to connect the reverse strand at the other end. The first connection region and the second connection region are different from each other and are complements of the adapter region in the resulting concatemer. Circular nucleic acid molecules and nucleic acid concatemers are usually but not necessarily DNA molecules.

在一类实施例中,滚环扩增在溶液中进行。然后可以将所得到的多联体结合到例如使用所述方法产生的其它多联体的有序或无序阵列内的表面。给定多联体在阵列中的位置可以是预定的或者随机的。任选地,在滚环扩增反应中延伸的引物包含结合对的第一成员(例如,生物素),并且表面携带有结合对的第二成员(例如,亲和素或链霉亲和素)。在另一类实施例中,滚环扩增步骤在固体支持物的表面上进行。例如,在滚环扩增反应中延伸的引物可以在与环状核酸分子结合之前或之后并且在扩增反应开始之前结合到例如相同引物的有序或无序阵列内的表面。引物可以共价或非共价地结合到表面(例如,通过如上所述的结合对)。In one class of embodiments, rolling circle amplification is performed in solution. The resulting concatemers can then be bound to surfaces within an ordered or disordered array of other concatemers, for example, produced using the method. The position of a given concatemer in an array can be predetermined or random. Optionally, the primers extended in the rolling circle amplification reaction include a first member of a binding pair (e.g., biotin), and the surface carries a second member of a binding pair (e.g., avidin or streptavidin). In another class of embodiments, the rolling circle amplification step is performed on the surface of a solid support. For example, the primers extended in the rolling circle amplification reaction can be bound to surfaces within an ordered or disordered array of, for example, the same primers, before or after binding to the circular nucleic acid molecule and before the amplification reaction begins. Primers can be covalently or non-covalently bound to a surface (e.g., by binding pairs as described above).

许多合适的测序过程在本领域中是已知的,并且可以应用于本发明方法的实践。例如,从第一测序引物和任选地从第二测序引物测序可以涉及通过掺入进行的测序、通过连接进行的测序或通过杂交技术进行的测序。在优选类别的实施例中,从第一测序引物和任选地从第二测序引物测序包括通过结合技术进行的测序。在测序期间,第一测序引物和第二测序引物任选地利用缺乏链置换活性或具有弱链置换活性的聚合酶而延伸。Many suitable sequencing processes are known in the art and can be applied to the practice of the inventive method. For example, sequencing from a first sequencing primer and optionally from a second sequencing primer can involve sequencing by incorporation, sequencing by ligation, or sequencing by hybridization techniques. In a preferred class of embodiments, sequencing from a first sequencing primer and optionally from a second sequencing primer includes sequencing by a binding technique. During sequencing, the first sequencing primer and the second sequencing primer are optionally extended using a polymerase lacking strand displacement activity or having a weak strand displacement activity.

与这些方法相关、由这些方法产生或在这些方法中使用的组合物、系统和试剂盒同样是本发明的特征。例如,一般类别的实施例提供一种包含核酸多联体阵列的组合物。将多联体结合到表面,例如,其中不同的多联体在有序排布的不同位点,或者不同的多联体在无序阵列中随机分布位置的不同位点。给定多联体在阵列中的位置可以是预定的或者随机的。每一个多联体包括多重拷贝的:包括第一测序引物结合位点的第一衔接子区域、靶核酸序列的正向链、不同于第一衔接子区域的第二衔接子区域以及与正向链互补的靶核酸序列的反向链。第二衔接子区域可以包含序列与第一测序引物结合位点不同的第二测序引物结合位点。Compositions, systems and kits related to, produced by or used in these methods are also features of the present invention. For example, a general class of embodiments provides a composition comprising an array of nucleic acid concatemers. The concatemers are bound to a surface, for example, with different concatemers at different sites in an orderly arrangement, or with different concatemers at different sites in a randomly distributed position in an unordered array. The position of a given concatemer in the array can be predetermined or random. Each concatemer includes multiple copies of: a first adapter region including a first sequencing primer binding site, a forward strand of a target nucleic acid sequence, a second adapter region different from the first adapter region, and a reverse strand of a target nucleic acid sequence complementary to the forward strand. The second adapter region may include a second sequencing primer binding site having a sequence different from the first sequencing primer binding site.

多联体可以共价或非共价地结合到表面上。例如,每个多联体可以包括结合对的第一成员(例如,生物素),其与转而结合到表面的结合对的第二成员(例如,亲和素或链霉亲和素)结合。The concatemers can be covalently or non-covalently bound to the surface. For example, each concatemer can include a first member of a binding pair (e.g., biotin) that is bound to a second member of the binding pair (e.g., avidin or streptavidin) that in turn is bound to the surface.

在一些实施例中,将第一测序引物与第一测序引物结合位点杂交。在一些实施例中,将第二测序引物与第二测序引物结合位点杂交。组合物可以包含通过延伸第一测序引物和/或第二测序引物而产生的新生链。将通过第一测序引物的延伸而产生的新生链任选地进行阻断。组合物任选地还包含:聚合酶(例如,链置换聚合酶或缺乏链置换活性的聚合酶)、一种或多种核苷酸(例如,天然存在的核苷酸、非天然核苷酸、标记的核苷酸、可逆终止子核苷酸和/或链终止核苷酸)、掩蔽引物、阻断性寡核苷酸、置换引物、掩蔽链、置换链、一种或多种订书钉型寡核苷酸和/或测序过程中使用的其它试剂。组合物任选地存在于核酸测序系统中。In some embodiments, the first sequencing primer is hybridized to the first sequencing primer binding site. In some embodiments, the second sequencing primer is hybridized to the second sequencing primer binding site. The composition may include a nascent chain generated by extending the first sequencing primer and/or the second sequencing primer. The nascent chain generated by the extension of the first sequencing primer is optionally blocked. The composition optionally further includes: a polymerase (e.g., a strand displacement polymerase or a polymerase lacking strand displacement activity), one or more nucleotides (e.g., naturally occurring nucleotides, non-natural nucleotides, labeled nucleotides, reversible terminator nucleotides and/or chain termination nucleotides), a masked primer, a blocking oligonucleotide, a displacement primer, a masked strand, a displacement strand, one or more stapled oligonucleotides and/or other reagents used in the sequencing process. The composition is optionally present in a nucleic acid sequencing system.

基本上上述所有特征也相关地适用于这些实施例,例如,关于阵列中的多联体数量、多联体中重复单元的拷贝数、组合物中包含用于去除新生链或掩蔽链的核酸酶、组合物中包含单链结合蛋白、合适的阵列基材等。Essentially all of the features described above are also relevantly applicable to these embodiments, for example, regarding the number of concatemers in the array, the number of copies of the repeating units in the concatemers, the inclusion of nucleases in the composition for removing nascent or masked strands, the inclusion of single-stranded binding proteins in the composition, suitable array substrates, etc.

另一种一般类别的实施例提供一种试剂盒,其包含被构造为结合多种核酸多联体的固体支持物、第一茎环衔接子、不同于第一茎环衔接子的第二茎环衔接子、用于进行滚环扩增的试剂(例如,滚环扩增引物、链置换聚合酶以及一种或多种核苷酸)、第一测序引物、任选地第二测序引物以及用于进行核酸测序的试剂(例如,聚合酶以及一种或多种核苷酸,通常包含一种、两种、三种或四种标记的核苷酸)。用于滚环扩增和测序的聚合酶通常是不同的聚合酶,但在一些实施例中可以是相同的。试剂盒还可以包含用于产生环状核酸分子、进行滚环扩增以产生多联体以及进行核酸测序的另外的试剂,包括但不限于缓冲反应溶液、掩蔽引物、阻断性寡核苷酸、置换引物、一种或多种订书钉型寡核苷酸、位点特异性核酸内切酶和/或核酸外切酶。试剂盒通常还包含使用组分的说明书,例如,用于产生环状核酸分子、进行滚环扩增以产生多联体以及进行核酸测序。试剂盒的组分包装在一个或多个容器中。Another general class of embodiments provides a kit comprising a solid support configured to bind multiple nucleic acid concatemers, a first stem-loop adapter, a second stem-loop adapter different from the first stem-loop adapter, reagents for performing rolling circle amplification (e.g., rolling circle amplification primers, strand displacement polymerases, and one or more nucleotides), a first sequencing primer, optionally a second sequencing primer, and reagents for performing nucleic acid sequencing (e.g., polymerases and one or more nucleotides, typically comprising one, two, three, or four labeled nucleotides). The polymerases used for rolling circle amplification and sequencing are typically different polymerases, but may be the same in some embodiments. The kit may also include additional reagents for generating circular nucleic acid molecules, performing rolling circle amplification to generate concatemers, and performing nucleic acid sequencing, including but not limited to a buffered reaction solution, a masking primer, a blocking oligonucleotide, a displacement primer, one or more staple oligonucleotides, a site-specific endonuclease, and/or an exonuclease. The kit also typically includes instructions for using the components, for example, for generating circular nucleic acid molecules, performing rolling circle amplification to generate concatemers, and performing nucleic acid sequencing. The components of the kit are packaged in one or more containers.

基本上上文所述的所有特征也相关地适用于这些实施例,例如,关于合适的阵列基材、包含单链结合蛋白等。Essentially all features described above also apply to these embodiments, e.g., with respect to suitable array substrates, inclusion of single-stranded binding proteins, etc.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1示意性地图示了用于制备本发明的核酸多联体并用其进行双端测序的方法的实施例。在本实施例中,在进行测序以产生第二读段之前,去除在测序以产生第一读段中产生的核酸片段。Figure 1 schematically illustrates an embodiment of a method for preparing nucleic acid concatemers of the present invention and performing paired-end sequencing therewith. In this embodiment, nucleic acid fragments generated in sequencing to generate a first read are removed before sequencing to generate a second read.

图2示意性地图示了用于制备本发明的核酸多联体并用其进行双端测序的方法的实施例。在本实施例中,在进行测序以产生第二读段的同时,将在测序以产生第一读段中产生的核酸片段阻断并保持在原位。Figure 2 schematically illustrates an embodiment of a method for preparing a nucleic acid concatemer of the present invention and using it for double-end sequencing. In this embodiment, while sequencing is performed to generate a second read, the nucleic acid fragments generated in sequencing to generate a first read are blocked and maintained in place.

图3示意性地图示了用于制备本发明的核酸多联体并用其进行双端测序的方法的实施例。在本实施例中,将对称和不对称的环状核酸构建体的统计学混合物用于产生所测序的核酸多联体。Figure 3 schematically illustrates an embodiment of a method for preparing nucleic acid concatemers of the present invention and using them for double-end sequencing. In this embodiment, a statistical mixture of symmetric and asymmetric circular nucleic acid constructs is used to generate sequenced nucleic acid concatemers.

图4示意性地图示了环状核酸分子的制备及其用作滚环扩增模板以产生在方法中有用的核酸多联体的用途。FIG. 4 schematically illustrates the preparation of circular nucleic acid molecules and their use as rolling circle amplification templates to generate nucleic acid concatemers useful in the methods.

图5示意性地图示了用于制备本发明的核酸多联体并用其进行核酸测序的方法的实施例。在本实施例中,合成掩蔽链以促进产生序列读段。Figure 5 schematically illustrates an example of a method for preparing nucleic acid concatemers of the present invention and using them for nucleic acid sequencing. In this example, a masking strand is synthesized to facilitate the generation of sequence reads.

图6A至图6B示意性地图示了用于制备本发明的核酸多联体并用其进行双端测序的方法的实施例。在本实施例中,掩蔽链被用于促进产生第一读段和第二读段。6A to 6B schematically illustrate an embodiment of a method for preparing nucleic acid concatemers of the present invention and performing paired-end sequencing therewith. In this embodiment, a masking strand is used to facilitate the generation of a first read segment and a second read segment.

图7A至图7B示意性地图示了用于制备本发明的核酸多联体并用其进行双端测序的方法的实施例。在本实施例中,将第一测序过程期间产生的新生链进行延伸和置换,并且从该延伸链获得第二读段。7A to 7B schematically illustrate an embodiment of a method for preparing nucleic acid concatemers of the present invention and using them for double-end sequencing. In this embodiment, the nascent chain generated during the first sequencing process is extended and displaced, and a second read is obtained from the extended chain.

示意图不一定是按比例绘制的。The schematic diagrams are not necessarily drawn to scale.

具体实施方式Detailed ways

除非另有说明,本发明的实践可以使用本领域技术范围内的有机化学、聚合物技术、分子生物学(包括重组技术)、细胞生物学、生物化学和免疫学的常规技术和描述。此类常规技术包含聚合物阵列合成、杂交、连接、噬菌体展示和使用标记物检测杂交。合适的技术的具体说明可以参考下面的实例。然而,当然也可以使用其它等效的常规程序。此类常规技术和描述可以在标准实验室手册中找到,如:《基因组分析:实验室手册系列(GenomeAnalysis:A Laboratory Manual Series)》(第I至IV卷)、《使用抗体:实验室手册(UsingAntibodies:A Laboratory Manual)》、《细胞:实验室手册(Cells:A LaboratoryManual)》、《PCR引物:实验室手册(PCR Primer:A Laboratory Manual)》以及《分子克隆:实验室手册(Molecular Cloning:A Laboratory Manual)》(全部来自冷泉港实验室出版社),Stryer,L.(1995)《生物化学(Biochemistry)》(第4版)Freeman,纽约,Gait,《寡核苷酸合成:实用方法(Oligonucleotide Synthesis:A Practical Approach)》1984,IRL出版社,伦敦,Nelson和Cox(2000),Lehninge《生物化学原理(Principles of Biochemistry)》第三版,W.H.Freeman Pub.,纽约州纽约市,以及Berg等人(2002)《生物化学》第5版,W.H.Freeman Pub.,纽约州纽约市,所有这些出于所有目的均通过引用整体并入本文。Unless otherwise indicated, the practice of the present invention may use conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant technology), cell biology, biochemistry and immunology within the technical scope of the art. Such conventional techniques include polymer array synthesis, hybridization, connection, phage display and the use of markers to detect hybridization. The specific description of suitable technology can refer to the example below. However, other equivalent routine procedures can certainly be used. Such routine techniques and descriptions can be found in standard laboratory manuals, such as: Genome Analysis: A Laboratory Manual Series (Volumes I to IV), Using Antibodies: A Laboratory Manual, Cells: A Laboratory Manual, PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Stryer, L. (1995) Biochemistry (4th edition) Freeman, New York, Gait, Oligonucleotide Synthesis: A Practical Approach 1984, IRL Press, London, Nelson and Cox (2000), Lehninge Principles of Biochemistry 3rd edition, W.H. Freeman Pub., New York, NY, and Berg et al. (2002) Biochemistry 5th ed., W.H. Freeman Pub., New York, NY, all of which are incorporated herein by reference in their entirety for all purposes.

除非另有定义,否则本文使用的所有技术性和科学术语具有与本发明所属领域的普通技术人员通常理解的含义相同的含义。以下定义对本领域中的定义进行补充,并且针对当前申请,不归因于任何相关或不相关的情况,例如任何共有的专利或申请。尽管在测试本发明的实践中可以使用与本文描述的相似或等效的任何方法和材料,但是本文描述的是优选的材料和方法。因此,本文使用的术语仅用于描述特定实施例的目的,而不旨在进行限制。本文对各种另外的术语进行了定义或以其它方式进行表征。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those generally understood by those of ordinary skill in the art to which the present invention belongs. The following definitions supplement the definitions in the art and are not attributed to any relevant or irrelevant circumstances, such as any shared patents or applications, for the current application. Although any methods and materials similar or equivalent to those described herein can be used in the practice of testing the present invention, preferred materials and methods are described herein. Therefore, the terms used herein are only used for the purpose of describing specific embodiments and are not intended to be limited. Various other terms are defined or otherwise characterized herein.

注意,如本文和所附权利要求书中所使用的,单数形式“一个”、“一种”和“所述”包括复数指代,除非上下文另有明确规定。因此,例如,提及“一种聚合酶”是指一种试剂或此类试剂的混合物,提及“所述方法”包括提及本领域技术人员已知的等效步骤和方法,等等。Note that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Thus, for example, reference to "a polymerase" refers to one reagent or a mixture of such reagents, reference to "the method" includes reference to equivalent steps and methods known to those skilled in the art, and so forth.

在提供了值范围的情况下,应当理解的是,在所述范围的上限与下限之间的每一中间值(到下限单位的十分之一,除非上下文清楚地另外指明)以及在所陈述范围内的任何其它所陈述的值或中间值均涵盖在本发明内。这些较小范围的上限和下限可以独立地包含在更小范围中并且也涵盖在本发明内,服从所陈述范围中的任何专门排除的限制。在所陈述范围包含一个或两个限值的情况下,本发明还包含排除所包含的任一个或两个限值的范围。Where a range of values is provided, it is to be understood that each intermediate value between the upper and lower limits of the range (to one tenth of the unit of the lower limit, unless the context clearly indicates otherwise) and any other stated value or intermediate value within the stated range are encompassed within the present invention. The upper and lower limits of these smaller ranges may be independently included in smaller ranges and are also encompassed within the present invention, subject to any specifically excluded limitations in the stated ranges. Where the stated range includes one or two limits, the present invention also includes ranges excluding either or both of the included limits.

如本文所用,术语“约”表示给定量的值以所述值的+/-10%变化,或者任选地以所述值的+/-5%变化,或者在一些实施例中以所述值的+/-1%变化。As used herein, the term "about" means that the value of a given amount varies by +/- 10% of the stated value, or optionally varies by +/- 5%, or in some embodiments, varies by +/- 1% of the stated value.

如本文所用,术语“包括”旨在意味着组合物和方法包含所叙述的要素,但不排除其它要素。当用于定义组合物和方法时,“基本上由...组成”应当意味着排除对所述组合物和方法具有任何重要意义的其它要素。“由...组成”应当意味着排除所要求保护的组合物的大于痕量要素的其它成分和大量方法步骤。由这些过渡术语中的每个定义的实施例在本发明的范围内。因此,意图是所述方法和组合物可以包含额外的步骤和组分(包括),或者可替代地包含不重要的步骤和组合物(基本上由...组成),或者可替代地,仅意在所陈述的方法步骤或组合物(由...组成)。As used herein, the term "comprising" is intended to mean that compositions and methods include the stated elements, but do not exclude other elements. When used to define compositions and methods, "consisting essentially of" should mean excluding other elements that are of any importance to the compositions and methods. "Consisting of" should mean excluding other ingredients greater than trace elements and a large number of method steps of the claimed composition. Embodiments defined by each of these transitional terms are within the scope of the present invention. Therefore, it is intended that the methods and compositions may include additional steps and components (comprising), or alternatively include insignificant steps and compositions (consisting essentially of), or alternatively, only the stated method steps or compositions are intended (consisting of).

本文的“核酸”、“多核苷酸”、“寡核苷酸”或语法上的等同物是指共价连接在一起的至少两个核苷酸。本发明的核酸通常含有磷酸二酯键,尽管在某些情况下,包括可能具有替代骨架的核酸类似物,包括例如磷酰胺、硫代磷酸酯、二硫代磷酸酯和肽核酸(PNA)骨架和键。其它类似核酸包括具有阳性骨架、非离子骨架和非核糖骨架的那些,包括下列中所描述的那些:美国专利第5,235,033号和第5,034,506号。核酸也可以有其它修饰,如包含杂原子、附接有标记物(如染料)或者用将仍然允许碱基配对和被聚合酶或其它酶识别的官能团取代。因此,术语“核酸”涵盖可以对应于核苷酸串的任何物理上的单体单元串,包括核苷酸聚合物(例如,典型的DNA或RNA聚合物)、经修饰的寡核苷酸(例如,包含对生物RNA或DNA而言不典型的核苷酸的寡核苷酸,如2′-O-甲基化寡核苷酸)等。"Nucleic acid", "polynucleotide", "oligonucleotide" or grammatical equivalents herein refer to at least two nucleotides covalently linked together. The nucleic acids of the present invention generally contain phosphodiester bonds, although in some cases, nucleic acid analogs that may have alternative backbones are included, including, for example, phosphoramide, phosphorothioate, phosphorodithioate and peptide nucleic acid (PNA) backbones and bonds. Other similar nucleic acids include those with positive backbones, non-ionic backbones and non-ribose backbones, including those described in the following: U.S. Patents Nos. 5,235,033 and 5,034,506. Nucleic acids may also have other modifications, such as including heteroatoms, attaching labels (such as dyes) or replacing with functional groups that will still allow base pairing and recognition by polymerases or other enzymes. Therefore, the term "nucleic acid" encompasses any physical monomer unit string that can correspond to a nucleotide string, including nucleotide polymers (e.g., typical DNA or RNA polymers), modified oligonucleotides (e.g., oligonucleotides containing nucleotides that are not typical for biological RNA or DNA, such as 2'-O-methylated oligonucleotides), etc.

术语“单链”可指例如单个聚合物核酸链,或者核酸聚合物链内与同一链或不同链内的互补区非碱基配对的区域,这取决于上下文。类似地,术语“双链”可以指相互杂交的两条聚合物链,或者指其中至少一条参与链还包含非碱基配对的其它部分的双螺旋区域,这可从上下文中清楚地看出。The term "single-stranded" may refer, for example, to a single polymer nucleic acid strand, or to a region within a nucleic acid polymer strand that is not base paired with a complementary region within the same strand or a different strand, depending on the context. Similarly, the term "double-stranded" may refer to two polymer strands that are hybridized to each other, or to a region of a duplex in which at least one of the participating strands also contains other portions that are not base paired, as is clear from the context.

“核苷酸序列”是核苷酸的聚合物(寡核苷酸、DNA、核酸等)或者表示核苷酸聚合物的字符串,这取决于上下文。在某些情况下,术语“核苷酸序列”是指核酸中碱基的实际序列,并且在某些情况下,术语“核苷酸序列”是指测得的或测定的序列。A "nucleotide sequence" is a polymer of nucleotides (oligonucleotide, DNA, nucleic acid, etc.) or a string of characters representing a polymer of nucleotides, depending on the context. In some cases, the term "nucleotide sequence" refers to the actual sequence of bases in a nucleic acid, and in some cases, the term "nucleotide sequence" refers to a measured or determined sequence.

对靶核酸的链进行“测序”意指测定包括该链和/或其互补物的核苷酸的顺序和身份(identity),例如,通过本领域已知的各种核酸测序技术中的任何一种产生该链的读段。由于许多此类技术使用与待测序的链互补的引物(例如,随着靶序列的测定而延伸的引物),测序被称为“从”引物进行。"Sequencing" a strand of a target nucleic acid means determining the order and identity of the nucleotides comprising the strand and/or its complement, e.g., by generating reads of the strand by any of a variety of nucleic acid sequencing techniques known in the art. Since many such techniques use primers that are complementary to the strand to be sequenced (e.g., primers that are extended as the target sequence is determined), sequencing is said to be performed "from" the primers.

“靶核酸序列”是其核苷酸序列或其至少一部分待测定的核酸或其区域。如本领域众所周知的,由于特定的核酸序列编码互补序列,任何靶核酸序列都可以表达为“正向”序列(或链)或其互补的“反向”序列(或链)。本发明的某些方法测定正向链的核苷酸序列的一部分和反向链的核苷酸序列的一部分(通常是不同的部分);由于两条链是互补的,这一过程通常测定了正向链的两个部分和反向链的两个部分(通常是每一者的两个相对末端)的核苷酸序列。术语“正向”和“反向”在本文中是以相对意义而非绝对意义使用的,以表达一条链或序列是另一者的互补;不旨在隐含功能(例如,编码与非编码)、染色体内的位置、扩增顺序或其它此类信息。A "target nucleic acid sequence" is a nucleic acid or region thereof whose nucleotide sequence or at least a portion thereof is to be determined. As is well known in the art, since specific nucleic acid sequences encode complementary sequences, any target nucleic acid sequence can be expressed as a "forward" sequence (or strand) or its complementary "reverse" sequence (or strand). Certain methods of the present invention measure a portion of the nucleotide sequence of the forward strand and a portion of the nucleotide sequence of the reverse strand (usually different portions); since the two strands are complementary, this process typically measures the nucleotide sequences of two portions of the forward strand and two portions of the reverse strand (usually two opposite ends of each). The terms "forward" and "reverse" are used herein in a relative sense rather than an absolute sense to express that one strand or sequence is the complement of another; it is not intended to imply function (e.g., coding versus non-coding), position within a chromosome, amplification sequence, or other such information.

核酸序列的“读段”是核苷酸或碱基对(或者核苷酸或碱基对概率)的测得的或推定的序列。读段可以对应于单个DNA片段的全部或一部分。读段通常提供DNA区域中测得的或推定的碱基的顺序和身份,读段还可以包含其它信息,如质量分数、概率等。A "read" of a nucleic acid sequence is a measured or inferred sequence of nucleotides or base pairs (or nucleotide or base pair probabilities). A read may correspond to all or part of a single DNA fragment. A read generally provides the order and identity of the measured or inferred bases in a DNA region, and may also contain other information such as quality scores, probabilities, etc.

本公开中所使用的设计成在适当严谨的杂交条件下特异性退火到靶核酸中充分互补的核酸序列的寡核苷酸在本文中有时被称为“引物”,而它们所退火至的序列被称为“引物结合位点”(或“引发位点”)。因此,对核酸中的引物结合位点(例如,衔接子区域中的引物结合位点)具有“特异性”的引物是包含在适当严谨的杂交条件下优先与引物结合位点杂交的核酸序列的引物。适当严谨的杂交条件通常由本领域普通技术人员确定。引物中与引物结合位点杂交的核酸序列在本文中有时被称为“引物区域”或类似的短语。引物通常包含不直接参与与引物结合位点杂交并且在本文描述的方法中可以具有特定用途的区域。引物可用于不同的功能,包括但不限于核酸合成和/或捕获含有其同源引物结合位点的核酸。例如,“捕获引物”(或“捕获寡核苷酸”)可用于分离含有特异性捕获(例如,存在于衔接子或衔接子区域中的)引物结合位点的核酸,并且通常包含结合对的第一成员(例如,核酸序列、生物素、亲和素、抗原、抗体或其结合片段等)或者直接附接于固体支持物。作为另一个实例,“合成引物”是可用于通过核酸聚合酶(在核酸合成条件下)引发核酸合成的引物,并且在某些特定应用中,可在边合成边测序(SBS)或通过结合测序(SBB)应用中用作测序引物,或者在核酸扩增反应(例如,滚环扩增反应)中用作引物。Oligonucleotides designed to specifically anneal to fully complementary nucleic acid sequences in target nucleic acids under appropriate stringent hybridization conditions used in the present disclosure are sometimes referred to herein as "primers", and the sequences to which they anneal are referred to as "primer binding sites" (or "priming sites"). Therefore, primers having "specificity" to primer binding sites in nucleic acids (e.g., primer binding sites in adapter regions) are primers containing nucleic acid sequences that preferentially hybridize to primer binding sites under appropriate stringent hybridization conditions. Appropriate stringent hybridization conditions are generally determined by those of ordinary skill in the art. The nucleic acid sequences that hybridize to primer binding sites in primers are sometimes referred to herein as "primer regions" or similar phrases. Primers generally include regions that are not directly involved in hybridization to primer binding sites and that can have specific uses in the methods described herein. Primers can be used for different functions, including but not limited to nucleic acid synthesis and/or capturing nucleic acids containing their cognate primer binding sites. For example, a "capture primer" (or "capture oligonucleotide") can be used to isolate nucleic acids containing a specific capture primer binding site (e.g., present in an adapter or adapter region), and typically comprises a first member of a binding pair (e.g., a nucleic acid sequence, biotin, avidin, an antigen, an antibody or a binding fragment thereof, etc.) or is directly attached to a solid support. As another example, a "synthesis primer" is a primer that can be used to initiate nucleic acid synthesis by a nucleic acid polymerase (under nucleic acid synthesis conditions), and in certain specific applications, can be used as a sequencing primer in sequencing by synthesis (SBS) or sequencing by binding (SBB) applications, or as a primer in a nucleic acid amplification reaction (e.g., a rolling circle amplification reaction).

如本文所用,“基本上相同”的核酸是与参考核酸序列具有至少80%、85%、90%、95%、96%、97%、98%或99%序列同一性的核酸。比较的长度优选为核酸的全长,但通常为至少20个核苷酸、30个核苷酸、40个核苷酸、50个核苷酸、75个核苷酸、100个核苷酸、125个核苷酸或更多。As used herein, a "substantially identical" nucleic acid is a nucleic acid having at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to a reference nucleic acid sequence. The length of comparison is preferably the full length of the nucleic acid, but is typically at least 20 nucleotides, 30 nucleotides, 40 nucleotides, 50 nucleotides, 75 nucleotides, 100 nucleotides, 125 nucleotides or more.

“不对称核酸”意指与核酸的第二末端相比,在核酸的第一末端处具有不同核酸组成的核酸。在本发明的上下文中特别有用的不对称核酸是不对称标记的,即具有附接到第一末端的第一衔接子和附接到第二末端的第二衔接子,其中第一衔接子与第二衔接子具有至少一种核酸组成差异。第一衔接子与第二衔接子之间的核酸组成差异可以是任何期望的差异,包括但不限于一个或多个核酸序列差异(例如,取代、缺失、插入、倒位、重排(例如,功能域的不同顺序),或其任意组合)。在某些实施例中,并且如本文所详述的,不对称核酸在一个末端处而不在第二末端处包含用于扩增引物的引物结合位点。"Asymmetric nucleic acid" means a nucleic acid having a different nucleic acid composition at the first end of the nucleic acid compared to the second end of the nucleic acid. Asymmetric nucleic acids particularly useful in the context of the present invention are asymmetrically labeled, i.e., having a first adaptor attached to the first end and a second adaptor attached to the second end, wherein the first adaptor and the second adaptor have at least one difference in nucleic acid composition. The difference in nucleic acid composition between the first adaptor and the second adaptor can be any desired difference, including but not limited to one or more nucleic acid sequence differences (e.g., substitutions, deletions, insertions, inversions, rearrangements (e.g., different orders of functional domains), or any combination thereof). In certain embodiments, and as described in detail herein, asymmetric nucleic acids include a primer binding site for an amplification primer at one end but not at the second end.

如本文所用,术语“核酸多联体”是指含有多重拷贝串联连接的相同核酸序列的连续核酸聚合物链(例如,单个DNA链)。在本发明的上下文中特别有用的多联体包含多重串联拷贝的包含以下的序列:第一衔接子区域、靶核酸序列的一条链、不同于第一衔接子区域的另一个衔接子区域以及靶核酸序列的互补链。“衔接子区域”是多联体内将靶核酸序列的给定链的3′端与靶核酸序列的另一条链的5′端连接起来的核酸序列。在其中茎环衔接子连接到双链片段的末端形成多联体由其产生(例如,通过滚环扩增)的环状核酸的实施例中,衔接子区域与通过衔接子提供的序列互补(即,每个衔接子区域与环状核酸中的在环状核酸内将靶核酸序列的一条链的3′端与靶核酸序列的另一条链的5′端连接起来的“连接区域”中的一个互补)。As used herein, the term "nucleic acid concatemer" refers to a continuous nucleic acid polymer chain (e.g., a single DNA chain) containing multiple copies of the same nucleic acid sequence connected in series. Particularly useful concatemers in the context of the present invention include multiple tandem copies of the following sequences: a first adapter region, one strand of the target nucleic acid sequence, another adapter region different from the first adapter region, and a complementary strand of the target nucleic acid sequence. An "adapter region" is a nucleic acid sequence within a concatemer that connects the 3' end of a given strand of a target nucleic acid sequence to the 5' end of another strand of the target nucleic acid sequence. In embodiments of circular nucleic acids in which stem-loop adapters are connected to the ends of double-stranded fragments to form concatemers from which the concatemers are generated (e.g., by rolling circle amplification), the adapter region is complementary to the sequence provided by the adapter (i.e., each adapter region is complementary to one of the "connection regions" in the circular nucleic acid that connects the 3' end of one strand of the target nucleic acid sequence to the 5' end of another strand of the target nucleic acid sequence within the circular nucleic acid).

“结合对”意指在至少一种结合条件下特异性地相互结合的任何两个部分。结合对的成员包括但不限于互补的单链核酸序列、生物素/亲和素或者生物素/链霉亲和素(以及生物素/中性抗生物素蛋白、生物素/traptavidin等)、抗原/抗体、半抗原/抗体(例如,地高辛配基/抗地高辛配基抗体)、配体/受体等。(注意,可以使用抗体的抗原/半抗原结合片段而不是整个抗体)。"Binding pair" means any two moieties that specifically bind to each other under at least one binding condition. Members of a binding pair include, but are not limited to, complementary single-stranded nucleic acid sequences, biotin/avidin or biotin/streptavidin (as well as biotin/neutravidin, biotin/traptavidin, etc.), antigen/antibody, hapten/antibody (e.g., digoxigenin/anti-digoxigenin antibody), ligand/receptor, etc. (Note that antigen/hapten binding fragments of antibodies can be used instead of whole antibodies).

“连接子”意指起到将第一功能元件或部分附接到另一个功能元件或部分的作用的部分。关于将核酸结构域附接到彼此或附接到不同的部分,连接子可以是附加的核苷酸残基(DNA、RNA、PNA等)、肽、碳链、聚乙二醇间隔子等。功能元件/部分与连接子的附接可以是共价或非共价的。在这方面无意进行限制。"Linker" means a part that plays a role in attaching a first functional element or part to another functional element or part. With regard to attaching nucleic acid domains to each other or to different parts, linkers can be additional nucleotide residues (DNA, RNA, PNA, etc.), peptides, carbon chains, polyethylene glycol spacers, etc. The attachment of functional elements/parts to linkers can be covalent or non-covalent. There is no intention to limit in this regard.

“链置换核酸聚合酶”、“链置换聚合酶”及其等同物意指具有5′至3′模板依赖性核酸合成活性和5′至3′链置换活性的核酸聚合酶。因此,当此类聚合酶在核酸合成期间遇到模板的双链区域时,它将对非模板链进行置换,同时在模板链上继续进行核酸合成。在环状模板(例如,如图所示,在两端具有发夹衔接子的双链插入物的模板)上,此类聚合酶可以在合适的核酸合成条件下进入滚环复制。虽然可以使用任何合适的链置换核酸聚合酶,但在某些实施例中,聚合酶是phi29(Φ29)DNA聚合酶或其修饰型。在使用经修饰的重组Φ29DNA聚合酶的情况下,它可以与野生型或核酸外切酶缺陷型Φ29 DNA聚合酶同源,例如,如在美国专利第5,001,050号、第5,198,543号或第5,576,204号中所描述的,其全部公开内容出于所有目的通过引用整体并入本文。可替代地,经修饰的重组DNA聚合酶可以与其它Φ29型DNA聚合酶同源,如B103、GA-1、PZA、Φ15、BS32、M2Y、Nf、G1、Cp-1、PRD1、PZE、SF5、Cp-5、Cp-7、PR4、PR5、PR722、L17、等。关于命名法,还参见Meijer等人.(2001)“Φ29噬菌体家族(Φ29 Family of Phages)”.《微生物学和分子生物学评论(Microbiology andMolecular Biology Reviews)》,65(2):261-287。示例性的合适聚合酶在例如下列中进行了描述:美国专利第8,420,366号和第8,257,954号,两者标题均为“用于提高单分子测序准确性的经修饰的聚合酶的生成(Generation of modified polymerases for improvedaccuracy in single molecule sequencing)”,美国专利申请公开第2007-0196846、第2008-0108082号、第2010-0075332号、第2010-0093555号、第2012-0034602号、第2013-0217007号、第2014-0094374号和第2014-0094375号,以及国际专利申请第WO 2007/075987号、第WO 2007/075873号、第WO 2007/076057号,其出于所有目的通过引用整体并入本文。许多另外的合适链置换聚合酶是本领域已知的或可商购获得的。"Strand displacement nucleic acid polymerase", "strand displacement polymerase" and their equivalents mean a nucleic acid polymerase having 5' to 3' template-dependent nucleic acid synthesis activity and 5' to 3' strand displacement activity. Therefore, when such a polymerase encounters a double-stranded region of a template during nucleic acid synthesis, it will displace the non-template strand while continuing nucleic acid synthesis on the template strand. On a circular template (e.g., a template with a double-stranded insert of hairpin adapters at both ends, as shown in the figure), such a polymerase can enter rolling circle replication under suitable nucleic acid synthesis conditions. Although any suitable strand displacement nucleic acid polymerase can be used, in certain embodiments, the polymerase is phi29 (Φ29) DNA polymerase or a modified form thereof. In the case of using a modified recombinant Φ29 DNA polymerase, it can be homologous to a wild-type or exonuclease-deficient Φ29 DNA polymerase, for example, as described in U.S. Patent Nos. 5,001,050, 5,198,543, or 5,576,204, the entire disclosure of which is incorporated herein by reference in its entirety for all purposes. Alternatively, the modified recombinant DNA polymerase can be homologous to other Φ29-type DNA polymerases, such as B103, GA-1, PZA, Φ15, BS32, M2Y, Nf, G1, Cp-1, PRD1, PZE, SF5, Cp-5, Cp-7, PR4, PR5, PR722, L17, For nomenclature, see also Meijer et al. (2001) "Φ29 Family of Phages". Microbiology and Molecular Biology Reviews, 65(2): 261-287. Exemplary suitable polymerases are described, for example, in U.S. Pat. Nos. 8,420,366 and 8,257,954, both entitled “Generation of modified polymerases for improved accuracy in single molecule sequencing,” U.S. Patent Application Publication Nos. 2007-0196846, 2008-0108082, 2010-0075332, 2010-0093555, 2012-0034602, 2013-0217007, 2014-0094374, and 2014-0094375, and International Patent Application Nos. WO 2007/075987, WO 2007/075873, WO 2007/075874, and WO 2007/075875. No. 2007/076057, which is incorporated herein by reference in its entirety for all purposes. Many additional suitable strand-displacing polymerases are known in the art or are commercially available.

在下面的描述中,对许多具体细节进行了阐述,以便提供对本发明的更透彻理解。然而,对于本领域技术人员来说显而易见的是,可以在没有这些具体细节中的一个或多个的情况下实践本发明。在其它情况下,为了避免模糊本发明,没有对本领域技术人员公知的特征和程序进行描述。In the following description, many specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without one or more of these specific details. In other cases, features and procedures well known to those skilled in the art are not described in order to avoid obscuring the present invention.

双端测序的方法Paired-end sequencing method

本公开总体上涉及用于核酸测序的改进方法,特别是用于进行双端测序的改进方法。双端或成对测序通常涉及测定感兴趣的靶序列的第一区域和第二区域的核苷酸序列(例如,测定核酸片段的两个相对末端的序列)。在一些实施例中,被测序的第一区域和第二区域通过未被测序的已知或未知核酸片段而分开。在其它实施例中,第一区域和第二区域重叠,并且可以测定全长靶区域的序列。The present disclosure generally relates to an improved method for nucleic acid sequencing, particularly an improved method for performing double-ended sequencing. Double-ended or paired sequencing generally relates to determining the nucleotide sequence of the first region and the second region of the target sequence of interest (e.g., determining the sequence of the two opposite ends of the nucleic acid fragment). In certain embodiments, the first region and the second region being sequenced are separated by known or unknown nucleic acid fragments that are not sequenced. In other embodiments, the first region and the second region overlap, and the sequence of the full-length target region can be determined.

在一些方面,本发明提供用于产生测序模板并在这些测序模板上进行双端测序。测序模板可以是每一个具有以下重复结构的单个核酸链:衔接子区域1、靶标的正向核酸链、衔接子区域2、靶标的反向核酸链。这些核酸是多联体分子。例如,重复结构可以在多联体中出现数百到数千次。多联体是单链的,因为它是单个聚合物链;显而易见,在某些条件下,该链可以具有二级结构,例如,链的自互补部分(例如,正向区域和反向区域)碱基配对形成被保持未配对的区域分开的双链区域。In some aspects, the present invention provides for generating sequencing templates and performing double-end sequencing on these sequencing templates. The sequencing template can be a single nucleic acid chain each having the following repeating structure: adapter region 1, forward nucleic acid chain of the target, adapter region 2, reverse nucleic acid chain of the target. These nucleic acids are concatemer molecules. For example, the repeating structure can appear hundreds to thousands of times in the concatemer. The concatemer is single-stranded because it is a single polymer chain; it is obvious that under certain conditions, the chain can have a secondary structure, for example, the self-complementary parts of the chain (e.g., the forward region and the reverse region) base pair to form double-stranded regions separated by regions that remain unpaired.

用于本发明的多联核酸测序模板可以通过对具有中心双链区的分子进行滚环扩增来产生,该中心双链区在其每个末端通过单链环连接以形成环状分子。如本文所描述,这些类型的拓扑环状分子有时被称为并且可以例如通过将两个不同的发夹(即,茎环)衔接子连接到双链DNA片段的两个末端来产生。位于中心自互补双链区域任一个末端处的发夹区域(例如,由发夹衔接子提供的区域)可以被称为接头(connector)区域。对于本发明,分子优选在中心区域的一个末端具有不同于另一个末端的接头。换句话说,该分子是不对称的。该分子不对称,使得它们形成本发明的优选测序模板,其中在,例如,使用与两个接头区域中的一个互补的扩增引物对环状分子进行滚环扩增后,衔接子区域1与衔接子区域2不同。多联核酸测序模板也可以通过本领域已知的其它技术来产生,例如,通过使用随机或构建体特异性引物的多重置换扩增(MDA)。例如,通过对不对称环状核酸进行滚环扩增产生的多联体可用于利用对多联体的衔接子衍生区域具有特异性的引物而进行的多重置换扩增反应,以产生可用作测序模板的多联体。Multiplexed nucleic acid sequencing templates for use in the present invention can be generated by rolling circle amplification of molecules having a central double-stranded region connected at each end by single-stranded loops to form a circular molecule. As described herein, these types of topological circular molecules are sometimes referred to as And can be produced, for example, by connecting two different hairpin (i.e., stem loop) adapters to the two ends of the double-stranded DNA fragment. The hairpin region (e.g., the region provided by the hairpin adapter) located at either end of the central self-complementary double-stranded region can be referred to as a connector region. For the present invention, the molecule preferably has a connector different from the other end at one end of the central region. In other words, the molecule is asymmetric. The molecule is asymmetric so that they form a preferred sequencing template of the present invention, wherein, for example, after the circular molecule is subjected to rolling circle amplification using an amplification primer complementary to one of the two connector regions, adapter region 1 is different from adapter region 2. Multiple nucleic acid sequencing templates can also be produced by other techniques known in the art, for example, by using multiple displacement amplification (MDA) of random or construct-specific primers. For example, the concatemer produced by rolling circle amplification of asymmetric circular nucleic acids can be used for multiple displacement amplification reactions using primers specific to the adapter-derived region of the concatemer to produce a concatemer that can be used as a sequencing template.

本发明提供了对多联模板进行测序以获得感兴趣靶序列的双端读段的方法。测序可以以任何合适的方式进行。测序方法可以是逐步测序方法。测序方法可以是例如边合成边测序(SBS)、通过结合测序(SBB)或通过连接测序。由于本发明的多联模板包含靶核酸序列的克隆群体,因此该多联结构提供用于进行逐步测序的多个位点。单个多联分子具有特定感兴趣靶区域的正向链和反向链的克隆群体。测序可以通过首先引入与衔接子区域1杂交的测序引物来进行。从该引物逐步测序提供第一链的序列读段。测序过程产生与第一链互补的新生核酸链。此测序过程可以针对所需数目的核苷酸而进行。通常,读段包含第一链的序列的一部分,但它可以包含第一链的整个长度。在对第一条链进行这种测序后,引入第二测序引物。第二引物与衔接子区域2杂交,并且进行逐步测序以提供部分或全部互补第二链的测序读段。由于测序模板的拓扑结构,第一读段对应于从靶序列的第一末端进行的读取,而第二读段对应于从靶序列的第二末端进行的读取,从而提供感兴趣的靶序列的成对末端或双端序列。如果第一读段是靶核酸“有义”链的一个末端,那么第二读段将是靶核酸“反义”链的相对末端(反之亦然)。碱基互补性规则可用于将序列或序列区域转换为其互补物,以进行序列分析。此类分析可以使用任何期望的序列组合,例如第一读段和第二读段、第一读段和第二读段的互补序列或者一个读段和另一个读段的互补序列,因为所有这些都提供等效的信息。The present invention provides a method for sequencing a multiplex template to obtain a double-ended read of a target sequence of interest. Sequencing can be performed in any suitable manner. The sequencing method can be a step-by-step sequencing method. The sequencing method can be, for example, sequencing by synthesis (SBS), sequencing by binding (SBB) or sequencing by ligation. Since the multiplex template of the present invention comprises a clonal population of target nucleic acid sequences, the multiplex structure provides a plurality of sites for step-by-step sequencing. A single multiplex molecule has a clonal population of forward and reverse strands of a specific target region of interest. Sequencing can be performed by first introducing a sequencing primer that hybridizes with an adapter region 1. Step-by-step sequencing from the primer provides a sequence read of the first chain. The sequencing process produces a nascent nucleic acid chain complementary to the first chain. This sequencing process can be performed for a desired number of nucleotides. Typically, the read comprises a portion of the sequence of the first chain, but it can comprise the entire length of the first chain. After the first chain is subjected to such sequencing, a second sequencing primer is introduced. The second primer hybridizes with the adapter region 2, and step-by-step sequencing is performed to provide a sequencing read of a partially or fully complementary second chain. Due to the topological structure of the sequencing template, the first read corresponds to the read from the first end of the target sequence, and the second read corresponds to the read from the second end of the target sequence, thereby providing a paired end or double-ended sequence of the target sequence of interest. If the first read is one end of the "sense" strand of the target nucleic acid, the second read will be the opposite end of the "antisense" strand of the target nucleic acid (and vice versa). Base complementarity rules can be used to convert sequences or sequence regions to their complements for sequence analysis. Such analysis can use any desired sequence combination, such as the first read and the second read, the complementary sequence of the first read and the second read, or the complementary sequence of one read and another read, because all of these provide equivalent information.

当进行第二测序过程以产生第二读段时,来自第一测序过程的延伸引物的混杂信号是不理想的。因此,在一些情况下,在进行第二测序过程之前,将在第一测序过程中产生的第一引物和新生链进行去除或阻断。When performing a second sequencing process to generate a second read, the confounding signal from the extended primer of the first sequencing process is undesirable. Therefore, in some cases, the first primer and the nascent chain generated in the first sequencing process are removed or blocked before performing the second sequencing process.

来自第一测序过程的新生链可以通过任何合适的过程而去除。例如,可以将它们降解或者可以通过变性将它们去除。降解可以通过酶解进行,例如通过核酸外切酶处理。在一个实例中,对多联体的5′端进行保护(例如,通过生物素或双生物素部分,多联体通过该生物素或双生物素部分结合到表面上),并且使用5′至3′核酸外切酶来去除新生链。作为另一个实例,可以使用3′至5′核酸外切酶消化来去除新生链;消化时间通常是有限制的,使得短的新生链被去除。虽然在此期间也可以去除一小部分多联体,但通常多联体包含很多的重复单元拷贝,以至于去除一小部分对随后的测序过程几乎没有影响。新生链可以例如通过核酸内切酶处理而切割成可以洗掉的较短碎片。降解也可以通过包含随后切割的核苷酸来实现,例如,通过在测序期间将尿嘧啶残基掺入新生链中,并使用尿嘧啶DNA糖基化酶(UDG)和DNA糖基化酶-裂解酶核酸内切酶VIII、核酸内切酶IV或APE1的混合物进行切割,任选地随后洗涤以去除所得到的碎片。变性可以通过改变任何合适的条件如离子强度和/或温度来进行。可以使用降解和变性的组合。The nascent chains from the first sequencing process can be removed by any suitable process. For example, they can be degraded or they can be removed by denaturation. Degradation can be performed by enzymatic hydrolysis, such as by exonuclease treatment. In one example, the 5' end of the concatemer is protected (for example, by a biotin or bi-biotin portion, through which the concatemer is bound to the surface), and a 5' to 3' exonuclease is used to remove the nascent chains. As another example, 3' to 5' exonuclease digestion can be used to remove the nascent chains; the digestion time is usually limited so that short nascent chains are removed. Although a small part of the concatemer can also be removed during this period, the concatemer usually contains many copies of the repeating unit, so that removing a small part has little effect on the subsequent sequencing process. The nascent chains can be cut into shorter fragments that can be washed away, for example, by endonuclease treatment. Degradation can also be achieved by including nucleotides that are subsequently cleaved, for example, by incorporating uracil residues into the nascent chain during sequencing and cleaving using a mixture of uracil DNA glycosylase (UDG) and DNA glycosylase-lyase endonuclease VIII, endonuclease IV or APE1, optionally followed by washing to remove the resulting fragments. Denaturation can be performed by changing any suitable conditions such as ionic strength and/or temperature. A combination of degradation and denaturation can be used.

在一些情况下,在进行第二测序过程之前,阻断第一测序过程中产生的第一引物和新生链进行进一步测序。存在许多在3′端处阻断DNA链以防止其生成干扰第二测序过程的信号的方法。例如,可以掺入双脱氧核苷酸、3′-阻断的核苷酸(例如,3′-O-叠氮基dNTP)或阻断3′延伸的另一个基团;本领域已知许多这样的基团。参见例如美国专利申请公开US2020/0032322,其描述了可用于阻断新生链3′端的三元复合物抑制剂部分,其出于所有目的通过引用整体并入本文。3′阻断基团可以被称为终止子。在一些情况下,可以使用可逆终止子。例如,在下列中提供了合适的终止子的实例:Chen等人.(2013)“新一代测序技术中使用的可逆终止子的历史和进展(The history and advances of reversibleterminators used in new generations of sequencing technology)”《基因组蛋白质组与生物信息学报(Genomics Proteomics Bioinformatics)》11(1):34-40。在一些情况下,阻断可以是不可逆的。阻断可以根据正在进行的测序类型进行调整。例如,如果第二测序过程中使用的测序方法是通过结合测序,则可能理想的是提供不仅防止延伸,而且防止可能提供错误背景信号的同源核苷酸结合的阻断基团。In some cases, before the second sequencing process is performed, the first primer and the nascent chain generated in the first sequencing process are blocked for further sequencing. There are many methods for blocking the DNA chain at the 3′ end to prevent it from generating signals that interfere with the second sequencing process. For example, a dideoxynucleotide, a 3′-blocking nucleotide (e.g., a 3′-O-azido dNTP), or another group that blocks 3′ extension can be incorporated; many such groups are known in the art. See, for example, U.S. Patent Application Publication US2020/0032322, which describes a ternary complex inhibitor portion that can be used to block the 3′ end of the nascent chain, which is incorporated herein by reference in its entirety for all purposes. The 3′ blocking group can be referred to as a terminator. In some cases, a reversible terminator can be used. For example, examples of suitable terminators are provided in the following: Chen et al. (2013) "The history and advances of reversible terminators used in new generations of sequencing technology" Journal of Genomics Proteomics Bioinformatics 11 (1): 34-40. In some cases, blocking may be irreversible. Blocking may be adjusted depending on the type of sequencing being performed. For example, if the sequencing method used in the second sequencing process is by sequencing by binding, it may be desirable to provide a blocking group that not only prevents extension, but also prevents binding of homologous nucleotides that may provide false background signals.

本发明的方法任选地用于高度多路复用测序。多联测序模板通常固定于例如基材。基材将具有许多可单独分辨的多联测序模板。在一些情况下,使用对测序反应的光学检测。可以使用其它检测方法,例如电子检测方法。可以针对特定应用选择测序模板的数目。在一些情况下,数百万至数十亿个多联测序模板可以在相同基材上并行测序(例如,至少一百万、至少一千万、至少一亿、至少十亿、至少二十亿或至少三十亿个多联模板)。通常,对于此类并行测序,每个多联测序模板将具有与其它多联测序模板相同的第一衔接子区域和第二衔接子区域,而不同多联测序模板中的正向区域和反向区域将代表不同的靶分子。例如,通过将来自感兴趣的生物体的DNA片段化,可以产生此类不同靶分子的文库。本领域已知许多形成文库的方法。由于不同的多联体包含相同的衔接子区域,可以使用一对引物来对不同的靶标进行测序。The method of the present invention is optionally used for highly multiplexed sequencing. Multiple sequencing templates are usually fixed to, for example, substrates. The substrate will have many multiple sequencing templates that can be distinguished individually. In some cases, optical detection of sequencing reactions is used. Other detection methods, such as electronic detection methods, can be used. The number of sequencing templates can be selected for specific applications. In some cases, millions to billions of multiple sequencing templates can be sequenced in parallel on the same substrate (for example, at least one million, at least ten million, at least one hundred million, at least one billion, at least two billion or at least three billion multiple templates). Typically, for such parallel sequencing, each multiple sequencing template will have the first adapter region and the second adapter region identical to other multiple sequencing templates, and the forward region and the reverse region in different multiple sequencing templates will represent different target molecules. For example, by fragmenting the DNA from an organism of interest, a library of such different target molecules can be produced. Many methods of forming libraries are known in the art. Because different concatemers contain the same adapter region, a pair of primers can be used to sequence different targets.

在描述本文所公开的方法的各个方面时,将对附图进行参考。应当理解的是,附图仅仅图示了所公开的方法的具体实施例,并不旨在进行限制。In describing various aspects of the method disclosed herein, reference will be made to the accompanying drawings. It should be understood that the accompanying drawings only illustrate specific embodiments of the disclosed method and are not intended to be limiting.

图1示意性地图示了本发明的方法的实施例。提供了不对称的滚环扩增模板101。环状模板101包含中心靶区域,所述中心靶区域包含自互补的正向(111)和反向(112)区域。模板的一个末端利用第一连接区域113覆盖,并且另一个末端利用第二连接区域114覆盖。第一连接区域113具有不同于第二连接区域114的核酸序列。连接区域可以具有各种功能的序列,如条形码、引发位点、限制性酶切位点、识别位点等。注意,虽然接头或连接区域被示为对应于模板101的单链环区域,但是在许多情况下,接头还将具有延伸到环状核酸101的双链部分中的自互补区域,例如,其中每一个均具有单链环和双链茎的发夹衔接子被连接到双链片段形成环状核酸。(参见,例如图4中所图示的实例。)一个或多个引发位点、条形码和其它有用的序列(或其互补序列)可以被包含在此类衔接子的单链或双链部分(或两者)中,并且因此被包含在所得到的连接区域中。两个接头中的一个(在本实例中为113)将具有引物115与其杂交的滚环扩增起始位点,如滚环扩增引物结合位点。扩增引物结合位点通常但不一定位于连接区域的单链环区域中。Fig. 1 schematically illustrates an embodiment of the method of the present invention. An asymmetric rolling circle amplification template 101 is provided. The circular template 101 comprises a central target region, which comprises a self-complementary forward (111) and reverse (112) region. One end of the template is covered with a first connection region 113, and the other end is covered with a second connection region 114. The first connection region 113 has a nucleic acid sequence different from that of the second connection region 114. The connection region can have sequences of various functions, such as barcodes, priming sites, restriction enzyme sites, recognition sites, etc. Note that although the connector or connection region is shown as corresponding to the single-stranded loop region of the template 101, in many cases, the connector will also have a self-complementary region extending into the double-stranded portion of the circular nucleic acid 101, for example, a hairpin adapter each of which has a single-stranded loop and a double-stranded stem is connected to a double-stranded fragment to form a circular nucleic acid. (See, e.g., the example illustrated in FIG. 4 .) One or more priming sites, barcodes, and other useful sequences (or their complements) may be included in the single-stranded or double-stranded portion (or both) of such adapters, and thus in the resulting junction region. One of the two adapters (in this example, 113) will have a rolling circle amplification initiation site, such as a rolling circle amplification primer binding site, to which primer 115 hybridizes. The amplification primer binding site is typically, but not necessarily, located in the single-stranded loop region of the junction region.

此处,在步骤I中,利用链置换聚合酶如phi29将滚环扩增引物115进行延伸。聚合酶沿着小箭头指示的方向围绕环形模板101重复行进,形成具有下列重复结构的多联体102:第一衔接子区域123-正向链121-第二衔接子区域124-反向链122(从5′至3′读取)。附图示出了两个拷贝的多联体的单元。多联体102末端处的箭头指示通常存在更多拷贝的单元,例如数百至数千个。拷贝数可以是任何合适的数目,例如数十、数百、数千或数万个拷贝。此处,第一衔接子区域123与第一连接区域113互补,正向链121与滚环模板101中的反向链112互补并且因此序列与101的正向链111基本相同,第二衔接子区域124与第二连接区域114互补,并且反向链122与滚环模板101中的正向链111互补并且因此序列与101的反向链112基本相同。虽然靶片段被示为直接连接到与环状核酸的单链环区域互补的片段,但是可以理解的是,在这些片段之间可以有并且通常将会有其它间插序列。例如,在茎环衔接子用于构建模板101的情况下,衔接子的茎的一条链的互补物将存在于环的互补物与所得到的多联体中的正向靶区域之间。如上文所描述,对于滚环模板,可以在结构上使用这些间插序列(例如,自互补区域),或者间插序列可以具有特定的功能,例如切割位点、引物结合位点、识别位点或者包含分子条形码或唯一分子标识符(UMI)的条形码。Here, in step I, the rolling circle amplification primer 115 is extended using a strand displacement polymerase such as phi29. The polymerase repeatedly travels around the circular template 101 in the direction indicated by the small arrow to form a concatemer 102 having the following repeating structure: first adapter region 123-forward strand 121-second adapter region 124-reverse strand 122 (read from 5' to 3'). The accompanying figure shows a unit of a concatemer of two copies. The arrows at the ends of the concatemer 102 indicate that there are usually more copies of the unit, such as hundreds to thousands. The number of copies can be any suitable number, such as tens, hundreds, thousands or tens of thousands of copies. Here, the first adapter region 123 is complementary to the first ligation region 113, the forward strand 121 is complementary to the reverse strand 112 in the rolling circle template 101 and is therefore substantially identical in sequence to the forward strand 111 of 101, the second adapter region 124 is complementary to the second ligation region 114, and the reverse strand 122 is complementary to the forward strand 111 in the rolling circle template 101 and is therefore substantially identical in sequence to the reverse strand 112 of 101. Although the target fragment is shown as being directly linked to a fragment complementary to the single-stranded loop region of the circular nucleic acid, it will be appreciated that there can be and typically will be other intervening sequences between these fragments. For example, where a stem-loop adapter is used to construct the template 101, the complement of one strand of the stem of the adapter will be present between the complement of the loop and the forward target region in the resulting concatemer. As described above, for rolling circle templates, these intervening sequences can be used structurally (e.g., self-complementary regions), or the intervening sequences can have specific functions, such as cleavage sites, primer binding sites, recognition sites, or barcodes comprising molecular barcodes or unique molecular identifiers (UMIs).

这种滚环扩增(RCA)过程可以在溶液中进行,或者可以通过将结合于基材的RCA引物延伸来进行。在RCA引物结合于基材的情况下,产物是通过引物结合于基材的多联体。引物可以共价或非共价地结合于表面,例如,如下文所述或使用本领域公知的技术。如果RCA过程在溶液中进行,产生的多联体可以沉积并固定到基材上,例如如下文所述或使用本领域公知的技术。This rolling circle amplification (RCA) process can be carried out in solution, or can be carried out by extending the RCA primers that are bound to the substrate. In the case where the RCA primers are bound to the substrate, the product is a concatemer that is bound to the substrate by the primers. The primers can be covalently or non-covalently bound to the surface, for example, as described below or using techniques known in the art. If the RCA process is carried out in solution, the concatemers produced can be deposited and fixed to the substrate, for example, as described below or using techniques known in the art.

步骤II示出了第一测序过程,其使用与第一衔接子区域123中的引物结合位点杂交的第一引物(131,虚线)。第一衔接子区域123上的该引物结合位点与被包含在第一连接区域113中但不存在于第二连接区域114中的序列互补。测序过程可以在固定于基材上的一个多联体或多个多联体上进行。测序过程可以是逐步测序过程,如边合成边测序(SBS)、通过结合测序(SBB)或通过连接测序。在图1所图示的示例性实施例中,第一测序引物131延伸形成新生链132,延伸沿着小箭头指示的方向行进。测序过程从测序引物131行进到靶序列122,直至第一测序过程停止。此过程产生第一读段。Step II shows a first sequencing process, which uses a first primer (131, dotted line) that hybridizes to a primer binding site in the first adapter region 123. The primer binding site on the first adapter region 123 is complementary to a sequence contained in the first connection region 113 but not present in the second connection region 114. The sequencing process can be performed on a concatemer or multiple concatemers fixed to a substrate. The sequencing process can be a step-by-step sequencing process, such as sequencing by synthesis (SBS), sequencing by binding (SBB), or sequencing by connection. In the exemplary embodiment illustrated in Figure 1, the first sequencing primer 131 extends to form a nascent chain 132, and the extension proceeds in the direction indicated by the small arrow. The sequencing process proceeds from the sequencing primer 131 to the target sequence 122 until the first sequencing process stops. This process produces a first read segment.

在所述方法的本实施例中,在步骤III中,将通过延伸第一测序过程中产生的引物131而产生的新生链132去除。如本文上文所描述的,可以通过例如变性、降解(例如,通过核酸外切酶)或此类技术的组合来进行去除。In this embodiment of the method, in step III, the nascent chain 132 generated by extending the primer 131 generated in the first sequencing process is removed. As described herein above, the removal can be performed by, for example, denaturation, degradation (e.g., by exonuclease), or a combination of such techniques.

在步骤IV中,第二测序过程从与第二衔接子区域124中的引物结合位点杂交的第二测序引物133(粗线)进行。第二衔接子区域124上的该引物结合位点与被包含在第二接头114中但不存在于第一接头113中的序列互补。第二测序过程通常以与第一测序过程相同的方式进行,但是可以利用不同的过程进行。在图1图示的示例性实施例中,第二测序引物133延伸形成新生链134,延伸沿着小箭头指示的方向行进。进行第二测序过程,直至该过程停止。这产生第二读段。第二读段来自与获得第一读段的链互补的链。与第一读段相比,第二读段从感兴趣的靶核酸的相对端延伸。第一读段和第二读段组合提供感兴趣的靶序列的双端序列。In step IV, a second sequencing process is performed from a second sequencing primer 133 (bold line) that hybridizes to a primer binding site in the second adapter region 124. The primer binding site on the second adapter region 124 is complementary to a sequence contained in the second adapter 114 but not present in the first adapter 113. The second sequencing process is typically performed in the same manner as the first sequencing process, but may be performed using a different process. In the exemplary embodiment illustrated in FIG. 1 , the second sequencing primer 133 is extended to form a nascent chain 134, and the extension proceeds in the direction indicated by the small arrow. The second sequencing process is performed until the process stops. This produces a second read. The second read comes from a chain complementary to the chain from which the first read was obtained. Compared to the first read, the second read extends from the opposite end of the target nucleic acid of interest. The first read and the second read combine to provide a double-ended sequence of the target sequence of interest.

感兴趣的靶序列的长度将根据应用而变化。一个实例是利用来自感兴趣的生物体的DNA靶片段的文库。将不对称发夹添加到靶片段的末端,以产生不对称滚环扩增模板的文库。发夹中的一个具有RCA引物结合位点。对库进行滚环扩增,以形成本发明的多联测序模板集。RCA可以在溶液中进行,然后将多联测序模板集固定到基材上,或者RCA可以对表面结合的引物进行延伸。将多联测序模板集固定到表面上,使得所固定的测序模板的至少一个子集是可单独分辨的。使用本文所描述的逐步测序方法对可单独分辨的测序模板集进行第一测序过程和第二测序过程,以产生第一读段和第二读段。The length of the target sequence of interest will vary depending on the application. An example is a library of DNA target fragments from an organism of interest. Asymmetric hairpins are added to the ends of the target fragments to produce a library of asymmetric rolling circle amplification templates. One of the hairpins has an RCA primer binding site. The library is subjected to rolling circle amplification to form a multiple sequencing template set of the present invention. RCA can be performed in a solution, and then the multiple sequencing template set is fixed to a substrate, or RCA can extend a surface-bound primer. The multiple sequencing template set is fixed to a surface so that at least one subset of the fixed sequencing templates is individually distinguishable. The step-by-step sequencing method described herein is used to perform a first sequencing process and a second sequencing process on a sequencing template set that can be distinguished individually to produce a first read and a second read.

在一些实施例中,第一读段和第二读段不重叠。作为一个实例,靶片段可以为约500个碱基长。第一测序过程和第二测序过程中的每一者可以延伸例如150个碱基。对于靶区域长约500个碱基的测序模板,将对靶标的前150个碱基和后150个碱基进行识别,而不会对中心区域的200个碱基进行识别。在一些实施例中,例如,对于较短的靶区域,第一读段与第二读段之间可以有重叠。例如,如果靶片段为200个碱基长,并且第一读段和第二读段各为150个碱基,则在靶标的中间将有一个100个碱基的部分被测序两次,从而在该区域内提供更高的准确性。In some embodiments, the first read and the second read do not overlap. As an example, the target fragment can be about 500 bases long. Each of the first sequencing process and the second sequencing process can extend, for example, 150 bases. For a sequencing template with a target region length of about 500 bases, the first 150 bases and the last 150 bases of the target will be identified, while the 200 bases in the central region will not be identified. In some embodiments, for example, for a shorter target region, there may be an overlap between the first read and the second read. For example, if the target fragment is 200 bases long, and the first read and the second read are each 150 bases, then a 100 base portion will be sequenced twice in the middle of the target, thereby providing higher accuracy in the region.

第一测序引物和第二测序引物的结合位点通常但不一定位于它们各自衔接子区域的单链环区域。测序引物的结合位点任选地位于尽可能靠近衔接子区域的5′端处(例如,尽可能靠近衔接子区域的单链环部分的5′端),使得测序在过程中尽可能早地行进到靶区域。任选地,条形码或其它序列标签位于衔接子区域的5′端与测序引物结合位点之间,使得根据标签以及靶区域产生序列信息。测序引物结合位点中的一个可以但不必与滚环扩增引物结合位点的互补序列重叠。The binding sites of the first sequencing primer and the second sequencing primer are usually but not necessarily located in the single-stranded loop region of their respective adapter regions. The binding site of the sequencing primer is optionally located as close as possible to the 5' end of the adapter region (e.g., as close as possible to the 5' end of the single-stranded loop portion of the adapter region) so that sequencing proceeds to the target region as early as possible in the process. Optionally, a barcode or other sequence tag is located between the 5' end of the adapter region and the sequencing primer binding site so that sequence information is generated according to the tag and the target region. One of the sequencing primer binding sites can but need not overlap with the complementary sequence of the rolling circle amplification primer binding site.

图2示意性地图示了本发明的方法的另一个实施例。步骤与描述图1所述的步骤类似,但此处不是从第一测序过程中去除新生链132,而是阻断新生链132提供进一步的测序信号,从而允许第二测序过程在不去除该链的情况下进行。步骤I和步骤II基本上如上文针对图1所述的进行。在步骤III中,在从引物131延伸到反向链122的第一测序过程停止后,加入阻断剂X,其防止被阻断的新生链242在随后的第二测序过程期间给出错误的背景信号。除了被阻断的新生链242在第二测序过程期间保持在适当的位置外,步骤IV基本上如上文针对图1所述的进行。本领域中存在许多以这种方式来阻断新生链的已知的方法。存在不可逆的阻断剂,例如,它们会与3′羟基反应以防止进一步延伸。可以掺入双脱氧核苷酸、3′-阻断的核苷酸(例如,3′-O-叠氮基dNTP)或阻断3′延伸的另一个基团,以终止和阻断新生链。结合剂(例如,结合蛋白或抗体)可用于防止延伸。在一些情况下,可以将测序聚合酶以防止其解离和停止延伸的方式禁用,使得其充当阻断剂。虽然在许多实施例中使用不可逆的阻断剂,但在其它实施例中,阻断剂可以是可逆阻断剂。如上所述,在例如Chen等人(同上文)中提供了合适的可逆终止子的实例。对于如SBB的过程,除了防止延伸之外,阻断剂还可用于防止标记的核苷酸类似物对阻断的新生链的末端采样并产生背景信号。这可以利用例如结合部分或大体积的终止基团来实现。参见例如美国专利申请公开US2020/0032322(其出于所有目的通过引用整体并入本文),其描述了用于阻断新生链的3′端的三元复合物抑制剂部分,以及通过抑制三元复合物的形成在通过结合技术进行测序的检查阶段降低背景信号。Fig. 2 schematically illustrates another embodiment of the method of the present invention. The steps are similar to the steps described in Fig. 1, but here, instead of removing the nascent chain 132 from the first sequencing process, the nascent chain 132 is blocked to provide further sequencing signals, thereby allowing the second sequencing process to be performed without removing the chain. Steps I and II are performed substantially as described above for Fig. 1. In step III, after the first sequencing process extending from primer 131 to reverse strand 122 stops, a blocker X is added, which prevents the blocked nascent chain 242 from giving an erroneous background signal during the subsequent second sequencing process. Except that the blocked nascent chain 242 remains in place during the second sequencing process, step IV is performed substantially as described above for Fig. 1. There are many known methods for blocking nascent chains in this way in the art. There are irreversible blockers, for example, they react with 3' hydroxyl to prevent further extension. Dideoxynucleotides, 3'-blocked nucleotides (e.g., 3'-O-azido dNTP) or another group that blocks 3' extension can be incorporated to terminate and block the nascent chain. Binding agents (e.g., binding proteins or antibodies) can be used to prevent extension. In some cases, sequencing polymerase can be disabled in a manner to prevent it from dissociating and stopping extension, so that it acts as a blocker. Although irreversible blockers are used in many embodiments, in other embodiments, blockers can be reversible blockers. As described above, examples of suitable reversible terminators are provided in, for example, Chen et al. (supra). For processes such as SBB, in addition to preventing extension, blockers can also be used to prevent labeled nucleotide analogs from sampling the ends of blocked nascent chains and generating background signals. This can be achieved using, for example, a binding moiety or a large-volume termination group. See, for example, U.S. Patent Application Publication US2020/0032322 (which is incorporated herein by reference in its entirety for all purposes), which describes a ternary complex inhibitor portion for blocking the 3′ end of a nascent chain, and by suppressing the formation of a ternary complex to reduce background signals in the inspection phase of sequencing by binding technology.

通常,首先从第一引物进行一个测序过程以产生第一读段,然后从第二引物进行第二测序过程以产生第二读段。然而,在一些情况下,可以在从第一引物和从第二引物读取之间交替进行。例如,不同的可逆阻断剂可用于第一测序过程和第二测序过程,使得任一过程可根据需要停止和开始。例如,所述方法可用于产生第一读段和第二读段,其中每个读段经过若干个测序过程而不是单个测序过程来测定。如果需要,从第一引物和从第二引物的测序之间的交替可以与每个核苷酸一样频繁地发生。在其它情况下,可以同时进行从第一引物和第二引物的测序。Typically, a sequencing process is first performed from the first primer to produce a first read, and then a second sequencing process is performed from the second primer to produce a second read. However, in some cases, it is possible to alternate between reading from the first primer and reading from the second primer. For example, different reversible blockers can be used for the first sequencing process and the second sequencing process, so that either process can be stopped and started as needed. For example, the method can be used to produce a first read and a second read, wherein each read is determined by several sequencing processes rather than a single sequencing process. If desired, the alternation between sequencing from the first primer and from the second primer can occur as frequently as each nucleotide. In other cases, sequencing from the first primer and the second primer can be performed simultaneously.

在其中同时进行从第一测序引物和第二测序引物测序的一些实施例中,测序过程期间检测到的信号(例如,在SBB反应期间从荧光标记的同源核苷酸类似物的结合中观察到的光学信号)基于其强度进行区分。例如,从第一引物测序产生的信号可以比从第二测序引物测序产生的信号更强(反之亦然)。这可以通过例如以下来完成:以不同的浓度提供第一测序引物和第二测序引物,通过以可延伸寡核苷酸和不可延伸寡核苷酸的混合物的形式提供引物中的一种(从而有效地降低可参与测序反应的引物的浓度),和/或通过使用以显著不同的效率退火至它们各自的衔接子区域的第一引物和第二引物(例如,通过选择引物序列,在引物中包含2′-O-甲基核苷酸或者导致更紧密或更弱的引物结合的其它修饰,和/或通过改变衔接子区域中单链环的大小,较小的环通常降低引物结合效率)。由于对第一读段和第二读段中的核苷酸的识别是根据测序过程期间观察到的信号而确定的,因此可以通过映射到已知的参考序列来促进确定特定信号是通过从第一引物还是从第二引物测序生成的(并且因此确定特定核苷酸是否应该被分配给第一读段或第二读段)。例如,可以基于所观察到的信号生成可能的序列读段集,并将其与已知的参考序列进行比较以获得读段。在逐步测序过程中,可以依次识别占据每个位置的核苷酸。在对正向链和反向链同时测序的情况下,针对每个位置识别两个核苷酸(或者一个核苷酸,如果相同核苷酸在两条链的下一个位置)。因此,对于长度为n的读段,最多生成2n个可能的序列用于与参考进行比较。此类映射在第一测序过程和第二测序过程之间观察到的强度不同的实施例中,以及在不使用强度差异并且通过映射而不是通过映射和观察到的强度的组合确定第一读段或第二读段的碱基分配的实施例中是有用的。In some embodiments in which sequencing from a first sequencing primer and a second sequencing primer is performed simultaneously, signals detected during the sequencing process (e.g., optical signals observed from the binding of fluorescently labeled homologous nucleotide analogs during an SBB reaction) are distinguished based on their intensity. For example, the signal generated by sequencing from the first primer can be stronger than the signal generated by sequencing from the second sequencing primer (and vice versa). This can be accomplished, for example, by providing the first sequencing primer and the second sequencing primer at different concentrations, by providing one of the primers in the form of a mixture of extendable oligonucleotides and non-extendable oligonucleotides (thereby effectively reducing the concentration of primers that can participate in the sequencing reaction), and/or by using first and second primers that anneal to their respective adapter regions with significantly different efficiencies (e.g., by selecting primer sequences, including 2′-O-methyl nucleotides in the primers or other modifications that result in tighter or weaker primer binding, and/or by changing the size of the single-stranded loops in the adapter region, smaller loops generally reducing primer binding efficiency). Since the identification of the nucleotides in the first read and the second read is determined according to the signal observed during the sequencing process, it is possible to facilitate the determination of whether a specific signal is generated by sequencing from the first primer or the second primer (and therefore determine whether a specific nucleotide should be assigned to the first read or the second read) by mapping to a known reference sequence. For example, a possible sequence read set can be generated based on the observed signal and compared with a known reference sequence to obtain a read. In a step-by-step sequencing process, the nucleotides occupying each position can be identified in turn. In the case of simultaneous sequencing of the forward chain and the reverse chain, two nucleotides (or one nucleotide if the same nucleotide is in the next position of the two chains) are identified for each position. Therefore, for a read of length n, at most 2n possible sequences are generated for comparison with a reference. Such mapping is useful in embodiments where the intensity observed between the first sequencing process and the second sequencing process is different, and in embodiments where the intensity difference is not used and the base assignment of the first read or the second read is determined by mapping rather than by a combination of mapping and observed intensity.

图3示出了本发明的双端测序过程如何与对称和不对称滚环扩增模板的统计学混合物一起使用。在一些情况下,理想的是使用导致形成对称和不对称分子的简化过程用于形成不对称的环状RCA模板。本领域已知许多产生这种统计学混合物的方法。一种方法是从靶DNA片段或靶DNA片段的文库开始,并且将两种不同发夹衔接子的混合物连接到末端。当这样做时,产生产物的统计学混合物。例如,如果一个发夹衔接子包括连接区域113,而另一个发夹衔接子包括连接区域114,则当提供大约等量的两种衔接子时,将形成约50%的具有接头113和114两者的不对称环状核酸101与约50%的对称环状核酸(具有两个拷贝的连接区域113的303和具有两个拷贝的连接区域114的304)的混合物。图3示出了这些构建体。虽然使用这种方法会降低产量,但使用此类统计学混合物可以是有用的,例如,在获得高百分比不对称构建体的过程复杂、不方便、不理想或不必要的情况下。Fig. 3 shows how the double-end sequencing process of the present invention is used with the statistical mixture of symmetric and asymmetric rolling circle amplification templates. In some cases, it is desirable to use a simplified process that causes the formation of symmetric and asymmetric molecules for forming asymmetric circular RCA templates. Many methods for producing such statistical mixtures are known in the art. One method is to start from a library of target DNA fragments or target DNA fragments, and a mixture of two different hairpin adapters is connected to the end. When doing so, a statistical mixture of products is produced. For example, if a hairpin adapter includes a connection region 113, and another hairpin adapter includes a connection region 114, then when two adapters of approximately equal amounts are provided, a mixture of about 50% of asymmetric circular nucleic acids 101 with both joints 113 and 114 and about 50% of symmetric circular nucleic acids (303 with two copies of the connection region 113 and 304 with two copies of the connection region 114) will be formed. Fig. 3 shows these constructs. Although using this method can reduce yield, using such statistical mixtures can be useful, for example, in the case where the process of obtaining a high percentage asymmetric construct is complicated, inconvenient, undesirable or unnecessary.

图3示出了不对称滚环扩增模板101正如上文所述那样工作。在该图所示的示例性实施例中,示出了对从第一测序过程得到的新生链进行去除。本文描述的其它方法也将在此背景下起作用,例如,用于阻断而非去除新生链。在图3所示的实施例中,连接区域113具有滚环扩增引发位点,而连接区域114没有。因此,仅含有连接区域114的对称构建体304不会经历滚环扩增,因此不会产生多联测序模板。对称构建体303具有连接区域113,并且因此在两端具有RCA引发位点。这些构建体可以如步骤IA中所示经历RCA扩增以产生多联体305。因为多联体305由对称构建体303产生,所以它仅包含第一衔接子区域123。因此,它包含延伸到反向区域122的第一测序引物131的结合位点,并且它还包含延伸到正向区域121的第一测序引物131的结合位点。因此,当在多联体305上进行第一测序过程时,将存在卷积信号;第一测序引物131的延伸会同时将碱基顺序添加到两个不同的序列(新生链132和352)中,因此在不使用另外的信息的情况下,使用来自该多联体的信息将很难甚至不可能提供序列信息,如下文所述。当进行第二测序过程时,将不存在来自多联测序模板305的信号,因为它不含有第二测序引物133的结合位点。虽然这些多联模板在没有另外的操作的情况下不会产生有用的信号,但是通过它们的特征(例如,第一测序过程的卷积信号和/或第二测序过程的信号的缺失)来识别这些模板并将它们从分析中去除是简单的。因此,虽然对称RCA模板的存在将影响理想多联体和可用序列数据的产生,但仍然会从由不对称构建体101产生的那些多联模板102获得有用的双端测序数据。然而,在一些情况下,可以将来自由对称构建体产生的模板如305的卷积信号用于序列测定。可以使用上文详述的映射到已知参考序列来获得第一读段和第二读段(例如,从可能的读段的集中,该可能的读段的集包含逐步测序过程期间在每个位置识别的一个或两个核苷酸的所有可能组合)。Fig. 3 shows that asymmetric rolling circle amplification template 101 works as described above. In the exemplary embodiment shown in the figure, it is shown that the nascent chain obtained from the first sequencing process is removed. Other methods described herein will also work in this context, for example, for blocking rather than removing nascent chains. In the embodiment shown in Fig. 3, the connection region 113 has a rolling circle amplification initiation site, while the connection region 114 does not. Therefore, the symmetric construct 304 containing only the connection region 114 will not undergo rolling circle amplification, and therefore will not produce a multiple sequencing template. The symmetric construct 303 has the connection region 113, and therefore has RCA initiation sites at both ends. These constructs can undergo RCA amplification to produce multiple bodies 305 as shown in step IA. Because the multiple body 305 is produced by the symmetric construct 303, it only includes the first adapter region 123. Therefore, it includes the binding site of the first sequencing primer 131 extending to the reverse region 122, and it also includes the binding site of the first sequencing primer 131 extending to the forward region 121. Therefore, when the first sequencing process is performed on the concatemer 305, there will be a convolution signal; the extension of the first sequencing primer 131 will add the base sequence to two different sequences (nascent chains 132 and 352) at the same time, so it will be difficult or even impossible to provide sequence information using the information from the concatemer without using additional information, as described below. When the second sequencing process is performed, there will be no signal from the concatemer sequencing template 305 because it does not contain the binding site of the second sequencing primer 133. Although these concatemer templates will not produce useful signals without additional operations, it is simple to identify these templates and remove them from the analysis by their characteristics (e.g., the convolution signal of the first sequencing process and/or the absence of the signal of the second sequencing process). Therefore, although the presence of symmetrical RCA templates will affect the generation of ideal concatemers and available sequence data, useful double-end sequencing data will still be obtained from those concatemer templates 102 generated by asymmetric constructs 101. However, in some cases, the convolution signal of the template such as 305 generated by the symmetrical construct can be used for sequence determination. The first read and the second read can be obtained using the mapping to a known reference sequence detailed above (eg, from a set of possible reads containing all possible combinations of one or two nucleotides identified at each position during the stepwise sequencing process).

显然,虽然在许多情况下使用不对称构建体来实现本文详述的双端测序是优选的,但在某些情况下不需要使用不对称构建体。例如,在参考序列可用的情况下(例如,在靶或靶核酸序列池代表特定区域或感兴趣区域的靶向测序方法中),可以使用对称构建体。在此类实施例中,产生对称构建体,进行滚环扩增以产生类似于305的多联体,并且从与多联体中存在的单一衔接子区域互补的单一引物进行测序。然后,映射到已知的参考序列可以产生所期望的序列读段。Obviously, although in many cases it is preferred to use an asymmetric construct to achieve the double-end sequencing described in detail herein, in some cases it is not necessary to use an asymmetric construct. For example, in the case where a reference sequence is available (e.g., in a targeted sequencing method where a target or target nucleic acid sequence pool represents a specific region or region of interest), a symmetric construct can be used. In such embodiments, a symmetric construct is generated, a rolling circle amplification is performed to generate a concatemer similar to 305, and sequencing is performed from a single primer complementary to a single adapter region present in the concatemer. Then, mapping to a known reference sequence can generate a desired sequence read.

虽然在某些实施例中可以使用对称构建体或者对称构建体与不对称构建体的混合物,但在许多实施例中,不对称构建体是优选的。图4中示意性地图示了在本发明的方法中用作滚环扩增模板的不对称核酸的合适制备方法。为了产生环状核酸分子401,提供两个不同的茎环衔接子(463和464)并将其连接到包含正向链411和反向链412的双链靶核酸片段466的相对末端上。因此,不对称环状核酸401包含将正向靶链411和反向靶链412的末端彼此连接的第一连接区域413(通过衔接子463提供)和第二连接区域414(通过衔接子464提供)。在本实例中,中心区域467(其包含靶核酸序列的正向链411和反向链412)最初是双链的。连接区域413和414各自包含单链环以及邻近中心靶区域的双链部分。如上文所详述,各种有用的序列(或其互补序列)可以被包含在连接区域中,如引物结合位点、条形码或唯一分子标识符(UMI)、限制性酶切位点、识别位点等。Although a symmetrical construct or a mixture of a symmetrical construct and an asymmetrical construct can be used in certain embodiments, an asymmetrical construct is preferred in many embodiments. A suitable preparation method of an asymmetrical nucleic acid used as a rolling circle amplification template in the method of the present invention is schematically illustrated in FIG. 4. In order to produce a circular nucleic acid molecule 401, two different stem-loop adapters (463 and 464) are provided and connected to the opposite ends of a double-stranded target nucleic acid fragment 466 comprising a forward strand 411 and a reverse strand 412. Therefore, an asymmetric circular nucleic acid 401 comprises a first connection region 413 (provided by adapter 463) and a second connection region 414 (provided by adapter 464) connecting the ends of a forward target strand 411 and a reverse target strand 412 to each other. In the present example, a central region 467 (which comprises the forward strand 411 and the reverse strand 412 of the target nucleic acid sequence) is initially double-stranded. Connection regions 413 and 414 each comprise a single-stranded loop and a double-stranded portion adjacent to the central target region. As detailed above, various useful sequences (or their complementary sequences) can be included in the ligation region, such as primer binding sites, barcodes or unique molecular identifiers (UMIs), restriction enzyme sites, recognition sites, etc.

在本实例中,将滚环扩增引物与连接区域413中的引物结合位点杂交。通过链置换聚合酶465进行的引物延伸将模板401转变成开放的环状形式,并且产生核酸多联体402,如图4所示。多联体402包含多重顺序拷贝的:第一衔接子区域423(其与第一连接区域413互补)、靶核酸序列的正向链421(其与环状模板401的反向链412互补)、第二衔接子区域424(其与第二连接区域414互补,并且因此不同于第一衔接子区域423,因为连接区域414和413彼此不同)以及靶核酸序列的反向链422(其与401的正向链411以及与正向链421互补)。In this example, a rolling circle amplification primer is hybridized to a primer binding site in the linker region 413. Primer extension by a strand displacement polymerase 465 converts the template 401 into an open circular form and generates a nucleic acid concatemer 402, as shown in FIG4. The concatemer 402 comprises multiple sequential copies of: a first adapter region 423 (which is complementary to the first linker region 413), a forward strand 421 of the target nucleic acid sequence (which is complementary to the reverse strand 412 of the circular template 401), a second adapter region 424 (which is complementary to the second linker region 414 and is therefore different from the first adapter region 423 because the linker regions 414 and 413 are different from each other) and a reverse strand 422 of the target nucleic acid sequence (which is complementary to the forward strand 411 of 401 and to the forward strand 421).

在图4所图示的示例性实施例中,将滚环扩增引物与连接区域41 3中的引物结合位点杂交。第一衔接子区域423的一部分(其大小取决于滚环扩增引物结合位点在连接区域413的单链环中所处的位置)、接着是正向链421因此出现在所得到的多联体的5′端。显然,引物结合位点可以改为位于连接区域414中。在此类实施例中,第二衔接子区域424的一部分、接着是反向链422出现在所得到的多联体的5′端。由于它们紧接着是第一衔接子区域423、正向链421、第二衔接子区域424、反向链422等等,因此显然,这两种多联体可以被视为含有相同的内部重复单元。也就是说,无论滚环扩增引物结合位点被设计到哪个连接区域中(并因此首先产生哪个靶链),所得到的多联体仍然包含多重顺序拷贝的包含以下的重复单位:第一衔接子区域、正向链、第二衔接子区域和反向链。如上所述,将靶链指定为正向和反向纯粹是为了方便提及两条互补链。因此对首先扩增哪条链没有任何限制。将衔接子区域、连接区域等指定为第一和第二同样纯粹是为了方便提及这些要素。In the exemplary embodiment illustrated in FIG. 4 , the rolling circle amplification primer is hybridized to the primer binding site in the connection region 413. A portion of the first adapter region 423 (whose size depends on the position of the rolling circle amplification primer binding site in the single-stranded loop of the connection region 413), followed by the forward strand 421, thus appears at the 5′ end of the resulting concatemer. Obviously, the primer binding site can be located in the connection region 414 instead. In such embodiments, a portion of the second adapter region 424, followed by the reverse strand 422, appears at the 5′ end of the resulting concatemer. Since they are followed by the first adapter region 423, the forward strand 421, the second adapter region 424, the reverse strand 422, and so on, it is obvious that these two concatemers can be regarded as containing the same internal repeating unit. That is, no matter which connection region the rolling circle amplification primer binding site is designed into (and therefore which target strand is generated first), the resulting concatemer still contains multiple sequential copies of the following repeating units: the first adapter region, the forward strand, the second adapter region, and the reverse strand. As described above, the designation of the target strands as forward and reverse is purely for convenience in referring to the two complementary strands. There is therefore no restriction on which strand is amplified first. The designation of the adapter regions, linker regions, etc. as first and second is also purely for convenience in referring to these elements.

多联体可以,例如,在有序阵列或此类多联体的无序分布中的可分辨点或位置处被固定在表面上,例如固定在玻片、芯片、流动池的表面或其它合适的基材上。给定多联体在有序或无序阵列中的位置可以是预定的或者随机的。多联体可以被固定在例如基材的平坦表面上、基材的非平坦表面上或者以三维方式固定在基材内,如在凝胶基质内。在基材上的固定可以以任何合适的方式进行。在一些实施例中,在表面上产生多联体。例如,滚环扩增引物可以被固定在表面上,例如共价地、通过将引物上的生物素或双生物素基团经由亲和素或链霉亲和素与结合表面的生物素或双生物素结合、通过与被固定在表面上的互补寡核苷酸杂交或者通过本领域已知的用于产生寡核苷酸的阵列或以其它方式固定寡核苷酸的任何其它技术。(参见,例如美国专利6,274,320,其描述了用于在固体支持物上产生附接位点阵列并将引物附接到其上的技术。)然后引物延伸导致多联体固定在该位置处。在其它实施例中,多联体在溶液中产生,并且然后通过本领域已知的用于产生核酸的阵列或以其它方式固定核酸的任何技术来固定。例如,可以在溶液中延伸生物素化的滚环扩增引物,并且然后可以通过亲和素或链霉亲和素将所得到的生物素化多联体固定在生物素化的表面上。作为另一个实例,多联体可以通过与结合到表面的一个或多个寡核苷酸杂交来固定。作为又另一个实例,多联体可以通过与带正电的表面静电相互作用来固定。作为又另一个实例,多联体可以共价附接到表面,例如,通过滚环扩增引物和/或多联体本身中所包含的一个或多个官能团;例如,可以将点击化学基团包含在引物中的一个或多个核苷酸中和/或在滚环扩增期间掺入到多联体中的一个或多个核苷酸中。参见,例如,美国专利申请序列号17/575,094,其描述了示例性的合适的表面和附接到其上的多联体,并且其出于所有目的通过引用整体并入本文。在多联体群体被固定在表面上的情况下,可以预先选择它们的位置,使得它们相距足够的距离,从而在测序过程期间针对给定多联体生成的信号能与由其它多联体生成的信号分辨开。类似地,在多联体群体被固定在表面上随机确定的位置的情况下,可以对它们的密度进行控制,使得至少一些多联体相距足够的距离,从而在测序过程期间针对给定多联体生成的信号能与由其它多联体生成的信号分辨开(例如,通过在沉积之前稀释多联体和/或通过限制表面上可用附接位点的密度)。Concatemers can, for example, be fixed on the surface at distinguishable points or positions in the disordered distribution of ordered arrays or such concatemers, for example, fixed on the surface of a slide, a chip, a flow cell or other suitable substrates. The position of a given concatemer in an ordered or disordered array can be predetermined or random. Concatemers can be fixed on, for example, a flat surface of a substrate, a non-flat surface of a substrate or fixed in a substrate in a three-dimensional manner, such as in a gel matrix. Fixation on substrates can be carried out in any suitable manner. In certain embodiments, concatemers are produced on the surface. For example, rolling circle amplification primers can be fixed on the surface, for example, covalently, by combining the biotin or double biotin groups on the primers via avidin or streptavidin with the biotin or double biotin of the binding surface, by hybridizing with complementary oligonucleotides fixed on the surface or by arrays known in the art for producing oligonucleotides or otherwise fixing any other technology of oligonucleotides. (See, e.g., U.S. Pat. No. 6,274,320, which describes techniques for producing an array of attachment sites on a solid support and attaching primers thereto.) Primer extension then results in the fixation of the concatemer at that location. In other embodiments, the concatemer is produced in solution and then fixed by any technique known in the art for producing an array of nucleic acids or otherwise fixing nucleic acids. For example, a biotinylated rolling circle amplification primer can be extended in solution, and the resulting biotinylated concatemer can then be fixed on a biotinylated surface by avidin or streptavidin. As another example, the concatemer can be fixed by hybridizing with one or more oligonucleotides bound to a surface. As yet another example, the concatemer can be fixed by electrostatic interactions with a positively charged surface. As yet another example, the concatemer can be covalently attached to a surface, e.g., by one or more functional groups contained in a rolling circle amplification primer and/or the concatemer itself; for example, a click chemistry group can be included in one or more nucleotides in a primer and/or incorporated into one or more nucleotides in a concatemer during rolling circle amplification. See, e.g., U.S. Patent Application Serial No. 17/575,094, which describes exemplary suitable surfaces and concatemers attached thereto, and which is incorporated herein by reference in its entirety for all purposes. In the case where a concatemer population is fixed on a surface, their positions can be preselected so that they are at a sufficient distance apart so that the signal generated for a given concatemer during the sequencing process can be distinguished from the signal generated by other concatemers. Similarly, in the case where a concatemer population is fixed at a randomly determined position on a surface, their density can be controlled so that at least some concatemers are at a sufficient distance apart so that the signal generated for a given concatemer during the sequencing process can be distinguished from the signal generated by other concatemers (e.g., by diluting the concatemers before deposition and/or by limiting the density of available attachment sites on the surface).

在某些条件下,多联体中的正向链和反向链在它们产生时会相互杂交,使得多联体包含由与滚环扩增模板中的单链环互补的单链区域间隔开的双链区域。因此,在第一测序过程和第二测序过程中使用链置换聚合酶可能是有利的。(此聚合酶可以与在前述滚环扩增反应中使用的任何链置换聚合酶相同或不同。)也可以使用通过阻止或消除多联体中正向链和反向链的杂交来减少二级结构形成的技术,代替使用链置换聚合酶或与链置换聚合酶结合使用。例如,单链DNA结合蛋白(例如,大肠杆菌单链DNA结合蛋白)可以以,例如,在滚环扩增和/或测序过程期间足以维持单链形式的正向链和反向链的量提供。本领域已知各种合适的单链结合蛋白(SSB),也已知其纯化技术。此外,单链DNA结合蛋白,例如来自大肠杆菌的单链DNA结合蛋白,可从供应商如赛默飞世尔科技公司(Thermo FisherScientific)和AS One International公司商购获得。Under certain conditions, the forward strand and reverse strand in the concatemer will hybridize with each other when they are produced, so that the concatemer contains double-stranded regions separated by single-stranded regions complementary to the single-stranded loop in the rolling circle amplification template. Therefore, it may be advantageous to use a strand displacement polymerase in the first sequencing process and the second sequencing process. (This polymerase may be the same or different from any strand displacement polymerase used in the aforementioned rolling circle amplification reaction.) It is also possible to use a technique to reduce the formation of secondary structures by preventing or eliminating the hybridization of the forward strand and the reverse strand in the concatemer, instead of using a strand displacement polymerase or in combination with a strand displacement polymerase. For example, a single-stranded DNA binding protein (e.g., an Escherichia coli single-stranded DNA binding protein) can be provided in an amount sufficient to maintain the forward strand and the reverse strand in a single-stranded form during rolling circle amplification and/or sequencing processes. Various suitable single-stranded binding proteins (SSBs) are known in the art, and purification techniques thereof are also known. In addition, single-stranded DNA binding proteins, such as single-stranded DNA binding proteins from Escherichia coli, are commercially available from suppliers such as Thermo Fisher Scientific and AS One International.

作为另一个实例,可以产生与靶链中的一条互补的掩蔽链,以确保另一条链呈单链形式,因此易于用缺乏或具有弱链置换活性的聚合酶进行测序。图5示意性地图示了本发明的方法的实施例,其中使用掩蔽链来促进利用缺乏强的链置换活性的聚合酶进行测序。提供不对称的环状滚环扩增模板501。环状模板501包含中心靶标区域,所述中心靶标区域包含自互补的正向(511)和反向(512)区域。模板的一个末端利用第一连接区域513覆盖,而另一个末端利用第二连接区域514覆盖。第一连接区域513具有不同于第二连接区域514的核酸序列。连接区域可以具有各种功能的序列,如条形码、引发位点、限制性酶切位点、识别位点等。注意,虽然接头或连接区域被示为对应于模板501的单链环区域,但是在许多情况下,接头还将具有延伸到环状核酸501的双链部分中的自互补区域,例如,其中每一个均具有单链环和双链茎的发夹衔接子被连接到双链片段以形成环状核酸。(参见,例如图4中所图示的实例。)一个或多个引发位点、条形码和其它有用的序列(或其互补序列)可以被包含在此类衔接子的单链或双链部分(或两者)中,并且因此被包含在所得到的连接区域中。两个接头中的一个(在本实例中为513)具有引物515与其杂交的滚环扩增起始位点,如滚环扩增引物结合位点。扩增引物结合位点通常但不一定位于连接区域的单链环区域中。As another example, a masked strand complementary to one of the target strands can be generated to ensure that the other strand is in a single-stranded form, so that it is easy to sequence with a polymerase that lacks or has a weak strand displacement activity. Figure 5 schematically illustrates an embodiment of the method of the present invention, in which a masked strand is used to facilitate sequencing using a polymerase that lacks a strong strand displacement activity. An asymmetric circular rolling circle amplification template 501 is provided. The circular template 501 includes a central target region, which includes a self-complementary forward (511) and reverse (512) region. One end of the template is covered with a first connection region 513, and the other end is covered with a second connection region 514. The first connection region 513 has a nucleic acid sequence different from that of the second connection region 514. The connection region can have sequences of various functions, such as barcodes, priming sites, restriction enzyme sites, recognition sites, etc. Note that although the joint or connection region is shown as a single-stranded loop region corresponding to template 501, in many cases, the joint will also have a self-complementary region extending into the double-stranded portion of the circular nucleic acid 501, for example, each of which has a hairpin adapter with a single-stranded loop and a double-stranded stem is connected to a double-stranded fragment to form a circular nucleic acid. (See, for example, the example illustrated in Fig. 4.) One or more priming sites, barcodes and other useful sequences (or their complementary sequences) can be included in the single-stranded or double-stranded portion (or both) of such adapters, and are therefore included in the resulting connection region. One of the two joints (513 in this example) has a rolling circle amplification start site that primer 515 hybridizes with it, such as a rolling circle amplification primer binding site. The amplification primer binding site is usually but not necessarily located in the single-stranded loop region of the connection region.

此处,在步骤I中,利用链置换聚合酶如phi29将滚环扩增引物515进行延伸。聚合酶沿着小箭头指示的方向围绕环形模板501重复行进,形成具有下列重复结构的多联体502:第一衔接子区域523-正向链521-第二衔接子区域524-反向链522。图中仅示出了多联体中间的一小部分;多联体502两端的箭头指示通常存在许多拷贝的重复单元,例如数百到数千个。拷贝数可以是任何合适的数目,例如数十、数百、数千或数万个拷贝。此处,第一衔接子区域523与第一连接区域513互补,正向链521与滚环模板501中的反向链512互补并且因此序列与501的正向链511基本相同,第二衔接子区域524与第二连接区域514互补,并且反向链522与滚环模板501中的正向链511互补并且因此序列与501的反向链512基本相同。虽然靶片段被示为直接连接到与环状核酸的单链环区域互补的片段,但是可以理解的是,在这些片段之间可以有并且通常将会有其它间插序列。例如,在茎环衔接子用于构建模板501的情况下,衔接子的茎的一条链的互补物将存在于环的互补物与所得到的多联体中的正向靶区域之间。如上文所描述,对于滚环模板,可以在结构上使用这些间插序列(例如,自互补区域),或者间插序列可以具有特定的功能,例如切割位点、引物结合位点、识别位点或者包含分子条形码或唯一分子标识符(UMI)的条形码。Here, in step I, the rolling circle amplification primer 515 is extended using a strand displacement polymerase such as phi29. The polymerase repeatedly moves around the circular template 501 in the direction indicated by the small arrow to form a concatemer 502 having the following repeating structure: the first adapter region 523-forward chain 521-second adapter region 524-reverse chain 522. Only a small part of the middle of the concatemer is shown in the figure; the arrows at both ends of the concatemer 502 indicate that there are usually many copies of the repeating unit, such as hundreds to thousands. The number of copies can be any suitable number, such as tens, hundreds, thousands or tens of thousands of copies. Here, the first adapter region 523 is complementary to the first connection region 513, the forward chain 521 is complementary to the reverse chain 512 in the rolling circle template 501 and the sequence is therefore substantially the same as the forward chain 511 of 501, the second adapter region 524 is complementary to the second connection region 514, and the reverse chain 522 is complementary to the forward chain 511 in the rolling circle template 501 and the sequence is therefore substantially the same as the reverse chain 512 of 501. Although the target fragment is shown as being directly connected to the fragment complementary to the single-stranded loop region of the circular nucleic acid, it is understood that there can be and typically will be other intervening sequences between these fragments. For example, where a stem-loop adapter is used to construct template 501, the complement of one strand of the stem of the adapter will be present between the complement of the loop and the forward target region in the resulting concatemer. As described above, for rolling circle templates, these intervening sequences (e.g., self-complementary regions) can be used structurally, or the intervening sequences can have specific functions, such as cleavage sites, primer binding sites, recognition sites, or barcodes comprising molecular barcodes or unique molecular identifiers (UMIs).

这种滚环扩增(RCA)过程可以在溶液中进行,或者可以通过将结合于基材的RCA引物延伸来进行。在RCA引物结合于基材的情况下,产物是通过引物结合于基材的多联体。引物可以共价或非共价地结合于表面,例如,如本文所述或使用本领域公知的技术。如果RCA过程在溶液中进行,产生的多联体可以沉积并固定在基材上,例如如本文所述或使用本领域公知的技术。This rolling circle amplification (RCA) process can be carried out in solution, or can be carried out by extending the RCA primers bound to the substrate. In the case where the RCA primers are bound to the substrate, the product is a concatemer bound to the substrate by the primers. The primers can be covalently or non-covalently bound to the surface, for example, as described herein or using techniques known in the art. If the RCA process is carried out in solution, the concatemers produced can be deposited and fixed on the substrate, for example, as described herein or using techniques known in the art.

在步骤II中,掩蔽引物575与第二衔接子区域524中的结合位点杂交。第二衔接子区域524上的该引物结合位点与被包含在第二连接区域514中但不在第一连接区域513中的序列互补。在步骤III中,掩蔽引物575通常通过链置换聚合酶来延伸,以形成与正向链521互补的第一掩蔽链576。掩蔽引物575的延伸在其行进通过整个第一衔接子区域523之前停止,使得所得到的第一掩蔽链576不与整个第一衔接子区域523互补。在图5所图示的方案中,在第一衔接子区域523中由X指示的位置处停止延伸。可以使用多种方法中的任何一种来阻断延伸,如果需要,可以将它们组合。示例性方法在下文进行详述。此外,对于这些示例性方法中的任何一种,任选地可以在低温下进行延伸和/或可以使用具有相对弱的链置换活性的聚合酶来促进在期望的点停止延伸。In step II, masked primer 575 hybridizes with a binding site in second adapter region 524. The primer binding site on second adapter region 524 is complementary to a sequence contained in second connection region 514 but not in first connection region 513. In step III, masked primer 575 is typically extended by a strand displacement polymerase to form a first masked strand 576 complementary to forward strand 521. The extension of masked primer 575 stops before it travels through the entire first adapter region 523, so that the resulting first masked strand 576 is not complementary to the entire first adapter region 523. In the scheme illustrated in FIG5 , extension is stopped at a position indicated by X in the first adapter region 523. Any of a variety of methods can be used to block extension, and they can be combined if necessary. Exemplary methods are described in detail below. In addition, for any of these exemplary methods, extension can optionally be performed at low temperatures and/or a polymerase with relatively weak strand displacement activity can be used to facilitate stopping extension at a desired point.

在一种示例性方法中,可以通过将阻断链置换的寡核苷酸杂交到第一衔接子区域的单链环区域来停止延伸。能够通过与其结合位点强杂交来阻断链置换的示例性寡核苷酸包含例如包括锁核酸(LNA)和/或肽核酸(PNA)残基的寡聚体。其它示例性的阻断性寡核苷酸可以与第一衔接子区域形成一个或多个链间交联。例如,阻断性寡核苷酸可以包含至少一个5-溴-脱氧尿苷(可从例如集成DNA技术公司(Integrated DNA Technologies,Inc.)获得),其在暴露于紫外光时与第一衔接子区域形成交联。In an exemplary method, extension can be stopped by hybridizing an oligonucleotide blocking strand displacement to a single-stranded loop region of the first adapter region. Exemplary oligonucleotides capable of blocking strand displacement by strong hybridization with its binding site include, for example, oligomers comprising locked nucleic acid (LNA) and/or peptide nucleic acid (PNA) residues. Other exemplary blocking oligonucleotides can form one or more interchain crosslinks with the first adapter region. For example, a blocking oligonucleotide can include at least one 5-bromo-deoxyuridine (available from, for example, Integrated DNA Technologies, Inc.), which forms a crosslink with the first adapter region when exposed to ultraviolet light.

作为另一个实例,延伸可以通过在第一衔接子区域中包含至少一个非天然核苷酸并且在排除所述至少一个非天然核苷酸的互补物的条件下延伸掩蔽引物而停止。本领域已知多种不能有效地与天然碱基进行碱基配对的非天然碱基,并且可以在本发明的实践中使用。例如,包含异胞嘧啶(isoC)的一个或多个核苷酸残基可以被包含在用于制备滚环扩增模板的茎环衔接子中的一个的环区域中,使得所得到的滚环扩增模板的第一连接区域包含一个或多个isoC。将滚环扩增引物在包含包括有异鸟嘌呤(isoG)的核苷酸的混合物中延伸,使得所得到的多联体包含一个或多个isoG。当延伸掩蔽引物时,反应混合物中不提供isoC,因此延伸不能行进通过由isoG占据的模板位置。As another example, extension can be stopped by including at least one non-natural nucleotide in the first adapter region and extending the masked primer under conditions that exclude the complement of the at least one non-natural nucleotide. A variety of non-natural bases that cannot effectively base pair with natural bases are known in the art and can be used in the practice of the present invention. For example, one or more nucleotide residues containing isocytosine (isoC) can be included in the loop region of one of the stem-loop adapters used to prepare the rolling circle amplification template, so that the first connection region of the resulting rolling circle amplification template contains one or more isoC. The rolling circle amplification primer is extended in a mixture containing nucleotides including isoguanine (isoG) so that the resulting concatemer contains one or more isoG. When the masked primer is extended, isoC is not provided in the reaction mixture, so the extension cannot proceed through the template position occupied by isoG.

作为又另一个实例,在延伸掩蔽引物之前,可以向第一衔接子区域中引入切口,以在切口处停止延伸。可以在掩蔽引物与多联体杂交之后或更通常在其之前进行切刻。可以例如使用具有特异性识别位点的核酸内切酶来引入切口。本领域已知多种合适的位点特异性核酸内切酶,并且可以例如从纽英伦生物技术公司(New England Biosciences,Inc.)商购获得。作为一个实例,可以将切刻核酸内切酶的识别位点设计在第一衔接子区域的双链茎部分,或者可以将寡核苷酸与第一衔接子区域的单链环部分杂交以产生切刻核酸内切酶的识别位点。可以使用仅水解双链体的一条链的改变的限制性内切酶或天然存在的切刻核酸内切酶。作为另一个实例,可以将寡核苷酸与第一衔接子区域的单链环部分杂交以产生切割两条链的核酸内切酶的识别位点(切割寡核苷酸并在衔接子区域留下切口)。例如,与第一衔接子区域的单链部分互补的甲基化寡核苷酸可以产生供FspEI(纽英伦生物技术公司)切割的位点。作为又另一个实例,Tth Argonaute(纽英伦生物技术公司)可以与5′-磷酸化单链DNA引导子一起使用,以在具有与所述引导子互补的链的衔接子区域的部分中引入切口。As another example, before extending the masking primer, a nick can be introduced into the first adapter region to stop extension at the nick. The masking primer can be nicked after hybridization with the concatemer or more usually before it. The nick can be introduced, for example, using an endonuclease with a specific recognition site. A variety of suitable site-specific endonucleases are known in the art and can be commercially available, for example, from New England Biotechnology Company (New England Biosciences, Inc.). As an example, the recognition site of the nicking endonuclease can be designed in the double-stranded stem portion of the first adapter region, or the oligonucleotide can be hybridized with the single-stranded loop portion of the first adapter region to produce the recognition site of the nicking endonuclease. Only a restriction endonuclease or a naturally occurring nicking endonuclease that hydrolyzes only one chain of the duplex can be used. As another example, the oligonucleotide can be hybridized with the single-stranded loop portion of the first adapter region to produce a recognition site for the endonuclease that cuts two chains (cutting the oligonucleotide and leaving a nick in the adapter region). For example, a methylated oligonucleotide complementary to the single-stranded portion of the first adaptor region can create a site for cleavage by FspEI (New England Biotech). As yet another example, Tth Argonaute (New England Biotech) can be used with a 5′-phosphorylated single-stranded DNA guide to introduce a nick in the portion of the adaptor region having a strand complementary to the guide.

在一些实施例中,例如,在利用其中检测来自与模板中下一个可用碱基结合的荧光标记的同源核苷酸的信号以识别序列中下一个正确的核苷酸的SBB技术进行测序的情况下(特别是在在流动池或其它基材表面的不同位置同时对多个靶进行测序的情况下),使切刻多联体产生的片段彼此接近可以增加从该片段簇(即,针对该特定靶核酸)可观察到的信号强度。通过切刻多联体产生的片段任选地通过将与DNA折纸术中使用的订书钉型寡核苷酸类似的一个或多个订书钉型寡核苷酸与片段杂交而保持在一起。此类杂交可以桥接不同的片段,使由给定多联体产生的片段紧密聚集。In some embodiments, for example, in the case of sequencing using an SBB technique in which a signal from a fluorescently labeled homologous nucleotide bound to the next available base in the template is detected to identify the next correct nucleotide in the sequence (particularly in the case of sequencing multiple targets simultaneously at different locations on the surface of a flow cell or other substrate), bringing the fragments produced by the nicked concatemers close to each other can increase the signal intensity observable from the cluster of fragments (i.e., for the specific target nucleic acid). The fragments produced by the nicked concatemers are optionally held together by hybridizing one or more staple oligonucleotides similar to those used in DNA origami to the fragments. Such hybridization can bridge different fragments, allowing the fragments produced by a given concatemer to be tightly aggregated.

每个订书钉型寡核苷酸能够同时与两个或更多个片段(直接或间接)结合,以将它们保持彼此接近。在各种示例性的设计中,订书钉型寡核苷酸可以直接与两个或更多个片段的衔接子区域杂交、可以与一个片段和另一个订书钉型寡核苷酸杂交和/或可以与一个片段杂交并与中间体结合。例如,订书钉型寡核苷酸的一部分(例如,订书钉型寡核苷酸的一个末端)可以与第一衔接子区域的单链部分杂交,而订书钉型寡核苷酸的另一部分(例如,其另一个末端)可以与第二衔接子区域的单链部分杂交,例如在不同的片段上。作为另一个实例,订书钉型寡核苷酸的一部分(例如,订书钉型寡核苷酸的一个末端)可以与第一衔接子区域的单链部分杂交,而订书钉型寡核苷酸的另一部分(例如,其另一个末端)可以与另一种情况下的第一衔接子区域的单链部分杂交;第一衔接子区域的与订书钉型寡核苷酸杂交的部分任选地具有相同的序列。类似地,一个订书钉型寡核苷酸可以与两种不同情况下的第二衔接子区域的单链部分杂交。在某些方面,订书钉型寡核苷酸的部分可以相互杂交或与中间体杂交。例如,两个或更多个订书钉型寡核苷酸的部分可以各自与相同的单个中间体寡核苷酸或不同的中间体寡核苷酸杂交,例如由颗粒呈递的中间体寡核苷酸。可以将订书钉型寡核苷酸官能化(例如,在其3′端和/或5′端和/或内部)以便彼此或通过中间体交联。例如,生物素化的订书钉型寡核苷酸可以通过链霉亲和素交联(例如,通过在将订书钉型寡核苷酸与第一衔接子区域和/或第二衔接子区域杂交后加入链霉亲和素)。在另一个实例中,订书钉型寡核苷酸可以利用点击化学基团如应变促进的点击化学基团而官能化,并且可以通过呈递多种情况下的互补点击化学配偶体的多价中间体而交联。示例性的应变促进的点击化学基团和配偶体包括但不限于二苯并环辛烯(DBCO)和叠氮化物、反式环辛烯(TCO)和四嗪、及其衍生物。例如,DBCO官能化的订书钉型寡核苷酸可以通过呈递多个叠氮化物的树枝状大分子而交联。Each staple oligonucleotide can be combined with two or more fragments (directly or indirectly) at the same time to keep them close to each other. In various exemplary designs, the staple oligonucleotide can be directly hybridized with the adapter region of two or more fragments, can be hybridized with a fragment and another staple oligonucleotide, and/or can be hybridized with a fragment and combined with an intermediate. For example, a portion of a staple oligonucleotide (e.g., one end of a staple oligonucleotide) can be hybridized with a single-stranded portion of a first adapter region, while another portion of a staple oligonucleotide (e.g., its other end) can be hybridized with a single-stranded portion of a second adapter region, such as on different fragments. As another example, a portion of a staple oligonucleotide (e.g., one end of a staple oligonucleotide) can be hybridized with a single-stranded portion of a first adapter region, while another portion of a staple oligonucleotide (e.g., its other end) can be hybridized with a single-stranded portion of a first adapter region in another case; the portion of the first adapter region that is hybridized with a staple oligonucleotide optionally has the same sequence. Similarly, a staple oligonucleotide can be hybridized with a single-stranded portion of a second adapter region in two different cases. In some aspects, the parts of staple oligonucleotides can be mutually hybridized or hybridized with intermediates.For example, the parts of two or more staple oligonucleotides can be hybridized with the same single intermediate oligonucleotide or different intermediate oligonucleotides, such as intermediate oligonucleotides presented by particles.The staple oligonucleotides can be functionalized (for example, at its 3' end and/or 5' end and/or inside) so as to be cross-linked to each other or by intermediates.For example, biotinylated staple oligonucleotides can be cross-linked by streptavidin (for example, by adding streptavidin after the staple oligonucleotide is hybridized with the first adapter region and/or the second adapter region).In another example, staple oligonucleotides can be functionalized using click chemistry groups such as strain-promoted click chemistry groups, and can be cross-linked by presenting the multivalent intermediates of complementary click chemistry partners in a variety of situations.Exemplary strain-promoted click chemistry groups and partners include but are not limited to dibenzocyclooctene (DBCO) and azide, trans-cyclooctene (TCO) and tetrazine, and derivatives thereof. For example, DBCO-functionalized stapled oligonucleotides can be cross-linked via dendrimers presenting multiple azides.

单个寡核苷酸可以任选地用作引物和订书钉型寡核苷酸。例如,第一测序引物的5′端可以与第二衔接子区域杂交,而第一测序引物的3′端与第一衔接子区域杂交(使得第一测序引物也用作订书钉型寡核苷酸),和/或掩蔽引物的5′端可以与第一衔接子区域杂交,而掩蔽引物的3′端与第二衔接子区域杂交(使得掩蔽引物也用作订书钉型寡核苷酸)。显然,与此类引物的测序、掩蔽和订书钉部分互补的衔接子区域的部分通常不会相互重叠,使得引物不会竞争它们的结合位点。A single oligonucleotide may optionally be used as a primer and a staple oligonucleotide. For example, the 5' end of the first sequencing primer may hybridize to the second adapter region, while the 3' end of the first sequencing primer hybridizes to the first adapter region (so that the first sequencing primer also serves as a staple oligonucleotide), and/or the 5' end of the masking primer may hybridize to the first adapter region, while the 3' end of the masking primer hybridizes to the second adapter region (so that the masking primer also serves as a staple oligonucleotide). Obviously, the portions of the adapter regions that are complementary to the sequencing, masking, and staple portions of such primers will generally not overlap with each other, so that the primers will not compete for their binding sites.

在图5所图示的实例中,合成第一掩蔽链576之后,在步骤IV中,第一测序引物531与第一衔接子区域523中的引物结合位点杂交。第一衔接子区域523上的该引物结合位点与被包含在第一连接区域513中但不存在于第二连接区域514中的序列互补。测序过程可以在固定在基材上的一个多联体或多个多联体(例如,数千至数百万至数十亿个)上进行。测序过程可以是逐步测序过程,如SBS、SBB或通过连接测序。在图5所图示的示例性实施例中,将第一测序引物531在步骤V中延伸以形成新生链532,其中延伸沿着箭头指示的方向行进。测序过程从测序引物531行进到靶序列的反向链522,直至第一测序过程停止。此过程产生第一读段。In the example illustrated in FIG5 , after synthesizing the first masked strand 576, in step IV, the first sequencing primer 531 hybridizes with the primer binding site in the first adapter region 523. The primer binding site on the first adapter region 523 is complementary to a sequence contained in the first connection region 513 but not present in the second connection region 514. The sequencing process can be performed on a concatemer or multiple concatemers (e.g., thousands to millions to billions) fixed on a substrate. The sequencing process can be a step-by-step sequencing process, such as SBS, SBB, or sequencing by connection. In the exemplary embodiment illustrated in FIG5 , the first sequencing primer 531 is extended in step V to form a nascent chain 532, wherein the extension proceeds in the direction indicated by the arrow. The sequencing process proceeds from the sequencing primer 531 to the reverse strand 522 of the target sequence until the first sequencing process stops. This process produces a first read segment.

在需要靶核酸序列的双端读段的情况下,可以使用各种方法中的任何一种来获得第二读段。两种示例性方法在下文进行了详述。Where paired-end reads of a target nucleic acid sequence are desired, any of a variety of methods may be used to obtain the second read. Two exemplary methods are described in detail below.

图6A至6B示意性地图示了其中从多联体内的靶标的另一条链获得第二读段的实施例。在本实施例中,掩蔽引物675包括5′磷酸基团,并且因此第一掩蔽链676也包括5′磷酸基团。在其它方面,如上文针对图5所详述的进行步骤I至步骤V。在完成从第一测序引物531的测序后,将通过从第一测序引物531测序产生的新生链532在步骤VI中延伸以产生第二掩蔽链638。新生链532的延伸在其行进通过整个第二衔接子区域524之前停止,使得所得到的第二掩蔽链638不与整个第二衔接子区域524互补。可以使用缺乏链置换活性的聚合酶来延伸第一测序引物531,使得延伸停止在第一掩蔽链676的5′端,或者掩蔽引物675并且因此第一掩蔽链676可以包含LNA或PNA残基(或导致与多联体紧密结合的其它修饰),可以在掩蔽引物675与第二衔接子区域524之间形成链间交联,或者第二衔接子区域524可以包含至少一个非天然核苷酸,延伸反应中未包含所述至少一个非天然核苷酸的互补物,以便如上文所详述停止延伸。合成第二掩蔽链638之后,在步骤VII中去除第一掩蔽链676,例如,通过利用λ核酸外切酶或者催化从具有5′磷酸的双链DNA中进行5′至3′核苷酸去除的类似酶进行消化。Figures 6A to 6B schematically illustrate an embodiment in which a second read is obtained from another strand of the target within the concatemer. In this embodiment, the masking primer 675 includes a 5' phosphate group, and therefore the first masked strand 676 also includes a 5' phosphate group. In other aspects, steps I to V are performed as detailed above with respect to Figure 5. After the sequencing from the first sequencing primer 531 is completed, the nascent strand 532 generated by sequencing from the first sequencing primer 531 is extended in step VI to generate a second masked strand 638. The extension of the nascent strand 532 is stopped before it travels through the entire second adapter region 524, so that the resulting second masked strand 638 is not complementary to the entire second adapter region 524. The first sequencing primer 531 may be extended using a polymerase lacking strand displacement activity such that extension stops at the 5' end of the first masked strand 676, or the masked primer 675 and thus the first masked strand 676 may comprise LNA or PNA residues (or other modifications that result in tight binding to the concatemer), interstrand crosslinks may be formed between the masked primer 675 and the second adapter region 524, or the second adapter region 524 may comprise at least one non-natural nucleotide, the complement of which is not included in the extension reaction, so as to stop extension as described in detail above. After the second masked strand 638 is synthesized, the first masked strand 676 is removed in step VII, for example, by digestion with lambda exonuclease or a similar enzyme that catalyzes the removal of 5' to 3' nucleotides from double-stranded DNA having a 5' phosphate.

在步骤VIII中,第二测序引物633与第二衔接子区域524中的引物结合位点杂交。第二衔接子区域524上的该引物结合位点与被包含在第二接头514中但不存在于第一接头513中的序列互补。在步骤IX中从第二测序引物633进行第二测序过程。第二测序过程通常以与第一测序过程相同的方式进行,但是可以利用不同的过程进行。在图6A至图6B所图示的示例性实施例中,将第二测序引物633延伸以形成新生链634,其中延伸沿着箭头指示的方向行进。进行第二测序过程,直至该过程停止。这产生第二读段。第二读段来自与获得第一读段的链互补的链。与第一读段相比,第二读段从感兴趣的靶核酸的相对端延伸。第一读段和第二读段组合提供感兴趣的靶序列的双端序列。In step VIII, the second sequencing primer 633 hybridizes with the primer binding site in the second adapter region 524. The primer binding site on the second adapter region 524 is complementary to a sequence contained in the second adapter 514 but not present in the first adapter 513. In step IX, a second sequencing process is performed from the second sequencing primer 633. The second sequencing process is usually performed in the same manner as the first sequencing process, but can be performed using different processes. In the exemplary embodiment illustrated in Figures 6A to 6B, the second sequencing primer 633 is extended to form a nascent chain 634, wherein the extension proceeds in the direction indicated by the arrow. The second sequencing process is performed until the process stops. This produces a second read. The second read comes from a chain complementary to the chain from which the first read is obtained. Compared to the first read, the second read extends from the opposite end of the target nucleic acid of interest. The first read and the second read combine to provide a double-ended sequence of the target sequence of interest.

图7A至图7B示意性地图示了其中第二读段是从通过延伸第一测序引物形成的链而非直接从多联体获得的实施例。在本实施例中,掩蔽引物775并且因此还有第一掩蔽链776包含LNA残基、PNA残基或者导致与第二衔接子区域524强结合的其它修饰。在其它方面,如上文针对图5所详述的进行步骤I至步骤III。7A-7B schematically illustrate an embodiment in which the second read is obtained from a strand formed by extending the first sequencing primer rather than directly from a concatemer. In this embodiment, the masked primer 775 and therefore also the first masked strand 776 comprises LNA residues, PNA residues, or other modifications that result in strong binding to the second adapter region 524. In other aspects, steps I to III are performed as detailed above for FIG. 5 .

在图7A至图7B所图示的实例中,合成第一掩蔽链776之后,在步骤IV中,将第一测序引物731与第一衔接子区域523杂交。第一测序引物731包含位于或临近其5′端的区域以及位于或临近其3′端的区域,它们各自与第一衔接子区域523杂交。第一测序引物731的这些5′区域和3′区域位于与第一衔接子区域523不互补且不杂交的中心区域的侧翼。第一衔接子区域523上的引物结合亚位点被与第一测序引物731不互补的区域间隔开,因此第一衔接子区域523的一部分保持为单链。在图7A至图7B所图示的示例性实施例中,将第一测序引物731在步骤V中延伸以形成新生链732,其中延伸沿着箭头指示的方向行进。测序过程可以是逐步测序过程,如SBS、SBB或通过连接测序,并且可以在固定在基材上的一个多联体或多个多联体上进行。测序过程从第一测序引物731行进到靶序列的反向链522,直至第一测序过程停止。此过程产生第一读段。In the example illustrated in FIGS. 7A to 7B , after synthesizing the first masked strand 776, in step IV, the first sequencing primer 731 is hybridized to the first adapter region 523. The first sequencing primer 731 includes a region located at or near its 5′ end and a region located at or near its 3′ end, each of which hybridizes to the first adapter region 523. These 5′ and 3′ regions of the first sequencing primer 731 are located on the flanks of the central region that is not complementary to and does not hybridize with the first adapter region 523. The primer binding subsites on the first adapter region 523 are separated by regions that are not complementary to the first sequencing primer 731, so that a portion of the first adapter region 523 remains single-stranded. In the exemplary embodiment illustrated in FIGS. 7A to 7B , the first sequencing primer 731 is extended in step V to form a nascent strand 732, wherein the extension proceeds in the direction indicated by the arrow. The sequencing process can be a stepwise sequencing process, such as SBS, SBB, or sequencing by ligation, and can be performed on one or more concatemers fixed on a substrate. The sequencing process proceeds from the first sequencing primer 731 to the reverse strand 522 of the target sequence until the first sequencing process stops. This process generates a first read segment.

如图7A至图7B中所图示的,在步骤VI中,将通过从第一测序引物731测序产生的新生链732进一步延伸以产生与反向链522互补的延伸链782。当到达第一掩蔽链776的紧密结合的5′端时,延伸停止,使得延伸链782不与整个第二衔接子区域524互补(尽管它通常与第二衔接子区域的一部分互补,以便为第二测序引物733提供方便的通用结合位点)。显然,除了在第一掩蔽引物775和第一掩蔽链776中包含LNA之外,或代替在所述第一掩蔽引物和所述第一掩蔽链中包含LNA,还可以使用上文详述的用于终止延伸的其它技术(例如,对第二衔接子区域524切刻或在第二接头区域524中包含非天然核苷酸并从延伸反应中排除其互补核苷酸)。As illustrated in Figures 7A-7B, in step VI, the nascent strand 732 generated by sequencing from the first sequencing primer 731 is further extended to generate an extended strand 782 that is complementary to the reverse strand 522. When the tightly bound 5' end of the first masked strand 776 is reached, the extension stops, so that the extended strand 782 is not complementary to the entire second adapter region 524 (although it is typically complementary to a portion of the second adapter region to provide a convenient universal binding site for the second sequencing primer 733). Obviously, in addition to or instead of including LNA in the first masked primer 775 and the first masked strand 776, other techniques for terminating extension (e.g., nicking the second adapter region 524 or including non-natural nucleotides in the second linker region 524 and excluding their complementary nucleotides from the extension reaction) may also be used.

在步骤VII中,置换引物785与第一衔接子区域523的位于结合第一测序引物731的区域之间的单链部分杂交。在步骤VIII中,置换引物785延伸时产生第二掩蔽链786,通常利用链置换聚合酶从反向链522置换延伸链782。延伸链782的5′区域保持与第一衔接子区域523杂交。在图7A至图7B中所图示的实施例中,当到达第一掩蔽链776的紧密结合的5′端时,置换引物785的延伸停止。同样地,显然,除了在第一掩蔽引物775和第一掩蔽链776中包含LNA之外,或代替在所述第一掩蔽引物和所述第一掩蔽链中包含LNA,还可以使用上文详述的用于终止延伸的其它技术(例如,对第二衔接子区域524切刻或在第二接头区域524中包含非天然核苷酸并从延伸反应中排除其互补核苷酸)。在一些实施例中,允许延伸行进通过第二衔接子区域524;在此类实施例中,可以使用上文详述的技术(例如,对第一衔接子区域523切刻或在第一衔接子区域523中包含非天然核苷酸并从延伸反应中排除其互补核苷酸)替代地在第一衔接子区域523中终止延伸。In step VII, the displacement primer 785 hybridizes to the single-stranded portion of the first adapter region 523 between the regions that bind the first sequencing primer 731. In step VIII, the second masked strand 786 is generated when the displacement primer 785 is extended, typically by a strand displacement polymerase that displaces the extended strand 782 from the reverse strand 522. The 5′ region of the extended strand 782 remains hybridized to the first adapter region 523. In the embodiment illustrated in FIGS. 7A-7B , extension of the displacement primer 785 stops when the tightly bound 5′ end of the first masked strand 776 is reached. Likewise, it will be apparent that other techniques for terminating extension (e.g., nicking the second adapter region 524 or including non-natural nucleotides in the second linker region 524 and excluding their complementary nucleotides from the extension reaction) described in detail above may be used in addition to or in lieu of including LNAs in the first masked primer 775 and the first masked strand 776. In some embodiments, extension is allowed to proceed through the second linker region 524; in such embodiments, extension can be terminated in the first linker region 523 instead using the techniques detailed above (e.g., nicking the first linker region 523 or including non-natural nucleotides in the first linker region 523 and excluding their complementary nucleotides from the extension reaction).

在步骤IX中,第二测序引物733与置换的延伸链782中的引物结合位点杂交。在步骤X中,从第二测序引物733进行第二测序过程。第二测序过程通常以与第一测序过程相同的方式进行,但是可以利用不同的过程进行。在图7A至图7B所图示的示例性实施例中,将第二测序引物733延伸以形成新生链734,其中延伸沿着箭头指示的方向行进。进行第二测序过程,直至该过程停止。这产生第二读段。第二读段来自与获得第一读段的链互补的链。与第一读段相比,第二读段从感兴趣的靶核酸的相对端延伸。第一读段和第二读段组合提供感兴趣的靶序列的双端序列。In step IX, the second sequencing primer 733 hybridizes with the primer binding site in the displaced extended strand 782. In step X, a second sequencing process is performed from the second sequencing primer 733. The second sequencing process is usually performed in the same manner as the first sequencing process, but can be performed using different processes. In the exemplary embodiment illustrated in Figures 7A to 7B, the second sequencing primer 733 is extended to form a nascent strand 734, wherein the extension proceeds in the direction indicated by the arrow. The second sequencing process is performed until the process stops. This produces a second read. The second read comes from a strand complementary to the strand from which the first read is obtained. Compared to the first read, the second read extends from the opposite end of the target nucleic acid of interest. The first read and the second read combine to provide a double-ended sequence of the target sequence of interest.

虽然前面的实例已经针对双端测序进行了描述,但是显然,这些方法中的每一种都可以用于测定靶标的单个区域的核酸序列,只需进行那些导致从第一测序引物测序以获得第一读段的步骤,而不继续进行导致从第二测序引物测序以获得第二读段的步骤。此类方法也是本发明的特征。Although the above examples have been described for paired-end sequencing, it is clear that each of these methods can be used to determine the nucleic acid sequence of a single region of a target by performing only those steps that result in sequencing from a first sequencing primer to obtain a first read, without continuing with the steps that result in sequencing from a second sequencing primer to obtain a second read. Such methods are also a feature of the present invention.

生成滚环扩增模板Generating rolling circle amplification template

通常,本发明的背景下特定用途的核酸滚环扩增模板在每个末端处包含具有发夹的双链区域(例如,含有感兴趣的靶核酸序列的双链核酸插入片段)。此类构建体可以使用任何合适的方法来生成。在一些实施例中,如本领域已知的,通过将发夹衔接子连接到片段的相容末端来将发夹末端附接到双链核酸片段,例如,经由钝末端或粘性末端连接。发夹衔接子的连接产生包括单链区域的核酸构建体,所述单链区域连接双链核酸插入片段的两条链(例如,通过发夹衔接子的茎,将插入片段的第一链的3′端与插入片段的杂交互补链的5′端连接,将衔接子的3′端与插入片段的一条链的5′端连接,并将衔接子的5′端与插入片段的另一条链的3′端连接,如图4所图示)。根据需要,单链区域可用于进行引发和/或捕获。可以使用在核酸上生成发夹末端的其它方法。Typically, the nucleic acid rolling circle amplification template for specific use in the context of the present invention comprises a double-stranded region with a hairpin at each end (e.g., a double-stranded nucleic acid insert containing a target nucleic acid sequence of interest). Such constructs can be generated using any suitable method. In certain embodiments, as known in the art, hairpin ends are attached to double-stranded nucleic acid fragments by connecting hairpin adapters to the compatible ends of the fragments, for example, via blunt ends or sticky ends. The connection of hairpin adapters produces a nucleic acid construct comprising a single-stranded region, which connects two chains of double-stranded nucleic acid inserts (e.g., by the stem of a hairpin adapter, the 3' end of the first chain of the insert is connected to the 5' end of the hybridization complementary chain of the insert, the 3' end of the adapter is connected to the 5' end of one chain of the insert, and the 5' end of the adapter is connected to the 3' end of another chain of the insert, as illustrated in Figure 4). As required, single-stranded regions can be used for initiation and/or capture. Other methods for generating hairpin ends on nucleic acids can be used.

发现在本公开中有用的环状核酸包含模板,其是具有中心双链区域并且在双链区域的每个末端处具有发夹区域的核酸。环状模板如/>模板的制备和用途在下列中进行了描述,例如:美国专利第8,153,375号、美国专利第8,236,499号和Travers等人.(2010)《核酸研究(Nucl.Acids Res.)》38(15):e159,其全部公开内容特此出于所有目的通过引用并入本文。/>模板的一个优点是它可以由双链核酸(例如DNA)片段的文库来制成。例如,可以将基因组DNA样品通过已知的方法如通过剪切或通过使用限制性内切酶片段化为DNA片段文库。可以将DNA片段文库连接到片段的每个末端处的发夹或其它茎环衔接子上,以产生/>环状模板的文库。发夹衔接子在发夹内提供单链区域,这为滚环扩增引物结合位点和测序引物结合位点的互补物提供有用的位置。通过针对所有片段使用相同的发夹衔接子对,发夹衔接子提供通用引发所有靶序列的位置。Circular nucleic acids found to be useful in the present disclosure include A template is a nucleic acid having a central double-stranded region and a hairpin region at each end of the double-stranded region. A circular template such as Preparation and use of templates are described, for example, in U.S. Pat. No. 8,153,375, U.S. Pat. No. 8,236,499, and Travers et al. (2010) Nucl. Acids Res. 38(15):e159, the entire disclosures of which are hereby incorporated by reference for all purposes. /> One advantage of the template is that it can be made from a library of double-stranded nucleic acid (e.g., DNA) fragments. For example, a genomic DNA sample can be fragmented into a library of DNA fragments by known methods, such as by shearing or by using restriction endonucleases. The library of DNA fragments can be ligated to hairpins or other stem-loop adapters at each end of the fragments to produce a library of double-stranded nucleic acid (e.g., DNA) fragments. The hairpin adapters provide a single-stranded region within the hairpin, which provides a useful location for the complement of the rolling circle amplification primer binding site and the sequencing primer binding site. By using the same hairpin adapter pair for all fragments, the hairpin adapters provide a location for universal priming of all target sequences.

如上文关于图3所描述的,将两种不同的发夹衔接子的混合物连接到双链靶片段上,得到对称环状核酸(两端处具有相同的衔接子)和不对称环状核酸(两端处具有不同的衔接子)的混合物。虽然如上文所详述该混合物可用于靶标的测序,但也可以从混合物中纯化不对称环状构建体,例如,如美国专利第10,920,268号中所详述的,其特此出于所有目的通过引用整体并入本文。例如,包含两个拷贝第一发夹衔接子的对称环状核酸和包含所述第一发夹衔接子和第二发夹衔接子的不对称环状核酸可以被捕获在携带有与第一衔接子互补的捕获引物的珠上。具有两个拷贝第二个衔接子的对称环状核酸不被捕获,并因此从混合物中去除。然后可以利用链置换聚合酶将与第二衔接子互补的引物进行延伸,并且引物的这种延伸将不对称核酸从珠上洗脱出来。美国专利第10,920,268号中还详述了纯化不对称环状核酸的其它技术。As described above with respect to FIG. 3, a mixture of two different hairpin adapters is connected to a double-stranded target fragment to obtain a mixture of symmetrical circular nucleic acids (having the same adapter at both ends) and asymmetric circular nucleic acids (having different adapters at both ends). Although the mixture can be used for sequencing of the target as described in detail above, the asymmetric circular construct can also be purified from the mixture, for example, as described in detail in U.S. Patent No. 10,920,268, which is hereby incorporated by reference as a whole for all purposes. For example, a symmetrical circular nucleic acid comprising two copies of the first hairpin adapter and an asymmetric circular nucleic acid comprising the first hairpin adapter and the second hairpin adapter can be captured on a bead carrying a capture primer complementary to the first adapter. Symmetric circular nucleic acids with two copies of the second adapter are not captured and are therefore removed from the mixture. Then a primer complementary to the second adapter can be extended using a strand displacement polymerase, and this extension of the primer elutes the asymmetric nucleic acid from the bead. Other techniques for purifying asymmetric circular nucleic acids are also described in detail in U.S. Patent No. 10,920,268.

显然,除了从对称构建体和不对称构建体的混合物中纯化不对称核酸之外或代替从混合物中纯化不对称核酸,还可以从多联体的混合物中纯化所需的多联体。例如,可以将包括滚环扩增引物结合位点的第一发夹衔接子和缺少滚环扩增引物结合位点的第二发夹衔接子的混合物连接到双链靶片段上,得到对称环状核酸(两端处具有相同的衔接子)和不对称环状核酸(两端处具有不同的衔接子)的混合物。当进行滚环扩增时,仅不对称构建体和具有两个第一衔接子的那些对称构建体会被扩增;具有两个第二衔接子的对称构建体不能与引物杂交,并且因此不会被扩增。通过将固定在固体支持物上的寡核苷酸结合到多联体区域中与第二衔接子互补的结合位点,可以将所得到的多联体暴露于固体支持物(例如,珠、流动池的表面或其它测序基材等)。因此,只有由不对称构建体的扩增产生的多联体才会被捕获。这些多联体可用作本文详述的测序模板,或者它们可经受多重置换扩增,并且所得到的多联产物可用作本文详述的测序模板。Obviously, in addition to purifying asymmetric nucleic acids from a mixture of symmetrical and asymmetric constructs or replacing the purification of asymmetric nucleic acids from a mixture, the desired concatemers can also be purified from a mixture of concatemers. For example, a mixture of a first hairpin adapter including a rolling circle amplification primer binding site and a second hairpin adapter lacking a rolling circle amplification primer binding site can be connected to a double-stranded target fragment to obtain a mixture of symmetrical circular nucleic acids (having the same adapter at both ends) and asymmetric circular nucleic acids (having different adapters at both ends). When rolling circle amplification is performed, only asymmetric constructs and those symmetrical constructs with two first adapters will be amplified; symmetrical constructs with two second adapters cannot hybridize with primers and therefore will not be amplified. By binding an oligonucleotide fixed on a solid support to a binding site complementary to the second adapter in the concatemer region, the resulting concatemer can be exposed to a solid support (e.g., a bead, the surface of a flow cell or other sequencing substrate, etc.). Therefore, only the concatemers generated by the amplification of the asymmetric construct will be captured. These concatemers can be used as sequencing templates as detailed herein, or they can be subjected to multiple displacement amplification and the resulting concatemer products can be used as sequencing templates as detailed herein.

可以使用构建不对称核酸群体的技术,而不是从对称构建体和不对称构建体的混合物中纯化不对称核酸。示例性的用于生成不对称环状核酸的合适方法在例如美国专利第10,370,701号中进行了描述,其特此出于所有目的通过引用整体并入本文。例如,构建在发夹衔接子与每个末端处的双链插入片段之间包括切口的对称环状核酸。从切口进行延伸,并将不同的发夹衔接子连接到所得到的产物的自由端。用于产生适合用作本发明的不对称滚环扩增模板的核酸构建体的其它方法在例如美国专利申请公开第2012/0196279号中提供,其特此出于所有目的通过引用整体并入本文。The technology of constructing asymmetric nucleic acid colony can be used, rather than purifying asymmetric nucleic acid from the mixture of symmetric construct and asymmetric construct. Exemplary suitable methods for generating asymmetric circular nucleic acid are described in, for example, U.S. Patent No. 10,370,701, which is hereby incorporated by reference as a whole for all purposes. For example, a symmetrical circular nucleic acid including a nick is constructed between a hairpin adapter and the double-stranded insert at each end. Extend from the nick, and different hairpin adapters are connected to the free end of the resulting product. Other methods for generating a nucleic acid construct suitable for use as asymmetric rolling circle amplification template of the present invention are provided in, for example, U.S. Patent Application Publication No. 2012/0196279, which is hereby incorporated by reference as a whole for all purposes.

用于生成在本发明的方法中使用的滚环扩增模板的一些有用方法从具有限定末端的双链核酸片段开始,所述限定末端可以是钝末端或具有已知突出序列(5′或3′突出序列)的末端。这些核酸片段可以是任何大小或大小范围,并且可以包含DNA、RNA、DNA-RNA杂交体(例如,在制备cDNA期间通过合成第一链产生的具有一条mRNA链和一条互补DNA链的分子)、基因组DNA、cDNA、mRNA、tRNA等。在一些实施例中,片段的核苷酸序列是未知的。Some useful methods for generating rolling circle amplification templates used in the methods of the invention start with double-stranded nucleic acid fragments with defined ends, which can be blunt ends or ends with known overhang sequences (5' or 3' overhang sequences). These nucleic acid fragments can be of any size or size range and can comprise DNA, RNA, DNA-RNA hybrids (e.g., molecules having one mRNA strand and one complementary DNA strand produced by synthesizing the first strand during the preparation of cDNA), genomic DNA, cDNA, mRNA, tRNA, etc. In some embodiments, the nucleotide sequence of the fragment is unknown.

在某些实施例中,本公开的方法和组合物中使用的双链核酸片段包括从样品中获得的核酸。样品可以包含任意数量的物质,包括但不限于:体液(包括但不限于血液、尿液、血清、淋巴液、唾液、肛门和阴道分泌物、汗液和精液)和几乎任何生物体(例如,包括人类在内的哺乳动物物种)的细胞;环境样品(包括但不限于空气、农业、水和土壤样品);生物战剂样品;研究样品;扩增反应的产物(包括靶标和信号扩增两者,如PCR扩增反应);纯化的样品(例如,如纯化的基因组DNA、原始样品(细菌、病毒、基因组DNA等))。如本领域技术人员所理解的,可以对样品进行几乎任何实验操作。In certain embodiments, the double-stranded nucleic acid fragments used in the methods and compositions of the present disclosure include nucleic acids obtained from samples. The sample may contain any number of substances, including but not limited to: body fluids (including but not limited to blood, urine, serum, lymph, saliva, anal and vaginal secretions, sweat and semen) and cells of almost any organism (e.g., mammalian species including humans); environmental samples (including but not limited to air, agricultural, water and soil samples); biological warfare agent samples; research samples; products of amplification reactions (including both target and signal amplification, such as PCR amplification reactions); purified samples (e.g., such as purified genomic DNA, original samples (bacteria, viruses, genomic DNA, etc.)). As will be appreciated by those skilled in the art, almost any experimental operation may be performed on the sample.

当在所公开的方法中使用时,基因组DNA可以基本上由任何来源通过三个步骤进行制备:细胞裂解、脱蛋白和DNA回收。这些步骤适应于应用的要求、DNA的所要求的产量、纯度和分子量以及来源的量和历史。关于基因组DNA分离的进一步的细节内容可以在以下中找到:Berger和Kimmel,《分子克隆技术指南,酶学方法(Guide to Molecular CloningTechniques,Methods in Enzymology)》,第152卷,学术出版社(Academic Press,Inc.),加利福尼亚州圣地亚哥(Berger);Sambrook等人,《分子克隆:实验室指南(MolecularCloning-A Laboratory Manual)》(第三版),第1-3卷,冷泉港实验室,纽约冷泉港,2008(“Sambrook”);《现代分子生物学指南(Current Protocols in Molecular Biology)》,F.M.Ausubel等人编著,现代实验室指南(Current Protocols)格林出版联合公司和约翰威利父子公司的合资企业(补充到2021年)(“Ausubel”);Kaufman等人(2003)《生物学和医学分子和细胞方法手册第二版(Handbook of Molecular and Cellular Methods inBiology and Medicine Second Edition)》Ceske(编著)CRC出版社(CRC Press)(Kaufman);以及《核酸方案手册(The Nucleic Acid Protocols Handbook)》Ralph Rapley(编著)(2000)冷泉港实验室,哈门那出版社(Humana Press Inc)(Rapley)。此外,许多可商购获得的试剂盒用于从细胞中纯化基因组DNA,包括:WizardTM基因组DNA纯化试剂盒,可从普洛麦格公司(Promega)获得;Aqua PureTM基因组DNA分离试剂盒,可从伯乐公司(BioRad)获得;Easy-DNATM试剂盒,可从赛默飞世尔科技公司获得;以及DnEasyTM组织试剂盒,其可从德国凯杰公司(Qiagen)获得。可替代地或另外地,可以通过靶向捕获方案获得靶核酸片段,例如,其中靶核酸最初在微阵列或其它捕获技术上作为单链片段获得,随后对捕获的材料进行扩增以生成双链样品材料。在下列中已对多种此类捕获方案进行了描述,例如:HodgesE等人《自然遗传学(Nat.Genet.)》2007年11月4日,Olson M.,《自然方法(NatureMethods)》2007年11月;4(11):891-2,Albert TJ等人,《自然方法》2007年11月;4(11):903-5,以及Okou DT等人,《自然方法》2007年11月;4(11):907-9。When used in the disclosed methods, genomic DNA can be prepared from essentially any source by three steps: cell lysis, deproteinization, and DNA recovery. These steps are adapted to the requirements of the application, the required yield, purity, and molecular weight of the DNA, and the amount and history of the source. Further details on genomic DNA isolation can be found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology, Vol. 152, Academic Press, Inc., San Diego, CA (Berger); Sambrook et al., Molecular Cloning—A Laboratory Manual (3rd ed.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 2008 (“Sambrook”); Current Protocols in Molecular Biology, FM Ausubel et al., eds., Current Protocols, a joint venture of Green Publishing Associates and John Wiley & Sons, Inc. (supplements through 2021) (“Ausubel”); Kaufman et al. (2003) Handbook of Molecular and Cellular Methods in Biology and Medicine Second Edition. Edition) Ceske (ed.) CRC Press (Kaufman); and The Nucleic Acid Protocols Handbook Ralph Rapley (ed.) (2000) Cold Spring Harbor Laboratory, Humana Press Inc (Rapley). In addition, a number of commercially available kits are used to purify genomic DNA from cells, including: Wizard Genomic DNA Purification Kit, available from Promega; Aqua Pure Genomic DNA Isolation Kit, available from BioRad; Easy-DNA Kit, available from Thermo Fisher Scientific; and DnEasy Tissue Kit, available from Qiagen, Germany. Alternatively or additionally, target nucleic acid fragments can be obtained by a targeted capture protocol, for example, where the target nucleic acid is initially obtained as a single-stranded fragment on a microarray or other capture technology, and the captured material is subsequently amplified to generate double-stranded sample material. A variety of such capture schemes have been described, for example, in: Hodges E et al., Nat. Genet. 2007 Nov 4, Olson M., Nature Methods 2007 Nov;4(11):891-2, Albert TJ et al., Nature Methods 2007 Nov;4(11):903-5, and Okou DT et al., Nature Methods 2007 Nov;4(11):907-9.

可以在本文所述的方法中使用的核酸也可来源于cDNA,例如由从例如真核生物受试者或来源于真核生物受试者的特定组织获得的mRNA制备的cDNA。例如,使用高通量测序系统对来源于cDNA文库的核酸靶标进行测序获得的数据可用于鉴定例如感兴趣基因的新剪接变体,或者用于比较例如感兴趣基因的剪接亚型在例如不同组织类型之间、相同组织类型的不同处理之间或相同组织类型的不同发育阶段之间的差异表达。The nucleic acid that can be used in the methods described herein can also be derived from cDNA, such as cDNA prepared from mRNA obtained from, for example, a eukaryotic subject or a specific tissue derived from a eukaryotic subject. For example, the data obtained by sequencing a nucleic acid target derived from a cDNA library using a high-throughput sequencing system can be used to identify, for example, new splicing variants of a gene of interest, or to compare, for example, differential expression of splicing isoforms of a gene of interest between, for example, different tissue types, between different treatments of the same tissue type, or between different developmental stages of the same tissue type.

通常可以使用例如在Sambrook和Ausubel中描述的方案和方法,从几乎任何来源分离mRNA。分离的mRNA的产量和质量可以取决于例如提取RNA之前如何储存组织、提取RNA期间破坏组织的手段或者从中提取RNA的组织类型。RNA分离方案可以相应地进行优化。许多mRNA分离试剂盒可以例如从赛默飞世尔科技公司和美国博诚公司(BioChain)商购获得。此外,来自各种来源(例如,牛、小鼠和人)以及组织(例如脑、血液和心脏)的mRNA可从例如美国博诚公司公司(加利福尼亚州海沃德(Hayward,CA))和宝生物工程公司(Takara Bio)(加利福尼亚州圣何塞(San Jose,CA))商购获得。Generally, the scheme and method described in Sambrook and Ausubel can be used to separate mRNA from almost any source. The output and quality of the mRNA separated can depend on how to store tissue, extract the means of destroying tissue during RNA extraction, or extract the tissue type of RNA therefrom before RNA extraction. RNA separation scheme can be optimized accordingly. Many mRNA separation kits can be commercially available from Thermo Fisher Scientific and Technological Company and U.S. Bo Cheng Company (BioChain). In addition, mRNA from various sources (for example, cattle, mice and people) and tissue (for example brain, blood and heart) can be commercially available from U.S. Bo Cheng Company (Hayward, CA, CA) and Takara Bio Engineering Company (Takara Bio) (San Jose, CA, San Jose, CA).

一旦回收纯化的mRNA,将逆转录酶用于由mRNA模板生成cDNA。由mRNA(例如,从原核生物以及真核生物收获的mRNA)产生cDNA的方法和方案在下列中进行了阐述:《cDNA文库方案(cDNA Library Protocols)》,I.G.Cowell等人编著,哈门那出版社,新泽西州,1997,Sambrook和Ausubel。此外,许多试剂盒可商购获得用于制备cDNA,包括Cells-to-cDNATMII、RETROscriptTM以及CloneMinerTMcDNA文库构建试剂盒(赛默飞世尔科技公司)和UniversalcDNA合成系统(普洛麦格公司)。许多公司,例如Creative Biogene公司和金斯瑞公司(GenScript),提供cDNA合成服务。Once the purified mRNA is recovered, reverse transcriptase is used to generate cDNA from the mRNA template. Methods and protocols for generating cDNA from mRNA (e.g., mRNA harvested from prokaryotes and eukaryotes) are described in cDNA Library Protocols, IG Cowell et al., eds., Humana Press, New Jersey, 1997, Sambrook and Ausubel. In addition, many kits are commercially available for preparing cDNA, including Cells-to-cDNA II, RETROscript , and CloneMiner cDNA Library Construction Kits (Thermo Fisher Scientific) and Universal cDNA Synthesis System (Promega). Many companies, such as Creative Biogene and GenScript, provide cDNA synthesis services.

在本文描述的本发明的一些实施例中,核酸片段由基因组DNA或cDNA生成。由基因组DNA、cDNA或DNA多联体生成核酸片段的方法有很多。这些方法包括但不限于:机械方法,如超声处理、机械剪切、雾化、水力剪切等;化学方法,如利用羟基自由基、Cu(II):硫醇组合、重氮盐等处理;酶法,如核酸外切酶消化、限制性核酸内切酶消化、转座子切割和标记等;以及电化学裂解。这些方法在例如Sambrook和Ausubel中进行了进一步解释。In some embodiments of the invention described herein, nucleic acid fragments are generated from genomic DNA or cDNA. There are many methods for generating nucleic acid fragments from genomic DNA, cDNA or DNA concatemers. These methods include, but are not limited to: mechanical methods, such as ultrasonic treatment, mechanical shearing, atomization, hydraulic shearing, etc.; chemical methods, such as treatment with hydroxyl radicals, Cu(II): thiol combinations, diazonium salts, etc.; enzymatic methods, such as exonuclease digestion, restriction endonuclease digestion, transposon cleavage and labeling, etc.; and electrochemical cleavage. These methods are further explained in, for example, Sambrook and Ausubel.

在一些实施例中,从样品中获得核酸分子,并将其片段化以在本公开的方法中使用。可以根据本领域已知的和本文描述的任何方法对片段进行进一步修饰。核酸片段可以通过使用本领域已知的任何方法对源核酸如基因组DNA进行片段化来生成。在一个实施例中,基因组DNA裂解和提取期间的剪切力生成所需范围的片段。本发明还涵盖利用限制性核酸内切酶的片段化方法。In certain embodiments, nucleic acid molecules are obtained from a sample and fragmented for use in the methods disclosed herein. Fragments can be further modified according to any method known in the art and described herein. Nucleic acid fragments can be generated by fragmenting source nucleic acids such as genomic DNA using any method known in the art. In one embodiment, shearing forces during genomic DNA cracking and extraction generate fragments of the desired range. The present invention also encompasses fragmentation methods utilizing restriction endonucleases.

双链核酸片段可以是后续用途,例如,克隆、转化、富集、测序等所需的任何长度。在某些实施例中,片段的长度可以为约10至约50,000个碱基对(bp)以及它们之间的任何范围,例如,约100至约40,000bp、约300至30,000bp、约500至20,000bp、约800至10,000bp、约1,000至8,000bp等。在某些实施例中,双链核酸片段的平均大小为长度为至少约100bp、至少约200、至少约300、至少约500、至少约1,000、至少约1,500、至少约2,000、至少约5,000、至少约10,000、至少约20,000等。在一些实施例中,插入到滚环扩增模板中的靶核酸序列的平均长度在50与20,000bp之间,例如,在50与10,000bp之间、50与5,000bp之间、50与2,000bp之间、50与1,000bp之间、50与500bp之间、50与300bp之间、50与200bp之间、200与800bp之间或者200与500bp之间。因此,在此类实施例中,所得到的多联体中的正向和反向靶核酸序列的平均长度各自在50与20,000个核苷酸之间,例如,在50与10,000个核苷酸之间、50与5,000个核苷酸之间、50与2,000个核苷酸之间、50与1,000个核苷酸之间、50与500个核苷酸之间、50与300个核苷酸之间、50与200个核苷酸之间、200与800个核苷酸之间或者200至500个核苷酸之间。The double-stranded nucleic acid fragments can be any length required for subsequent use, e.g., cloning, transformation, enrichment, sequencing, etc. In certain embodiments, the length of the fragments can be from about 10 to about 50,000 base pairs (bp), and any range therebetween, e.g., from about 100 to about 40,000 bp, from about 300 to 30,000 bp, from about 500 to 20,000 bp, from about 800 to 10,000 bp, from about 1,000 to 8,000 bp, etc. In certain embodiments, the average size of the double-stranded nucleic acid fragments is at least about 100 bp, at least about 200, at least about 300, at least about 500, at least about 1,000, at least about 1,500, at least about 2,000, at least about 5,000, at least about 10,000, at least about 20,000, etc. in length. In some embodiments, the average length of the target nucleic acid sequence inserted into the rolling circle amplification template is between 50 and 20,000 bp, e.g., between 50 and 10,000 bp, between 50 and 5,000 bp, between 50 and 2,000 bp, between 50 and 1,000 bp, between 50 and 500 bp, between 50 and 300 bp, between 50 and 200 bp, between 200 and 800 bp, or between 200 and 500 bp. Thus, in such embodiments, the average length of the forward and reverse target nucleic acid sequences in the resulting concatemers is each between 50 and 20,000 nucleotides, e.g., between 50 and 10,000 nucleotides, between 50 and 5,000 nucleotides, between 50 and 2,000 nucleotides, between 50 and 1,000 nucleotides, between 50 and 500 nucleotides, between 50 and 300 nucleotides, between 50 and 200 nucleotides, between 200 and 800 nucleotides, or between 200 and 500 nucleotides.

在某些实施例中,对片段进行处理以产生相容于与具有相容钝末端的衔接子连接的钝末端。可以使用产生钝末端的任何合适方法,包括利用一种或多种具有5′和/或3′单链核酸外切酶活性的酶(例如,大肠杆菌核酸外切酶III)和/或进行补平反应以延伸3′凹陷末端(例如,利用T4 DNA聚合酶)。在这方面无意进行限制。在其它实施例中,片段具有粘性末端,例如通过用限制性核酸内切酶或其它核酸酶处理留下的末端,或者通过Taq聚合酶添加的3′A,并且衔接子具有与这些末端互补的突出部分。显然,片段的两个末端可以具有不同的粘性末端,或者一个末端可以是钝的,而另一个末端是粘性的,并且一个衔接子将与一个末端相容,而另一个衔接子与另一个末端相容。In certain embodiments, the fragments are treated to produce blunt ends that are compatible with being connected to adapters having compatible blunt ends. Any suitable method for producing blunt ends can be used, including the use of one or more enzymes having 5' and/or 3' single-stranded exonuclease activity (e.g., E. coli exonuclease III) and/or a fill-in reaction to extend the 3' recessed end (e.g., using T4 DNA polymerase). There is no intention to limit in this regard. In other embodiments, the fragments have sticky ends, such as the ends left by treatment with restriction endonucleases or other nucleases, or 3'A added by Taq polymerase, and the adapters have overhangs complementary to these ends. Obviously, the two ends of the fragment can have different sticky ends, or one end can be blunt and the other end is sticky, and one adapter will be compatible with one end, while the other adapter is compatible with the other end.

滚环扩增Rolling circle amplification

可用于由不对称环状核酸产生用作本发明中的测序模板的多联体的滚环扩增方法是本领域公知的。参见例如:Fire等人的美国专利第5,648,245号“通过RCA制备的多联体文库(Concatemer library by RCA)”,Wang等人的美国专利申请公开第20050069939号“通过滚环扩增扩增多核苷酸(Amplification of polynucleotides by rolling circleamplification)”,以及Turner等人的美国专利第9,290,800号“靶向滚环扩增(Targetedrolling circle amplification)”,其出于所有目的通过引用整体并入本文。从附接于基材的寡核苷酸进行的滚环扩增在下列中进行了描述,例如:Rothberg和Bader的美国专利第6,274,320号以及Lapidus的美国专利申请公开号第20060024711号,其出于所有目的通过引用整体并入本文。Rolling circle amplification methods that can be used to generate concatemers used as sequencing templates in the present invention from asymmetric circular nucleic acids are well known in the art. See, for example, U.S. Patent No. 5,648,245 to Fire et al., "Concatemer library by RCA," U.S. Patent Application Publication No. 20050069939 to Wang et al., "Amplification of polynucleotides by rolling circle amplification," and U.S. Patent No. 9,290,800 to Turner et al., "Targeted rolling circle amplification," which are incorporated herein by reference in their entirety for all purposes. Rolling circle amplification from oligonucleotides attached to a substrate is described in, for example, U.S. Patent No. 6,274,320 to Rothberg and Bader and U.S. Patent Application Publication No. 20060024711 to Lapidus, which are incorporated herein by reference in their entirety for all purposes.

核酸测序过程Nucleic acid sequencing process

用于在所提供方法中使用的合适测序过程包括但不限于:通过结合进行的测序、边合成边测序(通过掺入进行的测序)、基于pH的测序、通过聚合酶监测进行的测序、通过杂交进行的测序以及大规模平行测序或下一代测序的其它方法。用于进行测序的合适表面包括但不限于:平坦基材、水凝胶、纳米孔阵列、微粒、纳米颗粒或流动池内的表面。包括方法、试剂和固相表面的示例性测序平台在下文和引用的参考文献中进行了阐述。Suitable sequencing processes for use in the provided methods include, but are not limited to, sequencing by binding, sequencing by synthesis (sequencing by incorporation), pH-based sequencing, sequencing by polymerase monitoring, sequencing by hybridization, and other methods of massively parallel sequencing or next generation sequencing. Suitable surfaces for sequencing include, but are not limited to, surfaces within a flat substrate, a hydrogel, a nanopore array, a microparticle, a nanoparticle, or a flow cell. Exemplary sequencing platforms including methods, reagents, and solid phase surfaces are described below and in the cited references.

在一个方面,利用通过结合进行的测序(SBB)技术来进行测序。示例性的特别有用的通过结合反应进行的测序在下列中进行了描述:美国专利第10,077,470号、第10,443,098号、第10,400,272号和第10,975,427号以及美国专利申请公开第20190119742号、第20180187245号和第20200032322号,其中的每一个均通过引用整体并入本文。一般而言,通过结合测序测定模板核酸分子序列的方法可以基于在特定条件下形成三元复合物(聚合酶、被引发的核酸和同源核苷酸之间)。所述方法可以包含检查阶段,接着是核苷酸掺入阶段。In one aspect, sequencing is performed using sequencing (SBB) technology performed by combining. Exemplary particularly useful sequencing performed by combining reaction is described in the following: U.S. Patent No. 10,077,470, No. 10,443,098, No. 10,400,272 and No. 10,975,427 and U.S. Patent Application Publication No. 20190119742, No. 20180187245 and No. 20200032322, each of which is incorporated herein by reference as a whole. In general, the method for determining the sequence of template nucleic acid molecules by combining sequencing can be based on forming a ternary complex (between polymerase, triggered nucleic acid and homologous nucleotides) under specific conditions. The method may include an inspection phase, followed by a nucleotide incorporation phase.

通过结合程序进行的测序中的检查阶段可以在流动池中进行,所述流动池具有至少一个利用引物引发的模板核酸分子(例如,多联RCA产物);将被引发的模板核酸分子与包含聚合酶和至少一种核苷酸类型的第一反应混合物接触;在核苷酸未被共价添加到引物的条件下,观察聚合酶和核苷酸与被引发的模板核酸分子的相互作用;以及使用观察到的聚合酶和核苷酸与被引发的模板核酸分子的相互作用来识别每个模板核酸中的下一个碱基。被引发的模板、聚合酶与核苷酸之间的相互作用可以以多种方案进行检测。例如,核苷酸可以含有可检测的标记。每个核苷酸可以具有相对于其它核苷酸可区分的标记。可替代地,不同核苷酸类型中的一些或全部可以具有相同的标记,并且可以例如基于不同核苷酸类型或其组合向流动池的单独递送来区分核苷酸类型。在一些实施例中,聚合酶可以被标记。与不同核苷酸类型相关的聚合酶可以具有区分与它们相关的核苷酸类型的独特标记。可替代地,聚合酶可以具有类似的标记,并且不同的核苷酸类型可以基于不同核苷酸类型向流动池的单独递送来区分(例如,将标记的聚合酶和一种或多种未标记的核苷酸组合一次递送)。The inspection phase in sequencing by binding procedures can be performed in a flow cell having at least one template nucleic acid molecule (e.g., a multiplex RCA product) primed with a primer; contacting the primed template nucleic acid molecule with a first reaction mixture comprising a polymerase and at least one nucleotide type; observing the interaction of the polymerase and the nucleotide with the primed template nucleic acid molecule under the condition that the nucleotide is not covalently added to the primer; and using the observed interaction of the polymerase and the nucleotide with the primed template nucleic acid molecule to identify the next base in each template nucleic acid. The interaction between the primed template, the polymerase and the nucleotide can be detected in a variety of schemes. For example, the nucleotide can contain a detectable label. Each nucleotide can have a label that is distinguishable relative to other nucleotides. Alternatively, some or all of the different nucleotide types can have the same label, and the nucleotide types can be distinguished, for example, based on the separate delivery of different nucleotide types or combinations thereof to the flow cell. In some embodiments, the polymerase can be labeled. The polymerase associated with different nucleotide types can have a unique label that distinguishes the nucleotide types associated with them. Alternatively, the polymerases can be similarly labeled, and the different nucleotide types can be distinguished based on separate delivery of the different nucleotide types to the flow cell (eg, a labeled polymerase and one or more unlabeled nucleotides are delivered in combination at one time).

在检查阶段期间,通过三元复合物的稳定可以有助于区分正确核苷酸和不正确核苷酸。多种条件和试剂可用于三元复合物的稳定,例如通过防止核苷酸掺入和/或防止三元复合物解离。例如,引物可以含有防止核苷酸共价附接的可逆阻断部分,延伸所需的辅因子(如二价金属离子)可以不存在,抑制基于聚合酶的引物延伸的抑制性二价阳离子可以存在,在检查阶段存在的聚合酶可以具有抑制引物延伸的化学修饰和/或突变,和/或核苷酸可以具有抑制掺入的化学修饰,如去除或改变天然三磷酸部分的5′修饰。如盐浓度、pH值和温度的条件也有助于三元复合物的稳定性。During the inspection phase, the correct nucleotides can be distinguished from incorrect nucleotides by the stability of the ternary complex. A variety of conditions and reagents can be used for the stability of the ternary complex, for example, by preventing the incorporation of nucleotides and/or preventing the dissociation of the ternary complex. For example, the primer can contain a reversible blocking portion that prevents the covalent attachment of the nucleotide, the cofactor required for extension (such as a divalent metal ion) can be absent, the inhibitory divalent cation that inhibits the extension of the polymerase-based primer can be present, the polymerase present in the inspection phase can have a chemical modification and/or mutation that inhibits primer extension, and/or the nucleotide can have a chemical modification that inhibits incorporation, such as removing or changing the 5' modification of the natural triphosphate portion. Conditions such as salt concentration, pH and temperature also contribute to the stability of the ternary complex.

然后可以通过在流动池中创造条件来进行延伸阶段,其中可以将核苷酸添加到每个模板核酸分子上的引物中。在一些实施例中,这涉及去除检查阶段中所使用的试剂,并用促进延伸的试剂替换它们。例如,可以用能够延伸的聚合酶和核苷酸来替换检查试剂。可替代地,可以将一种或多种试剂添加到检查阶段反应中,以创造延伸条件。例如,可以将催化性二价阳离子添加到缺乏阳离子的检查混合物中,和/或可以去除或禁用聚合酶抑制剂,和/或可以添加具有延伸能力的核苷酸,和/或可以添加解封试剂以使引物具有延伸能力,和/或可以添加具有延伸能力的聚合酶。任选地,酶促地掺入到被引发模板核酸分子的引物链中的核苷酸与在识别下一个正确核苷酸的检查步骤中使用的核苷酸不同。Then the extension phase can be carried out by creating conditions in the flow cell, wherein nucleotides can be added to the primers on each template nucleic acid molecule. In certain embodiments, this relates to removing the reagents used in the inspection phase, and replacing them with reagents that promote extension. For example, the inspection reagent can be replaced with a polymerase and nucleotides that can extend. Alternatively, one or more reagents can be added to the inspection phase reaction to create extension conditions. For example, catalytic divalent cations can be added to the inspection mixture lacking cations, and/or polymerase inhibitors can be removed or disabled, and/or nucleotides with extension ability can be added, and/or unblocking reagents can be added so that the primers have extension ability, and/or a polymerase with extension ability can be added. Optionally, the nucleotides enzymatically incorporated into the primer chain of the triggered template nucleic acid molecule are different from the nucleotides used in the inspection step of identifying the next correct nucleotide.

任选地,掺入步骤中使用的聚合酶与检查步骤中使用的聚合酶不同。任选地,掺入的核苷酸是可逆终止子核苷酸,其中引物延伸被限制为去除可逆终止子部分之前的单个核苷酸掺入。因此,对于使用可逆终止子核苷酸的实施例,可以(在检测发生之前或之后)将解封试剂递送至流动池。可以在不同的递送步骤之间进行洗涤。Optionally, the polymerase used in the incorporation step is different from the polymerase used in the inspection step. Optionally, the incorporated nucleotide is a reversible terminator nucleotide, wherein primer extension is limited to a single nucleotide incorporation prior to removal of the reversible terminator portion. Therefore, for embodiments using reversible terminator nucleotides, the unblocking reagent may be delivered to the flow cell (before or after detection occurs). Washing may be performed between different delivery steps.

上述检查阶段和延伸阶段可以循环进行,使得在每个循环中对单个下一个正确的核苷酸进行检查(即,下一个正确的核苷酸是正确结合模板核酸中的核苷酸的核苷酸,模板核酸中的所述核苷酸位于与杂交引物的3′端杂交的模板中碱基的紧邻5′处),并且随后,将单个下一个正确的核苷酸添加到引物中。可以进行任意数量的循环,包括例如至少1、2、5、10、20、25、30、40、50、75、100、150或更多个循环。可替代地或另外地,循环的数量可以被限制为不超过150、100、75、50、40、30、25、20、10、5、2或1个循环。该循环测序过程产生模板核酸序列中的全部或部分的读段。The above-mentioned inspection phase and extension phase can be carried out in cycles so that in each cycle a single next correct nucleotide is inspected (i.e., the next correct nucleotide is a nucleotide that correctly binds to a nucleotide in the template nucleic acid, and the nucleotide in the template nucleic acid is located immediately 5' to the base in the template that hybridizes to the 3' end of the hybridization primer), and then, the single next correct nucleotide is added to the primer. Any number of cycles can be performed, including, for example, at least 1, 2, 5, 10, 20, 25, 30, 40, 50, 75, 100, 150 or more cycles. Alternatively or additionally, the number of cycles can be limited to no more than 150, 100, 75, 50, 40, 30, 25, 20, 10, 5, 2 or 1 cycles. The cycle sequencing process generates reads of all or part of the template nucleic acid sequence.

也可以使用边合成边测序(SBS)技术。该技术通常涉及通过针对与引物杂交的模板链迭代添加核苷酸来对引物进行酶促延伸。简而言之,SBS可以通过将附接于流动池的特征的靶核酸与一种或多种标记的核苷酸、DNA聚合酶等接触来启动。使用靶核酸作为模板在其上延伸引物的那些特征将结合可以被检测的标记的核苷酸。任选地,标记的核苷酸可以进一步包含可逆的终止特性,一旦核苷酸已经被添加到引物中,所述特性终止进一步的引物延伸。例如,可以将具有可逆终止子部分的核苷酸类似物添加到引物中,使得随后的延伸不会发生,直至递送解封剂以去除该部分。因此,对于使用可逆性终止的实施例,可以(在检测发生之前或之后)将解封试剂递送至流动池。可以在不同的递送步骤之间进行洗涤。然后可以重复循环。例如,在下列中描述了可以容易地适用于在本公开的方法中与多联体阵列一起使用的示例性SBS程序、试剂和检测仪器:Bentley等人,《自然(Nature)》456:53-59(2008)、WO 04/018497、WO 91/06678、WO 07/123744,美国专利第7,057,026号、第7,329,492号、第7,211,414号、第7,315,019号和第7,405,281号以及美国专利申请公开第2008/0108082号,其中的每一个均通过引用并入本文。同样有用的还有可从依诺米娜有限(Illumina,Inc.)公司(加利福尼亚州圣地亚哥)商购获得的SBS方法。Sequencing by synthesis (SBS) technology can also be used. This technology generally involves enzymatic extension of primers by iteratively adding nucleotides for the template strand hybridized with primers. In short, SBS can be initiated by contacting the target nucleic acid attached to the feature of the flow cell with one or more labeled nucleotides, DNA polymerases, etc. The features of primers extended thereon using the target nucleic acid as a template will be combined with the nucleotides of the label that can be detected. Optionally, the labeled nucleotides can further include reversible termination characteristics, and once nucleotides have been added to the primer, the characteristic terminates further primer extension. For example, nucleotide analogs with reversible terminator parts can be added to the primer so that subsequent extension does not occur until an unblocking agent is delivered to remove the part. Therefore, for an embodiment using reversibility to terminate, an unblocking agent can be delivered to the flow cell (before or after detection occurs). Washing can be performed between different delivery steps. Then the cycle can be repeated. For example, exemplary SBS procedures, reagents, and detection instruments that can be readily adapted for use with concatemer arrays in the methods of the present disclosure are described in Bentley et al., Nature 456:53-59 (2008), WO 04/018497, WO 91/06678, WO 07/123744, U.S. Pat. Nos. 7,057,026, 7,329,492, 7,211,414, 7,315,019, and 7,405,281, and U.S. Patent Application Publication No. 2008/0108082, each of which is incorporated herein by reference. Also useful are SBS methods commercially available from Illumina, Inc. (San Diego, California).

一些SBS实施例包含检测核苷酸掺入到延伸产物时释放的质子。例如,基于检测释放的质子的测序可以使用可从赛默飞世尔公司(马塞诸塞州沃尔瑟姆(Waltham,MA))商购获得的或者在下列美国专利申请公开中描述的试剂和电检测器:第2009/0026082号、第2009/0127589号、第2010/0137143号和第2010/0282617号,其中的每一个均通过引用并入本文。Some SBS embodiments include detecting protons released when nucleotides are incorporated into extension products. For example, sequencing based on detecting released protons can use reagents and electrical detectors commercially available from Thermo Fisher Scientific (Waltham, MA) or described in the following U.S. Patent Application Publications: Nos. 2009/0026082, 2009/0127589, 2010/0137143, and 2010/0282617, each of which is incorporated herein by reference.

可以使用其它的测序程序,如焦磷酸测序。焦磷酸测序检测在特定的核苷酸被掺入到与模板核酸链杂交的新生引物中时无机焦磷酸(PPi)的释放。参见,例如,Ronaghi等人,《分析生物化学(Analytical Biochemistry)》242(1),84-9(1996);Ronaghi,《基因组研究(Genome Res.)》11(1),3-11(2001);Ronaghi等人《科学(Science)》281(5375),363(1998);以及美国专利第6,210,891号、第6,258,568号和第6,274,320号,其中的每一个均通过引入并入本文。在焦磷酸测序中,释放的PPi可以通过被ATP硫酸化酶转化为三磷酸腺苷(ATP)来检测,并且所得到的ATP可以经由荧光素酶产生的光子来检测。因此,可以经由发光检测系统来监测测序反应。Other sequencing procedures can be used, such as pyrophosphate sequencing. Pyrophosphate sequencing detects the release of inorganic pyrophosphate (PPi) when a specific nucleotide is incorporated into a nascent primer hybridized to a template nucleic acid strand. See, for example, Ronaghi et al., Analytical Biochemistry 242 (1), 84-9 (1996); Ronaghi, Genome Res. 11 (1), 3-11 (2001); Ronaghi et al., Science 281 (5375), 363 (1998); and U.S. Pat. Nos. 6,210,891, 6,258,568 and 6,274,320, each of which is incorporated herein by reference. In pyrophosphate sequencing, the released PPi can be detected by being converted into adenosine triphosphate (ATP) by ATP sulfurylase, and the resulting ATP can be detected via photons generated by luciferase. Therefore, the sequencing reaction can be monitored via a luminescence detection system.

通过连接反应进行测序也是有用的,包括例如下列中描述的那些:Shendure等人《科学》309:1728-1732(2005)和美国专利第5,599,675号和第5,750,341号,其中的每一个均通过引用并入。一些实施例可以包含通过杂交程序进行测序,如例如在下列中所描述的:Bains等人,《理论生物学杂志(Journal of Theoretical Biology)》135(3),303-7(1988);Drmanac等人,《自然-生物技术(Nature Biotechnology)》16,54-58(1998);Fodor等人,《科学》251(4995),767-773(1995);以及WO 1989/10977,其中的每一个均通过引用并入。在通过连接测序和通过杂交测序的程序中,与核酸模板杂交的引物通过寡核苷酸连接进行重复的延伸循环。通常,寡核苷酸是荧光标记的,并且能够被检测以确定模板的序列。Sequencing by ligation is also useful, including, for example, those described in Shendure et al. Science 309: 1728-1732 (2005) and U.S. Pat. Nos. 5,599,675 and 5,750,341, each of which is incorporated by reference. Some embodiments may include sequencing by hybridization procedures, such as described, for example, in Bains et al. Journal of Theoretical Biology 135(3), 303-7 (1988); Drmanac et al. Nature Biotechnology 16, 54-58 (1998); Fodor et al. Science 251(4995), 767-773 (1995); and WO 1989/10977, each of which is incorporated by reference. In sequencing by ligation and sequencing by hybridization procedures, a primer hybridized to a nucleic acid template undergoes repeated cycles of extension by oligonucleotide ligation. Typically, the oligonucleotides are fluorescently labeled and can be detected to determine the sequence of the template.

一些实施例可以利用涉及实时监测DNA聚合酶活性的方法。例如,可以通过携带荧光团的聚合酶和γ-磷酸盐标记的核苷酸之间的荧光共振能量转移(FRET)相互作用或零模式波导(ZMW)来检测核苷酸掺入。例如,用于经由FRET和/或ZMW检测进行测序的技术和试剂在下列中进行了描述:Levene等人,《科学》299,682-686(2003);Lundquist等人《光学快报(Opt.Lett.)》33,1026-1028(2008);以及Korlach等人《美国国家科学院院报(Proc.Natl.Acad.Sci.USA)》105,1176-1181(2008),其公开内容通过引用并入本文。Some embodiments may utilize methods involving real-time monitoring of DNA polymerase activity. For example, nucleotide incorporation may be detected by fluorescence resonance energy transfer (FRET) interactions or zero-mode waveguides (ZMW) between a polymerase carrying a fluorophore and a γ-phosphate labeled nucleotide. For example, techniques and reagents for sequencing via FRET and/or ZMW detection are described in Levene et al., Science 299, 682-686 (2003); Lundquist et al., Opt. Lett. 33, 1026-1028 (2008); and Korlach et al., Proc. Natl. Acad. Sci. USA 105, 1176-1181 (2008), the disclosures of which are incorporated herein by reference.

双端测序Paired-end sequencing

如本领域已知的,双端测序有许多用途。例如,在某些情况下,通过使用通过结合测序技术或边合成边测序技术,特别是当使用阻断的标记核苷酸时,可以从靶核酸可靠地获得的序列数据量可以被限制为例如数十或数百个循环的掺入(并且因此测序读段中为数十或数百个碱基)。虽然此类短读段可能非常有用,特别是在如例如SNP分析和基因分型的应用中,但在许多情况下,能够可靠地获得相同靶分子的进一步序列数据是有利的。“双端”或“成对”测序的技术允许从单个多核苷酸双链体上的两个位置,例如从双链体的两个相对末端,确定序列的两个读段。例如,在靶双链体的长度是平均测序读段长度的两倍以上的应用中,已知“双端”序列出现在单个双链体上并且因此在基因组中连接或配对,这极大地帮助将整个基因组序列组装成共有序列。在靶双链体的长度小于平均测序读段长度两倍的应用中,来自相对末端的读段在靶标的中间重叠,并且因此可以进行比较以增加该区域碱基测定的准确性,这是有用的,因为随着测序技术接近其读段长度的极限,准确性会降低。双端序列另外的用途在本领域是众所周知的。As known in the art, double-ended sequencing has many uses. For example, in some cases, by using a combination of sequencing technology or sequencing by synthesis technology, particularly when using blocked labeled nucleotides, the amount of sequence data that can be reliably obtained from the target nucleic acid can be limited to, for example, tens or hundreds of cycles of incorporation (and therefore tens or hundreds of bases in sequencing reads). Although such short reads may be very useful, particularly in applications such as, for example, SNP analysis and genotyping, in many cases, it is advantageous to be able to reliably obtain further sequence data of the same target molecule. The technology of "double-ended" or "paired" sequencing allows two reads of a sequence to be determined from two positions on a single polynucleotide duplex, such as from two opposite ends of a duplex. For example, in applications where the length of a target duplex is more than twice the average sequencing read length, it is known that "double-ended" sequences appear on a single duplex and are therefore connected or paired in the genome, which greatly helps to assemble the entire genome sequence into a consensus sequence. In applications where the length of the target duplex is less than twice the average sequencing read length, reads from opposite ends overlap in the middle of the target and can therefore be compared to increase the accuracy of base calls in that region, which is useful because accuracy decreases as sequencing technologies approach their read length limits. Additional uses for paired-end sequences are well known in the art.

本文提供一种改进的双端测序的方法。下列参考文献提供了双端测序的替代方法和用途。参见,例如Vermass的美国专利第8,192,930号“对多核苷酸模板测序的方法(Method for sequencing a polynucleotide template)”、Rigatti的美国专利第8,105,784号“通过桥式扩增在文库上制备的双端读段(Paired end reads on libraries madeby bridge amplification)”、Chen的国际专利申请公开WO2004070005“通过阻断和解阻断的双端测序(Double ended sequencing by blocking and unblocking)”以及美国专利申请公开第20210189483号“用于双端测序的受控链置换(Controlled strand displacementfor paired-end sequencing)”,其出于所有目的通过引用整体并入本文。An improved method for double-end sequencing is provided herein. The following references provide alternative methods and uses for double-end sequencing. See, for example, U.S. Patent No. 8,192,930 to Vermass, "Method for sequencing a polynucleotide template", U.S. Patent No. 8,105,784 to Rigatti, "Paired end reads on libraries made by bridge amplification", Chen's International Patent Application Publication WO2004070005 "Double ended sequencing by blocking and unblocking", and U.S. Patent Application Publication No. 20210189483 "Controlled strand displacement for paired-end sequencing", which are incorporated herein by reference in their entirety for all purposes.

组合物、系统和试剂盒Compositions, systems and kits

与这些方法相关、由这些方法产生或在这些方法中使用的组合物、系统和试剂盒同样是本发明的特征。例如,一般类别的实施例提供一种包含核酸多联体阵列的组合物。将多联体结合到表面,例如,其中不同的多联体在有序排布的不同位点处,或者不同的多联体在无序阵列中的不同位点处。给定多联体在阵列中的位置可以是预定的或者随机的。每一个多联体包括多重拷贝的:包括第一测序引物结合位点的第一衔接子区域、靶核酸序列的正向链、不同于第一衔接子区域的第二衔接子区域以及与正向链互补的靶核酸序列的反向链。第二衔接子区域可以包含序列与第一测序引物结合位点不同的第二测序引物结合位点。Compositions, systems and kits related to, produced by or used in these methods are also features of the present invention. For example, a general class of embodiments provides a composition comprising an array of nucleic acid concatemers. The concatemers are bound to a surface, for example, wherein different concatemers are at different sites in an orderly arrangement, or different concatemers are at different sites in an unordered array. The position of a given concatemer in the array can be predetermined or random. Each concatemer includes multiple copies of: a first adapter region including a first sequencing primer binding site, a forward strand of a target nucleic acid sequence, a second adapter region different from the first adapter region, and a reverse strand of a target nucleic acid sequence complementary to the forward strand. The second adapter region may include a second sequencing primer binding site having a sequence different from the first sequencing primer binding site.

多联体可以共价或非共价地结合到表面上。例如,每个多联体可以包括结合对的第一成员(例如,生物素),其与转而结合到表面的结合对的第二成员(例如,亲和素或链霉亲和素)结合。阵列可以包含大量的多联体,数千到数百万到数十亿(例如,至少一百万、至少一千万、至少一亿、至少十亿、至少二十亿或至少三十亿个多联体)。阵列中的多联体可以被固定在例如基材的平坦表面上、基材的非平坦表面上或者以三维方式固定在基材内,如在凝胶基质内。合适的基材是本领域众所周知的,包括但不限于玻片、芯片、流动池的表面等。Concatemers can be covalently or non-covalently bound to the surface. For example, each concatemer can include a first member of a binding pair (e.g., biotin), which is combined with a second member of a binding pair (e.g., avidin or streptavidin) that is in turn bound to the surface. The array can include a large number of concatemers, thousands to millions to billions (e.g., at least one million, at least ten million, at least one hundred million, at least one billion, at least two billion or at least three billion concatemers). The concatemers in the array can be fixed on a flat surface of, for example, a substrate, on a non-flat surface of the substrate, or fixed in a three-dimensional manner in the substrate, such as in a gel matrix. Suitable substrates are well known in the art, and include but are not limited to the surface of a slide, a chip, a flow cell, etc.

在一些实施例中,将第一测序引物与第一测序引物结合位点杂交。在一些实施例中,将第二测序引物与第二测序引物结合位点杂交。组合物可以包含通过延伸第一测序引物和/或第二测序引物而产生的新生链。将通过第一测序引物的延伸而产生的新生链任选地进行阻断。组合物任选地还包含:聚合酶(例如,链置换聚合酶或缺乏链置换活性的聚合酶)、一种或多种核苷酸(例如,天然存在的核苷酸、非天然核苷酸、标记的核苷酸、可逆终止子核苷酸和/或链终止核苷酸)、掩蔽引物、阻断性寡核苷酸、置换引物、掩蔽链、置换链、一种或多种订书钉型寡核苷酸和/或测序过程中使用的其它试剂。组合物任选地存在于核酸测序系统中。In some embodiments, the first sequencing primer is hybridized to the first sequencing primer binding site. In some embodiments, the second sequencing primer is hybridized to the second sequencing primer binding site. The composition may include a nascent chain generated by extending the first sequencing primer and/or the second sequencing primer. The nascent chain generated by the extension of the first sequencing primer is optionally blocked. The composition optionally further includes: a polymerase (e.g., a strand displacement polymerase or a polymerase lacking strand displacement activity), one or more nucleotides (e.g., naturally occurring nucleotides, non-natural nucleotides, labeled nucleotides, reversible terminator nucleotides and/or chain termination nucleotides), a masked primer, a blocking oligonucleotide, a displacement primer, a masked strand, a displacement strand, one or more stapled oligonucleotides and/or other reagents used in the sequencing process. The composition is optionally present in a nucleic acid sequencing system.

基本上上述所有特征也相关地适用于这些实施例,例如,关于多联体中重复单元的拷贝数、组合物中包含用于去除新生链或掩蔽链或对衔接子区域切刻的核酸酶、组合物中包含单链结合蛋白、合适的阵列基材等。Essentially all of the features described above are also relevantly applicable to these embodiments, for example, regarding the copy number of the repeat unit in the concatemer, the inclusion of nucleases in the composition for removing nascent strands or masking strands or nicking the linker region, the inclusion of single-stranded binding proteins in the composition, suitable array substrates, etc.

另一种一般类别的实施例提供一种试剂盒,其包含下列一种或多种的任意组合:被构造为结合多种核酸多联体的固体支持物、第一茎环衔接子、不同于第一茎环衔接子的第二茎环衔接子、用于进行滚环扩增的试剂(例如,滚环扩增引物、链置换聚合酶以及一种或多种核苷酸)、第一测序引物、任选地第二测序引物以及用于进行核酸测序的试剂(例如,聚合酶以及一种或多种核苷酸,任选地包含一种、两种、三种或四种标记的核苷酸)。用于滚环扩增和测序的聚合酶通常是不同的聚合酶,但在一些实施例中可以是相同的。试剂盒还可以包含用于产生环状核酸分子、进行滚环扩增以产生多联体以及进行核酸测序的另外的试剂,包括但不限于缓冲反应溶液、掩蔽引物、阻断性寡核苷酸、置换引物、一种或多种订书钉型寡核苷酸、位点特异性核酸内切酶和/或核酸外切酶。试剂盒通常还包含使用组分用于执行也在本文中描述或参考的所需过程的说明书,例如,用于产生环状核酸分子、进行滚环扩增以产生多联体以及进行核酸测序。试剂盒的组分包装在一个或多个容器中。Another general class of embodiments provides a kit comprising any combination of one or more of the following: a solid support configured to bind multiple nucleic acid concatemers, a first stem-loop adapter, a second stem-loop adapter different from the first stem-loop adapter, reagents for performing rolling circle amplification (e.g., rolling circle amplification primers, strand displacement polymerases, and one or more nucleotides), a first sequencing primer, optionally a second sequencing primer, and reagents for performing nucleic acid sequencing (e.g., polymerases and one or more nucleotides, optionally containing one, two, three, or four labeled nucleotides). The polymerases used for rolling circle amplification and sequencing are typically different polymerases, but may be the same in some embodiments. The kit may also include additional reagents for generating circular nucleic acid molecules, performing rolling circle amplification to generate concatemers, and performing nucleic acid sequencing, including but not limited to a buffered reaction solution, a masking primer, a blocking oligonucleotide, a displacement primer, one or more stapled oligonucleotides, a site-specific endonuclease, and/or an exonuclease. The kit typically also includes instructions for using the components to perform the desired process also described or referenced herein, for example, for generating circular nucleic acid molecules, performing rolling circle amplification to generate concatemers, and performing nucleic acid sequencing. The components of the kit are packaged in one or more containers.

基本上上文所述的所有特征也相关地适用于这些实施例,例如,关于合适的阵列基材、包含单链结合蛋白等。Essentially all features described above also apply to these embodiments, e.g. with respect to suitable array substrates, inclusion of single-stranded binding proteins, etc.

适合用于进行在本发明的方法中有用的测序过程的各种核酸测序系统是本领域已知的(参见,例如上述详述测序技术的参考文献)且/或是可商购获得的。此类测序系统可以包含流体处理系统(例如,用于在测序过程期间将试剂递送到包括有多联体阵列的基材)和检测系统。Various nucleic acid sequencing systems suitable for performing sequencing processes useful in the methods of the present invention are known in the art (see, e.g., the references above detailing sequencing techniques) and/or are commercially available. Such sequencing systems may include a fluid handling system (e.g., for delivering reagents to a substrate including a concatemer array during the sequencing process) and a detection system.

例如,测序过程(例如,使用上文所描述的基材和本发明的组合物和方法的测序过程)可以是在荧光光学系统的背景下开发利用的,所述荧光光学系统能够照射基材上的各个位置,并从这些位置(例如,每个由不同的多联模板占据的可单独分辨的位置)获得、检测并单独记录荧光信号。此类系统通常使用一个或多个照明源,为正在使用的标记物提供适当波长的激发光。光学系统将激发光引导到反应区域,收集发射的荧光信号并将其引导到适当的一个或多个检测器。光学系统另外的组件可以提供对光谱上不同(例如,来自不同荧光标记)的信号的分离,并且将这些分离的信号引导到单个检测器的不同部分或不同的检测器。其它组件可以提供对光信号的空间滤波、将激发和/或发射光聚焦并引导到基材以及从基材进行聚焦和引导。For example, a sequencing process (e.g., a sequencing process using the substrate described above and the compositions and methods of the present invention) can be developed and utilized in the context of a fluorescent optical system that is capable of irradiating various locations on the substrate and obtaining, detecting, and separately recording fluorescent signals from these locations (e.g., each individually distinguishable location occupied by a different multiplex template). Such systems typically use one or more illumination sources to provide excitation light of appropriate wavelengths for the marker being used. The optical system directs the excitation light to the reaction area, collects the emitted fluorescent signal and directs it to an appropriate one or more detectors. Additional components of the optical system can provide separation of spectrally different signals (e.g., from different fluorescent labels), and direct these separated signals to different parts of a single detector or different detectors. Other components can provide spatial filtering of optical signals, focusing and directing the excitation and/or emission light to the substrate, and focusing and directing from the substrate.

本文描述的方法可以进一步包含计算机实现的过程,和/或结合到计算机可读介质上指导此类过程的软件。因此,由上文所描述的反应和光学系统所生成的信号数据被输入或以其它方式接收到计算机或其它数据处理器中,并经历各处理步骤或组件中的一个或多个。一旦进行了这些过程,所得到的计算机实现的过程的输出就可以以有形或可观察的格式产生,例如,打印在用户可读的报告中或显示在计算机显示器上。所得到的输出可以存储在一个或多个数据库中,用于以后的评估、处理、报告等,或者它可以由计算机保留或传输到不同的计算机,以用于配置后续的反应或数据处理。Method described herein can further comprise computer-implemented process, and/or be attached to the software of instructing such process on computer-readable medium.Therefore, the signal data generated by reaction described above and optical system are input or otherwise received in computer or other data processor, and experience one or more in each processing step or assembly.Once these processes have been carried out, the output of the computer-implemented process obtained can be produced in a tangible or observable format, for example, printed in a user-readable report or displayed on a computer display.The output obtained can be stored in one or more databases, for later assessment, processing, reporting, etc., or it can be retained by a computer or transferred to different computers, for configuring subsequent reaction or data processing.

应当理解,本文所描述的实例和实施例仅用于说明目的,并且将向本领域的技术人员建议根据其进行的各种修改或改变并被包括在本申请的精神和范围内以及所附权利要求书的范围内。It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes therefrom will be suggested to those skilled in the art and are to be included within the spirit and purview of this application and the scope of the appended claims.

虽然为了清楚和理解的目的已经对前述发明进行了一些详细的描述,通过阅读本公开,本领域技术人员将清楚可在不脱离本发明的真实范围的情况下进行形式和细节的各种改变。例如,上文所描述的所有技术和设备可以以各种组合使用。本申请中引用的所有出版物、专利、专利申请和/或其它文献出于所有目的通过引用整体并入,其程度如同每个单独的出版物、专利、专利申请和/或其它文献单独地指出出于所有目的通过引用并入一样,包括出于描述和公开装置、组合物、制剂以及在出版物中描述的并且可以与当前描述的发明结合使用的方法的目的。Although the aforementioned invention has been described in some detail for the purpose of clarity and understanding, it will be clear to those skilled in the art, by reading this disclosure, that various changes in form and detail may be made without departing from the true scope of the invention. For example, all techniques and equipment described above may be used in various combinations. All publications, patents, patent applications, and/or other documents cited in this application are incorporated by reference in their entirety for all purposes, to the extent that each individual publication, patent, patent application, and/or other document is individually indicated to be incorporated by reference for all purposes, including for the purpose of describing and disclosing devices, compositions, preparations, and methods described in publications and that may be used in conjunction with the presently described invention.

Claims (43)

1. A method for double ended sequencing, comprising:
a) Providing a nucleic acid concatemer comprising multiple sequential copies of: a first adapter region, a forward strand of a target nucleic acid sequence, a second adapter region different from the first adapter region, and a reverse strand of the target nucleic acid sequence complementary to the forward strand; and
B) Performing a sequencing process to generate double-ended reads of the target nucleic acid sequence by:
i) Hybridizing a first sequencing primer to the first adapter region and obtaining a first read of a first portion of the target nucleic acid sequence by sequencing from the first sequencing primer; and
Ii) hybridizing a second sequencing primer to the second adapter region and obtaining a second read of a second portion of the target nucleic acid sequence by sequencing from the second sequencing primer;
Wherein the first read and the second read constitute a double-ended read of the target nucleic acid sequence.
2. The method of claim 1, wherein the nucleic acid concatemers are generated by:
Providing a circular nucleic acid molecule comprising: a central region comprising the forward strand and the complementary reverse strand; the center region has two ends, the forward chain is connected to the reverse chain at one end with a first connection region and the forward chain is connected to the reverse chain at the other end with a second connection region; and
Rolling circle amplification is performed using the circular nucleic acid molecule as a template to produce the nucleic acid concatemer.
3. The method of claim 2, wherein the rolling circle amplification step is performed in solution.
4. The method of claim 2, wherein the rolling circle amplification step is performed on the surface of a solid support.
5. The method of claim 1, wherein step ii) is performed after step i).
6. The method of claim 5, wherein after the first read is obtained, nascent strand formed by sequencing from the first primer is removed prior to hybridizing the second sequencing primer to the second adapter region.
7. The method of claim 6, wherein the nascent strand formed by sequencing from the first primer is removed by cleavage and washing, exonuclease digestion, or denaturation.
8. The method of claim 5, wherein the nascent strand formed by sequencing from the first primer is not removed.
9. The method of claim 8, wherein the 3' end of the nascent strand formed by sequencing from the first primer is blocked prior to hybridizing the second sequencing primer to the second adapter region.
10. The method of claim 1, wherein step i) and step ii) are performed simultaneously.
11. The method of claim 10, wherein sequencing from the first sequencing primer and the second sequencing primer to obtain the first read and second read comprises detecting a signal, wherein a signal conducive to obtaining the first read is distinguishable from a signal conducive to obtaining the second read based on its intensity.
12. The method of claim 11, wherein the first sequencing primer and the second sequencing primer are provided at different concentrations.
13. The method of claim 11, wherein the second sequencing primer is provided in the form of a mixture of an extendable oligonucleotide and a non-extendable oligonucleotide.
14. The method of claim 11, wherein the first sequencing primer and the second sequencing primer anneal to their respective adapter regions with significantly different efficiencies.
15. The method of claim 10, wherein sequencing from the first sequencing primer and the second sequencing primer to obtain the first read and the second read comprises mapping to a known reference sequence.
16. The method of claim 1, wherein sequencing from the first sequencing primer and the second sequencing primer comprises extending the first sequencing primer and the second sequencing primer with a strand displacement polymerase.
17. The method of claim 1, wherein sequencing from the first sequencing primer and the second sequencing primer is performed in the presence of a single strand binding protein.
18. The method of claim 1, wherein performing a sequencing process to generate a double-ended read of the target nucleic acid sequence comprises: synthesizing a first masking strand complementary to the forward strand prior to hybridizing the first sequencing primer to the first adapter region, and synthesizing a second masking strand complementary to the reverse strand prior to hybridizing the second sequencing primer to the second adapter region.
19. The method of claim 1, wherein sequencing from the first sequencing primer and the second sequencing primer comprises sequencing by a binding technique.
20. The method of claim 1, wherein the nucleic acid concatemer comprises at least 100 sequential copies of the first adapter region, the forward strand, the second adapter region, and the reverse strand.
21. A method for nucleic acid sequencing, comprising:
Providing a nucleic acid concatemer comprising multiple sequential copies of: a first adapter region, a forward strand of a target nucleic acid sequence, a second adapter region different from the first adapter region, and a reverse strand of the target nucleic acid sequence complementary to the forward strand;
Hybridizing a masking primer to the second adapter region and extending the masking primer to produce a first masking strand complementary to the forward strand, wherein the first masking strand is not equally complementary to the entire first adapter region; and
A first sequencing primer is hybridized to the first adapter region and sequenced from the first sequencing primer to obtain a first read of the first portion of the target nucleic acid sequence.
22. The method of claim 21, wherein extending the masking primer comprises extending the masking primer with a strand displacement polymerase.
23. The method of claim 22, comprising hybridizing an oligonucleotide that blocks strand displacement to the first adapter region prior to extending the masking primer.
24. The method of claim 21, wherein the first adapter region comprises at least one non-natural nucleotide, wherein the masking primer extends under conditions that exclude the complement of the at least one non-natural nucleotide.
25. The method of claim 21, comprising introducing a cut into the first adapter region prior to extending the masking primer.
26. The method of claim 25, comprising hybridizing one end of a staple-type oligonucleotide to the first adapter region and hybridizing the other end of the staple-type oligonucleotide to the second adapter region.
27. The method of claim 25, wherein the 5' end of the first sequencing primer hybridizes to the second adapter region.
28. The method of claim 25, wherein the 5' end of the masking primer hybridizes to the first adapter region.
29. The method as claimed in claim 21, comprising:
After sequencing from the first sequencing primer, further extending the nascent strand produced by sequencing from the first sequencing primer to produce a second masked strand complementary to the reverse strand, wherein the second masked strand is not also complementary to the entire second adapter region;
Removing the first masking chain; and
Hybridizing a second sequencing primer to the second adapter region and obtaining a second read of a second portion of the target nucleic acid sequence by sequencing from the second sequencing primer.
30. The method of claim 29, wherein the masking primer comprises a 5' phosphate group, and wherein removing the first masking strand comprises digesting the first masking strand with lambda exonuclease.
31. The method of claim 21, wherein the first sequencing primer comprises a5 'region and a 3' region that each hybridizes to the first adapter region and flank a central region that does not hybridize to the first adapter region, wherein a portion of the first adapter region remains single stranded when the first sequencing primer hybridizes to the first adapter region; the method comprises the following steps:
Further extending the nascent strand generated by sequencing from the first sequencing primer to generate a first extended strand complementary to the reverse strand, wherein the first extended strand is not likewise complementary to the entire second adapter region;
hybridizing a displacement primer to a single stranded portion of the first adapter region;
Extending the displacement primer with a strand displacement polymerase to displace the first extension strand from the reverse strand, wherein the 5' region of the first extension strand remains hybridized to the first adapter region; and
A second sequencing primer is then hybridized to the first extended strand, and a second read of the second portion of the target nucleic acid sequence is obtained by sequencing from the second sequencing primer.
32. The method of claim 21, wherein the nucleic acid concatemers are generated by:
Providing a circular nucleic acid molecule comprising: a central region comprising the forward strand and the complementary reverse strand; the center region has two ends, the forward chain is connected to the reverse chain at one end with a first connection region and the forward chain is connected to the reverse chain at the other end with a second connection region; and
Rolling circle amplification is performed using the circular nucleic acid molecule as a template to produce the nucleic acid concatemer.
33. The method of claim 21, wherein the rolling circle amplification step is performed in solution.
34. The method of claim 21, wherein the rolling circle amplification step is performed on the surface of a solid support.
35. The method of claim 21, wherein sequencing from the first sequencing primer comprises sequencing by a binding technique.
36. A composition, comprising:
An array of nucleic acid concatemers, each nucleic acid concatemer bound to a surface and comprising multiple copies: a first adapter region comprising a first sequencing primer binding site, a forward strand of a target nucleic acid sequence, a second adapter region different from the first adapter region, and a reverse strand of the target nucleic acid sequence complementary to the forward strand.
37. The composition of claim 36, wherein the second adapter region comprises a second sequencing primer binding site.
38. The composition of claim 36, comprising a first sequencing primer that hybridizes to the first sequencing primer binding site.
39. The composition of claim 36, comprising a second sequencing primer that hybridizes to the second sequencing primer binding site.
40. The composition of claim 36, comprising a polymerase.
41. The composition of claim 36, comprising one or more of a masking primer, a blocking oligonucleotide, a displacement primer, a masking strand, and a displacement strand.
42. The composition of claim 36, wherein the composition is present in a DNA sequencing system.
43. A kit, comprising:
A solid support configured to bind a plurality of nucleic acid concatemers;
a first stem-loop adapter;
A second stem-loop adaptor different from the first stem-loop adaptor;
Reagents for performing rolling circle amplification;
A first sequencing primer; and
Reagents for performing nucleic acid sequencing.
CN202280059251.XA 2021-07-08 2022-07-07 Double-end sequencing method and composition Pending CN117940582A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/219,738 2021-07-08
US202163246188P 2021-09-20 2021-09-20
US63/246,188 2021-09-20
PCT/US2022/036374 WO2023283347A1 (en) 2021-07-08 2022-07-07 Paired-end sequencing methods and compositions

Publications (1)

Publication Number Publication Date
CN117940582A true CN117940582A (en) 2024-04-26

Family

ID=90768663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280059251.XA Pending CN117940582A (en) 2021-07-08 2022-07-07 Double-end sequencing method and composition

Country Status (1)

Country Link
CN (1) CN117940582A (en)

Similar Documents

Publication Publication Date Title
US11965158B2 (en) Methods and compositions using one-sided transposition
EP3981884B1 (en) Single cell whole genome libraries for methylation sequencing
US11486004B2 (en) Methods of sequencing circular template polynucleotides
EP3083994B1 (en) Preserving genomic connectivity information in fragmented genomic dna samples
CN118638898A (en) Method for enrichment of targeted nucleic acid sequences and application in error-corrected nucleic acid sequencing
CN102084001A (en) Compositions and methods for nucleic acid sequencing
CN113528628A (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
US20230010287A1 (en) Paired-end sequencing methods and compositions
WO2021231263A2 (en) Nucleic acid amplification methods
CN117940582A (en) Double-end sequencing method and composition
WO2022272150A2 (en) Linked transcript sequencing
HK40047492A (en) Methods and compositions using one-sided transposition
HK40072046B (en) Single cell whole genome libraries for methylation sequencing
HK40072046A (en) Single cell whole genome libraries for methylation sequencing
HK40104820A (en) Single cell whole genome libraries for methylation sequencing
HK40104820B (en) Single cell whole genome libraries for methylation sequencing
HK40063948A (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
HK40016494B (en) Single cell whole genome libraries for methylation sequencing
HK40016494A (en) Single cell whole genome libraries for methylation sequencing
HK1235086B (en) Methods and compositions using one-sided transposition
HK1235086A1 (en) Methods and compositions using one-sided transposition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20240426

WD01 Invention patent application deemed withdrawn after publication