CN108728515A - A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods - Google Patents
A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods Download PDFInfo
- Publication number
- CN108728515A CN108728515A CN201810585283.9A CN201810585283A CN108728515A CN 108728515 A CN108728515 A CN 108728515A CN 201810585283 A CN201810585283 A CN 201810585283A CN 108728515 A CN108728515 A CN 108728515A
- Authority
- CN
- China
- Prior art keywords
- sequence
- single strand
- connector
- strand dna
- ctdna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a kind of analysis methods of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods.A kind of method in the library that the present invention provides structures for detecting the mutation of ctDNA low frequencies, in turn includes the following steps:(1) ctDNA samples are carried out to end successively to repair and 3 ' ends plus A processing;(2) by step (1), treated that ctDNA is connect with connector mixture, and library is obtained after PCR amplification.The present invention has the following advantages:1, barcode labels have been used, and have been reduced to the combined sequence of 4bp long, have been increased operation rate, detection sensitivity is improved, synthesis is simple, at low cost, it is easy to use, greatly improve the ratio of duplex mark molecules in the joint efficiency and sequencing data of adapter.2, the analysis of biological information method of the identification duplex used can quickly and effectively remove the mistake introduced during sequencing, capture and PCR, reduce the false positive of detection.
Description
Technical field
The present invention relates to a kind of points of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods
Analysis method.
Background technology
CtDNA (circulating tumor DNA), i.e. Circulating tumor DNA, refer to tumour cell release be present in blood
DNA fragmentation in the body fluid such as liquid, cerebrospinal fluid is a kind of characteristic tumor biomarker.The variation of ctDNA detects, and can be used for
The early diagnosis of tumour, the dynamic monitoring of tumor development and curative effect, Drug Resistance Detection, recurring risk assessment etc..CtDNA is in blood
Content in the body fluid such as slurry is very low, and the 1% of usually less than total cfDNA, variation detection difficulty is very big.
Technology currently used for ctDNA abrupt climatic changes mainly has ARMS methods (mainly Super-ARMS), second generation sequencing
(NGS) and digital pcr (dPCR, including BEAMing technologies).Super-ARMS and digital pcr technology are easy quickly, specificity and
Sensitivity is higher, the disadvantage is that the known mutations of finite number can only be detected, flux is low.NGS flux is high, and detection gene dosage is not
It is limited, it can detect known or unknown mutation.But NGS technical sophistications, time-consuming and is not easy to standardize, library prepares and sequencing
False positive may be all introduced in the process, influence result interpretation.
Mainly there are three sources for the false positive of NGS testing results:1. the PCR mistakes accumulated in the preparation process of library;2. surveying
The sequencing mistake generated in program process;3. sample room pollutes, the microarray datasets inherent technology such as mainly Hiseq X, Novaseq lacks
Index calibration is to the problem on wrong sample caused by falling into, also referred to as " label jump " (index hopping).
Duplex technologies, using special joint sequence, can introduce random in library construction in the both sides of target fragment
Sequence label, to label from same molecule positive minus strand.By the dual correction of positive minus strand, can be significantly reduced
The false positive of PCR mistakes and sequencing error tape.The technology main is asked in the application of ct DNA variation detection there are two
Topic:1. joint sequence purity is relatively low, initial molecule is caused to lose in connection procedure;2. the duplex ratios detected in sequencing result
Example very low (10%-15% or so), most of molecule can not carry out positive minus strand correction.
Invention content
The object of the present invention is to provide a kind of library construction using the detection ctDNA low frequencies mutation of duplex methods and sequencings
The analysis method of data.
A kind of method in the library that the present invention provides structures for detecting the mutation of ctDNA low frequencies includes following step successively
Suddenly:
(1) ctDNA samples are carried out to end successively to repair and 3 ' ends plus A processing;
(2) by step (1), treated that ctDNA is connect with connector mixture, and library is obtained after PCR amplification;
The connector mixture is made of n connector;
Each connector forms part duplex structure by a sense primer first and a downstream primer first and obtains;
There is barcode label first in sense primer first;There is barcode label second in downstream primer first;
Barcode labels first and barcode label second reverse complementals;
Barcode label first is made of A, T, C and G, is put in order arbitrary;
N connector is using n different barcode label first;
The random natural number that n is >=8.
First nucleotide of 3 ' ends of the sense primer first is the T with modification;The purpose of the modification is anti-
Only exonuclease is degraded;5 ' ends of downstream primer first carry out phosphorylation modification.
Also there is adapter sequences 1 in the sense primer first;Also there is adapter sequences in the downstream primer first
2;Adapter sequences 1 and adapter sequences 2 are selected according to microarray dataset, partial sector reverse complemental in the two;N connector is adopted
With identical adapter sequences 1 and adapter sequences 2.
The partially double stranded structure by label first and adapter sequences 1 partial sequence and barcode labels second and
Partial sequence reverse complemental in adapter sequences 2 obtains.
The sense primer first is followed successively by adapter sequences 1, barcode labels first and base T from 5 ' ends.
The downstream primer first is followed successively by barcode labels second and adapter sequences 2 from 5 ' ends.
The connector can be by being annealed to obtain sense primer first and downstream primer first.
In the connector mixture, each connector equimolar mixing.
When microarray dataset is illumina microarray datasets, adapter sequences 1 specifically can be such as the sequence 1 from 5 ' of sequence table
It holds shown in the 1st to 21, the adapter sequences 2 specifically can be if the sequence 2 of sequence table is from shown in 5 ' the 5th to 26, ends.
The n concretely 12.
When n is 12, the barcode labels first of 12 connectors respectively as the sequence 1 of sequence table from 5 ' the 22nd to 25, ends,
Sequence 3 is held from 5 ' the 22nd to 25, ends, sequence 9 from 5 ' from 5 ' the 22nd to 25, ends, sequence 5 from 5 ' the 22nd to 25, ends, sequence 7
22nd to 25, sequence 11 from 5 ' end the 22nd to 25, sequence 13 from 5 ' end the 22nd to 25, sequence 15 from 5 ' end the 22nd to 25
Position, sequence 17 hold the 22nd to 25 and sequences from 5 ' the 22nd to 25, ends, sequence 21 from 5 ' the 22nd to 25, ends, sequence 19 from 5 '
23 from shown in 5 ' the 22nd to 25, ends.
When n is 12,12 connectors are as follows:
Single strand dna forms part shown in single strand dna and sequence 2 shown in sequence 1 of the connector 1 by sequence table
Duplex structure obtains;Single strand dna shape shown in single strand dna and sequence 4 shown in sequence 3 of the connector 2 by sequence table
It is obtained at partially double stranded structure;Single stranded DNA shown in single strand dna and sequence 6 shown in sequence 5 of the connector 3 by sequence table
Molecule forms part duplex structure and obtains;It is single shown in single strand dna and sequence 8 shown in sequence 7 of the connector 4 by sequence table
Ssdna molecule forms part duplex structure and obtains;10 institute of single strand dna shown in sequence 9 of the connector 5 by sequence table and sequence
The single strand dna shown forms part duplex structure and obtains;Single strand dna shown in sequence 11 of the connector 6 by sequence table and
Single strand dna shown in sequence 12 forms part duplex structure and obtains;It is single-stranded shown in sequence 13 of the connector 7 by sequence table
Single strand dna shown in DNA molecular and sequence 14 forms part duplex structure and obtains;Connector 8 by sequence table 15 institute of sequence
Single strand dna shown in the single strand dna and sequence 16 shown forms part duplex structure and obtains;Connector 9 is by sequence table
Single strand dna shown in single strand dna and sequence 18 shown in sequence 17 forms part duplex structure and obtains;Connector 10 by
Single strand dna shown in single strand dna and sequence 20 shown in the sequence 19 of sequence table forms part duplex structure and obtains;
Single strand dna shown in single strand dna and sequence 22 shown in sequence 21 of the connector 11 by sequence table is formed partially double stranded
Structure obtains;Single strand dna is formed shown in single strand dna and sequence 24 shown in sequence 23 of the connector 12 by sequence table
Partially double stranded structure obtains.
In the method, the primer pair that the PCR amplification uses is made of sense primer second and downstream primer second;Under described
It includes index sequence labels to swim primer second.
The sense primer second specifically can be as shown in the sequence 25 of sequence table.
The downstream primer second is followed successively by section first, index sequence labels and section second from 5 ' ends.The section first such as sequence
Shown in the sequence 26 of list, the section second is as shown in the sequence 27 of sequence table.
The method further includes the steps that the library after PCR amplification is carried out target area capture.
Existing library capture commercial kits can be used in the target area capture, and panel therein, which can be replaced, to be appointed
Anticipate the panel containing targeted mutagenesis.
The present invention also protects the library that the method for any description above is prepared.
The present invention also protects method or library targeted mutagenesis and its mutation in detecting ctDNA samples of any description above
Application in frequency.
The present invention also protects a kind of kit for building ctDNA low frequency abrupt climatic changes library, including any of the above institute
State connector mixture.
The kit further includes the reagent extracted for ctDNA, the reagent in library is built for DNA, is purified for library
Reagent, the material that library construction is used for for the reagent etc. of library capture.
The present invention also protects a kind of method of targeted mutagenesis and its frequency of mutation in detection ctDNA samples, including walks as follows
Suddenly:
(1) library is prepared according to the method for any description above;
(2) library is sequenced, obtains sequencing result, according to sequencing result analyze in ctDNA samples targeted mutagenesis and its
The frequency of mutation.
The analysis method of the sequencing result is as follows:
(a) sequencing result is compared to ginseng and is examined on genome hg19;
(b) there is identical starting and final position, and the cluster with identical barcode labels in the genome
Reads comes from the same chain of same molecule.The reads of same cluster is compared and is calculated, it is same in cluster reads
Sequence of the consistency higher than 80% is effective on position, and the most base of occurrence number is correct base, is retained;
If consistency be less than 80%, the base be sequencing, PCR or capture mistake caused by, be labeled as N, do not enter subsequent
Variation detection;
(c) there are identical starting and final position, two reversed clusters of read1 and read2barcode labels in the genome
Reads, is respectively from the positive minus strand of same molecule, referred to as duplex, on the same position of genome, duplex read
Consistent sequence is considered correct, and caused by inconsistent sequence is sequencing, PCR or capture mistake, it is labeled as N,
Subsequent variation detection is not entered.
The computational methods of the frequency of mutation are:In sequencing data, the data in the site are covered, support data/(branch of mutation
Hold the data of data+support wild type of mutation).
In (a), sequencing result compares specifically usable bwa softwares.
In the calculating of the frequency of mutation, samtools softwares can be used first, the bam of comparison is converted to mpileup
The file of format, then calculates the frequency of mutation.
CtDNA of any description above ctDNA samples concretely from human blood sample.
The frequency of mutation of any description above low frequency mutation is most down to 0.1%.
The invention adopts the above technical scheme, which has the following advantages:
1 has used barcode labels, with significant notation and can distinguish ctDNA molecules all in original sample.General feelings
Under condition, concentration (Fig. 1) is compared in ctDNA extracted amounts generally only tens nanograms, Insert Fragment distribution, and barcode labels can have
Effect distinguishes different ctNDA molecules, increases operation rate, and improves detection sensitivity.
2barcode labels are reduced to 12 kinds of fixed combined sequences of 4bp long by original 8bp random tags, are had and are closed
At simple, at low cost, easy to use feature.In addition, using this method, carries out duplex and build library, greatly improve
The ratio of duplex mark molecules in the joint efficiency (Fig. 2) and sequencing data of adapter.
3 present invention use identification duplex analysis of biological information method, can quickly and effectively remove sequencing, capture and
The mistake introduced during PCR, reduces the false positive of detection.
Description of the drawings
Fig. 1 is the fragment size distribution of ctDNA.
Fig. 2 is ctDNA molecule joint efficiencies.
Fig. 3 is duplex rates.
Fig. 4 is snv, the detection accuracy of indel.
Specific implementation mode
Embodiment below facilitates a better understanding of the present invention, but does not limit the present invention.Experiment in following embodiments
Method is unless otherwise specified conventional method.Test material as used in the following examples is unless otherwise specified certainly
What routine biochemistry reagent shop was commercially available.Quantitative test in following embodiment is respectively provided with three repeated experiments, as a result makes even
Mean value.
Embodiment 1, library construction
One, ctDNA samples Quality Control
Content assaying is carried out to ctDNA samples using Qubit nucleic acid quantifications instrument, while using Agilent 2100
Bioanalyzer detects the segment distribution (Fig. 1) of ctDNA, it is ensured that without contaminating genomic DNA.
Two, connector prepares
1,24 single stranded DNAs shown in table 1 are synthesized, wherein F represents sense primer, and R represents downstream primer.Sense primer
3 ' ends nucleotide carry out thio-modification (modification mode also can be replaced can prevent exonuclease degrade other repair
Decorations mode).The nucleotide at 5 ' ends of downstream primer carries out phosphorylation modification.
1 joint sequence information of table
In table 1, upstream sequence and downstream sequence in each group can be by annealed combinations at double-stranded adapters.
The barcode labels that underscore mark is 4bp long, each group of upstream sequence and downstream barcode labels are
Reverse complemental relationship, barcode labels can be replaced arbitrary 4bp long by A T C tetra- base random combines of G and content it is equal
The sequence of weighing apparatus;
The T of overstriking and " A " that initial molecule end adds are complementary, carry out TA connections;
It does not carry out underscore label and does not carry out the sequence of overstriking label to be the adapter sequences of illumina microarray datasets
Row such as use other microarray datasets, can be replaced the corresponding adapter sequences of other platforms;
Include altogether 12 groups of connectors (i.e. 12 groups of barcode labels) in table 1, purpose is as follows:(1) different ctDNA points is distinguished
Sub (2) identify the positive minus strand of a ctDNA molecule.12 groups of barcode labels can form the combination of 12 × 12=144 kinds, in conjunction with
The sequence information of molecule itself, it is sufficient to distinguish all molecules in primary sample, also can suitably increase in practical application (synthesis at
Originally increase) or reduction (it is slightly weak to distinguish effect) group number.
2, by the single stranded DNA in table 1 with TE dissolved dilutions to final concentration of 100 μM.By two single stranded DNAs in same group
Isometric mixing (total volume is no more than 100 μ l), (cycle of annealing of being annealed:95 DEG C, 30min;25 DEG C, 2h), obtain 12 groups
DNA solution mixes 12 groups of DNA solution equimolars, obtains connector mixed liquor (- 20 DEG C of storages).
Three, library construction and amplification
1,45ngctDNA is taken, library kit (KAPA, article No. are built according to KAPA Hyper DNA:KK8505 normal stream)
Cheng Jinhang builds library, and (including the end of ctDNA is repaired plus A tails and ctDNA connect and build library with connector mixed liquor prepared by step 2
And etc.).Product is recycled with AMPure XP magnetic beads for purifying, 20 μ l Nuclease-free water elutions, as PCR moulds later
Plate.
2, the pcr template for taking step 1 to obtain, reaction system is configured according to table 2, is carried out PCR amplification according to table 3, is obtained PCR
Amplified production.Product is recycled with AMPure XP magnetic beads for purifying, and 20 μ l Nuclease-free water elutions obtain library.
2 reaction system of table
F1(5’-3’):AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
(sequence 25);
R1(5'-3'):CAAGCAGAAGACGGCATACGAGAT (sequence 26)-XXXXXXXX-
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT(sequence 27).
In primers F 1 and primer R1, underscore mark is part corresponding with the connector in table 1, is used for amplified library.
In primer R1, XXXXXXXX indicates index sequence labels, and effect is to mark all molecules of the same sample, when
When there are multiple samples while being sequenced, sample can be distinguished by index sequence labels.Index sequence label length is 6-8bp.
3 response procedures of table
Four, target area captures
Using the commercialized 63 gene panel enrichment kits of ctDNA of pangen gene Co., Ltd (Agilent
Sureselect XT probes) carry out target area capture.When probe hybridizes, replaced using P5, P7blocking of Integrated Device Technology, Inc.
Blocking in original reagent box.It captures product and carries out PCR amplification using KAPA HiFiPCR kits, amplification cycles number is 16
It is a.
Embodiment 2, sequencing and analysis of biological information
1, library prepared by embodiment 1 carries out the sequencing of both-end 151bp on the HiseqX sequenators of illumina companies.
Sequencing depth is about 20,000 ×.
2, it compares:Reads will be sequenced using bwa softwares to compare onto hg19 genomes.
3, duplicate removal and error correction:There is identical starting and final position in the genome, and there is identical barcode sequences
Cluster reads, come from the same chain of same molecule.The reads of same cluster is compared and is calculated, cluster reads
Sequence of the consistency higher than 80% is effective on middle same position, and the most base of occurrence number is correct base, is protected
It stays;If consistency be less than 80%, the base be sequencing, PCR or capture mistake caused by, be labeled as N, do not enter
Subsequent variation detection.Cluster reads in this way generates the read after an error correction after error correction and duplicate removal.
4, duplex and error correction are identified:There is identical starting and final position, read1 and read2 in the genome
Two reversed cluster reads of barcode sequences, are respectively from the positive minus strand of same molecule.For example, read1 and read2
Barcode sequences are that the barcode sequences of A, the cluster reads of B, with read1 and read2 are B, the other cluster reads of A,
It is respectively from the positive minus strand of same molecule.This reads for being respectively from a positive minus strand of molecule, also referred to as duplex.
The two cluster reads of Duplex carry out duplicate removal and generate the read after error correction with error correction, carry out error correction again respectively.I.e. in genome
Same position on, sequence consistent duplex read be considered as it is correct, and inconsistent sequence be sequencing, PCR or
Caused by capture mistake, it is labeled as N, does not enter subsequent variation detection.
5, joint efficiency and duplex rates statistics:The molecule that statistics builds how many ratio in the starting template of library enters sequencing number
According to i.e. joint efficiency;And the molecule of how many ratio is duplex labels, i.e. duplex rates.
12 ctDNA samples (coming from healthy human blood's sample), after testing, this method joint efficiency are detected using this method
About 40% (Fig. 2), duplex rates are about 60% (Fig. 3), hence it is evident that higher than 10% or so duplex rates of existing literature report
(CAPP)。
Embodiment 3, method validation
Produced using Horizon companies the quasi- product of HD734 (Tru-Q 7 (1.3%Tier) Reference Standard,
Article No.:HD734) carry out snv, the detection verification of indel, the standard items include 1% or so mutation 34 (be shown in Table 4, it is all prominent
Become site to be present in the panel ranges used in 1 step 4 of embodiment).
Standard items 70ng is taken respectively, is dissolved into 100 μ L TE buffer (10mMTris-HCl, pH 8.0), is used
DNA is broken into the segment of 200bp by Covaris M220.The DNA fragmentation having no progeny of fighting each other is recycled, spare.Use Healthy People
Leucocyte DNA, the DNA of dilution standard product to the frequency of mutation is respectively 0.5% and 0.1% or so respectively.Standard items DNA and
The DNA of two kinds of diluted concentrations is respectively used to subsequent Accuracy Verification.
4 standard items of table (frequency of mutation 1%) testing result deck watch
5 standard items of table (frequency of mutation 0.5%) testing result deck watch
6 standard items of table (frequency of mutation 0.1%) testing result deck watch
In table 5 and table 6, ND expressions are not detected.
It is detected according to the method in embodiment 1 and embodiment 2, carries out 3 repetitions.
The result shows that for the snv of 0.1%, 0.5% and 1% frequency of mutation of standard items, the detection sensitivity point of indel
It Wei 95.10%, 97.06% and 100%.In addition, the frequency of mutation detected and the very high (figure of expected frequency of mutation consistency
4, table 4- tables 6).
Detection sensitivity:There are 34 mutation in standard items, carry out 3 repetitions, totally 102 mutation, sensitivity is to detect
Mutation count/102 × 100% arrived.
The expected frequency of mutation:The frequency of mutation (the molecular number of mutation/(mutating molecule number+wild type molecule in standard items
Number)).
The frequency of mutation detected:In sequencing data, the data in the site are covered, support the data of mutation/(support prominent
The data of the data of change+support wild type).
Sequence table
<110>Beijing pangen Gene Tech. Company Limited
<120>A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods
<160> 27
<170> SIPOSequenceListing 1.0
<210> 1
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 1
tacacgacgc tcttccgatc tagctt 26
<210> 2
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 2
agctagatcg gaagagcaca cgtct 25
<210> 3
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 3
tacacgacgc tcttccgatc tatgct 26
<210> 4
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 4
gcatagatcg gaagagcaca cgtct 25
<210> 5
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 5
tacacgacgc tcttccgatc tactgt 26
<210> 6
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 6
cagtagatcg gaagagcaca cgtct 25
<210> 7
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 7
tacacgacgc tcttccgatc ttgact 26
<210> 8
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 8
gtcaagatcg gaagagcaca cgtct 25
<210> 9
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 9
tacacgacgc tcttccgatc ttcgat 26
<210> 10
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 10
tcgaagatcg gaagagcaca cgtct 25
<210> 11
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 11
tacacgacgc tcttccgatc ttacgt 26
<210> 12
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 12
cgtaagatcg gaagagcaca cgtct 25
<210> 13
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 13
tacacgacgc tcttccgatc tgatct 26
<210> 14
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 14
gatcagatcg gaagagcaca cgtct 25
<210> 15
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 15
tacacgacgc tcttccgatc tgcatt 26
<210> 16
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 16
atgcagatcg gaagagcaca cgtct 25
<210> 17
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 17
tacacgacgc tcttccgatc tgtcat 26
<210> 18
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 18
tgacagatcg gaagagcaca cgtct 25
<210> 19
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 19
tacacgacgc tcttccgatc tcagtt 26
<210> 20
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 20
actgagatcg gaagagcaca cgtct 25
<210> 21
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 21
tacacgacgc tcttccgatc tctagt 26
<210> 22
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 22
ctagagatcg gaagagcaca cgtct 25
<210> 23
<211> 26
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 23
tacacgacgc tcttccgatc tcgtat 26
<210> 24
<211> 25
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 24
tacgagatcg gaagagcaca cgtct 25
<210> 25
<211> 58
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 25
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58
<210> 26
<211> 24
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 26
caagcagaag acggcatacg agat 24
<210> 27
<211> 34
<212> DNA
<213>Artificial sequence (Artificial Sequence)
<400> 27
gtgactggag ttcagacgtg tgctcttccg atct 34
Claims (10)
1. a kind of method in library of structure for detecting the mutation of ctDNA low frequencies, in turn includes the following steps:
(1) ctDNA samples are carried out to end successively to repair and 3 ' ends plus A processing;
(2) by step (1), treated that ctDNA is connect with connector mixture, and library is obtained after PCR amplification;
The connector mixture is made of n connector;
Each connector forms part duplex structure by a sense primer first and a downstream primer first and obtains;
There is barcode label first in sense primer first;There is barcode label second in downstream primer first;
Barcode labels first and barcode label second reverse complementals;
Barcode label first is made of A, T, C and G, is put in order arbitrary;
N connector is using n different barcode label first;
The random natural number that n is >=8.
2. the method as described in claim 1, it is characterised in that:First nucleotide of 3 ' ends of sense primer first be with
The T of modification;The purpose of the modification is to prevent exonuclease from degrading;5 ' ends of downstream primer first carry out phosphorylation modification.
3. method as claimed in claim 1 or 2, it is characterised in that:N=12.
4. method as claimed in claim 3, it is characterised in that:
The barcode labels first of 12 connectors is respectively if the sequence 1 of sequence table is from 5 ' the 22nd to 25, ends, sequence 3 from 5 ' ends the
22 to 25, sequence 5 from 5 ' end the 22nd to 25, sequence 7 from 5 ' end the 22nd to 25, sequence 9 from 5 ' end the 22nd to 25, sequences
Row 11 hold the 22nd to 25, sequences 17 from 5 ' from 5 ' the 22nd to 25, ends, sequence 15 from 5 ' the 22nd to 25, ends, sequence 13 from 5 '
Hold the 22nd to 25, sequence 19 from 5 ' end the 22nd to 25, sequence 21 from 5 ' end the 22nd to 25 and sequence 23 from 5 ' end the 22nd
Shown in 25.
5. method as described in claim 3 or 4, it is characterised in that:
12 connectors are as follows:
Single strand dna shown in single strand dna and sequence 2 shown in sequence 1 of the connector 1 by sequence table is formed partially double stranded
Structure obtains;Single strand dna forming portion shown in single strand dna and sequence 4 shown in sequence 3 of the connector 2 by sequence table
Duplex structure is divided to obtain;Single strand dna shown in single strand dna and sequence 6 shown in sequence 5 of the connector 3 by sequence table
Part duplex structure is formed to obtain;It is single-stranded shown in single strand dna and sequence 8 shown in sequence 7 of the connector 4 by sequence table
DNA molecular forms part duplex structure and obtains;Shown in single strand dna and sequence 10 shown in sequence 9 of the connector 5 by sequence table
Single strand dna formed part duplex structure obtain;Single strand dna and sequence shown in sequence 11 of the connector 6 by sequence table
Single strand dna shown in row 12 forms part duplex structure and obtains;Single stranded DNA shown in sequence 13 of the connector 7 by sequence table
Single strand dna shown in molecule and sequence 14 forms part duplex structure and obtains;Shown in sequence 15 of the connector 8 by sequence table
Single strand dna shown in single strand dna and sequence 16 forms part duplex structure and obtains;Connector 9 by sequence table sequence
Single strand dna shown in single strand dna shown in 17 and sequence 18 forms part duplex structure and obtains;Connector 10 is by sequence
Single strand dna shown in single strand dna and sequence 20 shown in the sequence 19 of table forms part duplex structure and obtains;Connector
Single strand dna forms part duplex structure shown in single strand dna and sequence 22 shown in 11 sequence 21 by sequence table
It obtains;Single strand dna forms part shown in single strand dna and sequence 24 shown in sequence 23 of the connector 12 by sequence table
Duplex structure obtains.
6. the library that any methods of claim 1-5 are prepared.
7. library described in claim the 1-5 any method or claim 6 in detecting ctDNA samples targeted mutagenesis and
Application in its frequency of mutation.
8. a kind of for building the kit in ctDNA low frequency abrupt climatic changes library, including any described in claim 1-5 connect
Head mixture.
9. a kind of method of targeted mutagenesis and its frequency of mutation in detection ctDNA samples, includes the following steps:
(1) library is prepared according to any method in claim 1-5;
(2) library is sequenced, obtains sequencing result, targeted mutagenesis and its mutation in ctDNA samples are analyzed according to sequencing result
Frequency.
10. method as claimed in claim 9, it is characterised in that:The analysis method of the sequencing result is as follows:
(a) sequencing result is compared to ginseng and is examined on genome hg19;
(b) there is identical starting and final position, and the cluster reads with identical barcode labels in the genome, come
From in the same chain of same molecule;The reads of same cluster is compared and is calculated, in cluster reads one on same position
Sequence of the cause property higher than 80% is effective, and the most base of occurrence number is correct base, is retained;If consistent
Property be less than 80%, then the base is sequencing, caused by PCR or capture mistake, is labeled as N, does not enter subsequent variation and detect;
(c) there are identical starting and final position, two reversed clusters of read1 and read2 barcode labels in the genome
Reads, is respectively from the positive minus strand of same molecule, referred to as duplex, on the same position of genome, duplex read
Consistent sequence is considered correct, and caused by inconsistent sequence is sequencing, PCR or capture mistake, it is labeled as N,
Subsequent variation detection is not entered;
The computational methods of the frequency of mutation are:In sequencing data, the data in the site are covered, support the data of mutation/(support prominent
The data of the data of change+support wild type).
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810585283.9A CN108728515A (en) | 2018-06-08 | 2018-06-08 | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810585283.9A CN108728515A (en) | 2018-06-08 | 2018-06-08 | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN108728515A true CN108728515A (en) | 2018-11-02 |
Family
ID=63932572
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810585283.9A Pending CN108728515A (en) | 2018-06-08 | 2018-06-08 | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN108728515A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109439729A (en) * | 2018-12-27 | 2019-03-08 | 上海鲸舟基因科技有限公司 | Detect connector, connector mixture and the correlation method of low frequency variation |
| CN111073961A (en) * | 2019-12-20 | 2020-04-28 | 苏州赛美科基因科技有限公司 | High-throughput detection method for gene rare mutation |
| CN113718034A (en) * | 2021-09-27 | 2021-11-30 | 中国医学科学院肿瘤医院 | Marker, detection kit and detection method for guiding medication and curative effect evaluation of ovarian cancer platinum drug-resistant patient |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105861710A (en) * | 2016-05-20 | 2016-08-17 | 北京科迅生物技术有限公司 | Sequencing joint and preparation method and application thereof in ultra-low frequency mutation detection |
| CN106599616A (en) * | 2017-01-03 | 2017-04-26 | 上海派森诺医学检验所有限公司 | duplex-seq-based ultralow-frequency mutation site detection analysis method |
| CN106834275A (en) * | 2017-02-22 | 2017-06-13 | 天津诺禾医学检验所有限公司 | The analysis method of the construction method, kit and library detection data in ctDNA ultralow frequency abrupt climatic changes library |
-
2018
- 2018-06-08 CN CN201810585283.9A patent/CN108728515A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105861710A (en) * | 2016-05-20 | 2016-08-17 | 北京科迅生物技术有限公司 | Sequencing joint and preparation method and application thereof in ultra-low frequency mutation detection |
| CN106599616A (en) * | 2017-01-03 | 2017-04-26 | 上海派森诺医学检验所有限公司 | duplex-seq-based ultralow-frequency mutation site detection analysis method |
| CN106834275A (en) * | 2017-02-22 | 2017-06-13 | 天津诺禾医学检验所有限公司 | The analysis method of the construction method, kit and library detection data in ctDNA ultralow frequency abrupt climatic changes library |
Non-Patent Citations (1)
| Title |
|---|
| NEWMAN, A. M.等: "Integrated digital error suppression for improved detection of circulating tumor DNA", 《NATURE BIOTECHNOLOGY》 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109439729A (en) * | 2018-12-27 | 2019-03-08 | 上海鲸舟基因科技有限公司 | Detect connector, connector mixture and the correlation method of low frequency variation |
| CN111073961A (en) * | 2019-12-20 | 2020-04-28 | 苏州赛美科基因科技有限公司 | High-throughput detection method for gene rare mutation |
| CN113718034A (en) * | 2021-09-27 | 2021-11-30 | 中国医学科学院肿瘤医院 | Marker, detection kit and detection method for guiding medication and curative effect evaluation of ovarian cancer platinum drug-resistant patient |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10619214B2 (en) | Detecting genetic aberrations associated with cancer using genomic sequencing | |
| CN107475375B (en) | A kind of DNA probe library, detection method and kit hybridized for microsatellite locus related to microsatellite instability | |
| KR102339760B1 (en) | Diagnosing fetal chromosomal aneuploidy using massively parallel genomic sequencing | |
| KR102028375B1 (en) | Systems and methods to detect rare mutations and copy number variation | |
| CN115029451B (en) | A sheep liquid phase chip and its application | |
| CN115198023B (en) | Hainan cattle liquid-phase breeding chip and application thereof | |
| CN106834507B (en) | DMD gene trap probe and its application in DMD detection in Gene Mutation | |
| US12416047B2 (en) | Noninvasive prenatal diagnostic methods | |
| CN109461473B (en) | Method and device for acquiring concentration of free DNA of fetus | |
| EP3564391A1 (en) | Method, device and kit for detecting fetal genetic mutation | |
| CN111073961A (en) | High-throughput detection method for gene rare mutation | |
| CN113564266B (en) | SNP typing genetic marker combination, detection kit and application | |
| CN111321209A (en) | Method for double-end correction of circulating tumor DNA sequencing data | |
| CN108728515A (en) | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods | |
| CN115083521A (en) | Method and system for identifying tumor cell group in single cell transcriptome sequencing data | |
| US20230265496A1 (en) | Method for low frequency somatic cell mutation identification and quantification | |
| CN109920480B (en) | Method and device for correcting high-throughput sequencing data | |
| Edwards | Whole-genome sequencing for marker discovery | |
| Genner et al. | Haplotype-Resolved DNA Methylation at the APOE Locus identifies Allele-Specific Epigenetic Signatures Relevant to Alzheimer’s Disease Risk | |
| KR100450816B1 (en) | Selection method of probe set for genotyping | |
| CN109280697A (en) | The method for carrying out fetus genotype identification using pregnant woman blood plasma dissociative DNA | |
| HK40004815A (en) | Method and device for acquiring fetal free dna concentration | |
| HK40004815B (en) | Method and device for acquiring fetal free dna concentration | |
| CN113897427A (en) | Reagent and kit for detecting fetal FGA gene mutation | |
| CN118147344A (en) | Primer group and kit for identifying sunflower varieties and application of primer group and kit |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181102 |
|
| RJ01 | Rejection of invention patent application after publication |