CN109439729A - Detect connector, connector mixture and the correlation method of low frequency variation - Google Patents
Detect connector, connector mixture and the correlation method of low frequency variation Download PDFInfo
- Publication number
- CN109439729A CN109439729A CN201811608440.XA CN201811608440A CN109439729A CN 109439729 A CN109439729 A CN 109439729A CN 201811608440 A CN201811608440 A CN 201811608440A CN 109439729 A CN109439729 A CN 109439729A
- Authority
- CN
- China
- Prior art keywords
- sequence
- connector
- base
- chain
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to a kind of connector for detecting low frequency variation, the connector includes that two complementary DNAs are single-stranded, wherein it successively includes: the sequence to partially overlap with upstream amplification primer that a chain P5 chain, which is held from 5 ' ends to 3 ',;Sequence in conjunction with the sequencing primer of upstream;The molecular label of specific nucleotide sequence combination;1 prominent base T;Another chain P7 chain is from 5 ' ends to 3 ' ends successively including three parts: the molecular label with molecular label reverse complemental in P5 chain;Sequence in conjunction with the sequencing primer of downstream;Sequence in conjunction with downstream amplification primer.The invention further relates to a kind of connector mixtures and correlation method for detecting low frequency variation.Library preparation and high-flux sequence are carried out to the sample containing low frequency variation and single-stranded damage using connector mixture of the invention, in conjunction with analysis of biological information process disclosed by the invention and algorithm, effectively improve the accuracy of variation detection.
Description
Technical field
The present invention relates to biological information field more particularly to genetic tests, in particular to one kind to be suitable for detection low frequency body
The connector and its application method of cytometaplasia and single-stranded damage variation.
Background technique
The generation that compares sequencing technologies, two generation sequencing technologies are by feat of simultaneously parallel to millions of or even over ten billion sequence
The excellent properties being sequenced, significantly reduce sequencing cost, have pushed it in each field such as scientific research, legal medical expert and clinic rapidly
Using.For example, containing Fetal genetic information in dissociative DNA in maternal blood slurry, pass through the plasma DNA to pregnant woman
(cell-free DNA, cfDNA) carries out low depth genome sequencing, can detecte fetal chromosomal abnormalities, noninvasive antenatal sieve
Look into the development for having pushed genetic test industry rapidly.And with the proposition of the U.S.'s " accurate medicine plan ", domestic oncogene
Detection service industry is also rapidly developing.Some researches show that the tumour cells of apoptosis or necrosis in recent years can be by small fragment intracellular
DNA discharges into blood circulation system, these DNA are Circulating tumor DNA (circulating tumor DNA, ctDNA).
Compared to the traditional approach for obtaining tumor specimen by operation, tissue biopsy, when ctDNA detection technique can overcome tumour well
Empty heterogeneous, it is the main flow direction of current liquid Biopsy that repetition is easy to detect.But it is compared to foetal DNA and accounts for pregnant woman
Ratio (up to 4% or more at pregnant 12 weeks) in blood plasma cfDNA, the ctDNA in tumor patient blood plasma account for the ratio of cfDNA total amount
Very low, according to the difference of cancer kind and the course of disease, the ratio of most of ctDNA only accounts for 0.1%~1%, therefore, the detection of ctDNA
Need higher sensitivity and specificity.And in current two generations sequencing experiment flow, in pre- library preparation, hybrid capture and
The mistake or hybridization time that some amplifications and sequencing are inevitably introduced in sequencing procedure are damaged caused by too long, are caused low
Frequency mutation cannot be distinguished with background noise and cause false positive or false negative.
In order to improve sequencing error correcting capability, scientist proposes some new methods, and there are two main classes: single-stranded cyclisation at present
Method and molecular label method.
1, single-stranded cyclisation method
This method is that the peace biology that can help proposes, is named as firefly technology, cardinal principle be first by clip size about
The double-strand cfDNA denaturation of 170bp is single stranded DNA, then circlewise by single stranded DNA connection, with unilateral target gene specificity
Primer carries out unidirectional RCA rolling circle amplification, guarantees that each DNA fragmentation is multiple by inline copy, then introduces P5/P7 connector and carries out
Both-end PE 150bp sequencing guarantees that each Insert Fragment is at least repeated sequencing twice or more, is confirmed by repeating sequencing
Whether the variation detected is really to make a variation.The advantage of this method is that rolling circle amplification molecule is replicated in initial molecule always, no
Mistake can be accumulated;It is enriched with target area by multiplex PCR, synthesising probing needle is not needed and is captured, simplifies operation;Compared to point
Subtab method, sequencing cost reduce.This method disadvantage is that the efficiency of single-stranded cyclization, the efficiency of multiplex PCR all have an impact, primer
Restricted to quantity, control primer size is relatively difficult, and can not identify the single-stranded damage in double-stranded template.
2, molecular label method
Molecular label (Unique Molecular Identifier, UMI) is now widely used method, principle
It is to add the distinctive sequence label of the preceding paragraph to each original DNA template, library upper machine sequencing after PCR amplification, in data
When analysis, multiple segments that same DNA profiling amplifies can be identified according to sequence label, it is multiple at this according to the variation of detection
Distribution situation in segment, it is vacation caused by random error in PCR amplification, hybrid capture and sequencing procedure which, which can be differentiated,
Positive variation, which is the variation that patient really carries, to improve detection sensitivity and specificity.
According to the difference of molecular label mark position, single chain molecule label and duplex molecule label can be divided into.
Single chain molecule label can only labeled ssdna molecule, or mark respectively two of double-stranded DNA it is single-stranded, cannot be simultaneously
Double-strand is marked, single stranded DNA is applied in general to and builds library or when molecular label is located at a wherein jag for breeches joint, it is excellent
Gesture is can to substantially reduce false positive results with relatively small number of sequencing amount, and disadvantage is that original DNA double-strand mould can not be utilized
The further error correction of the complementary chain information of another of plate, if the earlier cycles in exponential amplification, Huo Zheshi occur for PCR amplification mistake
Wax, which embeds, contains single-stranded damage in sample DNA (FFPE DNA), then can not only detect, be needed by double-strand by single chain molecule label
Molecular label technology could detect.
Duplex molecule label technique was published an article proposition by Michael W et al. in 2012, it is characterized in that double-strand Y type connects
Head end has 12 random nucleotide N as molecular label, and molecular label is followed by the nucleotide of 4 known arrays as molecule
The identification label of label, have after identification label a prominent base A, the connector and end plus T base double chain DNA molecule into
Row TA connection, then each double chain DNA molecule both ends respectively added a unique molecular label, so as to distinguishing different
Primary template, and the pair principle that can use positive-sense strand and antisense strand carries out further error correction.Michael W et al.
This method is improved in 2014, makes the protrusion base T of connector, is suitable for the banking process of current mainstream.But the party
Method is related to multistep enzymatic reaction and more purification steps, and connector preparation process is relatively complicated, and FS final spice concentration is difficult accurately fixed
Amount, Quality Control step is more demanding to experiment condition, and the success rate of connector preparation is not high, affects answering for duplex molecule label technique
With and promote.
Summary of the invention
The purpose of the present invention is overcoming the above-mentioned prior art, provide a kind of with new ds molecular label
Breeches joint, the duplex molecule sequence label be specific nucleotide sequence combination, the connector system containing the duplex molecule label
Preparation Method is very easy, it is only necessary to equal proportion again after the multipair connector containing specific nucleotide sequence molecular label is annealed respectively
Mixing.Library preparation is carried out to the sample containing low frequency variation and single-stranded damage using the connector mixture and high pass measures
Sequence can effectively improve the accuracy of variation detection in conjunction with analysis of biological information process disclosed by the invention and algorithm.
To achieve the goals above, one aspect of the present invention provides a kind of connector for detecting low frequency variation, has such as
Lower composition:
The connector include two complementary DNAs it is single-stranded, wherein a chain P5 chain from 5 ' end to 3 ' end successively include: with it is upper
The sequence that trip amplimer partially overlaps;Sequence in conjunction with the sequencing primer of upstream;The molecule mark of specific nucleotide sequence combination
Label;1 prominent base T;Another chain P7 chain is from 5 ' ends to 3 ' ends successively including three parts: with molecular label reverse mutual in P5 chain
The molecular label of benefit;Sequence in conjunction with the sequencing primer of downstream;Sequence in conjunction with downstream amplification primer.
Preferably, in P5 chain, the sequence, the sequence in conjunction with the sequencing primer of upstream that partially overlap with upstream amplification primer
Can there are partial sequence coincidence, and 3 ' end thio-modifications;Sequence and downstream in P7 chain, in conjunction with the sequencing primer of downstream
The sequence that amplimer combines can have partial sequence to be overlapped or even be completely coincident the chain, and 5 ' end phosphorylation modifications.
Preferably, the upstream amplification primer and downstream amplification primer includes sample label.
Preferably, the connector is that Y type truncates type joint;The length of the molecular label is 3~12bp.
Preferably, P5 chain is the nucleotide sequence as shown in SEQ ID NO:3, P7 chain is the institute as shown in SEQ ID NO:4
The upstream amplification primer stated is the nucleotide sequence as shown in SEQ ID NO:1, and the downstream amplification primer is such as SEQ ID
Nucleotide sequence shown in NO:2.
The present invention provides a kind of connector mixture, the connector mixture includes at least eight kinds of institutes mixed in proportion
The connector stated.
Preferably, on longitudinal same position of the duplex molecule tag combination in the connector mixture, four kinds of bases
It exists simultaneously, it is preferable that from longitudinal direction, the ratio of four kinds of base A:T:G:C is close to 1:1:1:1 in molecular label combination;From cross
Upwards, the appearance of continuous 4 or more identical bases is avoided the occurrence of in each molecular label, it is preferable that originate in molecular label
Position will avoid the appearance of continuous 2 and the above bases G.
Preferably, in the connector mixture duplex molecule label of each connector from each other at least 3 and 3 with
The difference of upper nucleotide sequence;Preferably, the length of the duplex molecule label of each connector cannot be complete in the connector mixture
It is exactly the same;Preferably, each connector is mixed by equal proportion in the connector mixture, alternatively, according to actually surveying in sequencing data
The ratio obtained adjusts the ratio of each connector mixing again.
The present invention provides a kind of methods for detecting low frequency somatic variation, comprising the following steps:
(1) it is respectively synthesized the P5 chain and P7 chain of every butt joint in the connector mixture, annealing forms breeches joint, and
It is mixed to form connector mixture in proportion;
(2) the connector mixture with duplex molecule label is attached with dissociative DNA segment sample and is reacted, connected
It practices midwifery object, and PCR amplification is carried out to connection product with the upstream and downstream amplimer with sample label, obtain amplified production;
(3) hybrid capture is carried out to the target fragment in amplified production, targeted capture library is obtained, to targeted capture library
The both-end sequencing for carrying out high depth carries out data fractionation to different samples according to the sample label of both-end;
(4) Quality Control processing is carried out to sequencing data, removes the joint sequence of low quality base, low quality read and pollution,
Correction process is carried out according to base quality to the lap of read pair simultaneously, obtains clean data;
(5) reads is compared onto reference genome, on the comparison position of each, there will be identical molecular label
Sequence, identical CIGAR label and the identical read pairs for comparing direction are classified as a read pairs family.
(6) it for each read pairs family, is accurately calculated according to Bayes' theorem and determines single-stranded consistency sequence
SSCS is arranged, base mass value is recalculated, reduces sequencing mistake;
(7) SSCS of generation is found into the SSCS that molecular label sequence can be complementary, further generates double-strand consensus sequence
DSCS, while retaining the SSCS that cannot form DSCS, the mobile position that compares repeats step (5)~(7) to next base;
(8) final consistency sequence is carried out variation detection, obtains initial variation set with reference to genome alignment, it is right
Above-mentioned variation set is annotated, and it is true and reliable to obtain final low frequency to mistake, demographic data library, coding region for specific filtration resistance
Somatic mutation.
Preferably, in step (1), the condition of annealing reaction are as follows: after 95 DEG C of 5min, with the cooling rate of 0.02 DEG C/sec
After slow cooling to 25 DEG C or 95 DEG C of 5min, PCR instrument is closed, is stood until temperature is down to room temperature;
Preferably, in step (2), the otal investment of dissociative DNA segment sample is 20~33ng;
Preferably, in step (3), sequencing depth is 10,000-30,000x;The sample label of both-end is different samples
Between the UDI that is all different of both-end sample label sequence;
Preferably, in step (5), the threshold value for reducing read pair family size read cluster size is 2, is generated
More read pair families, while utilizing the read comprising a read that DSCS can be formed with other SSCS
Pair family is for generating DSCS, in the fastq file of final output, while retaining SSCS and DSCS sequence and right
The base mass value answered;
Preferably, in step (6), according to Bayes' theorem, the method for determining prior probability is, if the alkali observed
Base is consistent with possible true base, then prior probability is 1-10-q/10, it is otherwise 10-q/10/ 3, q are base mass value, this point
Cloth p (b, bi,qi) description;Base possible for 4 kinds calculates posterior probability according to following formula one, for every on SSCS
A base positions, using the corresponding base/mass value of read pairs family to (bi,qi), calculating consistency base I is b
When probability (b ∈ { A, C, G, T }), the maximum base type of probability value is true base, thereby determines that the true of each position
Real base;Simultaneously according to following formula two, recalculate base quality using the posterior probability values of true base, obtain error correction it
Consensus sequence reads afterwards.
qc=-10log10(1-P [I=bc|{(bi,qi)]) formula two;
Preferably, in step (7), when two SSCS generate DSCS, if corresponding position base is identical, retain this alkali
Base, otherwise by this position, base is changed to N.
Detailed description of the invention
Fig. 1 shows the structural schematic diagram of center tap sequence and amplimer of the present invention.
Fig. 2 shows the quality inspection figure of the Agilent 2100Bioanalyzer of Duplex Adapter#10 annealed product.
Fig. 3 shows the quality inspection figure of the Agilent 2100Bioanalyzer of connector mixture.
Fig. 4 shows the flow chart for generating consensus sequence process.
Fig. 5 shows the detection sensitivity under the conditions of different sequencing depth and Monitoring lower-cut.
Fig. 6 shows the detection specificity under the conditions of different sequencing depth and Monitoring lower-cut.
Specific embodiment
In order to more clearly describe technology contents of the invention, further retouch combined with specific embodiments below
It states.
Of the invention provides a kind of breeches joint with duplex molecule label, which is specific
The combination of nucleotide sequence, the connector preparation method containing the duplex molecule label are very easy, it is only necessary to contain spy for multipair
Determine nucleotide sequence molecular label connector anneal respectively after again equal proportion mix.Using the connector mixture to containing low
The sample of frequency variation and single-stranded damage carries out library preparation and high-flux sequence, in conjunction with analysis of biological information disclosed by the invention
Process and algorithm can effectively improve the accuracy of variation detection.
Breeches joint mixture provided by the invention with duplex molecule label, the connector mixture are one group of connector
Equal proportion mixture, every kind of connector in connector mixture is annealed by two DNA are single-stranded, wherein a chain is named as P5
Chain, another is named as P7 chain.
As shown in Figure 1, P5 chain successively includes four parts by functionality from 5 ' ends to 3 ' ends: having portion with upstream amplification primer S1
Divide the sequence S2 being overlapped, the molecular label S4 of sequence S3 and specific nucleotide sequence combination in conjunction with the sequencing primer of upstream,
There are also 1 prominent base T, there is thio-modification at the chain 3 ' end, and wherein sequence S2 and sequence S3 can have partial sequence to be overlapped even
It is completely coincident.
As shown in Figure 1, P7 chain successively includes three parts by functionality from 5 ' ends to 3 ' ends: reversed with molecular label combination S 4
Complementary sequence S5 (5 ' end phosphorylation modification), the sequence S6 in conjunction with the sequencing primer of downstream, and with downstream amplification primer S7
The sequence S8 that part combines, wherein sequence S6 and sequence S8 can have partial sequence to be overlapped and even be completely coincident.
Wherein, the S3+S4 sequence of P5 chain and the S5+S6 sequence of P7 chain have partial sequence reverse complemental, and Y can be formed after annealing
Type joint.P5 chain S9 and P7 chain S10 is made annealing treatment after being respectively synthesized, and then mixes the various terminal after annealing by equal proportion
It closes, forms connector mixture.In connector mixture, between various terminal other than duplex molecule label, that is, S4 and S5 is different, other
Sequence is all identical.In connector mixture, the sequence of duplex molecule tag combination is specific nucleotide sequence, rather than random nucleosides
Acid sequence, thus with the prior art unlike, do not need to add the identification to molecular label near molecular label sequence
Sequence.
It should be noted that connector of the invention does not include the sample mark for being used to distinguish different samples for cost consideration
Label, to truncate type joint.Sample label is drawn during PCR amplification by upstream amplification primer S1 and downstream amplification primer S7
Enter, the sequence complementary with sequence in sequencing flowing groove is further comprised on the amplimer of upstream and downstream and is used to carry out cluster reaction, therefore
Truncation type joint and upstream and downstream amplimer of the invention is used cooperatively.
The sequence of upstream amplification primer S1 is as shown in SEQ ID NO:1:
5'-AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGAC-3';
The sequence of downstream amplification primer S7 is as shown in SEQ ID NO:2:
5’-CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGT-3’。
Wherein, " NNNNNNNN " sequence is the sample label at the library end P5 and the end P7 respectively, is all 8 length of nucleotides
The sample label sequence at combination, the end P5 and the end P7 is different, and the sample label between different samples is also different.Sample
Label is used to distinguish the different samples that ibid machine is sequenced, because the sequenator of some models is easy to happen crosstalk between sample, because
It is necessary to add double sample label to sample for this.
P5 chain overall length S9 sequence is as shown in SEQ ID NO:3:
5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNT-3';
P7 chain overall length S10 sequence is as shown in SEQ ID NO:4:
5’-NNNNNNGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3’。
Wherein, " NNNNNN " is molecular label sequence, and it is specific for every butt joint that length, which is 3~12 nucleotide,
Sequence rather than random sequence.For same a pair of joint in connector mixture, molecular label is particular sequence, P5 chain and P7
The molecular label sequence of chain is reverse complementary sequence, can be matched completely when being annealed into duplex molecule label.The end of P5 chain 3 ' has thio
There is phosphorylation modification at modification, the end of P7 chain 5 '.
The length of duplex molecule label is 3~12bp, theoretically can produce 43+44+45+…+411+412Different pairs of kind
Chain molecular label.Preferably, duplex molecule tag combination is colour balance, that is, each channel can in each circulation being sequenced
Detect signal, that is to say, that in connector mixture, on same a position of duplex molecule tag combination, four kinds of bases are simultaneously
In the presence of.High-flux sequence instrument majority is two-color laser, some are four-ways, detects four kinds with four kinds of different optical channels
Nucleotide;Some be it is twin-channel, A and C are respectively a fluorescence, and there are two fluorescence by T, and G is without fluorescence.
Preferably, base complexity is balance in duplex molecule tag combination, from the point of view of longitudinal, every group of molecular label group
The ratio of four kinds of base A:T:G:C is close to 1:1:1:1 in conjunction;From the point of view of laterally, continuous 4 are avoided the occurrence of in each molecular label
Or more identical base appearance;Particularly with binary channels sequenator, molecular label initial position to avoid continuous 2 and with
The appearance of upper bases G, when causing sequencing to proceed to these bases to avoid base composition imbalance, place of the software to sequencing signal
There is obstacle in reason, cannot accurately identify these bases.
Preferably, the editing distance (edit distance) between connector mixture double center chain molecular label is not less than 3, i.e.,
The difference of duplex molecule label at least 3 and 3 or more nucleotide sequences from each other, i.e., will at least occur 3 times or more
Sequencing mistake just will lead to the crosstalk of molecular label.
Preferably, the number of connector mixture double center chain molecular label is no less than 8, for both-end sequencing, at least
The combination of 8x8=64 kind is generated, due to being broken in genomic DNA fragment in the same probability with reference to genome starting and final position
It is very low, it has been able to whether distinguish from same primary template molecule with less number of combinations.
Preferably, although each connector is mixed by equal proportion in connector mixture, the connection reaction of connector has sequence preference
Property, therefore actually measured each molecular label ratio and unequal, need to be returned again according to ratio actually measured in sequencing data
The ratio of butt joint mixing is adjusted back.
Preferably, the length of each duplex molecule label cannot be identical in connector mixture, and molecular label end is all
1 prominent base T, if molecular label length is all consistent, sequencing the latter circulation in the base that measures it is whole
It is all T, base is seriously uneven, can reduce sequencing data quality.
It is thin using low frequency body in duplex molecule label coding technology detection tumor blood sample that the present invention provides a kind of
The method of born of the same parents' variation, comprising the following steps:
(1) expressing feature for pressing above-mentioned connector, is respectively synthesized the P5 chain and P7 chain of every butt joint in connector mixture, with moving back
Fiery buffer is diluted to certain concentration;P5 chain and P7 chain are mixed in molal quantity 1:1 ratio, annealing reaction is carried out, it is double to form Y type
Chain joint;
(2) annealed each butt joint is mixed by equimolar number, forms connector mixture, it is dense is diluted to working solution
Degree;
(3) a certain number of cfDNA extracted from tumor blood sample are taken, are pressed with the connector with duplex molecule label
Certain proportion is attached reaction, obtains connection product;
(4) PCR amplification is carried out to connection product with the upstream and downstream amplimer with sample label, obtains amplified production;
(5) hybrid capture is carried out to the target fragment in amplified production with probe, obtains targeted capture library;
(6) to targeted capture library carry out high depth both-end be sequenced, according to the sample label of both-end to different samples into
Row data are split;
(7) the sample sequencing data obtained to fractionation carries out Quality Control processing first, removes low quality base, low quality read
And the joint sequence of pollution, while correction process is carried out according to base quality to the lap of read pair, it obtains clean
Data;
(8) above-mentioned reads is compared onto reference genome, is compared on position at each, there will be identical molecule mark
Sequence is signed, identical CIGAR label and the identical read pairs for comparing direction are classified as a read pairs family;
(9) for each read pairs family, base sequence is determined using Bayes' theorem, generates one
SSCS.The method for determining prior probability is, if the base observed is consistent with possible true base, prior probability 1-
10-q/10, it is otherwise 10-q/10/ 3, q are base mass value, this distribution p (b, bi,qi) description;Base possible for 4 kinds, root
Posterior probability is calculated according to following formula one, it is corresponding using read pairs family for each base positions on SSCS
Base/mass value is to (bi,qi), calculate the probability (b ∈ { A, C, G, T }) when consistency base I is b, the maximum base of probability value
Type is true base, thereby determines that the true base of each position.Simultaneously according to formula two, the posteriority of true base is used
Probability value recalculates base quality, obtains the consensus sequence read pair after error correction.
qc=-10log10(1-P [I=bc|{(bi,qi)]) formula two
(10) SSCS is found into the SSCS that molecular label sequence can be complementary, further generates DSCS.Retain simultaneously not
The SSCS of DSCS can be formed.The mobile position that compares repeats step (8)~(10) to next base;
(11) the final consistency sequence is subjected to variation detection with reference to genome alignment, obtains the collection that initially makes a variation
It closes;
(12) above-mentioned variation set is annotated, specific filtration resistance obtains final mistake, demographic data library, coding region
The true and reliable somatic mutation of low frequency.
In step (1), the ingredient of annealing buffer contains Tris, EDTA, NaCl etc.;The reaction condition of annealing is 95 DEG C
5min, then with the cooling rate slow cooling of 0.02 DEG C/sec to 25 DEG C;The reaction condition of annealing is 95 DEG C of 5min, is then closed
PCR instrument is closed, is stood until temperature is down to room temperature.
In step (3), the extraction agent box of ctDNA is QIAamp Circulating Nucleic Acid Kit
(Qiagen);The total amount of ctDNA is 20ng~33ng, i.e., 6,000~10,000 genome monoploid copy;Connector with
The ratio of cfDNA segment is 100:1~200:1;Purifying reaction is carried out after connection reaction, the purifying magnetic bead used is
Agencourt AMPure XP(Beckman Coulter)。
In step (4), the nucleotide sequence of upstream amplification product as shown in SEQ ID NO:1, downstream amplification product
Nucleotide sequence is as shown in SEQ ID NO:2;PCR amplification recurring number is recycled at 5~10, is guaranteeing enough amplified production premises
Under reduce recurring number to the greatest extent.
In step (5), probe is biotin labeling;Probe can be DNA probe, be also possible to rna probe;Probe
Length in 50~120nt;The amplified production total amount for putting into hybrid capture is 500~750ng.
In step (6), both-end sequencing reading length is 2x 75bp or 2x150bp;Sequencing depth is 10,000x-30,
000x;The sample label of both-end is unique dual index (UDI), i.e., the both-end sample label sequence between different samples is all
It is not identical.
In step (8), it includes at least 3 pairs of read pairs ability that conventional method, which needs each read pairs family,
It is effective read pairs family, is just used to generate SSCS, what the SSCS of two molecular label sequence complementations was formed
DSCS sequence, which can be just retained, is further used for variation detection, and data user rate is lower, and contains 2 couples of read in the present invention
The read pairs family of pair is effective read pairs family, while an if read pairs
Family contains only 1 couple of read pair, but DSCS, such read pair can be complementarily shaped to other SSCS sequence
Family is also retained as valid data, to greatly improve data user rate.
In step (9), for each read paris family, during generating a SSCS, each position
The determination of upper base, conventional method use most of rules, that is, calculate the ratio of every kind of base (A, T, G, C) on this position, such as
Certain base ratio of fruit is greater than 70%, it is believed that the true base of this position is the base, while using wherein higher base
For mass value as final base mass value, the method is fairly simple, and the present invention calculates every kind of base according to Bayes' theorem is
The probability of true base, maximum probability is true base, according to this probability calculation base mass value, makes the alkali of consensus sequence
Base is more accurate and reliable.
Main advantages of the present invention include:
Connector of the invention contains duplex molecule label, therefore when application the technology of the present invention progress low frequency abrupt climatic change, phase
Compare cyclisation tandem sequence repeats method of ascertainment and single chain molecule labeling acts, positive-sense strand and the antisense strand that can use primary template are further
Correct single-stranded damage mistake caused by amplification incipient error and hybrid capture;
Duplex molecule label in inventive joint is one group of specific nucleotide sequence combination, rather than random nucleosides
Acid sequence, therefore the preparation method of this connector is very easily and economically, it is only necessary to simple annealing and mixing are not needed as existing
Multistep enzymatic reaction and purifying reaction will be carried out by having technology generally;
Duplex molecule label in inventive joint is one group of specific nucleotide sequence combination, minimum in connector mixture
There are 8 kinds of molecular labels to constitute the combination of both-end 8x8=64 kind, can effectively distinguish has identical starting on reference genome sequence
Whether come from the sequencing sequence of final position with a primary template molecule, without having 4 as the prior art12x412=
The combination of 2.8e14 kind;
Connector of the invention does not need the identification sequence of molecular label, while the length of molecular label is less than the prior art
12 nucleotide sequences, thus using the technology of the present invention be sequenced when, increase effectively read length, reduce sequencing cost;
It is 2 that the present invention, which reasonably reduces read pair family threshold value, and being dexterously utilized can be with other SSCS
The read pair family for containing only a read for forming DSCS sequence, substantially increases the utilization of raw sequencing data
Rate;
The present invention uses Bayes' theorem, accurately calculates the probability of every kind of base on each position, chooses probability value most
Big base is consistency base, and recalculates base mass value according to this probability value, can effectively reduce survey
The mistake that the random sequencing mistake and PCR amplification process of sequence instrument are brought into.
It should be understood that above-mentioned each technical characteristic of the invention and having in below (eg embodiment) within the scope of the present invention
It can be combined with each other between each technical characteristic of body description, to form a new or preferred technical solution.As space is limited, exist
This no longer tires out one by one states.
Present invention specific nucleotide sequence instead of existing random nucleotides, and to specific nucleotide sequence into
Optimization of having gone using the bioinformatic analysis algorithm of the technology and independent development can more effectively detect tumor blood sample
In low frequency somatic variation.With reference to the accompanying drawings and examples, a specific embodiment of the invention is made further detailed
It illustrates.It should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
Preparation of the embodiment 1 containing 7bp+8bp duplex molecule label connector
The sequence of connector P5 chain and P7 chain of the design with duplex molecule label: design includes 16 kinds in the connector mixture
Molecular label has 7 pairs as shown in table 1 below for 7 length of nucleotides, remaining 9 pairs are that 8 nucleotide are long in 16 kinds of molecular labels
Degree, the two length are staggered 1.In preceding 7 nucleotide sequences of molecular label combination, the base ratio A:G of longitudinal same position:
T:C=1:1:1:1, and between molecular label at least 3 nucleotide sequences difference.
Table 1
The pipe 12 of synthesis, 000g, which is centrifuged 1 minute, makes dry powder be thrown to bottom, careful to open pipe lid, with LowTE Buffer
Dry powder is diluted to 250 μM by (10mM Tris-HCl (pH 8.0), 0.1mM EDTA), and oscillation mixes and is placed in 4 DEG C of refrigerator mistakes
Night.5x Annealing Buffer is configured by table 2.
Table 2
Every butt joint is mixed by the system of following table 3, final concentration of 100 μM of connector.
Table 3
PCR pipe is placed on GeneAmp 9700PCR instrument (Applied Biosystems), 95 DEG C be incubated for 5 minutes, so
PCR instrument is directly closed afterwards, is taken out after standing 1 hour.
Every pipe annealed product takes 1 μ L, carries out quality inspection with Agilent 2100Bioanalyzer after dilution.It is connect with wherein #10
For head, the peak type of annealed product is as shown in Figure 2.Because being breeches joint, peak type can clip size bigger than normal one than expected
A bit.
16 pipe annealed products, every pipe take same volume to be mixed, and use Agilent after taking 1 μ L mixture to dilute
2100Bioanalyzer carries out quality inspection, and the peak type of annealed product is as shown in Figure 3.
The duplex molecule label connector prepared is diluted to working solution concentration, after small size packing -20 DEG C freeze it is spare,
Avoid multigelation.
Low frequency somatic variation in 2 examination criteria product of embodiment and tumor blood sample
Prepare standard items DNA: with normal cell line dna NA18536 to the standard items of Horizon Discovery
HD701 and HD753 carries out the dilution of different multiples, and the mixture of different extension rates and expected variation frequency are shown in Table 4 and table
5.DNA mixture is interrupted to main peak with Covaris S220 ultrasound in 170bp or so, similar to the main peak size of cfDNA.
Table 4
Table 5
Prepare cfDNA: being extracted with QIAamp Circulating Nucleic Acid Kit (Qiagen) from patient whole blood
Blood plasma obtained after separation.
The preparation of pre- library: by taking KAPA Hyper Prep Kit (Roche) as an example, blood plasma cfDNA and the standard items after interrupting
DNA takes 33ng, carries out pre- library preparation with KAPA Hyper Prep Kit (Roche), upper embodiment 1 is connected after filling-in tailing
The connector of middle preparation carries out PCR amplification after magnetic beads for purifying, and the sample label of both-end is introduced by upstream and downstream primer, and upstream and downstream is drawn
Object sequence is shown in SEQ ID NO:1 and SEQ ID NO:2.
Targeted capture library preparation: the DNA probe synthesized with IDT (Integrated DNA Technologies) company
For big Panel, which covers variant sites all in table 5 and table 6.Concrete operations are as follows: in the pre- library 500ng
Human Cot-1DNA and Adapter Blocker is added to be then added to 65 DEG C of DNA probe hybridization incubation 4-16 hours
It is incubated for 45 minutes for 65 DEG C in M270 magnetic bead, keeps the M270 magnetic bead with Streptavidin and the probe with biotin labeling abundant
In conjunction with then being cleaned for several times with the cleaning buffer solution of different ions concentration and temperature, wash away the non-purpose piece not in conjunction with probe
Section.DNA probe grabs the target fragment got off after PCR amplification, is purified with magnetic bead to get to the targeted capture prepared
Library.
Targeted capture library carries out 2x 75bp or 2x after quality inspection is qualified on the sequenator of Illumina platform
The sequencing depth of 150bp sequencing, initial data is 10,000x-30,000x, and the fractionation of data is carried out with the sample label of both-end.
For sequencing data, first using fastp removal low quality base, the joint sequence and low quality reads of pollution.
Correction process can be carried out according to base quality to lap if there is overlapping for R1 and R2.Count total using C++ program
The quality control indexs such as data volume, comparison rate, on-target rate, overburden depth.
It is compared using BWA, is determined according to position, molecular label sequence, CIGAR label and comparison direction is compared for the first time
read pairs family.The probability that every kind of base on each position in consensus sequence is calculated according to Bayes' theorem, determines
Real sequence further generates DSCS according to two SSCS of molecular label sequence complementation.
Comparison is re-started to the sequence after above-mentioned error correction, detection variation, annotation is as a result, obtain final variation after screening
Set.
HD701Mix1 and Mix2 is used when LOD (limit of detection) is 0.001,0.002,0.005 respectively
Detection sensitivity (PPA) and specificity (PPV) are calculated, as a result such as table 6.M1 indicates that Mix1, M2 indicate Mix2.Number after sample name
Word represents Monitoring lower-cut, such as HD701M1_0.001, indicates that LOD=0.001, TP are the abbreviation of true positive, ignore
Refer to that variation frequency is less than the number of loci of Monitoring lower-cut, FP is the abbreviation of false positive, and FN is false negative
Abbreviation.This method shows good sensitivity and specificity.
Table 6
sample | totalSNP | TP | ignore | FP | FN | PPA | PPV |
HD701M1_0.001 | 251 | 247 | 0 | 14 | 4 | 98.41% | 94.64% |
HD701M2_0.001 | 251 | 247 | 0 | 9 | 4 | 98.41% | 96.48% |
HD701M1_0.002 | 251 | 247 | 0 | 9 | 4 | 98.41% | 96.48% |
HD701M2_0.002 | 251 | 246 | 1 | 5 | 4 | 98.40% | 98.01% |
HD701M1_0.005 | 251 | 244 | 4 | 6 | 3 | 98.79% | 97.60% |
HD701M2_0.005 | 251 | 239 | 9 | 5 | 3 | 98.76% | 97.95% |
For this method by reducing read pairs family size threshold value, containing for DSCS can be formed with other SSCS by retaining
The methods of the read pair family for having single read substantially increases data user rate.As shown in table 7, with it is original
Duplex method is compared to (needing at least three read pairs to form read pair family, only to retain DSCS sequence), we
Method improves the valid data amount (6.526G) for detecting variation, and overburden depth (1954.61) and sensitivity are promoted.Phase
Than in only using single-ended UMI (70.06%), (94.64%) is substantially improved in the specificity of this method, while can also reach good
Detection sensitivity (98.41%), it was demonstrated that the detection advantage of this method.
Table 7
data_size(G) | reads | ontarget | mean_cov | PPA | PPV | |
raw data | 52.579 | 350,531,218 | - | - | - | - |
Original duplex | 1.104 | 8,095,816 | 94.3239 | 383.88 | 95.62% | 98.36% |
sinotools | 6.526 | 48,896,444 | 84.5106 | 1954.61 | 98.41% | 94.64% |
single | 7.974 | 59,591,152 | 85.8695 | 2448.31 | 98.80% | 70.06% |
To detecting that mutation selects 6 low frequency sites and carry out ddPCR verifyings, as shown in table 8, wherein 5 are positive findings,
And frequency invariance is higher.Separately there is the variation frequency in a site in ddPCR Monitoring lower-cut, ddPCR detects the lower positive
Signal, but not can determine that, provide negative findings.The above verifying proves that this method and ddPCR method have high consistency, can be with
Accurate detection low frequency variation.
Table 8
Gene | amino acid | This method | ddPCR |
EGFR | p.T790M | 0.3% | 0.33% |
EGFR | p.T790M | 2.3% | 1.82% |
EGFR | p.T790M | 0.1% | - |
EGFR | p.L858R | 0.3% | 0.89% |
EGFR | p.L858R | 2.3% | 4.60% |
KRAS | p.G12D | 2% | 1.80% |
Downsample experiment is carried out, the variation of detection sensitivity and specificity under different sequencing depth is simulated.Under detection
When limiting LOD=0.005, when sequencing depth reaches 1300X, detection sensitivity can be optimal level.LOD=0.001 or
When LOD=0.002, when sequencing depth reaches 1800X, detection sensitivity is optimal level.
In this description, the present invention is described referring to its specific embodiment.But it is clear that can still make
Various modifications and alterations are without departing from the spirit and scope of the invention out.Therefore, the description and the appended drawings should be considered as illustrative
And not restrictive.
Sequence table
<110>Shanghai Jing Zhou Gene Tech. Company Limited
<120>connector, connector mixture and the correlation method of low frequency variation are detected
<141> 2018-12-27
<160> 4
<170> SIPOSequenceListing 1.0
<210> 5
<211> 57
<212> DNA
<213>artificial sequence ()
<400> 5
aatgatacgg cgaccaccga gatctacacn nnnnnnnaca ctctttccct acacgac 57
<210> 5
<211> 53
<212> DNA
<213>artificial sequence ()
<400> 5
caagcagaag acggcatacg agatnnnnnn nngtgactgg agttcagacg tgt 53
<210> 5
<211> 40
<212> DNA
<213>artificial sequence ()
<400> 5
acactctttc cctacacgac gctcttccga tctnnnnnnt 40
<210> 5
<211> 39
<212> DNA
<213>artificial sequence ()
<400> 5
nnnnnngatc ggaagagcac acgtctgaac tccagtcac 39
Claims (10)
1. a kind of connector for detecting low frequency variation, which is characterized in that the connector is single-stranded including two complementary DNAs, wherein
It successively includes: the sequence to partially overlap with upstream amplification primer and upstream sequencing primer knot that one chain P5 chain is held from 5 ' ends to 3 '
The molecular label and 1 prominent base T that the sequence of conjunction, specific nucleotide sequence combine;Another chain P7 chain is from 5 ' ends to 3 ' ends
Successively include three parts: with the molecular label of molecular label reverse complemental in P5 chain, the sequence in conjunction with the sequencing primer of downstream and
Sequence in conjunction with downstream amplification primer.
2. the connector of detection low frequency variation according to claim 1, which is characterized in that in P5 chain, with upstream amplification
Sequence, the sequence in conjunction with the sequencing primer of upstream of primer portion coincidence can have partial sequence coincidence, and 3 ' hold thio repair
Decorations;In P7 chain, the sequence in conjunction with the sequencing primer of downstream, the sequence in conjunction with downstream amplification primer can have partial sequence weight
It closes and is even completely coincident the chain, and 5 ' end phosphorylation modifications.
3. the connector of detection low frequency variation according to claim 1, which is characterized in that the upstream amplification primer and
Downstream amplification primer includes sample label.
4. the connector of detection low frequency variation according to claim 1, which is characterized in that the connector is the truncation of Y type
Type joint;The length of the molecular label is 3~12bp.
5. the connector of detection low frequency variation according to claim 1, which is characterized in that P5 chain is such as SEQ ID NO:3
Shown in nucleotide sequence, P7 chain be as shown in SEQ ID NO:4, the upstream amplification primer is such as SEQ ID NO:1 institute
The nucleotide sequence shown, the downstream amplification primer are the nucleotide sequence as shown in SEQ ID NO:2.
6. a kind of connector mixture, which is characterized in that the connector mixture includes at least eight kinds of rights mixed in proportion
It is required that connector described in 1.
7. connector mixture according to claim 6, which is characterized in that the duplex molecule mark in the connector mixture
On longitudinal same position of label combination, four kinds of bases are existed simultaneously, it is preferable that from longitudinal direction, four kinds of alkali in molecular label combination
The ratio of base A:T:G:C is close to 1:1:1:1;From transverse direction, continuous 4 or more identical alkali are avoided the occurrence of in each molecular label
The appearance of base, it is preferable that avoid the appearance of continuous 2 and the above bases G in molecular label initial position.
8. connector mixture according to claim 6, which is characterized in that the double-strand of each connector in the connector mixture
The difference of molecular label at least 3 and 3 or more nucleotide sequences from each other;Preferably, in the connector mixture
The length of the duplex molecule label of each connector cannot be identical;Preferably, each connector is pressed and waits ratios in the connector mixture
Example mixing, alternatively, adjusting the ratio of each connector mixing again according to ratio actually measured in sequencing data.
9. a kind of method for detecting low frequency somatic variation, which comprises the following steps:
(1) it is respectively synthesized the P5 chain and P7 chain of every butt joint in connector mixture described in any one of claim 6 to 8, is moved back
Fire forms breeches joint, and is mixed to form connector mixture in proportion;
(2) the connector mixture with duplex molecule label is attached with dissociative DNA segment sample and is reacted, obtained connection and produce
Object, and PCR amplification is carried out to connection product with the upstream and downstream amplimer with sample label, obtain amplified production;
(3) hybrid capture is carried out to the target fragment in amplified production, obtains targeted capture library, targeted capture library is carried out
The both-end of high depth is sequenced, and carries out data fractionation to different samples according to the sample label of both-end;
(4) Quality Control processing is carried out to sequencing data, removes the joint sequence of low quality base, low quality read and pollution, simultaneously
Correction process is carried out according to base quality to the lap of read pair, obtains clean data;
(5) reads is compared on reference genome, on the comparison position of each, will have identical molecular label sequence,
Identical CIGAR label and the identical read pairs for comparing direction are classified as a read pairs family.
(6) it for each read pairs family, is accurately calculated according to Bayes' theorem and determines single-stranded consensus sequence
SSCS recalculates base mass value, reduces sequencing mistake;
(7) SSCS of generation is found into the SSCS that molecular label sequence can be complementary, further generates double-strand consensus sequence DSCS,
Retain the SSCS that cannot form DSCS simultaneously, the mobile position that compares repeats step (5)~(7) to next base;
(8) final consistency sequence is carried out variation detection, obtains initial variation set with reference to genome alignment, to this change
Different set is annotated, and specific filtration resistance obtains the true and reliable body cell of final low frequency to mistake, demographic data library, coding region
Mutation.
10. the method according to claim 9 for detecting low frequency somatic variation, which is characterized in that in step (1)
In, the condition of annealing reaction are as follows: after 95 DEG C of 5min, with the cooling rate slow cooling of 0.02 DEG C/sec to 25 DEG C or 95 DEG C
After 5min, PCR instrument is closed, is stood until temperature is down to room temperature;
Preferably, in step (2), the otal investment of dissociative DNA segment sample is 20~33ng;
Preferably, in step (3), sequencing depth is 10,000-30,000x;The sample label of both-end is between different samples
The UDI that both-end sample label sequence is all different;
Preferably, in step (5), the threshold value for reducing read pairs family size is 2, generates more read pairs
Family, while being used using the only read pairs family comprising a pair of read pair that can form DSCS with other SSCS
In generating DSCS, in the fastq file of final output, while retaining SSCS and DSCS sequence and corresponding base quality
Value;
It preferably, include 2 pairs for each read pairs family on identical comparison position in step (6)
The read pairs family of the above read pairs, which is just further used in, generates SSCS;According to Bayes' theorem, determine first
The method for testing probability is, if the base observed is consistent with possible true base, prior probability 1-10-q/10, otherwise
It is 10-q/10/ 3, q are base mass value, this distribution p (b, bi,qi) description;Base possible for 4 kinds, according to following formula
One calculating posterior probability uses the corresponding base/quality of read pairs family for each base positions on SSCS
Value is to (bi,qi), the probability (b ∈ { A, C, G, T }) when consistency base I is b is calculated, the maximum base type of probability value is
True base thereby determines that the true base of each position;Simultaneously according to following formula two, the posterior probability of true base is used
Value recalculates base quality, obtains the consensus sequence reads after error correction;
qc=-10log10(1-P [I=bc|{(bi,qi)]) formula two;
Preferably, in step (7), when two SSCS generate DSCS, if corresponding position base is identical, retain this base,
Otherwise by this position, base is changed to N.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811608440.XA CN109439729A (en) | 2018-12-27 | 2018-12-27 | Detect connector, connector mixture and the correlation method of low frequency variation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811608440.XA CN109439729A (en) | 2018-12-27 | 2018-12-27 | Detect connector, connector mixture and the correlation method of low frequency variation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109439729A true CN109439729A (en) | 2019-03-08 |
Family
ID=65537665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811608440.XA Pending CN109439729A (en) | 2018-12-27 | 2018-12-27 | Detect connector, connector mixture and the correlation method of low frequency variation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109439729A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109949862A (en) * | 2019-03-13 | 2019-06-28 | 拓普基因科技(广州)有限责任公司 | A kind of microsatellite instability detection method of blood ctDNA |
CN110129415A (en) * | 2019-05-17 | 2019-08-16 | 凯杰(苏州)转化医学研究有限公司 | A kind of NGS builds library molecular adaptor and its preparation method and application |
CN110257480A (en) * | 2019-07-04 | 2019-09-20 | 北京京诺玛特科技有限公司 | Nucleic acid sequence sequence measuring joints and its method for constructing sequencing library |
CN110265086A (en) * | 2019-07-04 | 2019-09-20 | 北京肿瘤医院(北京大学肿瘤医院) | Gene detection method and device |
CN110846382A (en) * | 2019-11-29 | 2020-02-28 | 北京科迅生物技术有限公司 | Method for enriching free DNA of fetus |
CN111073961A (en) * | 2019-12-20 | 2020-04-28 | 苏州赛美科基因科技有限公司 | High-throughput detection method for gene rare mutation |
WO2021013244A1 (en) * | 2019-07-25 | 2021-01-28 | 北京贝瑞和康生物技术有限公司 | Method for constructing capture library and kit |
CN112631562A (en) * | 2020-12-01 | 2021-04-09 | 上海欧易生物医学科技有限公司 | Second-generation sequencing sample mixing method based on python, application, equipment and computer-readable storage medium |
CN112626189A (en) * | 2020-04-24 | 2021-04-09 | 北京吉因加医学检验实验室有限公司 | Short joint, double-index joint primer and double-index library construction system of gene sequencer |
CN113005188A (en) * | 2020-12-29 | 2021-06-22 | 阅尔基因技术(苏州)有限公司 | Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing |
CN113628683A (en) * | 2021-08-24 | 2021-11-09 | 慧算医疗科技(上海)有限公司 | High-throughput sequencing mutation detection method, equipment, device and readable storage medium |
CN114032288A (en) * | 2021-12-10 | 2022-02-11 | 北京吉因加医学检验实验室有限公司 | Kit and method for preparing target nucleotide for sequencing by using same |
CN114107290A (en) * | 2021-11-19 | 2022-03-01 | 杭州杰毅生物技术有限公司 | Sequencing joint and sequencing analysis system thereof |
CN114250284A (en) * | 2021-12-31 | 2022-03-29 | 深圳市核子基因科技有限公司 | Paternity test method based on fetal free DNA in peripheral blood of pregnant woman |
CN114317528A (en) * | 2020-09-30 | 2022-04-12 | 北京吉因加科技有限公司 | Specific molecular label UMI group, mixed specific molecular label joint and application |
CN114530199A (en) * | 2022-01-19 | 2022-05-24 | 重庆邮电大学 | Method and device for detecting low-frequency mutation based on double sequencing data and storage medium |
CN116110496A (en) * | 2023-01-05 | 2023-05-12 | 深圳市海普洛斯医疗系统科技有限公司 | Method, device, equipment and storage medium for rapidly detecting joint sequence |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599616A (en) * | 2017-01-03 | 2017-04-26 | 上海派森诺医学检验所有限公司 | duplex-seq-based ultralow-frequency mutation site detection analysis method |
CN108728515A (en) * | 2018-06-08 | 2018-11-02 | 北京泛生子基因科技有限公司 | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods |
-
2018
- 2018-12-27 CN CN201811608440.XA patent/CN109439729A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599616A (en) * | 2017-01-03 | 2017-04-26 | 上海派森诺医学检验所有限公司 | duplex-seq-based ultralow-frequency mutation site detection analysis method |
CN108728515A (en) * | 2018-06-08 | 2018-11-02 | 北京泛生子基因科技有限公司 | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods |
Non-Patent Citations (1)
Title |
---|
MACCONAILL, L. E等: "Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing", 《BMC GENOMICS》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109949862A (en) * | 2019-03-13 | 2019-06-28 | 拓普基因科技(广州)有限责任公司 | A kind of microsatellite instability detection method of blood ctDNA |
CN110129415A (en) * | 2019-05-17 | 2019-08-16 | 凯杰(苏州)转化医学研究有限公司 | A kind of NGS builds library molecular adaptor and its preparation method and application |
CN110129415B (en) * | 2019-05-17 | 2023-08-18 | 迈杰转化医学研究(苏州)有限公司 | NGS library-building molecular joint and preparation method and application thereof |
CN110257480A (en) * | 2019-07-04 | 2019-09-20 | 北京京诺玛特科技有限公司 | Nucleic acid sequence sequence measuring joints and its method for constructing sequencing library |
CN110265086A (en) * | 2019-07-04 | 2019-09-20 | 北京肿瘤医院(北京大学肿瘤医院) | Gene detection method and device |
WO2021013244A1 (en) * | 2019-07-25 | 2021-01-28 | 北京贝瑞和康生物技术有限公司 | Method for constructing capture library and kit |
CN110846382A (en) * | 2019-11-29 | 2020-02-28 | 北京科迅生物技术有限公司 | Method for enriching free DNA of fetus |
CN110846382B (en) * | 2019-11-29 | 2023-02-28 | 北京科迅生物技术有限公司 | Method for enriching free DNA of fetus |
CN111073961A (en) * | 2019-12-20 | 2020-04-28 | 苏州赛美科基因科技有限公司 | High-throughput detection method for gene rare mutation |
CN112626189A (en) * | 2020-04-24 | 2021-04-09 | 北京吉因加医学检验实验室有限公司 | Short joint, double-index joint primer and double-index library construction system of gene sequencer |
CN114317528A (en) * | 2020-09-30 | 2022-04-12 | 北京吉因加科技有限公司 | Specific molecular label UMI group, mixed specific molecular label joint and application |
CN112631562A (en) * | 2020-12-01 | 2021-04-09 | 上海欧易生物医学科技有限公司 | Second-generation sequencing sample mixing method based on python, application, equipment and computer-readable storage medium |
CN113005188A (en) * | 2020-12-29 | 2021-06-22 | 阅尔基因技术(苏州)有限公司 | Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing |
CN113628683A (en) * | 2021-08-24 | 2021-11-09 | 慧算医疗科技(上海)有限公司 | High-throughput sequencing mutation detection method, equipment, device and readable storage medium |
CN113628683B (en) * | 2021-08-24 | 2024-04-09 | 慧算医疗科技(上海)有限公司 | High-throughput sequencing mutation detection method, device and apparatus and readable storage medium |
CN114107290A (en) * | 2021-11-19 | 2022-03-01 | 杭州杰毅生物技术有限公司 | Sequencing joint and sequencing analysis system thereof |
WO2023087527A1 (en) * | 2021-11-19 | 2023-05-25 | 杭州杰毅生物技术有限公司 | Sequencing adapter and sequencing analysis system thereof |
CN114032288A (en) * | 2021-12-10 | 2022-02-11 | 北京吉因加医学检验实验室有限公司 | Kit and method for preparing target nucleotide for sequencing by using same |
CN114250284A (en) * | 2021-12-31 | 2022-03-29 | 深圳市核子基因科技有限公司 | Paternity test method based on fetal free DNA in peripheral blood of pregnant woman |
CN114530199A (en) * | 2022-01-19 | 2022-05-24 | 重庆邮电大学 | Method and device for detecting low-frequency mutation based on double sequencing data and storage medium |
CN116110496A (en) * | 2023-01-05 | 2023-05-12 | 深圳市海普洛斯医疗系统科技有限公司 | Method, device, equipment and storage medium for rapidly detecting joint sequence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109439729A (en) | Detect connector, connector mixture and the correlation method of low frequency variation | |
JP6664025B2 (en) | Systems and methods for detecting rare mutations and copy number variations | |
US20240102101A1 (en) | Systems and methods to detect rare mutations and copy number variation | |
CN107475375B (en) | A kind of DNA probe library, detection method and kit hybridized for microsatellite locus related to microsatellite instability | |
CN105518151B (en) | Identification and use of circulating nucleic acid tumor markers | |
CN107075730A (en) | The identification of circle nucleic acid and purposes | |
JP2022120007A (en) | Noninvasive diagnostics by sequencing 5-hydroxymethylated cell-free dna | |
CN104204220B (en) | A kind of hereditary variation detection method | |
CN108300716A (en) | Joint component, its application and the method that targeting sequencing library structure is carried out based on asymmetric multiplex PCR | |
US9663826B2 (en) | System and method of genomic profiling | |
CN103403183A (en) | Noninvasive detection of fetal genetic abnormality | |
CA2877493A1 (en) | Highly multiplex pcr methods and compositions | |
CN110168108A (en) | Rareness DNA's deconvoluting and detecting in blood plasma | |
CN109844132A (en) | The method for analyzing nucleic acid fragment | |
CN113614246A (en) | Methods and compositions for identifying tumor models | |
JP2021526791A (en) | Methods and systems for determining the cellular origin of cell-free nucleic acids | |
CN111321202A (en) | Gene fusion variation library construction method, detection method, device, equipment and storage medium | |
JP2024112999A (en) | Cell-free DNA damage analysis and its clinical application | |
KR20220032525A (en) | Methods and systems for detecting residual disease | |
CN108504649B (en) | Coding PCR second-generation sequencing database building method, kit and detection method | |
CN106282161A (en) | Special capture and repeat replication low frequency DNA base variation method and application | |
CN109680054A (en) | A kind of detection method of low frequency DNA mutation | |
CN112410329A (en) | Primer combination, kit and application of kit in early screening of ovarian cancer | |
CN109790570A (en) | The method for obtaining the single celled base sequence information from vertebrate | |
CN108676869B (en) | Method and kit for detecting genetic deafness gene mutation site |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200421 Address after: 215123 Unit 201, B7 Building, 218 Xinghu Street, Suzhou Industrial Park, Jiangsu Province Applicant after: XUKANG MEDICAL SCIENCE & TECHNOLOGY (SUZHOU) Co.,Ltd. Address before: 201318 4-5 Floors, Area A, Building 19, 3399 Lane, Kangxin Road, Pudong New District, Shanghai Applicant before: SHANGHAI JINGZHOU GENE TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190308 |