CN114150047B - Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing - Google Patents

Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing Download PDF

Info

Publication number
CN114150047B
CN114150047B CN202111620536.XA CN202111620536A CN114150047B CN 114150047 B CN114150047 B CN 114150047B CN 202111620536 A CN202111620536 A CN 202111620536A CN 114150047 B CN114150047 B CN 114150047B
Authority
CN
China
Prior art keywords
base
sequence
value
information
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111620536.XA
Other languages
Chinese (zh)
Other versions
CN114150047A (en
Inventor
罗俊峰
王一帆
徐雪
陈曦
宋萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Carrier Gene Technology Suzhou Co ltd
Original Assignee
Carrier Gene Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Carrier Gene Technology Suzhou Co ltd filed Critical Carrier Gene Technology Suzhou Co ltd
Publication of CN114150047A publication Critical patent/CN114150047A/en
Application granted granted Critical
Publication of CN114150047B publication Critical patent/CN114150047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Organic Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a method for evaluating base damage, mismatching and variation in sample DNA by first-generation sequencing, which adopts a molecular label to mark DNA original molecules with damage or mismatching in the PCR amplification process on one hand, and carries out enrichment amplification on a sampling region on the other hand, amplifies the damage or mismatching information of about 0.1 percent to 10-99 percent, then respectively evaluates the ratio values of the base damage, the mismatching and the variation in the sample DNA by adopting an evaluation method based on the enrichment amplification effect and an evaluation method based on the type quantity of the molecular label, and judges the ratio values of the base damage, the mismatching and the variation in the sample DNA according to the credible results of the two methods. The method can accurately confirm the real existence of the damage or the mismatch by adopting an economical and rapid sanger sequencing method, can be favorable for optimizing a sample DNA extraction technology and a storage method, and helps to evaluate the quality of the sample DNA.

Description

Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing
Technical Field
The invention belongs to the technical field of gene detection, and particularly relates to a method for evaluating base damage, mismatching and variation in sample DNA by using first-generation sequencing.
Background
With the development of technology, in the field of DNA detection, especially cancer detection, people pay more and more attention to low-proportion mutation information, for example, 0.1% of body mutation information is one of the important indexes in the current liquid biopsy field, and gradually, people will not meet the 0.1% index any more, and further, if the level reaches 0.01%, the problem of how to distinguish mutation from mismatch and base damage will be faced.
First, the meaning of the two concepts of mutation and mismatch will be clarified. On the single copy cell level, such as single sperm and ovum, which are haploid, the concept of mutation is difficult to apply to haploid cells, the more conventional mutation is a population or a collective concept, such as human genome hg19, where the base at Chr1:2,000 is C, then if 1 sperm cell out of 1000 has a C > T mutation, and the other cells retain wild type C, we say that 0.1% of C > T mutation has occurred at this position, whereas in the case of T-containing sperm cells, the Chr1:2,000 position is normal T: a pairing and no mutation has occurred, whereas the mismatch described in this patent refers to the fact that Chr1:2,000 is not normal C: G pairing but T: G pairing does not meet the base pairing principle, which is a base pairing error in double strands, called base pairing, and if the base pairing is not repaired by a system, after a certain concept, DNA is replicated once, thus a progeny of DNA and a progeny are generated with a mutation.
The base damage and base mismatch can be formed in nature or in the postnatal; the base mismatching formed in nature means that in the process of division and proliferation of cells in organism cells, because of errors of an in vivo DNA replication system, G errors are matched with non-C bases, and the errors are not repaired by an in vivo repair system and are further reserved; the acquired base damage refers to damage caused by inappropriate or limited technologies, methods and conditions in the process of extracting DNA, for example, cytosine C is oxidized under the oxidizing condition to generate oxidative damage, deamination reaction is generated to become deaminated cytosine, and then the deaminated cytosine is considered to be uracil in the copying process and is matched with A; for example, G is easy to form 8-oxoG under the oxidation condition, and then the oxoG is easy to match with A in the replication process; generally, when these damaged bases and mismatches are stably inherited in an organism, mutations are formed, and mutations occurring at key positions of key genes accumulate to a certain extent, which may cause serious diseases such as cancer and may cause drug resistance. It is obvious that if the base damage caused by the acquired is easy to disturb the index of ten-thousandth or one-thousandth, so that the damage and the mismatch of deoxyribonucleotides in a sample are very important to be evaluated, especially important for some key mutation hot spots, and C > T and G > A caused by the base damage at the positions can cause false positive interference.
Because the probability and the proportion of the occurrence of the damage and the mismatch are very low, the current known sensitivity is about ten thousandth and lower than that of a conventional technical platform, for example, the error rate of a second-generation sequencing platform is about a thousandth, and therefore, the detection sensitivity of the second-generation sequencing platform is about 1%; some techniques of the qPCR platform have best detection sensitivity at 0.2%. Therefore, in the technical aspect, if a low proportion of variant information is to be detected, the molecular labeling technology is not separated, but the molecular labeling technology is seriously dependent on high-depth sequencing, and the time period is long, so that the popularization of detection items is not facilitated.
Disclosure of Invention
In order to solve the technical problems, the invention discloses a method for evaluating low-proportion base damage or base mismatch in a DNA sample, and simultaneously can evaluate the proportion of low-proportion base mismatch and base change naturally existing in an organism, thereby being beneficial to optimizing a sample DNA extraction technology and a storage method and helping to evaluate the quality of the sample DNA.
The first objective of the invention is to provide a method for evaluating base damage, mismatch and variation in sample DNA by one-generation sequencing, which comprises the following steps:
s1, adding a nucleic acid composition capable of inhibiting a non-target region (the non-target region refers to a region which does not have base damage, mismatch or variation and correspondingly has base damage, mismatch or variation) in a DNA sample and an amplification primer with an error-correctable molecular tag library, amplifying the DNA sample, and sequencing a product amplified by PCR by adopting a first-generation sequencing technology;
wherein the nucleic acid composition that inhibits non-target regions in the DNA sample is designed based on the sampling region in the DNA sample;
s2, obtaining sequencing data of the PCR amplification product in the step S1, and analyzing the sequencing data of the product by adopting an evaluation method based on enrichment amplification and an evaluation method based on the type number of the molecular tags to obtain ratio values of base damage, mismatching and variation in the DNA of an evaluation sample;
and S3, when the results of the evaluation method based on the enrichment amplification effect and the evaluation method based on the number of the molecular tag varieties simultaneously have credible results, adopting the evaluation method based on the number of the molecular tag varieties as the ratio value of base damage, mismatching and variation in the evaluation sample DNA.
Among them, the design method of nucleic acid composition capable of inhibiting non-target region in DNA sample is disclosed in Chinese patent No. 2020115796048.
The design method of the amplification primer of the error-correctable molecular label library is disclosed in Chinese patent with application number 2020115404605.
Further, the evaluation method based on enrichment amplification is analyzed by the following steps:
s01, representing the enrichment amplification effect of each sampling region by using an Efold value, wherein the calculation formula is as follows:
Efold=(VRF/VAF)×[(1-VAF)/(1-VRF)],
wherein VAF is the initial proportion of variant information in the sample; VRF is the variation information proportion of the sample in the detection result;
s02, obtaining an Efold value of each sampling region through testing of the standard substance, calculating a VRF value through the peak value proportion of different bases in sequencing data of a PCR amplification product, and calculating a VAF value through the following formula when the VRF satisfies 5% < = VRF < = 95%:
VAF=VRF/(Efold-Efold×VRF+VRF),
when VRF does not satisfy 5% < = VRF < =95%, the evaluation method result based on enrichment amplification is not reliable.
For example, given that the proportion of variation information in a standard sample is 0.1%, then VAF =0.1%, after enrichment amplification, PCR products were sequenced and found to contain 50% variation information, then VRF =50%, when:
Efold=(50%/0.1%)×[(1-0.1%)/(1-50%)]=999
if a PCR reaction does not enrich for amplification, i.e., VAF =0.1%, VRF will also be 0.1%, then,
Efold==(0.1%/0.1%)×((1-0.1%)/(1-0.1%))=1
it can be seen that when Efold =1, the whole reaction system has no enrichment amplification effect on the variation information; the table below illustrates the Efold calculated from different VAFs and VRFs for a particular reaction system, which embodies the inherent characteristics of that reaction system.
Efold
VAF=0.1%,VRF=0.1% 1
VAF=0.1%,VRF=50% 999
VAF=0.1%,VRF=90% 8991
VAF=1%,VRF=50% 99
VAF=1%,VRF=90% 891
VAF=1%,VRF=99% 9801
VAF=5%,VRF=99% 1881
As can be seen from the above table, when VRF infinitely approaches 100%, VAF value and Efold cannot be directly proportional, for example, the case of VAF =1%, VRF =99% and VAF =5%, VRF =99% indicates that the amplification and enrichment of a reaction are saturated when the reaction is 1%, and if the value of Efold at VAF =5% indicates that the amplification and enrichment of a reaction are inaccurate, we stipulate that: the Efold value for a particular reaction must be obtained at 5% < = VRF < = 95%.
The step S02 is specifically illustrated in the following table by way of example, when Efold is known, different VRFs can calculate VAF in the target sample to be measured
Figure BDA0003437372130000031
Figure BDA0003437372130000041
It should be noted that if a homozygous peak of variant information is present in the sanger signal, which means that the signal may be saturated, i.e. VRF is close to 100%, there is a high possibility that there is no direct relationship between VRF and VAF, for example, when VAF =5%, VRF =99% in the sanger sequencing result; when VAF =10%, sanger VRF is 99%, and thus VAF cannot be distinguished to be 5% or 10%, VAF = VRF/(Efold-Efold × VRF + VRF) can only be reasonably established in the linear range when 5% < = VRF < =95%, and when VRF >95% or VRF <5%, it means that the base damage and/or base mismatch ratio of the target sample to be measured is out of the detection range of the method disclosed in this patent.
Further, the evaluation method based on the number of the molecular tag species is analyzed by the following steps:
s001, outputting the UMInum of the type quantity of one molecular label sequence by a DNA sequence identification method based on sequencing data of a PCR amplification product and a known molecular label sequence;
s002, when UMInum < =10, the calculation formula of the ratio Pdm% of base damage, mismatching and mutation is as follows:
Pdm%=UMInum/(Ng×1000×2/6.67)×100%,
wherein Ng = mass of DNA charged in the reaction;
when UMInum >10, the results of the evaluation method based on the number of molecular tag species are not reliable.
For example, ng =10Ng, uminum =5,
the ratio of base damage to mismatch of wrong bases Pdm% =5/2998.5 × 100% =0.17%.
Further, before calculating the VRF value or before outputting the parameter umium, the method comprises the step of identifying variant information:
s0001, obtaining a base line value Noise of a sanger sequencing signal c (ii) a The method comprises the following steps:
a) Reading the Sanger AB1 file to obtain the signal value Fluor of each signal sampling of each fluorescence channel in the file cs And the number S of signal samples per base k
Fluor ck Number S of signal samples at base k for fluorescence channel c k Maximum in the i region, fluor ck The calculating method comprises the following steps:
Fluor ck =max{Fluor cs :s=S k -i..S k +i}
wherein i can be a positive integer within 0 to 5;
b) For each fluorescence channel there is a maximum at all N base positions
Figure BDA0003437372130000042
The maximum of the M bases (as given in the Sanger AB1 document) identified as corresponding to fluorescence channel c in the sequencing of one generation was removed from this to give a new set of maxima:
Figure BDA0003437372130000043
c) Calculating out
Figure BDA0003437372130000051
Removing the value of which the difference with the median value exceeds the average absolute deviation by n times, wherein n can be 2-5, and calculating the average value Noise of the rest maximum value c As background noise baseline for fluorescence channel c;
d) Subtracting the background noise value of the corresponding fluorescence channel from the signal value of all fluorescence channel signal samples to obtain FluorNN cs (No Noise):
FluorNN cs =Fluor cs -Noise c
S0002, searching a regional signal peak value according to the signal change of each fluorescence channel:
traversing the peak value of the fluorescence channel when the width W of one base k Only any channel in the region has a peak value, the region has a base, and the type of the base is the base type corresponding to the channel with the peak value; when one base is wide W k When there are multiple channels in the region (2) and there are peaks, there may be multiple bases in the region, the base type corresponding to the channel with the highest peak is the main base in the region, and the peaks of other channels, so as toThe proportion of the peak data in the peak value of the main base channel is taken as the basis, when the proportion is higher than a threshold value, the base type corresponding to the channel is an alternative base type of the region, otherwise, the alternative base type does not exist; obtaining a candidate base sequence A consisting of main bases and alternative bases, and labeling alternative base types at positions where the alternative bases exist;
wherein the one base width W k The area of (a) is defined as: if Sanger AB1 contains N bases, the number of signal samples at base k is S k The number of samples of the signal where the previous base is located is S k-1 The number of signal samples of the next base is S k+1 Then the base width region start position WS of base k k The following formula is obtained:
Figure BDA0003437372130000052
base width region termination position WE of base k k The following formula is obtained:
Figure BDA0003437372130000053
wherein the one base width W k The peak value in the region of (a) is defined as: fluorescence channel c was mapped to s ∈ (WS) using find _ peaks algorithm of Scipy k ,WE k ) FluorNN signal values after background noise removal of regions cs Calculating a peak value of the region; if no peak exists, the fluorescence channel c is at the base width W k There is no peak in the region of (a); if one or more peaks exist, the peak with the largest signal value is taken as the base width W of the fluorescence channel c k A peak within the region of (a);
s0003, obtaining a candidate base sequence B coded by IUPAC according to a first generation sequencing result:
the candidate base sequence B represents the full-length sequence of the PCR product and comprises a candidate base sequence B1, a candidate base sequence B2 and a candidate base sequence B3, wherein the candidate base sequence B1 is the sequence of the molecular tag library position, the candidate base sequence B2 is the sequence of the sample DNA sampling region, and the candidate base sequence B3 is the sequence except the sequence of the molecular tag library position and the sequence of the sample DNA sampling region; combining the main base and the alternative base in the candidate base sequence A by using IUPAC (International Union of Pure and Applied Chemistry) recommended base coding rule to obtain a candidate base sequence B coded by IUPAC; such as:
Figure BDA0003437372130000061
IUPAC coding table:
Figure BDA0003437372130000062
s0004, identifying variation information in the first-generation sequencing result:
1) Identifying information that the candidate base sequence B is different from the known reference sequence R (i.e., the sequence of the reference sequence genome, for example, hg 19) by using a method of calculating alignment information;
the method for calculating the para-position information is to compare the candidate base sequence B coded by the IUPAC with a known reference sequence R by using a sequence comparison Algorithm Gotoh's Algorithm and NUC.4.4 IUPAC code comparison fraction table; selecting the result with the highest comparison score as the alignment result of the candidate base sequence B and the known reference sequence R to obtain the para-position information of the candidate base sequence B and the known reference sequence R; 2) Using a para-position information calculation method to obtain para-position information of candidate base sequences B2 and B3 and a known reference sequence R, and aligning the two sequences; scanning the aligned candidate base sequences B2 and B3 and the known reference sequence R to obtain base information which is different from the known reference sequence R in the IUPAC sequence and is variation information;
wherein, define Base k For a certain Base position, the reference Base kr Is Base information in the reference sequence, and the Base different from the known reference sequence R is Base km (ii) a Specific position of candidate base sequences B2 and B3Base of position k From the reference Base kr And Base representing impairment, mismatch or variation information km And (4) forming.
Such as Base at a certain Base position k If in the IUPAC sequence "M" (corresponding to "A" or "C") and in the reference sequence "A" is present, then the position is considered to have a variation information of Base type "C", which we define, with reference to the Base kr Is the Base information in the reference sequence, such as the above-mentioned "A", and the Base different from the reference sequence R is called Base km Such as the "C" mentioned above, see Base km Contains information of Base damage, mismatching, change or variation, the Base km Is a reference to a particular Base type, so that the same position may have multiple bases km
Further, the VRF value is calculated by the following formula:
Figure BDA0003437372130000071
wherein Peak (Base) km ) Is the Base km The peak fluorescence signal of (a) is,
Figure BDA0003437372130000072
is Base k The sum of the peak fluorescence signals of the medium bases (including the main base and the alternative bases).
Further, the type number of the molecular tag sequences, UMInum, is obtained by the following method:
taking the adjacent amplification primers of B1 as known reference sequences, and using a para-position information calculation method for the candidate base sequence B to obtain the para-position information of the candidate base sequence B and the amplification primers, and aligning the two sequences; obtaining a candidate base sequence B1 from the aligned sequence according to the known length information of the B1 sequence;
extracting N at each position of the candidate base sequence B1 - Information as a characteristic value, said N - The information being Base k Types of bases not covered, e.g., position 1 of the candidate base sequence B1Is W (A/T), then N in bit number 1 - The information is S (G/C), and if the position 2 of the candidate base sequence B1 is H (A/T/C), the position 2 is N - Information G, N of the candidate nucleotide sequence B1 - The collection of information is defined as Index B Each known sequence in the library of tag sequences is defined as an Index l An Index l Index for each position of (1) B Information is excluded, tag sequence library Index l The number of the remaining molecular labels in the sequence is UMInum.
It is a second object of the present invention to provide an analysis device for evaluating base damage, mismatch and variation in a sample DNA by one-generation sequencing, the analysis device comprising:
the data extraction module is used for acquiring base sequence information and fluorescence signal data in a generation of sequencing AB1 file;
the preprocessing module is used for removing background noise of the fluorescence signal and generating a candidate base sequence;
the analysis module is used for analyzing and acquiring variation information in a generation of sequencing results;
and the label processing module is used for analyzing and calculating the number of the molecular label types UMInum in the PCR product.
It is a third object of the invention to provide a server, comprising one or more processors and memory,
the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the method for evaluating base damage, mismatch and variation in sample DNA by one-generation sequencing.
It is a fourth object of the present invention to provide a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the method for evaluating base damage, mismatches and variations in sample DNA using one generation sequencing.
By means of the scheme, the invention at least has the following advantages:
in the PCR amplification process, on one hand, a molecular label is adopted to mark DNA original molecules with damage or mismatch, on the other hand, enrichment amplification is carried out on a sampling region, damage or mismatch information of less than 0.1 percent is amplified to 10-99 percent, then, the ratio values of base damage, mismatch and variation in sample DNA are respectively evaluated by an evaluation method based on the enrichment amplification effect and an evaluation method based on the type number of the molecular label, and the ratio values of base damage, mismatch and variation in the sample DNA are judged according to the credible results of the two methods. The method can accurately confirm the real existence of the damage or the mismatch by adopting an economical and rapid sanger sequencing method, can be favorable for optimizing a sample DNA extraction technology and a storage method, and helps to evaluate the quality of the sample DNA.
The foregoing is a summary of the present invention, and the following is a detailed description of the preferred embodiments of the present invention, so that the technical solutions of the present invention can be more clearly understood.
Drawings
FIG. 1 shows the evaluation results of peripheral blood DNA sample base damage Sanger;
FIG. 2 is a bit sequence chart of 100 molecular tag sequences;
FIG. 3 is a schematic diagram of obtaining N-information;
FIG. 4 is a schematic diagram of molecular tags excluded from Excel using base order and N-information in example 2;
FIG. 5 is a schematic diagram of the use of N-information for exclusion and verification of the presence of molecular tags in example 2;
FIGS. 6a and 6b are bit-order tables of 100 molecular tag sequences in example 3;
FIG. 7 shows the results of Sagner sequencing in example 3;
FIG. 8 is a schematic diagram of molecular tags excluded from Excel using base order and N-information in example 3;
FIG. 9 is a schematic diagram of the use of N-information for exclusion and verification of the presence of molecular tags in example 3.
Detailed Description
Example 1: one-generation sequencing to evaluate the extent of base damage in sample DNA
1. Sampling regions were set at 4 positions in the human genome and primer pair combinations for PCR were designed as follows:
Name Seq(5’-3’)(SEQ ID NO.1~13) 50mM,25℃,deltaG
DmDe1-FP CCCTGACAACATAGTTGGAATCA -27.4
DmDe1-RP ACTCCAGGATAATACACATCACAGT -29.2
DmDe1-BL TGGAATCACTCATGATATCTCGAGCCAT -34.0
DmDe2-FP AGCAGTCTCTGCCTCGC -24.5
DmDe2-RP AGAAGATTCGGCAGAACTAAGCA -28.5
DmDe2-BL CCTCGCCAAGCGGCTCATGTTAATATT -35.0
DmDe4-FP AGAAGATGTGGAAAAGTCCCAATG -28.4
DmDe4-RP GTGCCCAGGTCAGTGGAT -24.7
DmDe4-BL TCCCAATGGAACTATCCGGAACATCCA -34.1
DmDe6-FP TCCTTTAACCACATAATTAGAATCATTCTTGA -33.9
DmDe6-RP AGTTAGTTTTCACTCTTTACAAGTTAAAATGA -33.5
DmDe6-BL ATCATTCTTGATGTCTCTGGCTAGACCAAA -35.6
UNITag tgtaaaacgacggccagtaca
note that: the RP sequences in the table are only specific sequence parts, and in the preparation process, a 5-tgtaaaacgacggcgccagtaca (N28) -RP structure is constructed by adding a UNITaq sequence, wherein N28 is 100 UMI sequences in example 2.
2. And (4) customizing a synthesized positive mutant plasmid template according to the hg19 reference sequence information. The sequences of the regions near the sampling region in the positive mutant template were as follows:
Name Seq(5’-3’)(SEQ ID NO.14~17)
Plasmid01 TGGAATCACTCATGATA--TCGAGCCA
Plasmid02 CCTCGCCAAGC--CTCATGTTA
Plasmid04 TCCCAATGGAACTAT--GGAACATCC
Plasmid06 ATCATTCTTGATGTCTCTG--TAGACCAAA
wherein "- -" means a deletion of 2 bases.
3. Preparing 0.1% of variation standard. And (3) configuring a standard product: the method comprises the steps of quantifying by using a qubit, calculating theoretical molecular number according to molecular mass of a plasmid template, gradually preparing a 0.1% variation standard substance, correcting and adjusting by using ddPCR to prepare 0.1% with a smaller relative error, and subsequently continuously correcting by using a second-generation sequencing result.
4. Efold values for each sampled region were obtained by NGS sequencing
a) Configuration of 5 × Oligo mix with BL system
Components Primer concentration (μ M) Volume (μ L)
FP 100 20
RP 100 20
BL 100 100
0.1×TE Make up to 1000 μ L
Total 1000μL
b) Configuration of 5 Xoligo mix w/o BL system (for use as a control in evaluating samples, the same amount of PCR system as used in with BL group)
Components Primer concentration (μ M) Volume (μ L)
FP 100 20
RP 100 20
0.1×TE Make up to 1000 μ L
Total 1000μL
c) Configuration of PCR System
Reagent composition Volume (μ L)
5×Oligo Mix with BL 6μL
2 XDNA polymeraseMaster Mix 15μL
0.1% standard substance 300ng
Nuclease Free Water Make up to 30 mu L
d) UMI-PCR amplification procedure
Figure BDA0003437372130000091
Figure BDA0003437372130000101
After the PCR was completed, 1 unit of exonuclease I was added to each reaction, and the reaction was incubated at 37 ℃ for 30 minutes and inactivated at 80 ℃ for 30 minutes. A further 2. Mu.L of 10. Mu.M FP and 2. Mu.L of 10. Mu.M UNITag were added for the subsequent PCR amplification procedure.
e) Subsequent PCR Process
Figure BDA0003437372130000102
5. Constructing a library of the PCR product after reaction by using a commercial second-generation sequencing library construction kit, sequencing on an Illumina platform, analyzing the number of the molecular label varieties of reads containing 2bp deletion variation information and analyzing the number of the molecular label varieties of reads containing wild type information at the same time, wherein the ratio of the two types is corrected VAF; the number of reads containing the variant information and the number of reads of the wild-type information are analyzed, and the ratio of the two numbers is VRF. The Efold value for each sample position is calculated.
VAF before NGS correction VAF after NGS correction VRF Efold
DmDe1 0.1% 0.25% 57.2% 533.2
DmDe2 0.1% 0.31% 83.5% 1627.4
DmDe4 0.1% 0.15% 48.4% 624.4
DmDe6 0.1% 0.23% 61.0% 678.5
6. Peripheral blood DNA samples to be evaluated were selected, DNA input =30ng, and then done with both BL and w/o BL groups, ensuring no contamination, while in comparison of the two groups, enrichment and amplification effects could be seen, some of the results are shown in fig. 1, and w/o BL groups can be seen to display wild-type information, meaning no enrichment amplification.
7. From the Efold obtained in NGS results and the VRF obtained in Sanger analysis procedure, according to the formula: VAF = VRF/(Efold-Efold × VRF + VRF) VAF in the original sample is calculated:
name of sampling area Base position information Efold VRF VAF
DmDe1 9G>A 533.2 73% 0.50%
DmDe2 11C>T 1627.4 9% 0.01%
12C>T 1627.4 11% 0.01%
13G>A 1627.4 47% 0.05%
14C>T 1627.4 29% 0.03%
DmDe4 6T>C 624.4 8% 0.01%
10G>A 624.4 31% 0.07%
DmDe6 10G>A 678.5 50% 0.15%
12G>A 678.5 35% 0.08%
Since there may be many base positions in the sample region where damage or mismatch may occur, we estimate the final degree of damage or mismatch as a range, such as DmDe2, and we consider the degree of damage or mismatch to be 0.01% to 0.05%, considering that 30ng input has about 9000 copies, the original molecules of the detected damage or mismatch may be around 1-5. Meanwhile, the conditions of C > T and G > A are the most frequently found in a large number of tests, and the conditions that cytosine is easy to mismatch with T after deamination and G oxidation are also shown in the literature.
Example 2: logic demonstration for analyzing number of molecular labels UMInum from sanger result
1. 100 molecular tags of known sequence were prepared, 28nt each, and each base was space-occupied separately as shown in FIG. 2.
2. Assuming that the PCR product contains 5 molecular tag sequences as shown in FIG. 3, after one-generation sequencing, based on the sanger results, N at each position - Information is available as shown in fig. 3.
3. According to N - The known sequence of the information filtering molecular tag, for example, at the 16 th base, the molecular tag which does not contain g and t at the position is excluded, and N at the 1 st to 16 th positions is passed - After the exclusion of the information, only 15 molecular tags remain, as shown in fig. 4;
4. continue according to N - Information is excluded that when proceeding to base 28, eventually leaves 5 molecular tags, just the 5 previously hypothesized to exist, e.g.FIG. 5 is a schematic illustration;
5. this example describes the use of molecular tags of known sequence to obtain N after sanger sequencing - The information is used for reversely deducing the logic of the number of the molecular label types in the PCR product, and the specific actual analysis is completed by the written software.
Example 3
In order to show the more general utility of the present invention, this example was designed at a different location in the human genome from that of example 1, and the same principle as that of example 1 was applied to the primer design principle of this example, refer to the earlier patent of this company, CN110923325A, primer set, kit and method for detecting EGFR gene mutation, and CN110982884A, primer set, kit and method for detecting AML-related gene mutation;
SSL3-FP:CCAGAAAACAGGCAGGTCTCTC
SSL3-BL:CAGGTCTCTCTGCTCTTGACCGAGC
SSL3-RP:ACAGCAGGCAGTTGGGA
the UNITaq sequence is the same as that in example 1, and the SSL3-RP sequence in this example is only a specific sequence part, and in the preparation process, the UNITaq sequence is added to construct a 5-tgtaaaacgacggctagtaca (N28) -RP structure, wherein N28 is 100 UMI sequences as shown in FIGS. 6a and 6B, and the design part of UMI refers to CN110060734B barcode generation and reading method for high robustness DNA sequencing, and the difference is that the barcode designed in CN110060734B is used for sample differentiation, and the reading mode is more complex, and the scheme of the present invention is used for distinguishing different original molecules in a sample, and simultaneously has a simpler reading and identifying mode.
Experimental method referring to example 1, according to hg19 reference sequence information, a synthetic positive mutant plasmid template is customized, the specific position is C > G near the position 80 in fig. 7, configured into 0.1% of a variant standard, a PCR product is directly subjected to Sanger sequencing, and the sequencing is repeated for 3 times, and the experimental result is shown in fig. 7, the horizontal frame is the region where UMI is located, and the vertical frame is the position C > G.
The first 4 bases of UMI can be clearly seen from the results of three replicates of sangerIs a pure peak of CTCA using the same N as in example 2 - The information concept is eliminated, and 6 UMI molecular labels can be screened, as shown in FIG. 8.
The 5 th N-information is c and a, further screening of UMI is not helpful, the 6 th N - The information is t, g and a, which is useful information, and 2 UMI molecular tags can be further screened, as shown in fig. 9;
n of the subsequent position - The information can be further clarified, and the sanger result is composed of the two UMI sequences, and the proportion of the two UMI sequences in the PCR product is close to 1 and accounts for 50 percent respectively, which indicates that at least two original DNA molecules with base mutation occur.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Sequence listing
<110> Zell Gene technology (Suzhou) Ltd
<120> method for evaluating base damage, mismatch and variation in sample DNA using one-generation sequencing
<160> 17
<170> PatentIn version 3.3
<210> 1
<211> 23
<212> DNA
<213> (Artificial sequence)
<400> 1
ccctgacaac atagttggaa tca 23
<210> 2
<211> 25
<212> DNA
<213> (Artificial sequence)
<400> 2
actccaggat aatacacatc acagt 25
<210> 3
<211> 28
<212> DNA
<213> (Artificial sequence)
<400> 3
tggaatcact catgatatct cgagccat 28
<210> 4
<211> 17
<212> DNA
<213> (Artificial sequence)
<400> 4
agcagtctct gcctcgc 17
<210> 5
<211> 23
<212> DNA
<213> (Artificial sequence)
<400> 5
agaagattcg gcagaactaa gca 23
<210> 6
<211> 27
<212> DNA
<213> (Artificial sequence)
<400> 6
cctcgccaag cggctcatgt taatatt 27
<210> 7
<211> 24
<212> DNA
<213> (Artificial sequence)
<400> 7
agaagatgtg gaaaagtccc aatg 24
<210> 8
<211> 18
<212> DNA
<213> (Artificial sequence)
<400> 8
gtgcccaggt cagtggat 18
<210> 9
<211> 27
<212> DNA
<213> (Artificial sequence)
<400> 9
tcccaatgga actatccgga acatcca 27
<210> 10
<211> 32
<212> DNA
<213> (Artificial sequence)
<400> 10
tcctttaacc acataattag aatcattctt ga 32
<210> 11
<211> 32
<212> DNA
<213> (Artificial sequence)
<400> 11
agttagtttt cactctttac aagttaaaat ga 32
<210> 12
<211> 30
<212> DNA
<213> (Artificial sequence)
<400> 12
atcattcttg atgtctctgg ctagaccaaa 30
<210> 13
<211> 21
<212> DNA
<213> (Artificial sequence)
<400> 13
tgtaaaacga cggccagtac a 21
<210> 14
<211> 25
<212> DNA
<213> (Artificial sequence)
<400> 14
tggaatcact catgatatcg agcca 25
<210> 15
<211> 20
<212> DNA
<213> (Artificial sequence)
<400> 15
cctcgccaag cctcatgtta 20
<210> 16
<211> 24
<212> DNA
<213> (Artificial sequence)
<400> 16
tcccaatgga actatggaac atcc 24
<210> 17
<211> 28
<212> DNA
<213> (Artificial sequence)
<400> 17
atcattcttg atgtctctgt agaccaaa 28

Claims (2)

1. A method for evaluating base damage, mismatches and variations in sample DNA using one-generation sequencing, comprising the steps of:
s1, adding a nucleic acid composition capable of inhibiting a non-target region in a DNA sample and an amplification primer with an error-correctable molecular tag library, amplifying the DNA sample, and sequencing a product obtained after PCR amplification by adopting a first-generation sequencing technology;
wherein the nucleic acid composition for inhibiting the non-target region in the DNA sample is designed according to the sampling region in the DNA sample, and comprises a forward primer, a reverse primer and a Blocker primer for amplifying the DNA sample; the Blocker primer inhibits the amplification of a region without base damage, mismatching and mutation; the reverse primer is connected with a UNITaq sequence and a UMI sequence;
s2, obtaining sequencing data of the PCR amplification product in the step S1, and analyzing the sequencing data of the product by adopting an evaluation method based on enrichment amplification and an evaluation method based on the type number of the molecular tags to obtain ratio values of base damage, mismatching and variation in the DNA of an evaluation sample;
s3, when the results of the evaluation method based on the enrichment amplification effect and the evaluation method based on the molecular tag species number have credible results, taking the evaluation method based on the molecular tag species number as the ratio value of base damage, mismatching and variation in the DNA of the evaluation sample;
the evaluation method based on enrichment amplification is analyzed through the following steps:
s01, representing the enrichment amplification effect of each sampling region by using an Efold value, wherein the calculation formula is as follows:
Efold=(VRF/VAF) × [(1-VAF)/(1-VRF)],
wherein VAF is the initial proportion of variant information in the sample; VRF is the variation information proportion of the sample in the detection result;
s02, testing the standard substance to obtain an Efold value of each sampling region, calculating a VRF value according to the peak ratio of different bases in sequencing data of a PCR amplification product, and calculating a VAF value according to the following formula when the VRF satisfies 5% < = VRF < = 95%:
VAF = VRF/(Efold-Efold × VRF+VRF),
when VRF does not satisfy 5% < = VRF < =95%, the evaluation method result based on enrichment amplification is not reliable;
the evaluation method based on the molecular tag variety number is characterized by comprising the following steps of:
s001, outputting the UMInum of the type quantity of one molecular label sequence by a DNA sequence identification method based on sequencing data of a PCR amplification product and a known molecular label sequence;
s002, when UMInum < =10, the calculation formula of the ratio Pdm% of base damage, mismatching and mutation is as follows:
Pdm%=UMInum/(Ng×1000×2/6.67)×100%,
wherein Ng = mass of DNA in Ng added in the reaction;
when the UMInum is more than 10, the result of the evaluation method based on the molecular tag variety number is not credible;
before calculating the VRF value or before outputting the parameter UMInum, the method comprises the steps of identifying variation information:
s0001, obtaining a base line value of the sanger sequencing signal
Figure DEST_PATH_IMAGE002
(ii) a The method comprises the following steps:
a) Reading the Sanger AB1 file to obtain the signal value of each fluorescence channel signal sampling in the file
Figure DEST_PATH_IMAGE004
And the number of signal samples per base
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE008
As a fluorescent channel
Figure DEST_PATH_IMAGE010
At the base
Figure DEST_PATH_IMAGE012
Number of samples of the location signal
Figure DEST_PATH_IMAGE014
The maximum value within the region of interest,
Figure 723202DEST_PATH_IMAGE008
the calculation method comprises the following steps:
Figure DEST_PATH_IMAGE016
wherein i can be a positive integer within 0 to 5;
b) For each fluorescence channel in all
Figure DEST_PATH_IMAGE018
The maximum value of one base position is
Figure DEST_PATH_IMAGE020
Removing base from the first generation of sequencing to identify as fluorescent channel
Figure DEST_PATH_IMAGE022
Corresponding to
Figure DEST_PATH_IMAGE024
Maximum of one base, obtaining a new maximumA set of values:
Figure DEST_PATH_IMAGE026
c) Computing
Figure DEST_PATH_IMAGE028
Removing the difference from the median value exceeding the mean absolute deviation
Figure DEST_PATH_IMAGE030
The value of the factor (x) is,
Figure 496467DEST_PATH_IMAGE030
the value can be 2 to 5, and the average value of the remaining maximum values is calculated
Figure DEST_PATH_IMAGE032
As a fluorescent channel
Figure 259893DEST_PATH_IMAGE010
A background noise baseline of (a);
d) Subtracting the background noise value of the corresponding fluorescence channel from the signal value of all fluorescence channel signal samples to obtain
Figure DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE036
S0002, searching a regional signal peak value according to the signal change of each fluorescence channel:
traversing the peak value of the fluorescence channel, and when only any channel in a region with the width of one base has the peak value, the region has one base, and the type of the base is the base type corresponding to the channel with the peak value; when one base is wide
Figure DEST_PATH_IMAGE038
When there is a peak in a plurality of channels in the region of (2), then the region of (2) isThe region may have a plurality of bases, the base type corresponding to the channel with the highest peak value is the main base of the region, the peak values of other channels are based on the proportion of the peak value data in the peak value of the main base channel, when the proportion is higher than a threshold value, the base type corresponding to the channel is an alternative base type of the region, otherwise, the alternative base type does not exist; obtaining a candidate base sequence A consisting of main bases and alternative bases, and labeling alternative base types at positions where the alternative bases exist;
wherein the one base is wide
Figure 812359DEST_PATH_IMAGE038
The area of (a) is defined as: if Sanger AB1 contains
Figure DEST_PATH_IMAGE040
One base, then base
Figure 984321DEST_PATH_IMAGE012
At a signal sampling number of
Figure 603521DEST_PATH_IMAGE006
The number of samples of the signal at which the previous base is located is
Figure DEST_PATH_IMAGE042
The number of samples of the signal at which the latter base is located is
Figure DEST_PATH_IMAGE044
Then base
Figure 152445DEST_PATH_IMAGE012
Starting position of the base width region of
Figure DEST_PATH_IMAGE046
The following formula is obtained:
Figure DEST_PATH_IMAGE048
base
Figure 679242DEST_PATH_IMAGE012
The base width region of (3) terminating position
Figure DEST_PATH_IMAGE050
The following formula is obtained:
Figure DEST_PATH_IMAGE052
wherein the presence of a peak within the region of one base width is defined as: fluorescence channel pair Using find _ peaks algorithm of Scipy
Figure 158240DEST_PATH_IMAGE010
In that
Figure DEST_PATH_IMAGE054
Background noise removed signal values of regions
Figure DEST_PATH_IMAGE056
Calculating a peak value of the region;
s0003, obtaining a candidate base sequence B coded by IUPAC according to a first generation sequencing result:
the candidate base sequence B represents the full-length sequence of the PCR product and comprises a candidate base sequence B1, a candidate base sequence B2 and a candidate base sequence B3, wherein the candidate base sequence B1 is the sequence of a molecular tag library position, the candidate base sequence B2 is the sequence of a sample DNA sampling region, and the candidate base sequence B3 is other sequences except the sequence of the molecular tag library position and the sequence of the sample DNA sampling region; combining the main base and the alternative base in the candidate base sequence A by using IUPAC base coding rules to obtain a candidate base sequence B coded by IUPAC;
s0004, identifying variation information in the first-generation sequencing result:
identifying the information that the candidate base sequence B is different from the known reference sequence R by using a para-position information calculation method;
the calculation method of the para-position information is to compare the candidate base sequence B coded by IUPAC with the known reference sequence R by using a sequence comparison algorithm and an IUPAC code comparison fraction table; selecting the result with the highest comparison score as the alignment result of the candidate base sequence B and the known sequence R to obtain the alignment information of the candidate base sequence B and the known reference sequence R;
using a para-position information calculation method to obtain para-position information of candidate base sequences B2 and B3 and a known reference sequence R, and aligning the two sequences; scanning the aligned candidate base sequences B2 and B3 and the known reference sequence R to obtain base information which is different from the known reference sequence R in the IUPAC sequence and is variation information;
wherein, define
Figure DEST_PATH_IMAGE058
A base position, a reference base
Figure DEST_PATH_IMAGE060
Is the base information in the known reference sequence R, and the base different from the known reference sequence R is
Figure DEST_PATH_IMAGE062
(ii) a Bases at a specific position in candidate base sequences B2 and B3
Figure 105599DEST_PATH_IMAGE058
From a reference base
Figure 279835DEST_PATH_IMAGE060
And representing information of injury, mismatching or variation
Figure 954530DEST_PATH_IMAGE062
Composition is carried out;
the VRF value is calculated by the following formula:
Figure DEST_PATH_IMAGE064
wherein
Figure DEST_PATH_IMAGE066
Is a basic group
Figure 697489DEST_PATH_IMAGE062
The peak fluorescence signal of (a) is,
Figure DEST_PATH_IMAGE068
is that
Figure 111021DEST_PATH_IMAGE058
The sum of the peak fluorescence signals of the medium bases;
the type number UMInum of the molecular tag sequence is obtained by the following method:
taking the adjacent amplification primers of B1 as known reference sequences, and using a para-position information calculation method for the candidate base sequence B to obtain the para-position information of the candidate base sequence B and the amplification primers, and aligning the two sequences; obtaining a candidate base sequence B1 from the aligned sequence according to the known length information of the B1 sequence;
extracting N at each position of the candidate base sequence B1 - Information as a characteristic value, said N - The information means
Figure 24751DEST_PATH_IMAGE058
N of the candidate base sequence B1 is a base type not included - The collection of information is defined as
Figure DEST_PATH_IMAGE070
Defining each known sequence in the tag sequence library as
Figure DEST_PATH_IMAGE072
Will be
Figure 578835DEST_PATH_IMAGE072
For each position of
Figure 19044DEST_PATH_IMAGE070
Information exclusion, tag sequence library
Figure 183571DEST_PATH_IMAGE072
The number of the remaining molecular labels in the sequence is UMInum.
2. The method of claim 1, wherein the threshold is 33%.
CN202111620536.XA 2020-12-29 2021-12-27 Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing Active CN114150047B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011604056X 2020-12-29
CN202011604056.XA CN113005188A (en) 2020-12-29 2020-12-29 Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing

Publications (2)

Publication Number Publication Date
CN114150047A CN114150047A (en) 2022-03-08
CN114150047B true CN114150047B (en) 2022-11-08

Family

ID=76383784

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202011604056.XA Pending CN113005188A (en) 2020-12-29 2020-12-29 Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing
CN202111620536.XA Active CN114150047B (en) 2020-12-29 2021-12-27 Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202011604056.XA Pending CN113005188A (en) 2020-12-29 2020-12-29 Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing

Country Status (1)

Country Link
CN (2) CN113005188A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116949223B (en) * 2023-09-19 2023-12-29 广东凯普生物科技股份有限公司 Hepatitis B virus drug administration guidance system and application thereof

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4103315B2 (en) * 2000-08-08 2008-06-18 株式会社日立製作所 Nucleobase sequencing apparatus and inspection system
JP4209623B2 (en) * 2002-03-19 2009-01-14 株式会社日立ハイテクノロジーズ Nucleotide sequencing method
US20090137402A1 (en) * 2006-10-11 2009-05-28 San Ming Wang Ditag genome scanning technology
CA2684217C (en) * 2007-04-13 2016-12-13 Sequenom, Inc. Comparative sequence analysis processes and systems
EP2430441B1 (en) * 2009-04-29 2018-06-13 Complete Genomics, Inc. Method and system for calling variations in a sample polynucleotide sequence with respect to a reference polynucleotide sequence
US20140066317A1 (en) * 2012-09-04 2014-03-06 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
WO2015015585A1 (en) * 2013-07-31 2015-02-05 株式会社日立製作所 Gene-mutation analysis device, gene-mutation analysis system, and gene-mutation analysis method
CN106202991B (en) * 2016-06-30 2019-03-08 厦门艾德生物医药科技股份有限公司 The detection method of abrupt information in product is sequenced in a kind of genome multiplex amplification
CN106367485B (en) * 2016-08-29 2019-04-26 厦门艾德生物医药科技股份有限公司 Double label connector groups of a kind of more positioning for detecting gene mutation and its preparation method and application
CN106282356B (en) * 2016-08-30 2019-11-26 天津诺禾医学检验所有限公司 A kind of method and device based on amplification second filial sequencing point mutation detection
CN106381334B (en) * 2016-09-14 2020-02-18 上海思路迪医学检验所有限公司 Quality control method and kit for detecting human BRCA1/2 gene variation based on high-throughput sequencing
US11732257B2 (en) * 2017-10-23 2023-08-22 Massachusetts Institute Of Technology Single cell sequencing libraries of genomic transcript regions of interest in proximity to barcodes, and genotyping of said libraries
JPWO2019132010A1 (en) * 2017-12-28 2021-01-21 タカラバイオ株式会社 Methods, devices and programs for estimating base species in a base sequence
CN111683958A (en) * 2018-02-20 2020-09-18 威廉马歇莱思大学 Systems and methods for allele enrichment using multiple suppression probe displacement amplification
EP3899955A1 (en) * 2018-12-19 2021-10-27 Grail, Inc. Cancer tissue source of origin prediction with multi-tier analysis of small variants in cell-free dna samples
CN109439729A (en) * 2018-12-27 2019-03-08 上海鲸舟基因科技有限公司 Detect connector, connector mixture and the correlation method of low frequency variation
EP3935185A1 (en) * 2019-03-04 2022-01-12 King Abdullah University Of Science And Technology Compositions and methods of labeling nucleic acids and sequencing and analysis thereof
CN112553306A (en) * 2020-12-28 2021-03-26 北京思尔成生物技术有限公司 Fusion gene nucleic acid detection method based on combination of capillary electrophoresis fragment analysis and first-generation sequencing

Also Published As

Publication number Publication date
CN113005188A (en) 2021-06-22
CN114150047A (en) 2022-03-08

Similar Documents

Publication Publication Date Title
Chiarello et al. Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold
Frøslev et al. Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates
EP2926288B1 (en) Accurate and fast mapping of targeted sequencing reads
CN107002121A (en) Method and system for analyzing nucleic acid sequencing data
CN113160882A (en) Pathogenic microorganism metagenome detection method based on third generation sequencing
CN108292327A (en) The method of detection copy number variation in next generation&#39;s sequencing
CN115927647B (en) SNP genetic marker related to egg heavy curve slope and application thereof
CN114150047B (en) Method for evaluating base damage, mismatching and variation in sample DNA by one-generation sequencing
CN113249453B (en) Method for detecting copy number change
Alkhateeb et al. Zseq: an approach for preprocessing next-generation sequencing data
CN109920480B (en) Method and device for correcting high-throughput sequencing data
CN108595912A (en) Detect the method, apparatus and system of chromosomal aneuploidy
Duffy et al. Evidentiary evaluation of single cells renders highly informative forensic comparisons across multifarious admixtures
Foster et al. A multi-gene region targeted capture approach to detect plant DNA in environmental samples: A case study from coastal environments
Nikodemova et al. The effect of low-abundance OTU filtering methods on the reliability and variability of microbial composition assessed by 16S rRNA amplicon sequencing
US11475980B2 (en) Methods of analyzing massively parallel sequencing data
Peirce et al. Genome Reshuffling for Advanced Intercross Permutation (GRAIP): simulation and permutation for advanced intercross population analysis
Perera et al. Genetic diversity and population structure of Saccharum hybrids
Lourenço et al. M-regression, false discovery rates and outlier detection with application to genetic association studies
Guha et al. Bayesian hidden Markov modeling of array CGH data
CN113284558B (en) Method for distinguishing gene expression difference and long copy number variation in RNA sequencing data
US7695901B2 (en) Identification of poinsettia cultivars
EP3195169B1 (en) Methods of analyzing massively parallel sequencing data
CN112513292A (en) Method and device for detecting homologous sequence based on high-throughput sequencing
KR102377422B1 (en) Method for identifying face shape using SNP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant