CN111667882B - Sequencing fuzzy sequence information comparison method - Google Patents
Sequencing fuzzy sequence information comparison method Download PDFInfo
- Publication number
- CN111667882B CN111667882B CN202010525168.XA CN202010525168A CN111667882B CN 111667882 B CN111667882 B CN 111667882B CN 202010525168 A CN202010525168 A CN 202010525168A CN 111667882 B CN111667882 B CN 111667882B
- Authority
- CN
- China
- Prior art keywords
- sequencing
- reaction
- sequence information
- fuzzy
- nucleotide
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 205
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000006243 chemical reaction Methods 0.000 claims abstract description 118
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 114
- 239000002773 nucleotide Substances 0.000 claims abstract description 112
- 239000012295 chemical reaction liquid Substances 0.000 claims abstract description 74
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 30
- 239000012634 fragment Substances 0.000 claims abstract description 15
- 239000000758 substrate Substances 0.000 claims description 75
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 42
- 230000000295 complement effect Effects 0.000 claims description 31
- 108090000790 Enzymes Proteins 0.000 claims description 10
- 102000004190 Enzymes Human genes 0.000 claims description 10
- 108090000623 proteins and genes Proteins 0.000 claims description 9
- 229920000137 polyphosphoric acid Polymers 0.000 claims description 8
- 229920000388 Polyphosphate Polymers 0.000 claims description 7
- 239000001205 polyphosphate Substances 0.000 claims description 7
- 235000011176 polyphosphates Nutrition 0.000 claims description 7
- 229910019142 PO4 Inorganic materials 0.000 claims description 5
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 claims description 5
- 239000010452 phosphate Substances 0.000 claims description 5
- 230000035772 mutation Effects 0.000 abstract description 7
- 108020004707 nucleic acids Proteins 0.000 abstract description 5
- 102000039446 nucleic acids Human genes 0.000 abstract description 5
- 239000000243 solution Substances 0.000 description 64
- 239000003153 chemical reaction reagent Substances 0.000 description 18
- 108020004414 DNA Proteins 0.000 description 9
- 239000003086 colorant Substances 0.000 description 9
- 230000007614 genetic variation Effects 0.000 description 9
- 238000012165 high-throughput sequencing Methods 0.000 description 6
- 239000012452 mother liquor Substances 0.000 description 6
- 238000005406 washing Methods 0.000 description 6
- 239000012530 fluid Substances 0.000 description 5
- CSNNHWWHGAXBCP-UHFFFAOYSA-L Magnesium sulfate Chemical compound [Mg+2].[O-][S+2]([O-])([O-])[O-] CSNNHWWHGAXBCP-UHFFFAOYSA-L 0.000 description 4
- 238000009396 hybridization Methods 0.000 description 3
- 239000007788 liquid Substances 0.000 description 3
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 2
- 101000651036 Arabidopsis thaliana Galactolipid galactosyltransferase SFR2, chloroplastic Proteins 0.000 description 2
- 241000588724 Escherichia coli Species 0.000 description 2
- BFNBIHQBYMNNAN-UHFFFAOYSA-N ammonium sulfate Chemical compound N.N.OS(O)(=O)=O BFNBIHQBYMNNAN-UHFFFAOYSA-N 0.000 description 2
- 229910052921 ammonium sulfate Inorganic materials 0.000 description 2
- 235000011130 ammonium sulphate Nutrition 0.000 description 2
- 238000003759 clinical diagnosis Methods 0.000 description 2
- 238000001816 cooling Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 239000006210 lotion Substances 0.000 description 2
- 229910052943 magnesium sulfate Inorganic materials 0.000 description 2
- 235000019341 magnesium sulphate Nutrition 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000010413 mother solution Substances 0.000 description 2
- 238000007481 next generation sequencing Methods 0.000 description 2
- 239000000344 soap Substances 0.000 description 2
- MTJGVAJYTOXFJH-UHFFFAOYSA-N 3-aminonaphthalene-1,5-disulfonic acid Chemical compound C1=CC=C(S(O)(=O)=O)C2=CC(N)=CC(S(O)(=O)=O)=C21 MTJGVAJYTOXFJH-UHFFFAOYSA-N 0.000 description 1
- DOBIZWYVJFIYOV-UHFFFAOYSA-N 7-hydroxynaphthalene-1,3-disulfonic acid Chemical compound C1=C(S(O)(=O)=O)C=C(S(O)(=O)=O)C2=CC(O)=CC=C21 DOBIZWYVJFIYOV-UHFFFAOYSA-N 0.000 description 1
- UBDHSURDYAETAL-UHFFFAOYSA-N 8-aminonaphthalene-1,3,6-trisulfonic acid Chemical compound OS(=O)(=O)C1=CC(S(O)(=O)=O)=C2C(N)=CC(S(O)(=O)=O)=CC2=C1 UBDHSURDYAETAL-UHFFFAOYSA-N 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 230000007067 DNA methylation Effects 0.000 description 1
- WGZDBVOTUVNQFP-UHFFFAOYSA-N N-(1-phthalazinylamino)carbamic acid ethyl ester Chemical compound C1=CC=C2C(NNC(=O)OCC)=NN=CC2=C1 WGZDBVOTUVNQFP-UHFFFAOYSA-N 0.000 description 1
- 101100271190 Plasmodium falciparum (isolate 3D7) ATAT gene Proteins 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000002073 fluorescence micrograph Methods 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003203 nucleic acid sequencing method Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a method for comparing sequencing fuzzy sequence information, which comprises the following steps: fixing the nucleotide fragment to be detected, and obtaining fuzzy sequence information through a sequencing reaction; comparing the fuzzy sequence information with a reference genome; at the same time, the mutation can be identified. The method provided by the invention does not need complete nucleic acid base sequence, and can compare and find variation only through fuzzy information obtained by sequencing the multi-base reaction liquid, thereby saving the cost of sequencing, accelerating the comparison speed and reducing the cost.
Description
Technical Field
The invention relates to a method and a system for comparing sequencing fuzzy sequence information, belonging to the field of gene sequencing.
Background
High-throughput sequencing technology, also known as next generation sequencing technology (NGS), is a new class of sequencing technology developed in recent years. High throughput sequencing technology is a revolutionary change to traditional sequencing technology, with simultaneous sequencing of tens of thousands to millions of nucleic acid molecules. High throughput sequencing can produce large amounts of data. The processing and utilization of data is an important component of high throughput sequencing.
The high-throughput sequencing technology can find out genetic variation and provide basis for clinical diagnosis, screening and the like. Genetic variations include Single Nucleotide Variations (SNV), copy Number Variations (CNV), fold-over-chromosome variations, DNA-modified variations (e.g., DNA methylation), and the like. Clinical diagnosis requires the ability to rapidly and accurately detect genetic variation at a low cost. However, the existing genetic variation detection methods based on the high-throughput sequencing technology all need to obtain the complete DNA sequence first and then find the variation, so that the time and price cost are increased. The invention provides a fuzzy comparison method, which can utilize fuzzy nucleic acid sequences to rapidly perform comparison and search for variation.
Disclosure of Invention
The present invention provides a method for obtaining partial information of a DNA sequence, aligning the partial information to a reference genome, and using the partial information to find/identify genetic variations.
The invention provides a method for comparing sequencing fuzzy sequence information, which is characterized in that,
Fixing the nucleotide fragment to be detected, and obtaining fuzzy sequence information through a sequencing reaction;
comparing the fuzzy sequence information with a reference nucleic acid sequence;
Wherein, the reaction liquid of the sequencing reaction contains nucleotide substrate molecules with two different bases;
sequencing refers to sequencing by utilizing a nucleotide substrate molecule of which the 5' -end is modified with a fluorophore with fluorescence switching property on polyphosphoric acid;
the fluorescence switching property means that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction;
The sequencing reaction is a sequencing method with an unclosed 3 end;
The alignment of the ambiguous sequence information with the reference nucleic acid sequence refers to encoding the ambiguous sequence information with the reference nucleic acid sequence in the same manner and then aligning.
The invention provides a method for comparing fuzzy sequence information by sequencing, which is characterized in that,
Fixing the nucleotide fragment to be detected, and obtaining fuzzy sequence information through a sequencing reaction;
comparing the fuzzy sequence information with a reference nucleic acid sequence;
Wherein, the reaction liquid of the sequencing reaction contains nucleotide substrate molecules with three different bases;
sequencing refers to sequencing by utilizing a nucleotide substrate molecule of which the 5' -end is modified with a fluorophore with fluorescence switching property on polyphosphoric acid;
the fluorescence switching property means that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction;
The sequencing reaction is a sequencing method with an unclosed 3 end;
The alignment of the ambiguous sequence information with the reference nucleic acid sequence refers to encoding the ambiguous sequence information with the reference nucleic acid sequence in the same manner and then aligning.
According to a preferred embodiment, one set of reaction solutions is used for each sequencing, each set comprising two or more reaction solutions, each reaction solution comprising nucleotide substrate molecules of at least two different bases.
According to a preferred embodiment, the ambiguous sequence information is a combination of degenerate sequence information and non-degenerate sequence information.
According to a preferred embodiment, the ambiguous sequence information obtained by sequencing is encoded into one of its possible base sequence information.
According to a preferred embodiment, all of the ambiguous sequence information obtained by sequencing is encoded as numbers.
According to a preferred embodiment, the ambiguous sequence information is encoded simultaneously or sequentially with the reference nucleic acid sequence.
According to a preferred embodiment, the 5 '-terminal polyphosphoric acid-modified fluorophore having fluorescence switching properties is a 5' -terminal polyphosphoric acid-modified fluorophore.
According to a preferred embodiment, the sequencing is performed using a nucleotide substrate molecule modified with a fluorophore having a fluorescence switching property at the 5' polyphosphate end or the middle phosphate;
The fluorescence switching property means that the fluorescence signal intensity is obviously increased after sequencing compared with that before sequencing reaction;
Each set of reaction liquid group is used for sequencing, each set of reaction liquid group comprises two reaction liquids, and each reaction liquid contains nucleotide substrate molecules with two different bases;
The nucleotide substrate molecules in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide substrate molecules in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected;
Firstly, fixing a nucleotide sequence fragment to be detected in a reaction chamber, and then introducing one reaction solution in a set of reaction solution groups;
Releasing the fluorophore on the nucleotide substrate having the fluorophore with fluorescence switching properties using an enzyme, thereby resulting in fluorescence switching;
then introducing a second reaction liquid in the same set of reaction liquid groups;
Releasing the fluorophore on the nucleotide substrate of the fluorophore having fluorescence switching properties using an enzyme, thereby causing fluorescence switching;
and (3) circularly adding the two reaction solutions, and obtaining fuzzy coding information of the nucleotide substrate to be detected through fluorescence information.
The invention provides a system for comparing fuzzy sequence information obtained by sequencing, which comprises a computing system and is characterized in that,
Using the method of any one of the preceding claims; comparing the fuzzy sequence information obtained by sequencing with a reference nucleic acid sequence.
The invention provides a method for comparing and identifying mutation by fuzzy sequence information obtained by sequencing, which comprises the following steps: fixing the nucleotide fragment to be detected, and obtaining fuzzy sequence information through a sequencing reaction; comparing the fuzzy sequence information with a reference genome; wherein the reaction liquid of the sequencing reaction contains nucleotide substrate molecules with two or more different bases.
The reaction liquid of the sequencing reaction comprises two or more nucleotide substrate molecules with different bases. When it is subjected to a sequencing reaction, sequence information corresponding to the nucleotide substrate molecules in the sequencing reaction solution is obtained each time. The information may contain two or more kinds of base number information, and is not specific sequence information but ambiguous sequence information.
According to a preferred embodiment of the present invention, the sequencing is performed using 5' -terminal polyphosphoric acid modified with a fluorophore having fluorescence switching properties; the fluorescence switching property refers to that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction.
According to a preferred embodiment of the invention, the sequencing is a sequencing-by-side method.
According to a preferred embodiment of the invention, it further comprises encoding the ambiguous sequence information and the reference genome in the same way and then aligning.
According to a preferred embodiment of the invention, it further comprises encoding the ambiguous sequence information or the reference genome and then aligning. In the coding process, the change of the base arrangement order may be involved, and other letters or symbols can be used instead, so that the same form is adopted and the alignment is facilitated.
According to a preferred embodiment of the invention, it further comprises encoding the reference genome, altering its sequence information and then aligning with the ambiguous sequence information.
According to a preferred embodiment of the invention, the reference genome is encoded, its sequence information is modified, and then aligned with the encoding of the ambiguous sequence information.
According to a preferred embodiment of the present invention, the ambiguous sequence information refers to complete base sequence information from which a nucleotide sequence cannot be derived.
According to a preferred embodiment of the present invention, the complete base sequence information refers to nucleic acid sequence information encoded by A, G, T, C or nucleic acid sequence information encoded by A, G, U, C can be obtained; wherein the base may be a methylated base.
According to a preferred embodiment of the invention, the ambiguous sequence information may be a degenerate sequence represented using M, K, R, Y, W, S, B, D, H, V letters.
According to a preferred embodiment of the present invention, the ambiguous sequence information may be a combination of degenerate sequence information and non-degenerate sequence information.
According to a preferred embodiment of the present invention, the method further comprises encoding a reference genome and then comparing the encoding of the ambiguous sequence information with the encoding of the reference genome
According to a preferred embodiment of the present invention, the encoding of the ambiguous sequence information and the encoding of the reference genome result in the same representation.
According to a preferred embodiment of the invention, the sequencing is a 3-terminal unblocked sequencing method.
According to a preferred embodiment of the present invention, the reaction solution used for sequencing comprises nucleotide substrate molecules of two or more different bases.
According to a preferred embodiment of the present invention, nucleotide substrate molecules of two or more different bases in a reaction solution used for sequencing are labeled with the same or different fluorescent molecules.
According to a preferred embodiment of the present invention, the reaction solution used for sequencing is a set of reaction solutions, each set of reaction solutions containing two or more reaction solutions.
According to a preferred embodiment of the present invention, the sequencing reaction fluid is a set of reaction fluid sets, each set of reaction fluid sets comprising two reaction fluids, each reaction fluid comprising nucleotides of two different bases; the nucleotide in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected.
According to a preferred embodiment of the invention, the encoded ambiguous sequence information is aligned to the encoded reference genome using the Smith-Waterman algorithm, bowtie, BWA or SOAP.
According to a preferred embodiment of the invention, the mutated gene is found from the result of the alignment using a common method of finding genetic mutations, preferably one or more of mutect, strelka, control-freec, cns-seq.
According to a preferred embodiment of the present invention, the ambiguous sequence information obtained by sequencing is encoded into one of its possible base sequence information.
According to a preferred embodiment of the invention, all ambiguous sequence information in the ambiguous sequence information obtained by sequencing is encoded into numbers.
According to a preferred embodiment of the invention, the coding of the ambiguous sequence information and the coding order of the reference genome are exchangeable.
According to a preferred embodiment of the present invention, the sequencing is performed using 5' -terminal polyphosphoric acid modified with a fluorophore having fluorescence switching properties; the fluorescence switching property refers to that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction.
According to a preferred embodiment of the invention, the fluorescence switching property means that after each sequencing reaction, the fluorescence signal is significantly increased or significantly decreased or the emitted light frequency range is significantly changed compared to before the sequencing reaction.
According to a preferred embodiment of the present invention, the 5 '-terminal polyphosphoric acid-modified fluorescent group-modified nucleotide substrate molecule refers to a 5' -terminal polyphosphoric acid-modified fluorescent group-modified nucleotide substrate molecule.
According to a preferred embodiment of the present invention, the sequencing is performed using a nucleotide substrate molecule modified with a fluorophore having a fluorescence switching property at the 5' polyphosphate end or the middle phosphate; the fluorescence switching property means that the fluorescence signal intensity is obviously increased after sequencing compared with that before sequencing reaction; each set of reaction liquid group is used for sequencing, each set of reaction liquid group comprises two reaction liquids, and each reaction liquid contains nucleotide substrate molecules with two different bases; the nucleotide substrate molecules in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide substrate molecules in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected; firstly, fixing a nucleotide sequence fragment to be detected in a reaction chamber, and then introducing one reaction solution in a set of reaction solution groups; releasing the fluorophore on the nucleotide substrate having the fluorophore with fluorescence switching properties using an enzyme, thereby resulting in fluorescence switching; then introducing a second reaction liquid in the same set of reaction liquid groups; releasing the fluorophore on the nucleotide substrate of the fluorophore having fluorescence switching properties using an enzyme, thereby causing fluorescence switching; and (3) circularly adding the two reaction solutions, and obtaining fuzzy coding information of the nucleotide substrate to be detected through fluorescence information.
The invention provides a sequencing reagent, which is characterized in that a nucleotide fragment to be detected is fixed, and fuzzy sequence information is obtained through the reaction of the sequencing reagent and the fixed nucleotide fragment; wherein the reaction liquid of the sequencing reaction contains nucleotide substrate molecules with two or more different bases.
According to a preferred embodiment of the present invention, sequencing is performed using a nucleotide substrate molecule sequencing reagent modified with a fluorophore having fluorescence switching properties at the 5' end of the polyphosphate; the fluorescence switching property refers to that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction.
According to a preferred embodiment of the present invention, the nucleotide substrate molecules of two or more different bases in the reaction reagent are labeled with the same or different fluorescent molecules.
According to a preferred embodiment of the present invention, the reaction reagent is a set of reaction solutions, each set of reaction solutions containing two or more reaction solutions.
According to a preferred embodiment of the present invention, the sequencing reagent is a set of reaction solutions, each set of reaction solutions comprising two reaction solutions, each reaction solution comprising nucleotides of two different bases; the nucleotide in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected.
According to a preferred embodiment of the present invention, the sequencing is performed using a nucleotide substrate molecule modified with a fluorophore having a fluorescence switching property at the 5' polyphosphate end or the middle phosphate; the fluorescence switching property means that the fluorescence signal intensity is obviously increased after sequencing compared with that before sequencing reaction; each set of reaction liquid group is used for sequencing, each set of reaction liquid group comprises two reaction liquids, and each reaction liquid contains nucleotide substrate molecules with two different bases; the nucleotide substrate molecules in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide substrate molecules in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected; firstly, fixing a nucleotide sequence fragment to be detected, and introducing one reaction solution in a set of reaction solution sets; releasing the fluorophore on the nucleotide substrate having the fluorophore with fluorescence switching properties using an enzyme, thereby resulting in fluorescence switching; then introducing a second reaction liquid in the same set of reaction liquid groups; releasing the fluorophore on the nucleotide substrate having the fluorophore with fluorescence switching properties using an enzyme, thereby resulting in fluorescence switching; and (3) circularly adding the two reaction solutions, and obtaining fuzzy coding information of the nucleotide substrate to be detected through fluorescence information.
The invention provides a nucleic acid sequencing method for obtaining fuzzy nucleic acid coding information, which is characterized in that a nucleotide fragment to be detected is fixed, and a sequencing reagent reacts with the fixed nucleotide fragment to obtain fuzzy sequence information; wherein the reaction liquid of the sequencing reaction contains nucleotide substrate molecules with two or more different bases.
According to a preferred embodiment of the present invention, sequencing is performed using a nucleotide substrate molecule sequencing reagent modified with a fluorophore having fluorescence switching properties at the 5' end of the polyphosphate;
The fluorescence switching property refers to that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction.
According to a preferred embodiment of the present invention, the nucleotide substrate molecules of two or more different bases in the reaction reagent are labeled with the same or different fluorescent molecules.
According to a preferred embodiment of the present invention, the reaction reagent is a set of reaction solutions, each set of reaction solutions containing two or more reaction solutions.
According to a preferred embodiment of the present invention, the sequencing reagent is a set of reaction solutions, each set of reaction solutions comprising two reaction solutions, each reaction solution comprising nucleotides of two different bases; the nucleotide in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected.
According to a preferred embodiment of the present invention, the sequencing is performed using a nucleotide substrate molecule modified with a fluorophore having a fluorescence switching property at the 5' polyphosphate end or the middle phosphate; the fluorescence switching property means that the fluorescence signal intensity is obviously increased after sequencing compared with that before sequencing reaction; each set of reaction liquid group is used for sequencing, each set of reaction liquid group comprises two reaction liquids, and each reaction liquid contains nucleotide substrate molecules with two different bases; the nucleotide substrate molecules in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide substrate molecules in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected; firstly, fixing a nucleotide sequence fragment to be detected, and introducing one reaction solution in a set of reaction solution sets; releasing the fluorophore on the nucleotide substrate having the fluorophore with fluorescence switching properties using an enzyme, thereby resulting in fluorescence switching; then introducing a second reaction liquid in the same set of reaction liquid groups; releasing the fluorophore on the nucleotide substrate having the fluorophore with fluorescence switching properties using an enzyme, thereby resulting in fluorescence switching; and (3) circularly adding the two reaction solutions, and obtaining fuzzy coding information of the nucleotide substrate to be detected through fluorescence information.
The invention provides a system for comparing and identifying mutation of fuzzy sequence information obtained by sequencing, which comprises a computing system and is used for comparing and/or identifying mutation by utilizing the fuzzy sequence information obtained by sequencing.
The ambiguous sequencing information refers to base sequence information that cannot be determined from the nucleotide sequence derived from the sequence information. Ambiguous base sequences are a common concept in the scientific field, such as the use of the letter W for the bases A and/or T. There are also relevant definitions on WIKIPEDIA (https:// en. WIKIPEDIA. Org/wiki/nucleotidide).
Fuzzy coding means that different DNA sequences may have identical coding results. Conversely, the same encoding result may have multiple different sources.
Ambiguous information encoding refers to manipulation of DNA sequences, which may have identical results. Encoding a reference genome refers to manipulation of the reference genome sequence, and locally different reference genomes may have identical manipulation results. Ambiguous information encoding refers to a simple rearrangement of the sequence locally ignoring the actual sequence order, according to its corresponding base. Sequence part refers to a region on a sequence corresponding to one sequencing reaction (one sequencing consists of a plurality of sequencing reactions).
The 2+2 sequencing method of the invention refers to that each round of sequencing uses a set of reaction liquid groups, each set of reaction liquid groups comprises two reaction liquids, and each reaction liquid comprises nucleotide substrate molecules with two different bases; the nucleotide substrate molecules in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide substrate molecules in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected. For example, one set of reaction solutions contains two reaction solutions, the first containing substrate molecules of A and T and the second containing substrate molecules of G and C. The 2+2 sequencing method can be named by the nucleotide molecular composition in the two reaction solutions. For example, if one set of reaction solutions contains two reaction solutions, the first containing a and T substrate molecules (collectively, W) and the second containing G and C substrate molecules (collectively, S), 2+2 sequencing using the set of reaction solutions is called WS sequencing. 2+2 sequencing shares the sequencing methods of three combinations of MK, RY and WS, each of which can be further divided into single-color and two-color sequencing.
The 1+3 sequencing in the invention means that one set of reaction liquid is used for each round of sequencing, each set of reaction liquid comprises two reaction liquids, wherein a nucleotide substrate molecule in one reaction liquid can be in complementary reaction with one base on a nucleotide sequence to be detected, and a nucleotide substrate molecule in the other reaction liquid can be in complementary reaction with other three bases on the nucleotide sequence to be detected. For example, one set of reaction solutions contains two reaction solutions, the first containing a substrate molecule of a and the other containing G, C and T substrate molecules.
The method provided by the invention has the following advantages: the 2+2 or 1+3 sequencing is performed only once, and repeated 2+2 or 1+3 sequencing for the same DNA sequence is not required. The nucleotide substrates in the reaction liquid used for each round of sequencing can be marked with the same fluorescent groups, or can be respectively marked with different fluorescent groups. The invention can encode the sequencing result and the reference genome at the same time. The characteristic of the coding is that if the theoretical sequencing signals of the two DNA sequences are identical, the coding results are identical. The invention uses general sequence comparison and identification method to compare the coded sequencing result to the coded reference genome and identify the genetic variation. The method provided by the invention requires discarding the first and last substrings of each sequence in the encoding of bicolor 2+2 sequencing information. The invention provides application of 2+2 or 1+3 fuzzy sequencing information for the first time.
All terms used in the present invention are intended to be given their ordinary meanings in the field of gene sequencing, unless otherwise indicated.
Detailed Description
The compounds, sequencing steps, alignment methods, etc. described in the disclosure are merely further illustrative of the invention, and the terms used are merely used to describe specific forms and are not limiting factors of the invention.
The basic steps of the invention are as follows:
1. the DNA samples were subjected to one round of 2+2 or 1+3 sequencing.
2. The sequencing result and the reference genome are encoded in the same way. The feature of the coding is that if the theoretical sequencing signals of the two DNA sequences are identical, the coding results are identical (even if the two sequences themselves are different). The result of the encoding is one or more strings (or sequences).
3. The encoded sequencing results are aligned to the encoded reference genome using commonly used sequence alignment methods (e.g., smith-Waterman algorithm, bowtie, BWA, SOAP, etc.).
4. Gene variation was found from the alignment of step 3 using commonly used methods for gene variation discovery (e.g., mutect, strelka, control-freec, cns-seq, GATK, etc.).
5. According to the coding method in step 2, the genetic variation found in step 4 was interpreted.
Theoretical sequencing signals refer to signals that should be theoretically sequenced in ideal situations without taking into account anomalies such as sequencing errors, signal attenuation, and dyssynchrony of DNA molecules. Theoretical sequencing signals directly reflect the base composition of the DNA sequence.
The above coding method may or may not satisfy the following "coding and reverse complement exchangeable" properties: the result obtained in either case is the same, either the DNA sequence is encoded first, followed by the reverse complement, or the DNA sequence is encoded first, followed by the reverse complement, followed by the encoding. For example, single MK sequencing of a DNA sequence, the coding scheme is defined as: all measured M was rewritten as A and all measured K was rewritten as T. Then:
it can be seen that this coding is consistent with the "code and reverse complement exchangeable" nature. However, if the coding scheme is defined as: all measured M was rewritten as A and all measured K was rewritten as C.
Then:
that is not in accordance with the "code and reverse complement exchangeable" nature.
If the selected coding mode does not meet the property of 'coding and reverse complementation exchange', the reference genome and the reverse complementation sequence thereof are required to be coded simultaneously in the step 2, and the (coded) sequencing result of each DNA molecule is compared with the coding result of the reference genome and the reverse complementation sequence thereof in the step 3, and a better comparison result is selected. If the coding scheme is chosen to meet the property of "coding and reverse complement interchangeability", then only the reference genome need be encoded in step 2, and its reverse complement need not be encoded.
Examples of coding schemes consistent with the "code and reverse complement exchangeable" property in monochrome 2+2 sequencing:
mk sequencing: 1) M is rewritten to A, K is rewritten to T; or 2) M is rewritten as C, K is rewritten as G;
RY sequencing: 1) R is rewritten to A, Y is rewritten to T; or 2) R is rewritten as C and Y is rewritten as G;
Ws sequencing: method of coding for monochrome WS sequencing, which codes for exchangeable properties complementary to the reverse: the W character is coded into a character string AT, and the S character is coded into a character string CG; similarly, WW codes to ATAT, SS codes to CGCG, WWW codes to ataat, SSs codes to CGCGCG, etc.
Examples of coding schemes consistent with the "code and reverse complement exchangeable" property in two-color 2+2 sequencing:
1. The sequence is sequentially partitioned into a number of substrings, each substring containing only bases corresponding to the 2+2 sequencing combination. For example, in two-color MK sequencing, each substring consists of A and/or C only, or G and/or T only. For example, sequence AAGTGGCACT is partitioned into (AA, GTGG, CAC, T).
2. Each sub-string is rearranged from small to large in alphabetical order, respectively. For example, (AA, GTGG, CAC, T) is rearranged to (AA, GGGT, ACC, T).
3. And sequentially connecting the rearranged sub-strings to form a new string, and taking the new string as a coding result. For example, (AA, GGGT, ACC, T) are concatenated into a string AAGGGTACCT.
The two-color coding mode accords with the property of 'coding and reverse complementary exchangeable':
to improve alignment accuracy in step 3, the first and last substrings of each sequence in the bicolor 2+2 encoding may need to be discarded. In the above example, sequence AAGTGGCACT needs to be encoded as GGTACCC. Since the two parts are prone to alignment errors.
The following examples are given without specific description, and both mono-color and bi-color 2+2 are encoded as given in the previous examples. dMK, dRY, dWS denotes two colors MK, two colors RY and two colors WS, sMK, sRY denotes one color MK and one color RY, respectively. For further elucidation of the invention, the following specific embodiments are presented. The specific parameters, steps, etc. involved are conventional in the art. The detailed description and examples do not limit the scope of the invention. Except where specifically indicated, all terms used in this application are used in the generic sense of this art. All gene sequences referred to in the present invention are sequences artificially synthesized on the market, except for the specific descriptions. There are many companies that commonly synthesize sequences, such as, for example, invitrogen.
Example 1
According to the description of the invention, human genomic DNA samples (reagent Human CEPH Genomic DNA in Ion PITM Controls Kit from Thermo Inc., cat. No. 4488985) were sequenced with two colors MK, two colors RY, two colors WS, one color MK, one color RY, one million DNA sequences each. The results were aligned to the corresponding encoded genome using Bowtie2 and the statistics were only able to align the proportion of DNA sequences to unique positions on the encoded genome (unique alignment). And comparing the result with the sequencing result (complete DNA sequence information can be obtained) of the Illumina sequencer (HiSeq 2000). The unique alignment is as follows:
in the table dMK represents a two-color MK sequencing method. Lower case letters d and s represent two-color sequencing and one-color sequencing, respectively.
Example 2
According to the description of the invention, E.coli genomic DNA samples (thermo E.coli DNA Control, cat. No. 4458450) were sequenced with two colors MK, two colors RY, two colors WS, one color MK, one color RY, one million DNA sequences each. The results were aligned to the corresponding encoded genome using Bowtie2 and the statistics were only able to align the proportion of DNA sequences to unique positions on the encoded genome (unique alignment). And comparing the result with the sequencing result of the Illumina sequencer (complete DNA sequence information can be obtained). The results of the unique alignment are shown in the following table:
Example 3
Since the present invention infers genetic variations based on only partial information of DNA sequences, the existence of a portion of genetic variations is not theoretically possible to find by the present invention. For example, in single color MK sequencing, the point mutation A.fwdarw.C cannot be found (but can be theoretically found in single color RY); in two-color MK sequencing, however, if adjacent two bases AC change position in mutation to CA, it is also theoretically impossible to find. We count the proportion of All human SNVs known to date (dbSNP database download: https:// www.ncbi.nlm.nih.gov/pnp. Filename: all_2015105. Vcf. Gz) that could not be detected theoretically by the present invention, as shown in the following table:
/>
Example 4
2+2 Three rounds of sequencing, single color: 3 sets of reaction solutions are prepared, each set of reaction solution comprises two types of bases marked with fluorescent groups, and the fluorescent groups are all fluorescent groups for common nucleic acid marking. Two bottles of reaction liquid in one set contain exactly 4 complete bases. The 6 bottles of solution were not repeated with each other.
The complete sequencing process involves three rounds, one after the other. The three sets of reagents were used separately for each round of sequencing process. In addition, the sequencing primer was identical (identical sequencing primer was used, and the reaction conditions were identical).
Each round of sequencing comprises:
1. hybridization of sequencing primers to an already prepared DNA array
2. The sequencing process is started. The 2.1-2.4 procedure is repeated a limited number of times.
2.1 Into the first vial of reagent. And reacting and collecting fluorescent signals.
2.2 Washing flowcell all residual reaction solution and fluorescent molecules produced
2.3 Into a second vial of reagent. And reacting and collecting fluorescent signals.
2.2 Washing flowcell all residual reaction solution and fluorescent molecules produced
3. The extended sequencing primer is unwound.
Thus, the next round of experiments can be performed.
Preparing a reaction solution:
Preparing a sequencing reaction lotion, namely a lotion for short, which comprises the following components:
20mM Tris-HCl pH 8.8
10mM(NH4)2SO4
50mM KCl
2mM MgSO4
0.1%20
Preparing a sequencing reaction mother solution (mother solution for short) which contains:
20mM Tris-HCl pH 8.8
10mM(NH4)2SO4
50mM KCl
2mM MgSO4
0.1%20
8000unit/mL Bst polymerase
100unit/mL CIP
Three sets of sequencing reaction solutions were prepared, six bottles total. The method comprises the following steps of:
1A, mother liquor +20uM dA4P-TG +20uM dC4P-TG
1B, mother liquor +20uM dG4P-TG+20uM dG4P-TG
2A, mother liquor +20uM dA4P-TG +20uM dG4P-TG
2B, mother liquor +20uM dC4P-TG +20uM dG4P-TG
3A, mother liquor +20uM dA4P-TG +20uM dT4P-TG
3B, mother liquor +20uM dC4P-TG +20uM dG4P-TG
The prepared reaction liquid and mother liquid are placed on a 4c refrigerator or ice for standby.
Hybridization sequencing primer:
The sequencing chip was filled with sequencing primer solution (10 uM dissolved in 1 XSSC buffer), warmed to 90℃and cooled to 40℃at a rate of 5℃per minute. The sequencing primer solution was rinsed off with a wash.
The first sequencing was performed:
The sequencing chip was placed on a sequencer.
Sequencing was performed using the first set of reactions. The following procedure was followed.
1, 10ML of washing liquid is introduced to wash the chip
2, Cooling the chip to 4 DEG C
3, 100UL of reaction solution 1A was introduced
4, Heating the chip to 65 DEG C
5, Waiting for 1min
6, Exciting with 473nm laser, and shooting fluorescent image.
7, 10ML of washing liquid is introduced to wash the chip
8, Cooling the chip to 4 DEG C
9, 100UL of reaction solution 1B was introduced
10, Heating the chip to 65 DEG C
11, Wait for 1min
12, Fluorescence image was taken by excitation with 473nm laser light.
The steps 1-12 were repeated 50 times to obtain 100 fluorescence signals.
Example 5
Bicolor 2+2 three rounds of sequencing: 3 sets of reaction solutions are prepared, each set of reaction solution comprises two bottles, and each bottle comprises two bases. The two bases are labeled with different fluorescent chromophores to distinguish between them, with different emission wavelengths.
In this example, two chromophores are used for all bases: x and Y. Two bottles of reaction liquid in one set contain exactly 4 complete bases. The 6 bottles of solution were not repeated with each other.
First bottle | Second bottle | |
First set | AX+CY | GX+TY |
Second set | AX+GY | CX+TY |
Third set | AX+TY | CX+GY |
(XY is symbolized by the term "fluorescent group for labeling of nucleic acids" as commonly used)
The complete sequencing process involves three rounds, one after the other. The three sets of reagents were used separately for each round of sequencing process. Except that they are identical.
Each round of sequencing comprises:
1 hybridization of sequencing primers to an already prepared DNA array
2 Start the sequencing process. The 2.1-2.4 procedure is repeated a limited number of times.
2.1 Into the first vial of reagent. The fluorescent signals of two wavelengths are reacted and collected.
2.2 Washing flowcell all residual reaction solution and fluorescent molecules produced
2.3 Into a second vial of reagent. The fluorescent signals of two wavelengths are reacted and collected.
2.2 Washing flowcell all residual reaction solution and fluorescent molecules produced
3 Unwinding the extended sequencing primer.
Thus, the next round of experiments can be performed.
Example 6
Examples 4 and 5 are complete sequencing schemes. It is a common view that complete, well-defined sequence information can be obtained under the sequencing flow of example 4 and example 5, or at least in the case of two rounds of sequencing. In the presence of the reference genome, only one round of sequencing is required to obtain ambiguous sequence information, such that variations can be aligned or found with the reference gene.
On the basis of example 4. Only any one of the three sets of reaction solutions is needed to be prepared, and two bottles of reaction solutions are utilized to carry out one round of sequencing. The specific sequencing steps may be the same as in example 4.
Example 7
On the basis of example 5, only any one of three sets of reaction solutions was prepared, and one round of sequencing was performed using two bottles of the reaction solutions. The specific sequencing steps may be the same as in example 5.
For further elucidation of the sequencing method of the present invention reference may be made to the applicant's already filed patents, CN201510822361.9 or CN 2015110815685. X. And will not be described in detail herein. It is specifically stated that the specific sequencing steps of the present invention do not limit the scope of the present invention.
Claims (10)
1. A method for comparing sequencing fuzzy sequence information is characterized in that,
Fixing the nucleotide fragment to be detected, and obtaining fuzzy sequence information through a sequencing reaction;
comparing the fuzzy sequence information with a reference nucleic acid sequence;
Wherein, the reaction liquid of the sequencing reaction contains nucleotide substrate molecules with two different bases;
sequencing refers to sequencing by utilizing a nucleotide substrate molecule of which the 5' -end is modified with a fluorophore with fluorescence switching property on polyphosphoric acid;
the fluorescence switching property means that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction;
The sequencing reaction is a sequencing method with an unclosed 3 end;
comparing the fuzzy sequence information with the reference nucleic acid sequence means that the fuzzy sequence information and the reference nucleic acid sequence are encoded in the same mode and then are compared;
wherein, the comparing of the fuzzy sequence information and the reference nucleic acid sequence comprises the following steps:
(1) Encoding the sequencing result and the reference nucleic acid sequence by the same method;
(2) Comparing the encoded sequencing result to the encoded reference nucleic acid sequence;
(3) Gene variation was found in the comparison results.
2. A method for comparing fuzzy sequence information by sequencing is characterized in that,
Fixing the nucleotide fragment to be detected, and obtaining fuzzy sequence information through a sequencing reaction;
comparing the fuzzy sequence information with a reference nucleic acid sequence;
Wherein, the reaction liquid of the sequencing reaction contains nucleotide substrate molecules with three different bases;
sequencing refers to sequencing by utilizing a nucleotide substrate molecule of which the 5' -end is modified with a fluorophore with fluorescence switching property on polyphosphoric acid;
the fluorescence switching property means that the fluorescence signal is obviously changed after sequencing compared with that before sequencing reaction;
The sequencing reaction is a sequencing method with an unclosed 3 end;
comparing the fuzzy sequence information with the reference nucleic acid sequence means that the fuzzy sequence information and the reference nucleic acid sequence are encoded in the same mode and then are compared;
wherein, the comparing of the fuzzy sequence information and the reference nucleic acid sequence comprises the following steps:
(1) Encoding the sequencing result and the reference nucleic acid sequence by the same method;
(2) Comparing the encoded sequencing result to the encoded reference nucleic acid sequence;
(3) Gene variation was found in the comparison results.
3. A method according to claim 1 or 2, characterized in that,
Each sequencing uses one set of reaction liquid group, wherein each set of reaction liquid group comprises two or more reaction liquids, and each reaction liquid comprises nucleotide substrate molecules with at least two different bases.
4. The method of claim 1, wherein the step of determining the position of the substrate comprises,
The ambiguous sequence information is a combination of degenerate sequence information and non-degenerate sequence information.
5. A method according to claim 1 or 2, characterized in that,
The ambiguous sequence information obtained by sequencing is encoded into one of its possible base sequence information.
6. A method according to claim 1 or 2, characterized in that,
And encoding all the fuzzy sequence information into numbers in the fuzzy sequence information obtained by sequencing.
7. A method according to claim 1 or 2, characterized in that,
The ambiguous sequence information and the reference nucleic acid sequence are encoded simultaneously or sequentially.
8. The method of claim 1, wherein the step of determining the position of the substrate comprises,
The nucleotide substrate molecule of the fluorophore with the fluorescent switching property modified by the 5 '-terminal polyphosphoric acid refers to the nucleotide substrate molecule of the fluorophore with the fluorescent switching property modified by the 5' -terminal polyphosphoric acid.
9. The method of claim 1, wherein the step of determining the position of the substrate comprises,
Sequencing using a nucleotide substrate molecule modified with a fluorophore having a fluorescence switching property at the 5' polyphosphate end or the middle phosphate;
The fluorescence switching property means that the fluorescence signal intensity is obviously increased after sequencing compared with that before sequencing reaction;
Each set of reaction liquid group is used for sequencing, each set of reaction liquid group comprises two reaction liquids, and each reaction liquid contains nucleotide substrate molecules with two different bases;
The nucleotide substrate molecules in one reaction liquid can be complementary with two bases on the nucleotide sequence to be detected, and the nucleotide substrate molecules in the other reaction liquid can be complementary with the other two bases on the nucleotide sequence to be detected;
Firstly, fixing a nucleotide sequence fragment to be detected in a reaction chamber, and then introducing one reaction solution in a set of reaction solution groups;
Releasing the fluorophore on the nucleotide substrate having the fluorophore with fluorescence switching properties using an enzyme, thereby resulting in fluorescence switching;
then introducing a second reaction liquid in the same set of reaction liquid groups;
Releasing the fluorophore on the nucleotide substrate of the fluorophore having fluorescence switching properties using an enzyme, thereby causing fluorescence switching;
and (3) circularly adding the two reaction solutions, and obtaining fuzzy coding information of the nucleotide substrate to be detected through fluorescence information.
10. A system for comparing ambiguous sequence information obtained from sequencing comprises a computing system, wherein,
Use of the method of any of the preceding claims; comparing the fuzzy sequence information obtained by sequencing with a reference nucleic acid sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010525168.XA CN111667882B (en) | 2016-12-01 | 2016-12-01 | Sequencing fuzzy sequence information comparison method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010525168.XA CN111667882B (en) | 2016-12-01 | 2016-12-01 | Sequencing fuzzy sequence information comparison method |
CN201611088606.0A CN108165616B (en) | 2016-12-01 | 2016-12-01 | Method and system for comparing and identifying variation by using fuzzy nucleic acid sequencing information |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611088606.0A Division CN108165616B (en) | 2016-12-01 | 2016-12-01 | Method and system for comparing and identifying variation by using fuzzy nucleic acid sequencing information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111667882A CN111667882A (en) | 2020-09-15 |
CN111667882B true CN111667882B (en) | 2024-05-14 |
Family
ID=62525863
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010525787.9A Active CN111575355B (en) | 2016-12-01 | 2016-12-01 | Sequencing fuzzy sequence analysis method |
CN202010525168.XA Active CN111667882B (en) | 2016-12-01 | 2016-12-01 | Sequencing fuzzy sequence information comparison method |
CN201611088606.0A Active CN108165616B (en) | 2016-12-01 | 2016-12-01 | Method and system for comparing and identifying variation by using fuzzy nucleic acid sequencing information |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010525787.9A Active CN111575355B (en) | 2016-12-01 | 2016-12-01 | Sequencing fuzzy sequence analysis method |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611088606.0A Active CN108165616B (en) | 2016-12-01 | 2016-12-01 | Method and system for comparing and identifying variation by using fuzzy nucleic acid sequencing information |
Country Status (1)
Country | Link |
---|---|
CN (3) | CN111575355B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112102883B (en) * | 2020-08-20 | 2023-12-08 | 深圳华大生命科学研究院 | Base sequence coding method and system in FASTQ file compression |
CN114561453A (en) * | 2022-01-28 | 2022-05-31 | 赛纳生物科技(北京)有限公司 | Method for qualitatively or quantitatively analyzing target sample through degenerate sequencing |
CN114540471B (en) * | 2022-01-28 | 2024-05-14 | 赛纳生物科技(北京)有限公司 | Method and system for performing comparison by using missing nucleic acid sequencing information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102329884A (en) * | 2011-10-20 | 2012-01-25 | 东南大学 | Synchronous synthesis and DNA sequencing method for two nucleotides and application thereof |
CN103951724A (en) * | 2014-04-30 | 2014-07-30 | 南京普东兴生物科技有限公司 | Specially modified nucleotide as well as application thereof in high-throughput sequencing |
CN104662165A (en) * | 2012-03-30 | 2015-05-27 | 加利福尼亚太平洋生物科学股份有限公司 | Methods and composition for sequencing modified nucleic acids |
CN104910229A (en) * | 2015-04-30 | 2015-09-16 | 北京大学 | Poly phosphoric acid end fluorescent labeled nucleotide and application thereof |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100130368A1 (en) * | 1998-07-30 | 2010-05-27 | Shankar Balasubramanian | Method and system for sequencing polynucleotides |
US20100035249A1 (en) * | 2008-08-05 | 2010-02-11 | Kabushiki Kaisha Dnaform | Rna sequencing and analysis using solid support |
CN102634586B (en) * | 2012-04-27 | 2013-10-30 | 东南大学 | Decoding and sequencing method by real-time synthesis of two nucleotides into deoxyribonucleic acid (DNA) |
CN106755292B (en) * | 2015-11-19 | 2019-06-18 | 赛纳生物科技(北京)有限公司 | A kind of nucleic acid molecule sequencing approach of phosphoric acid modification fluorogen |
-
2016
- 2016-12-01 CN CN202010525787.9A patent/CN111575355B/en active Active
- 2016-12-01 CN CN202010525168.XA patent/CN111667882B/en active Active
- 2016-12-01 CN CN201611088606.0A patent/CN108165616B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102329884A (en) * | 2011-10-20 | 2012-01-25 | 东南大学 | Synchronous synthesis and DNA sequencing method for two nucleotides and application thereof |
CN104662165A (en) * | 2012-03-30 | 2015-05-27 | 加利福尼亚太平洋生物科学股份有限公司 | Methods and composition for sequencing modified nucleic acids |
CN103951724A (en) * | 2014-04-30 | 2014-07-30 | 南京普东兴生物科技有限公司 | Specially modified nucleotide as well as application thereof in high-throughput sequencing |
CN104910229A (en) * | 2015-04-30 | 2015-09-16 | 北京大学 | Poly phosphoric acid end fluorescent labeled nucleotide and application thereof |
Non-Patent Citations (1)
Title |
---|
第二代测序序列比对方法综述;杨烨;刘娟;;武汉大学学报(理学版)(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111667882A (en) | 2020-09-15 |
CN111575355A (en) | 2020-08-25 |
CN108165616A (en) | 2018-06-15 |
CN108165616B (en) | 2020-09-29 |
CN111575355B (en) | 2023-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
ES2873850T3 (en) | Next Generation Sequencing Libraries | |
CN110343753B (en) | Nucleotide molecule sequencing method of phosphate modified fluorophore | |
CN108699599A (en) | The method for obtaining and correcting biological sequence information | |
CN111667882B (en) | Sequencing fuzzy sequence information comparison method | |
CN101818142B (en) | Method for replicating nucleic acid sequence | |
EP2909343B1 (en) | Methods to sequence a nucleic acid | |
CN112752850A (en) | Digital amplification for protein detection | |
CN112840035B (en) | Method for sequencing polynucleotides | |
US20130331286A1 (en) | Universal random access detection of nucleic acids | |
JP2002523062A (en) | Methods for determining polynucleotide sequence mutations | |
WO2020010137A1 (en) | Formulations and signal encoding and decoding methods for massively multiplexed biochemical assays | |
CN111454281B (en) | Merocyanine compound, dye for biomolecule labeling, kit and contrast agent composition containing same | |
CN106755290B (en) | The method being sequenced using the nucleotides substrate molecule with fluorescence switching property fluorogen | |
CN106916882B (en) | Method for dual allele-specific polymerase chain reaction of genotype identification chip for identifying polymorphism of nucleotide gene | |
CN114540471B (en) | Method and system for performing comparison by using missing nucleic acid sequencing information | |
US20240011020A1 (en) | Sequencing oligonucleotides and methods of use thereof | |
CN112280842B (en) | Sequencing-by-synthesis method for 3' -hydroxyl-terminated reversible blocked nucleotide | |
CN116574790A (en) | Polynucleotide sequencing method | |
WO2023175041A1 (en) | Concurrent sequencing of forward and reverse complement strands on concatenated polynucleotides | |
Gaikwad | Source of Genomic Resources-The genome sequencing facility | |
JP2004016131A (en) | Dna microarray and method for analyzing the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |