WO2023159817A1 - Genetic diagnosis probes and use thereof - Google Patents

Genetic diagnosis probes and use thereof Download PDF

Info

Publication number
WO2023159817A1
WO2023159817A1 PCT/CN2022/100272 CN2022100272W WO2023159817A1 WO 2023159817 A1 WO2023159817 A1 WO 2023159817A1 CN 2022100272 W CN2022100272 W CN 2022100272W WO 2023159817 A1 WO2023159817 A1 WO 2023159817A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
strand
cancer
complementary
probe
Prior art date
Application number
PCT/CN2022/100272
Other languages
French (fr)
Chinese (zh)
Inventor
李冰思
宿静
邱福俊
王晨阳
李晓玲
张之宏
汉雨生
Original Assignee
广州燃石医学检验所有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州燃石医学检验所有限公司 filed Critical 广州燃石医学检验所有限公司
Publication of WO2023159817A1 publication Critical patent/WO2023159817A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • This application relates to the field of biomedicine, in particular to a gene diagnostic probe and its application.
  • DNA methylation is an epigenetic modification, which is catalyzed by DNA methyl-transferase (DNA methyl-transferase, DNMT) to convert S-adenosylmethionine (SAM) into a methyl group.
  • DNA methyl-transferase DNA methyl-transferase
  • DNMT DNA methyl-transferase
  • SAM S-adenosylmethionine
  • the cytosine of the CG two nucleotides of the DNA is selectively methylated, mainly forming 5-methylcytosine (5-mC) (commonly found in the 5'-CG-3' sequence of genes)
  • 5-methylcytosine (5-mC) commonly found in the 5'-CG-3' sequence of genes
  • N6-mA and 7-methylguanine (7-mG) structural genes contain many CpG structures, and the 5th carbon atom of two cytosines in 2CpG and 2GPC is usually methylated , and the two methyl groups present a specific three-dimensional structure in the DNA double-strand major groove.
  • DNA methylation plays an important role in the regulation of gene expression. Aberrant DNA methylation marks have been reported in the development of various diseases, including cancer. As a high-resolution, high-throughput technology, DNA methylation sequencing is increasingly recognized for its role in early cancer screening, diagnosis, and monitoring.
  • WGBS Whole Genome Bisulfite Sequencing
  • WGBS Whole Genome Bisulfite Sequencing
  • CG dinucleotide is the most important methylation site, it is unevenly distributed in the genome, there are hypermethylation, hypomethylation and non-methylation regions, mC accounts for about the total C in mammals 2-7%.
  • CpG islands are abundant in the genome, and these detection and analysis can be greatly assisted by massively parallel nucleic acid sequencing (also known as “high-throughput sequencing” or “next-generation sequencing” (NGS)), making it possible to predict Where and where the cancer occurs becomes possible.
  • massively parallel nucleic acid sequencing also known as “high-throughput sequencing” or “next-generation sequencing” (NGS)
  • Embodiment 3 of Chinese Patent Announcement CN112646888B mentions a kind of utilization Single Cell Kit (Qiagen, Cat#150343) and Mung Bean Nuclease (NEB, Cat#M0250L) were used to process NA12878 DNA to prepare 0% methylation standard.
  • NAB Mung Bean Nuclease
  • human tumor gene detection preparations can be prepared to achieve early detection or early screening of cancers including but not limited to: brain cancer, lung cancer, skin cancer, nasopharyngeal cancer, throat cancer Cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, bowel cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumors, ovarian cancer, esophagus cancer, gallbladder cancer, biliary tract cancer , breast cancer, cervical cancer, uterine cancer, prostate cancer, head and neck cancer, sarcoma, thoracic malignancy (except lung), melanoma, and testicular cancer.
  • cancers including but not limited to: brain cancer, lung cancer, skin cancer, nasopharyngeal cancer, throat cancer Cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, bowel cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumors, ovarian cancer
  • the present application provides a combination of nucleic acid molecules, the combination of nucleic acid molecules comprising at least one nucleic acid probe set covering a target region of a nucleic acid to be detected, characterized in that the set of nucleic acid probes at least comprises: (1) and A first probe complementary to the first strand, the first strand being the sequence of the target region after base substitution; (2) a second probe complementary to the second strand, the second strand being the sequence of the target region The sequence of the complementary region of the target region after base substitution; and contains any one or both of the following two probes: (3) a third probe complementary to the third strand, and the third strand is complementary to the third strand The first strand is complementary; (4) a fourth probe complementary to the fourth strand, which is the complementary sequence of the second strand.
  • the application provides an application of the nucleic acid molecule combination described in the application in the preparation of human tumor gene detection preparations.
  • the present application provides a standard nucleic acid molecule used for assessing the accuracy of the degree of base modification detected in the application of the present application
  • the standard nucleic acid molecule comprises a candidate region where the degree of base modification is about 0%, and the candidate region The total length is about 1 bp to about 10000 bp.
  • Figure 1 shows the methylation measurement results of the "20% standard” and “50% standard” of the application, as well as the “zero methylation standard” and “full methylation standard” of the application ” methylation measurements.
  • FIGS 2A-2C show the uniformity measurement results of the probes designed in this application.
  • Figure 3 shows the repeatability measurement results of the probes designed in this application.
  • FIGS 4A-4C show the bias measurement results of the probes designed in this application.
  • Fig. 5 shows an exemplary reference schematic diagram of the capture probe design of the present application.
  • Fig. 6 shows an exemplary reference schematic diagram for calculating the methylation level in the present application.
  • next-generation gene sequencing NGS
  • high-throughput sequencing or “next-generation sequencing” generally refer to the second-generation high-throughput sequencing technology and higher-throughput sequencing methods developed thereafter.
  • Next-generation sequencing platforms include but are not limited to existing sequencing platforms such as Illumina. With the continuous development of sequencing technology, those skilled in the art can understand that other sequencing methods and devices can also be used for this method. For example, two Generation gene sequencing can have the advantages of high sensitivity, high throughput, high sequencing depth, or low cost.
  • Massively Parallel Signature Sequencing Massively Parallel Signature Sequencing, MPSS
  • Polony Sequencing 454pyro sequencing
  • Illumina (Solexa) sequencing Illumina (Solexa) sequencing
  • Ion semi conductor sequencing DNA nano-ball sequencing
  • Complete Genomics' DNA nanoarray and combined probe anchor ligation sequencing method etc.
  • the second-generation gene sequencing can make it possible to analyze the transcriptome and genome of a species in detail, so it is also called deep sequencing ( deep sequencing).
  • deep sequencing deep sequencing
  • the method of the present application can also be applied to first-generation gene sequencing, second-generation gene sequencing, third-generation gene sequencing or single molecule sequencing (SMS).
  • SMS single molecule sequencing
  • sample to be tested generally refers to a sample that needs to be tested. For example, it is possible to detect whether one or more gene regions on the sample to be tested are modified.
  • complementary region generally refers to a region that is complementary to a reference nucleotide sequence.
  • a complementary nucleic acid can be a nucleic acid molecule that optionally has an opposite orientation.
  • the complementary may refer to having the following complementary associations: guanine and cytosine; adenine and thymine; adenine and uracil.
  • hybridization generally refers to a reaction in which one or more polynucleotides react to form a complex stabilized by hydrogen bonds between the bases of the nucleotide residues. Hydrogen bonding can occur through Watson-Crick base pairing, Hoogstein binding, or in any other sequence-specific manner based on base complementarity.
  • the complex may comprise two strands forming a double helix, three or more strands forming a multi-strand complex, self-hybridizing single strands, or any combination of these.
  • the hybridization reaction may constitute a step in a wider method, such as the initiation of PCR or the enzymatic cleavage of polynucleotides by endonucleases.
  • a second sequence that is completely complementary to a first sequence or that is polymerized by a polymerase using the first sequence as a template is said to be "complementary" to said first sequence.
  • hybridizable refers to the ability of a polynucleotide to form complexes that are stabilized by hydrogen bonds between the bases of the nucleotide residues in a hybridization reaction.
  • a hybridizable nucleotide sequence is at least about 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% complementary to the sequence to which it hybridizes.
  • polynucleotide represents polymeric forms of nucleotides (deoxyribonucleotides or ribonucleotides) of any length, or analogs thereof.
  • a polynucleotide can have any three-dimensional structure and can perform any function, whether known or unknown.
  • polynucleotides coding or non-coding regions of genes or gene segments, loci (loci) defined by linkage analysis, exons, introns, messenger RNA (mRNA), translocation RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), microRNA (miRNA), ribozyme, cDNA, recombinant polynucleotide, branched polynucleotide, Plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, primers and linkers.
  • loci defined by linkage analysis, exons, introns, messenger RNA (mRNA), translocation RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), microRNA (miRNA), ribozyme, cDNA, recombinant polynu
  • a polynucleotide may include one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. Modifications to the nucleotide structure, if present, can be imparted either before or after assembly of the polymer. Nucleotide sequences can be interrupted by non-nucleotide components. Polynucleotides may be further modified after polymerization, such as by conjugation with labeling components.
  • the term "modification state” generally refers to the modification state of the gene fragment, nucleotide or its base in the present application.
  • the modification state in the present application may refer to the modification state of cytosine.
  • a gene segment of the present application having a modified state may have altered gene expression activity.
  • the modification status of the present application may refer to the methylation modification of a base.
  • the modified state in this application may refer to the covalent bonding of a methyl group at the 5' carbon position of cytosine in the CpG region of genomic DNA, for example, it may become 5-methylcytosine (5mC).
  • a modification state can refer to the presence or absence of 5-methylcytosine ("5-mCyt") within the DNA sequence.
  • methylation generally refers to the methylation state of a gene fragment, nucleotide or its base in this application.
  • the DNA fragment where the gene in this application is located may have methylation on one strand or multiple strands.
  • the DNA fragment where the gene in this application is located may have methylation at one site or multiple sites.
  • transformation generally refers to the transformation of one or more structures into another structure.
  • the transformations of the present application can be specific.
  • cytosine without methylation modification can be converted into other structures (such as uracil), and cytosine with methylation modification can be substantially unchanged after conversion.
  • cytosine without methylation modification can be cleaved after conversion, and cytosine with methylation modification can be substantially unchanged after conversion.
  • the term “bisulfite”, or “bisulfite” generally refers to a reagent that can distinguish DNA regions with and without modification states.
  • the bisulfite may include bisulfite, or an analog thereof, or a combination thereof.
  • bisulfite can deaminate the amino group of unmodified cytosine to distinguish it from modified cytosine.
  • the term “analogue” generally refers to a substance having a similar structure and/or function.
  • analogs of bisulfite may have a similar structure to bisulfite.
  • an analog of bisulfite may refer to a reagent that can also distinguish between DNA regions that have a modified state and those that do not.
  • the term "about” generally refers to a range of 0.5%-10% above or below the specified value, such as 0.5%, 1%, 1.5%, 2%, 2.5%, above or below the specified value. 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, or 10%.
  • the present application provides a combination of nucleic acid molecules, the nucleic acid molecule in the combination of nucleic acid molecules has a binding free energy for a nucleic acid sequence derived from a target region and a binding free energy for a nucleic acid sequence derived from a non-target region differs by more than one specific threshold.
  • the specified threshold is from about 12 to about 50 kcal/mol.
  • the specific threshold is about 20 to 30 kcal/mol.
  • the specific threshold is about 20 kcal/mol.
  • the combination of nucleic acid molecules of the present application is based on the screening of candidate target regions to determine suitable nucleic acid molecules.
  • the nucleic acid molecule combination designed for the candidate target region in the present application has a higher binding free energy for the nucleic acid sequence derived from the target region.
  • the combination of nucleic acid molecules designed in the present application for candidate target regions has higher binding free energy for nucleic acid sequences derived from target regions.
  • the nucleic acid molecules in the combination of nucleic acid molecules of the present application have a binding free energy for a nucleic acid sequence derived from a target region that differs from the binding free energy for a nucleic acid sequence derived from a non-target region by about 12 or more.
  • the nucleic acid molecules in the nucleic acid molecule combination of the present application have a binding free energy for the nucleic acid sequence derived from the target region, which is about 12 kcal/mol, about 13 kcal/mol higher than the binding free energy of the nucleic acid sequence derived from the non-target region.
  • the present application provides a combination of nucleic acid molecules, the combination of nucleic acid molecules comprising at least one nucleic acid probe set covering the target region of the nucleic acid to be tested, the set of nucleic acid probes at least comprising: (1) complementary to the first strand
  • the first probe of the first strand is the sequence of the target region after base substitution; (2) the second probe complementary to the second strand, the second strand is the complementary sequence of the target region
  • the sequence of the region after base substitution and may contain any one of the following two probes or may contain two of the following two probes at the same time: (3) the third probe complementary to the third strand, so The third strand is complementary to the first strand; (4) a fourth probe complementary to the fourth strand, the fourth strand being the complementary sequence of the second strand.
  • the nucleic acid molecule combination of the present application is aimed at the target region assuming that the nucleic acid to be tested is zero methylation, the first strand (target upper strand, OT strand) of the region after base replacement, and the complementary region of the region
  • the second strand after base substitution (the lower strand of the target, OB strand), and for the complementary strand of the first strand (the complementary strand of the upper strand of the target, CTOT strand), design a third probe complementary to the third strand ; Simultaneously design a fourth probe complementary to the fourth strand according to the complementary strand of the second strand (the complementary strand of the lower strand of the target, CTOB strand).
  • nucleic acid molecules of the present application are combined with capture probes for methylation detection.
  • the base-substituted site includes a site where cytosine is present.
  • the base substitution comprises a nucleic acid sequence in which cytosine is replaced by thymine or uracil through chemical and/or biological processes.
  • the base substitution includes obtaining a nucleic acid sequence in which all cytosines are replaced with thymine or uracil.
  • the base replacement may include bisulfite conversion treatment, and the unmethylated C in the original upper chain and the original lower chain is converted into uracil
  • the nucleic acid probe set further comprises: (1) a fifth probe complementary to the fifth strand, the fifth strand being the sequence of the target region without base substitution (2) a sixth probe complementary to the sixth strand, the sixth strand being a sequence in which the complementary region of the target region has not undergone base substitution; (3) a seventh probe complementary to the seventh strand, The seventh strand is complementary to the fifth strand; (4) an eighth probe complementary to the eighth strand, the eighth strand being the complementary sequence of the sixth strand.
  • four other probes are designed for the target region assumed to be fully methylated in the nucleic acid to be tested.
  • the combination of nucleic acid molecules comprises a set of nucleic acid probes covering 10,000 or more different target regions of the nucleic acid to be tested.
  • the nucleic acid molecule combination of the present application is aimed at 10000 or more, 15000 or more, 20000 or more, 25000 or more, 30000 or more, 40000 or more, or 50,000 or more different target areas for design.
  • the present application provides a combination of nucleic acid molecules, in which standards for specific methylation levels, for example, the detection results of methylation standards for 20% and/or 50% methylation levels
  • standards for specific methylation levels for example, the detection results of methylation standards for 20% and/or 50% methylation levels
  • the index selected from the following group is met: the fluctuation of the detection result of the methylation level is 25% or lower, and the repeatability is 9E-05 or lower.
  • the fluctuation is the difference between the maximum value and the minimum value of the detection result, and the repeatability is the median value of the mean square error of the methylation level among multiple wells.
  • fluctuations in methylation levels are used to assess the accuracy of nucleic acid molecule combinations.
  • the detection result fluctuation of the nucleic acid molecule combination of the present application is 22% or lower, 23% or lower, 24% or lower, 25% or lower. % or less, 26% or less, or 27% or less.
  • the mean square error of the methylation levels detected by the candidate capture probe combinations for two or more repeated measurements of the nucleic acid molecule combinations of the present application Between about 1.3E-05 and about 2.7E-04, preferably 9E-05 or lower, more preferably about 8E-05 or lower, further preferably about 7E-05 or lower.
  • the nucleic acid molecules in the combination of nucleic acid molecules are about 80 to about 120 bases in length.
  • the nucleic acid molecules in the combination of nucleic acid molecules are about 80, about 90, about 100, about 110, or about 120 bases in length.
  • the region where any two nucleic acid molecules in the combination of nucleic acid molecules overlap comprises about 10 to about 110 bases.
  • the region where any two nucleic acid molecules in the combination of nucleic acid molecules overlap comprises about 10, about 20, about 50, about 70, about 80, about 90, about 100, or about 110 bases.
  • the region to which the nucleic acid molecules in the combination of nucleic acid molecules are complementary does not contain 10 or more consecutive bases overlapping the repeat region.
  • the information of the repeating region can be described in what is known in the art, such as the repeating region (repeats) described in repeatmasker.org.
  • the present application provides a method for designing a combination of nucleic acid molecules, based on the first strand derived from the target region and subjected to base substitution and its complementary strand, and the second strand derived from the target region and subjected to base substitution and its complementary strand.
  • Complementary chains designing combinations of said nucleic acid molecules that are complementary to three or more of the above-mentioned chains.
  • the present application provides a method for designing a combination of nucleic acid molecules, which includes (1) screening target regions, and the combination of nucleic acid molecules designed for candidate target regions has a higher binding free energy for nucleic acid sequences derived from the target region (2) Design 4 probes for the candidate target region, the nucleic acid molecule combination of the present application is aimed at the target region assuming that the nucleic acid to be tested is fully methylated, and the first base substitution in the region One strand and its complementary strand, and four probes are designed for the second strand and its complementary strand after base substitution in the complementary region of the region; (3) screening nucleic acid molecule combinations, for the standard of specific methylation level As a product, a combination of nucleic acid molecules meeting the criteria selected from the following group is screened: the methylation level detection result fluctuation is 25% or lower, and the repeatability is 9E-05 or lower.
  • the present application provides a method for designing a combination of nucleic acid molecules, which includes (1) the combination of nucleic acid molecules designed for a candidate target region of the present application, the binding free energy of the nucleic acid sequence derived from the target region is higher than that of the source The binding free energy of the nucleic acid sequence in the non-target region is about 12 or higher; (2) the nucleic acid molecule combination of the present application is aimed at the target region assuming that the nucleic acid to be tested is fully methylated, and the base substitution of the region is The first strand (Top strand), the second strand (Bottom strand) after base substitution in the complementary region of this region, and the complementary strand (CTOT strand) to the first strand, the second strand complementary to the third strand is designed.
  • the fourth probe complementary to the fourth strand is designed; (3) screening nucleic acid molecule combinations, for the standards of specific methylation levels, screening meets the selection
  • the fluctuation is the difference between the maximum value and the minimum value of the detection result
  • the repeatability is the median value of the mean square error of the methylation level among multiple wells.
  • the nucleic acid molecule is combined with a capture probe for methylation detection.
  • the standard of specific methylation level used in the method of the present application the standard of specific methylation level is prepared by the method of the present application.
  • the present application provides the nucleic acid molecule combination obtained by the design method of the nucleic acid molecule combination of the present application.
  • the nucleic acid molecule is combined with a capture probe for methylation detection.
  • the present application provides a kit comprising the nucleic acid molecule combination of the present application.
  • the present application provides the application of the nucleic acid molecule combination of the present application and/or the kit of the present application in the preparation of human tumor gene detection preparations.
  • the detection preparation is used to detect the base modification level of the target region.
  • the base modification includes methylation modification.
  • the human tumors are from homogenous tumors, heterogeneous tumors, hematological cancers and/or solid tumors.
  • the human tumor is from one or more of the following group of cancers: brain cancer, lung cancer, skin cancer, nasopharyngeal cancer, throat cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, intestinal Cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumor, ovarian cancer, esophageal cancer, gallbladder cancer, biliary tract cancer, breast cancer, cervical cancer, uterine cancer, prostate cancer, head and neck cancer, sarcoma , Thoracic malignancy (except lung), melanoma, testicular cancer.
  • group of cancers brain cancer, lung cancer, skin cancer, nasopharyngeal cancer, throat cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, intestinal Cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumor, ovarian cancer, esophageal cancer, gallbladder cancer, biliary tract cancer
  • the present application provides a method for detecting the level of base modification, comprising providing the nucleic acid molecule combination of the present application and/or the kit of the present application.
  • the base modification includes methylation modification.
  • the present application provides a storage medium, which records a program capable of running the method of the present application.
  • the non-transitory computer readable storage medium may include a floppy disk, a flexible disk, a hard disk, a solid state storage (SSS) (such as a solid state drive (SSD)), a solid state card (SSC), a solid state module (SSM)), an enterprise high-grade flash drives, tape, or any other non-transitory magnetic media, etc.
  • SSD solid state drive
  • SSC solid state card
  • SSM solid state module
  • Non-transitory computer readable storage media may also include punched cards, paper tape, cursor sheets (or any other physical media having a pattern of holes or other optically identifiable markings), compact disc read only memory (CD-ROM) , Rewritable Disc (CD-RW), Digital Versatile Disc (DVD), Blu-ray Disc (BD) and/or any other non-transitory optical media.
  • CD-ROM compact disc read only memory
  • CD-RW Rewritable Disc
  • DVD Digital Versatile Disc
  • BD Blu-ray Disc
  • the present application provides a device, and the device includes the storage medium of the present application.
  • the device further includes a processor coupled to the storage medium, and the processor is configured to execute based on a program stored in the storage medium to implement the method of the present application.
  • the present application provides a nucleic acid molecule used as a standard for detecting the degree of base modification, the nucleic acid molecule comprising a candidate region with a degree of base modification of about 0%.
  • the total length of the candidate region of the present application is about 1 bp to about 10000 bp.
  • the total length of the candidate region of the present application is about 1 bp, about 10 bp, about 100 bp, about 1000 bp, about 10000 bp, about 50000 bp, or about 100000 bp.
  • the nucleic acid molecule can be selected from one or more of the following cell lines: GM24385, GM12878, GM12877, GM24631.
  • the present application provides a method for preparing a base modification degree detection standard, the method comprising determining a candidate region in a nucleic acid molecule where the base modification degree is about 0%.
  • the present application provides the use of a nucleic acid molecule in the preparation of a base modification degree detection standard, the nucleic acid molecule comprising a candidate region with a base modification degree of about 0%.
  • the degree of base modification includes the degree of methylation of cytosine in the candidate region.
  • the present application provides a set of candidate regions before base modification treatment as described in the nucleic acid molecule of the present application, which is used as a standard without base modification treatment.
  • the nucleic acid molecule can serve as a zero methylation standard.
  • the present application provides a method for preparing a base modification degree detection standard, the method comprising determining a candidate region in a nucleic acid molecule where the base modification degree is about 0%.
  • the present application provides the use of a nucleic acid molecule in the preparation of a base modification degree detection standard, the nucleic acid molecule comprising a candidate region with a base modification degree of about 0%.
  • the present application provides a collection of all base-modified candidate regions as described in the nucleic acid molecule of the present application, which is used as a base-modified standard.
  • the nucleic acid molecule can serve as a permethylation standard.
  • the present application provides a method for preparing a base modification degree detection standard, the method comprising determining a candidate region in a nucleic acid molecule where the base modification degree is about 0% before base modification treatment, and subjecting the nucleic acid molecule to Base modification treatment.
  • the present application provides the use of a nucleic acid molecule in the preparation of a base modification degree detection standard, the nucleic acid molecule comprises a candidate region with a base modification degree of about 0% before base modification treatment, and the nucleic acid molecule Perform base modification.
  • a methylation standard with a predetermined degree of base modification is obtained.
  • the base modification treatment comprises contacting the nucleic acid molecule with a methyltransferase. For example, if m% of the above-mentioned full methylation standard is mixed with 1-m% of the above-mentioned zero-methylation standard, the m% methylation standard can be obtained, and the m% methylation The degree of methylation of the standard in the candidate region is m%.
  • the present application provides a kit comprising the nucleic acid molecule of the present application.
  • the kit can be used as a standard for capture probes for methylation detection.
  • next-generation sequencing treated with bisulfite reduces the complexity of the library and poses a great challenge to the specificity of target capture.
  • the hybridization capture method is often suitable for long probes, thus providing Better specificity and tolerance to single-nucleotide polymorphism (SNP, single-nucleotide polymorphism).
  • SNP single-nucleotide polymorphism
  • Tm melting temperature
  • Kinetic calculation method achieves highly uniform capture and good repeatability of genomic target regions.
  • the hybridization process realizes the specific binding of the target DNA (T, target): RNA (P, probe) complementary sequence.
  • the equilibrium constant R eq of this dynamic reaction can be calculated by the standard free energy ⁇ G o , and the latter can be calculated by bisulfite treatment All conversions, or all non-transformations are assumed to be calculated.
  • Hybridization yield ( ⁇ ) can be calculated by forming DNA:RNA complementary binding or single-stranded morphology. Considering that P is far in excess in the system, in order to simplify the calculation,
  • probes in this hybridization capture system can be defined as
  • ⁇ G o ( ⁇ G o (T f P)- ⁇ G o (T f ) ⁇ G o (P))-( ⁇ G o (T n P)- ⁇ G o (T n P) ⁇ G o (P))
  • T n refers to the hybridization product against the target sequence
  • T f refers to the non-specific hybridization product
  • a suitable target region is such that the difference between the free energy of binding ( ⁇ G o ) of the candidate probe for a nucleic acid sequence derived from the target region and the nucleic acid sequence derived from the non-target region is greater than a specified threshold value of about 12 to 50kcal/mol.
  • the difference ( ⁇ G o ) between the free energy of binding of the candidate probes for the target region and the nucleic acid sequence derived from the non-target region is about 20 kcal/mol or higher, about 50 kcal/mol or higher.
  • a 120nt probe only needs to have a 60nt similar sequence with the capture sequence to capture the sequence.
  • the sliding window is performed to obtain the target subsequence set of each probe, and the length of each subsequence is 60nt, and the distance between each subsequence and the probe sequence is calculated.
  • ⁇ G o The calculation of ⁇ G o between two sequences can refer to calculation methods known in the art, such as Zhang, D et al. Nature Chemistry 4, 208-214 (2012).
  • the range of ⁇ G o that can be used to select suitable target regions is about 12 kcal/mol or higher.
  • the ⁇ G o range for selecting a suitable target region is about 20 kcal/mol or higher, about 50 kcal/mol or higher.
  • ⁇ G o can be calculated by the following example:
  • Example 1 (inappropriate probe region, that is, the probe has a similar sequence on the genome, and the ⁇ G o between the similar sequence and the probe is less than the threshold, and the probe is filtered):
  • the ⁇ G o value (full methylation status, only non-CpG C converted to T) calculated from the similar sequence and the probe sequence was 11.55. Regions of interest with a ⁇ G o range of less than about 12 were discarded.
  • Example 2 (inappropriate probe region, the probe has a similar sequence on the genome, the ⁇ G o between the similar sequence and the probe is less than the threshold, and the probe is filtered):
  • the ⁇ G o value (full non-methylation state, all C converted to T) calculated by the similar sequence 1 and the probe sequence is 2.28;
  • the calculated ⁇ G o value (full methylation status, only non-CpG C converted to T) of the similar sequence 1 and the probe sequence was 2.28. Regions of interest with a ⁇ G o range of less than about 12 were discarded.
  • the calculated ⁇ G o value (full methylation status, only non-CpG C converted to T) of the similar sequence 2 and the probe sequence was 4.67. Regions of interest with a ⁇ G o range of less than about 12 were discarded.
  • Example 3 (appropriate probe region, the probe does not have a longer similar sequence on the genome, and the probe is retained):
  • the ⁇ G o value calculated from the similar sequence and the probe sequence is 50.77;
  • the ⁇ G o value (full methylation status, only non-CpG C converted to T) calculated from the similar sequence and the probe sequence was 50.85. Target regions with a ⁇ G o range greater than about 12, the probe region is retained.
  • Figure 5 provides a reference example for illustration only, a double-stranded DNA fragment expected to be detected for methylation shown above, sorted in the direction of the arrow, including the original upper strand (CCGGCATGTTTAAACGCT) and the original lower strand (AGCGTTTAAACATGCCGG), Some of them assume that the cytosine (C) in all CpGs is methylated, marked with -mC.
  • the unmethylated (-mC) modified C in the original upper strand and the original lower strand is converted into uracil (U ), while the methylated C remains C.
  • the base paired with adenine (A) introduced in the PCR amplification of DNA is thymine (T).
  • T thymine
  • the target upper strand complementary strand (CTOT) complementary to the original upper strand with uracil (U) after bisulfite treatment is first formed, and the target upper strand complementary strand (CTOT) with the original upper strand after bisulfite treatment is formed.
  • the target lower strand complementary strand (CTOB) to which the original lower strand of uracil (U) is complementary.
  • the target upper strand (OT) transformed from the original upper strand and complementary to CTOT, and the target lower strand (OB) transformed from the original lower strand complementary to CTOB were formed.
  • the comparison shows that the unmethylated C in the original upper strand and the original lower strand is replaced by T in the target upper strand and the target lower strand, while the methylated C (underlined) remains unchanged .
  • the number and position of methylated C can be identified by measuring the C after bisulfite conversion treatment, so as to achieve the purpose of DNA methylation detection.
  • the above-mentioned process is expressed in the description herein as C being converted to T, C being replaced by T, C being replaced by T, and the like.
  • the capture probes of the present application can be designed based on target regions that are assumed to be unmethylated.
  • the sequence corresponding to Figure 5 is: T T GGTATGTTTAAA T GTT, and design a first probe complementary to the first strand; All C in the original lower strand is converted to T as the second strand, corresponding to the sequence in Figure 5: AGTGTTTAAATATGTTGG, design a second probe complementary to the second strand; at the same time, use the complementary strand of the first strand as the third strand , corresponding to the sequence in Figure 5 is: AACATTTAAACATACCAA, design the third probe complementary to the third strand; at the same time, according to the complementary strand of the second strand as the fourth strand, corresponding to the sequence in Figure 5 is: CC A ACATATTTAAAC A CT to design a fourth probe complementary to the fourth strand.
  • the probes of the present application are also designed for the complementary strands of the two target strands, achieving good coverage. Proven to improve capture performance, such as probe accuracy and reproducibility. It should be noted that the above-mentioned FIG. 5 is only an example for convenience of explanation, and the number of target chains to be selected is actually very large, and is not limited to the sequence in FIG. 5 .
  • the capture probe of the present application can also be further designed according to the target region assumed to be fully methylated.
  • the target region assumed to be fully methylated In the case of CpG islands as the main body of methylation determination, only base sequencing is considered (for example, in Figure 5 The direction of the arrow) means that the base C in "CG" will be methylated, and in other cases, it is considered that methylation will not occur.
  • the target regions of the present application are preferably about 10,000 or more.
  • the performance of the capture probe combination is detected through specific methylation standards, and the final probes used in the probe set are determined.
  • the methylation level is 20% and/or 50% standard test samples
  • the detection deviation is calculated as the difference between the methylation level detected by the candidate capture probe combination and the actual (or theoretical) methylation level/the actual (or theoretical) methylation level.
  • the collection of all probe combinations should cover more than 90% of the target region.
  • the collection of all probe combinations should cover more than 95% of the target area.
  • the collection of all probe combinations should cover more than 99% of the target area. Even more preferably, the collection of all probe combinations should cover 100% of the target area.
  • the repeatability RMSE is calculated as the mean squared error of the methylation levels detected by the candidate capture probe combination for two or more replicate measurements for a standard test sample with a specific methylation level of 20% and/or 50% .
  • the reproducibility, ie, the median squared error of the methylation level between duplicate wells, for a suitable capture probe combination is about 9E-05 or less.
  • the uniformity and bias of the combination of capture probes is tested.
  • the uniformity CV is calculated as,
  • d i represents the sequencing depth of the i-th probe, Indicates the mean of the sequencing depth of all probes.
  • the coverage uniformity CV of a suitable capture probe combination should be less than 1; preferably, the CV should be less than 0.5; more preferably, the CV should be less than 0.3; further preferably, the CV should be less than 0.2.
  • x i represents the sequencing depth of the i-th probe for the target strand (OT+OB)
  • y i represents the sequencing depth of the i-th probe for the complementary strand (CTOT+CTOB)
  • y represents the average value of the sequencing depth of all probes for the complementary strand.
  • OT represents the target upper strand of the target region
  • CTOT represents the complementary strand of the target upper strand
  • OB represents the target lower strand of the target region
  • CTOB represents the complementary strand of the target lower strand
  • the capture probes in the capture probe set are about 80 to about 120 bases in length.
  • the region of overlap between any two capture probes in the capture probe set comprises about 10 to about 110 bases.
  • the region of the capture probe set to which the capture probes are complementary does not contain 10 or more contiguous bases that overlap the repeat region. Repeated regions are described in what is known in the art, for example repeats described in repeatmasker.org.
  • the current methylation standards come from samples obtained by whole genome amplification. However, in the process of obtaining "zero methylation standards" in the amplification process, it may appear that all cytosines in the standards are actually unmethylated, making the above samples After bisulfite conversion, there is no cytosine, and it is prone to large capture deviation, so it is not suitable as a standard for evaluating the performance of capture methods.
  • the present application provides a method for the construction of a methylation standard for a capture probe pair.
  • a methyltransferase such as M.sssI
  • PC perfectly methylated standard
  • NC zero methylation standard
  • Methylation sequencing was carried out for "zero methylated standard” and “full methylated standard”; A specific region with a methylation level of 100% in the “Standards for Methylation” was used as the standard region.
  • the methylation level in the standard area is the actual methylation level of the methylation standard in this application level. For example: after mixing 20% full methylation standard with 80% zero methylation standard, in the selected specific region, it can be regarded as the actual methylation level (also called theoretical methylation level ) is 20%.
  • the reaction conditions for the methyltransferase (eg M.SssI) enzyme are: react at 37° C. for 15 minutes, and react at 65° C. for 20 minutes.
  • the left and right sides of Figure 1 and Figure 3 show the methylation measurement results of the "zero methylation standard" and "full methylation standard" of the present application.
  • the methylation level of NC is 0-0.002
  • the methylation level of PC is 0.97-1.00
  • the methylation standard of this application is suitable for the evaluation of capture probes.
  • the complementary strand group means that the capture probe is only designed for the complementary strand (CTOT+CTOB)
  • the target strand group means that the capture probe is designed for the target strand (OT+OB)
  • the double-strand group means the double-strand for the target strand and the complementary strand Design capture probes.
  • OT means the upper chain of the target
  • CTOT means the complementary chain of the upper chain of the target
  • OB means the lower chain of the target
  • CTOB means the complementary chain of the lower chain of the target.
  • the uniformity of the capture probe combination is used to evaluate the uniformity of probe coverage on different target regions, and the CV range of the coefficient of variation.
  • the horizontal axis of the figure below represents different methylation levels, and the vertical axis represents the sequencing depth.
  • Figures 2A-2C show the uniformity measurement results of the three probes designed by the present application for the double strand, the target strand, and the complementary strand. The results showed that the uniformity of the double-strand probe design was better than that of the complementary-strand probe design alone, and was close to that of the traditional target-strand probe design.
  • the horizontal axis represents different methylation levels
  • the vertical axis represents the deviation between repeated samples
  • Figure 3 shows the repeatability measurement results of the three probes designed by the present application for the double strand, the target strand, and the complementary strand. The results showed that the reproducibility of the double-strand probe design was better than that of the complementary-strand probe design alone, and was close to that of the target-strand probe design.
  • the median estimated repeatability was 1.22E-04 for the 20% methylated standard and 1.23E-04 for the 50% methylated standard; for the complementary strand, 20% methylated Median repeatability of assessments for methylated standards was 1.12E-04, median repeatability for assessments of 50% methylated standards was 9.16E-05; for double strands preferred for this application, 20% methylated standards assessed The median repeatability value of 8.05E-05 was 8.05E-05, and the repeatability median value of 50% methylation standard evaluation was 7.03E-05.
  • the capture strand preference evaluates the depth of capture of the target strand (OT+OB) and complementary strand (CTOT+CTOB) by different probes.
  • the horizontal axis of the figure indicates the coverage depth of the target strand, and the vertical axis indicates the sequencing depth of the complementary strand.
  • the results showed a lower strand preference R ⁇ 2 for capture using double-stranded probes.
  • Figures 4A-4C show the bias measurement results of the three probes designed by the present application for the double strand, the target strand, and the complementary strand. The results showed that the design bias of the double-strand probe was better than that of the complementary-strand probe design alone, and was close to that of the traditional target-strand probe design.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are genetic diagnosis probes and the use thereof. A probe is a nucleic acid molecule combination, the nucleic acid molecule combination comprising at least one nucleic acid probe group that covers a target area of a nucleic acid to be detected, and the nucleic acid probe group at least comprising nucleic acid probes that cover sense and antisense strands of the target area as well as respective complementary strands thereof.

Description

一种基因诊断探针及其应用A gene diagnostic probe and its application 技术领域technical field
本申请涉及生物医药领域,具体的涉及一种基因诊断探针及其应用。This application relates to the field of biomedicine, in particular to a gene diagnostic probe and its application.
背景技术Background technique
DNA甲基化(methylation)是一种表观遗传修饰,它是由DNA甲基转移酶(DNA methyl-transferase,DNMT)催化S-腺苷甲硫氨酸(S-adenosylmethionine,SAM)作为甲基供体,将DNA的CG两个核苷酸的胞嘧啶被选择性地添加甲基,主要形成5-甲基胞嘧啶(5-mC)(常见于基因的5'-CG-3'序列)和少量的N6-甲基嘌呤(N6-mA)及7-甲基鸟嘌呤(7-mG)结构基因含有很多CpG结构,2CpG和2GPC中两个胞嘧啶的5位碳原子通常被甲基化,且两个甲基集团在DNA双链大沟中呈特定三维结构。DNA methylation is an epigenetic modification, which is catalyzed by DNA methyl-transferase (DNA methyl-transferase, DNMT) to convert S-adenosylmethionine (SAM) into a methyl group. Donor, the cytosine of the CG two nucleotides of the DNA is selectively methylated, mainly forming 5-methylcytosine (5-mC) (commonly found in the 5'-CG-3' sequence of genes) And a small amount of N6-methylpurine (N6-mA) and 7-methylguanine (7-mG) structural genes contain many CpG structures, and the 5th carbon atom of two cytosines in 2CpG and 2GPC is usually methylated , and the two methyl groups present a specific three-dimensional structure in the DNA double-strand major groove.
DNA甲基化在基因表达调控中起着重要的作用。异常的DNA甲基化标记在多种疾病发生发展中过程中都被报道过,包括癌症。DNA甲基化测序作为一种高分辨率,高通量的技术,其作用在癌症早期筛查,诊断,以及监控的作用越来越被认识。DNA methylation plays an important role in the regulation of gene expression. Aberrant DNA methylation marks have been reported in the development of various diseases, including cancer. As a high-resolution, high-throughput technology, DNA methylation sequencing is increasingly recognized for its role in early cancer screening, diagnosis, and monitoring.
全基因组重亚硫酸盐测序(WGBS,Whole Genome Bisulfite Sequencing)是甲基化测序的金标准,但是因为处理过程中对DNA的严重破坏和过高的测序成本,成为临床应用的困难。更重要的是,人类基因组的大部分区域在癌症发生发展过程中并不活跃,癌症相关的变异往往集中在某些特定区域,如CpG岛(CpG island)。CG二核苷酸是最主要的甲基化位点,它在基因组中呈不均匀分布,存在高甲基化、低甲基化和非甲基化的区域,在哺乳动物中mC约占C总量的2-7%。Whole Genome Bisulfite Sequencing (WGBS, Whole Genome Bisulfite Sequencing) is the gold standard for methylation sequencing, but it has become difficult for clinical application due to severe damage to DNA during processing and high sequencing costs. More importantly, most regions of the human genome are not active during the development of cancer, and cancer-related mutations are often concentrated in certain specific regions, such as CpG islands. CG dinucleotide is the most important methylation site, it is unevenly distributed in the genome, there are hypermethylation, hypomethylation and non-methylation regions, mC accounts for about the total C in mammals 2-7%.
CpG岛在基因组中大量存在,通过大规模平行核酸测序(也称为“高通量测序”或者“下一代测序”(NGS))可以极大地辅助这些检测和分析,使得通过甲基化信号预测癌症的发生以及发生的部位成为可能。CpG islands are abundant in the genome, and these detection and analysis can be greatly assisted by massively parallel nucleic acid sequencing (also known as "high-throughput sequencing" or "next-generation sequencing" (NGS)), making it possible to predict Where and where the cancer occurs becomes possible.
此外,经过重亚硫酸盐处理之后的DNA片段中未甲基化的胞嘧啶(C)会转化为胸腺嘧啶(T),降低的C含量导致结合力度更强的胞嘧啶(C)-鸟嘌呤(G)的结合位点变少,同时C含量的减少也使得DNA上碱基的复杂度降低,两者都增加了杂交捕获的难度。In addition, unmethylated cytosine (C) in DNA fragments after bisulfite treatment is converted to thymine (T), and the reduced C content leads to a stronger binding of cytosine (C)-guanine (G) has fewer binding sites, and the reduction of C content also reduces the complexity of the bases on the DNA, both of which increase the difficulty of hybrid capture.
同时,本领域也缺少一种能够直接体现DNA甲基化水平的标准品,用于评估甲基化捕获探针的捕获性能。中国专利公告CN112646888B的实施例3提到了一种利用
Figure PCTCN2022100272-appb-000001
Single Cell Kit(Qiagen,Cat#150343)和Mung Bean Nuclease(NEB,Cat#M0250L)处理NA12878 DNA以 制备0%甲基化标准品的方法。但这种实际制备的0%甲基化标准品在随后的重亚硫酸盐转化过程中,几乎所有的胞嘧啶(C)都会转化为胸腺嘧啶(T),使得DNA上碱基的复杂度大大降低,对于捕获带来了很大的困难,因此并不适用于作为一种高准确性衡量甲基化捕获探针性能的标准品。
At the same time, there is also a lack of a standard in the art that can directly reflect the level of DNA methylation for evaluating the capture performance of methylation capture probes. Embodiment 3 of Chinese Patent Announcement CN112646888B mentions a kind of utilization
Figure PCTCN2022100272-appb-000001
Single Cell Kit (Qiagen, Cat#150343) and Mung Bean Nuclease (NEB, Cat#M0250L) were used to process NA12878 DNA to prepare 0% methylation standard. However, in the subsequent bisulfite conversion process of this actually prepared 0% methylation standard, almost all cytosine (C) will be converted into thymine (T), which greatly increases the complexity of the bases on the DNA. It is very difficult to capture, so it is not suitable as a standard for measuring the performance of methylated capture probes with high accuracy.
因此,本领域缺少一种符合捕获性能预期的适用于甲基化DNA靶向测序的捕获探针,以及用于准确衡量探针捕获准确性的标准品。Therefore, there is a lack of a capture probe suitable for targeted sequencing of methylated DNA that meets the capture performance expectations and a standard for accurately measuring the capture accuracy of the probe.
发明内容Contents of the invention
本申请提供了一种高精确度的基因杂交捕获探针,可以对与多种不同癌症相关的甲基化变异区域,尤其是特定甲基化特征区域进行杂交捕获。通过高准确度的甲基化检测探针,能够制备人肿瘤基因检测制剂,从而实现包括但不限于以下这些癌症的早期检测或早期筛查:脑癌、肺癌、皮肤癌、鼻咽癌、咽喉癌、肝癌、骨癌、淋巴瘤、胰腺癌、皮肤癌、肠癌、直肠癌、甲状腺癌、膀胱癌、肾癌、口腔癌、胃癌、实体瘤、卵巢癌、食管癌、胆囊癌、胆道癌、乳腺癌、宫颈癌、子宫癌、前列腺癌、头颈癌、肉瘤、胸腔恶性肿瘤(除肺外)、黑色素瘤、和睾丸癌。The present application provides a high-precision gene hybridization capture probe, which can hybridize and capture methylation variation regions associated with various cancers, especially specific methylation characteristic regions. Through highly accurate methylation detection probes, human tumor gene detection preparations can be prepared to achieve early detection or early screening of cancers including but not limited to: brain cancer, lung cancer, skin cancer, nasopharyngeal cancer, throat cancer Cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, bowel cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumors, ovarian cancer, esophagus cancer, gallbladder cancer, biliary tract cancer , breast cancer, cervical cancer, uterine cancer, prostate cancer, head and neck cancer, sarcoma, thoracic malignancy (except lung), melanoma, and testicular cancer.
本申请提供了一种核酸分子组合,所述核酸分子组合包含至少一个覆盖一待测核酸的一目标区域的核酸探针组,其特征在于,所述核酸探针组至少包含:(1)与第一链互补的第一探针,所述第一链为所述目标区域经过碱基替换后的序列;(2)与第二链互补的第二探针,所述第二链为所述目标区域的互补区域经过碱基替换后的序列;并且包含以下两个探针中的任意一个或同时包含两个:(3)与第三链互补的第三探针,所述第三链与所述第一链互补;(4)与第四链互补的第四探针,所述第四链为第二链的互补序列。The present application provides a combination of nucleic acid molecules, the combination of nucleic acid molecules comprising at least one nucleic acid probe set covering a target region of a nucleic acid to be detected, characterized in that the set of nucleic acid probes at least comprises: (1) and A first probe complementary to the first strand, the first strand being the sequence of the target region after base substitution; (2) a second probe complementary to the second strand, the second strand being the sequence of the target region The sequence of the complementary region of the target region after base substitution; and contains any one or both of the following two probes: (3) a third probe complementary to the third strand, and the third strand is complementary to the third strand The first strand is complementary; (4) a fourth probe complementary to the fourth strand, which is the complementary sequence of the second strand.
本申请提供了一种本申请所述的核酸分子组合在制备人肿瘤基因检测制剂中的应用。The application provides an application of the nucleic acid molecule combination described in the application in the preparation of human tumor gene detection preparations.
本申请提供了一种用作评估本申请的应用中检测的碱基修饰程度准确性的标准品核酸分子,所述标准品核酸分子包含碱基修饰程度为约0%候选区域,所述候选区域的总长度为约1bp-约10000bp。The present application provides a standard nucleic acid molecule used for assessing the accuracy of the degree of base modification detected in the application of the present application, the standard nucleic acid molecule comprises a candidate region where the degree of base modification is about 0%, and the candidate region The total length is about 1 bp to about 10000 bp.
本领域技术人员能够从下文的详细描述中容易地洞察到本申请的其它方面和优势。下文的详细描述中仅显示和描述了本申请的示例性实施方式。如本领域技术人员将认识到的,本申请的内容使得本领域技术人员能够对所公开的具体实施方式进行改动而不脱离本申请所涉及发明的精神和范围。相应地,本申请的附图和说明书中的描述仅仅是示例性的,而非为限制性的。Those skilled in the art can easily perceive other aspects and advantages of the present application from the following detailed description. In the following detailed description, only exemplary embodiments of the present application are shown and described. As those skilled in the art will appreciate, the content of the present application enables those skilled in the art to make changes to the specific embodiments which are disclosed without departing from the spirit and scope of the invention to which this application relates. Correspondingly, the drawings and descriptions in the specification of the present application are only exemplary rather than restrictive.
附图说明Description of drawings
本申请所涉及的发明的具体特征如所附权利要求书所显示。通过参考下文中详细描述的示例性实施方式和附图能够更好地理解本申请所涉及发明的特点和优势。对附图简要说明如下:The particular features of the invention to which this application relates are set forth in the appended claims. The features and advantages of the invention to which this application relates can be better understood with reference to the exemplary embodiments described in detail hereinafter and the accompanying drawings. A brief description of the accompanying drawings is as follows:
图1显示的是本申请“20%的标准品”和“50%的标准品”的甲基化测量结果,以及本申请“零甲基化的标准品”和“全甲基化的标准品”的甲基化测量结果。Figure 1 shows the methylation measurement results of the "20% standard" and "50% standard" of the application, as well as the "zero methylation standard" and "full methylation standard" of the application ” methylation measurements.
图2A-2C显示的是本申请设计的探针的均一性测量结果。Figures 2A-2C show the uniformity measurement results of the probes designed in this application.
图3显示的是,本申请设计的探针的重复性测量结果。Figure 3 shows the repeatability measurement results of the probes designed in this application.
图4A-4C显示的是本申请设计的探针的偏好性测量结果。Figures 4A-4C show the bias measurement results of the probes designed in this application.
图5显示的是本申请捕获探针设计的示例性参考示意图。Fig. 5 shows an exemplary reference schematic diagram of the capture probe design of the present application.
图6显示的是本申请计算甲基化水平的示例性参考示意图。Fig. 6 shows an exemplary reference schematic diagram for calculating the methylation level in the present application.
具体实施方式Detailed ways
以下由特定的具体实施例说明本申请发明的实施方式,熟悉此技术的人士可由本说明书所公开的内容容易地了解本申请发明的其他优点及效果。The implementation of the invention of the present application will be described in the following specific examples, and those skilled in the art can easily understand other advantages and effects of the invention of the present application from the content disclosed in this specification.
术语定义Definition of Terms
在本申请中,术语“二代基因测序(NGS)”、高通量测序”或“下一代测序”通常是指第二代高通量测序技术及之后发展的更高通量的测序方法。下一代测序平台包括但不限于已有的Illumina等测序平台。随着测序技术的不断发展,本领域技术人员能够理解的是还可以采用其他方法的测序方法和装置用于本方法。例如,二代基因测序可以具有高灵敏度、通量大、测序深度高、或低成本的优势。根据发展历史、影响力、测序原理和技术不同等,主要有以下几种:大规模平行签名测序(Massively Parallel Signature Sequencing,MPSS)、聚合酶克隆(Polony Sequencing)、454焦磷酸测序(454pyro sequencing)、Illumina(Solexa)sequencing、离子半导体测序(Ion semi conductor sequencing)、DNA纳米球测序(DNA nano-ball sequencing)、Complete Genomics的DNA纳米阵列与组合探针锚定连接测序法等。所述二代基因测序可以使对一个物种的转录组和基因组进行细致全貌的分析成为可能,所以又被称为深度测序(deep sequencing)。例如,本申请的方法同样可以应用于一代基因测序、二代基因测序、三代基因测序或单分子测序(SMS)。In this application, the terms "next-generation gene sequencing (NGS)", high-throughput sequencing" or "next-generation sequencing" generally refer to the second-generation high-throughput sequencing technology and higher-throughput sequencing methods developed thereafter. Next-generation sequencing platforms include but are not limited to existing sequencing platforms such as Illumina. With the continuous development of sequencing technology, those skilled in the art can understand that other sequencing methods and devices can also be used for this method. For example, two Generation gene sequencing can have the advantages of high sensitivity, high throughput, high sequencing depth, or low cost. According to the development history, influence, sequencing principles and technologies, there are mainly the following types: Massively Parallel Signature Sequencing (Massively Parallel Signature Sequencing, MPSS), Polony Sequencing, 454pyro sequencing, Illumina (Solexa) sequencing, Ion semi conductor sequencing, DNA nano-ball sequencing , Complete Genomics' DNA nanoarray and combined probe anchor ligation sequencing method, etc. The second-generation gene sequencing can make it possible to analyze the transcriptome and genome of a species in detail, so it is also called deep sequencing ( deep sequencing). For example, the method of the present application can also be applied to first-generation gene sequencing, second-generation gene sequencing, third-generation gene sequencing or single molecule sequencing (SMS).
在本申请中,术语“待测样本”通常是指需要进行检测的样本。例如,可以检测待测样本上的一个或者多个基因区域是否存在有修饰状态。In this application, the term "sample to be tested" generally refers to a sample that needs to be tested. For example, it is possible to detect whether one or more gene regions on the sample to be tested are modified.
在本申请中,术语“互补区域”通常是指与参考核苷酸序列相比具有互补的区域。例如,互补核酸可以为任选地具有相反方向的核酸分子。例如,所述互补可以是指具有下面的互补性关联:鸟嘌呤和胞嘧啶;腺嘌呤和胸腺嘧啶;腺嘌呤和尿嘧啶。In this application, the term "complementary region" generally refers to a region that is complementary to a reference nucleotide sequence. For example, a complementary nucleic acid can be a nucleic acid molecule that optionally has an opposite orientation. For example, the complementary may refer to having the following complementary associations: guanine and cytosine; adenine and thymine; adenine and uracil.
在本申请中,术语“杂交”通常是指其中一个或多个多核苷酸反应以形成通过核苷酸残基的碱基之间的氢键稳定的复合物的反应。可以通过沃森-克里克碱基配对、胡格斯丁结合(Hoogsteinbinding)或者根据碱基互补以任何其它序列特异性方式发生氢键作用。所述复合物可以包括形成双螺旋结构的两条链,形成多链复合物的三条或更多条链、自杂交单链或这些的任意组合。杂交反应可以构成更广泛的方法中的步骤,如PCR的起始或者通过核酸内切酶的多核苷酸的酶促切割。将与第一序列完全互补的或者使用第一序列作为模板,通过聚合酶聚合的第二序列称为与所述第一序列“互补”。如应用于多核苷酸的术语“可杂交的”是指多核苷酸在杂交反应中形成通过核苷酸残基的碱基之间的氢键稳定的复合物的能力。在一些实施方式中,可杂交的核苷酸序列与它所杂交的序列至少约50%、60%、70%、75%、80%、85%、90%、95%或100%互补。In this application, the term "hybridization" generally refers to a reaction in which one or more polynucleotides react to form a complex stabilized by hydrogen bonds between the bases of the nucleotide residues. Hydrogen bonding can occur through Watson-Crick base pairing, Hoogstein binding, or in any other sequence-specific manner based on base complementarity. The complex may comprise two strands forming a double helix, three or more strands forming a multi-strand complex, self-hybridizing single strands, or any combination of these. The hybridization reaction may constitute a step in a wider method, such as the initiation of PCR or the enzymatic cleavage of polynucleotides by endonucleases. A second sequence that is completely complementary to a first sequence or that is polymerized by a polymerase using the first sequence as a template is said to be "complementary" to said first sequence. The term "hybridizable" as applied to a polynucleotide refers to the ability of a polynucleotide to form complexes that are stabilized by hydrogen bonds between the bases of the nucleotide residues in a hybridization reaction. In some embodiments, a hybridizable nucleotide sequence is at least about 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% complementary to the sequence to which it hybridizes.
术在本申请中,语“多核苷酸”、“核苷酸”、“核酸”和“寡核苷酸”是可互换使用的。它们表示具有任何长度的核苷酸(脱氧核糖核苷酸或者核糖核苷酸)的多聚形式,或其类似物。多核苷酸可以具有任何立体结构,并且可以发挥任何功能,无论是已知的还是未知的。以下是多核苷酸的非限制性实例:基因或基因片段的编码或非编码区、根据连锁分析所限定的基因座(基因座)、外显子、内含子、信使RNA(mRNA)、转运RNA(tRNA)、核糖体RNA(rRNA)、短干扰RNA(siRNA)、短-发夹RNA(shRNA)、微小RNA(miRNA)、核糖酶、cDNA、重组多核苷酸、分枝多核苷酸、质粒、载体、具有任何序列的分离的DNA、具有任何序列的分离的RNA、核酸探针、引物和接头。多核苷酸可以包括一个或多个修饰的核苷酸,如甲基化核苷酸和核苷酸类似物。如果存在,可以在所述聚合物组装之前或之后赋予对核苷酸结构的修饰。可以通过非核苷酸组分中断核苷酸序列。可以在聚合后进一步修饰多核苷酸,如通过用标记组分的缀合。Terminology In this application, the terms "polynucleotide", "nucleotide", "nucleic acid" and "oligonucleotide" are used interchangeably. They represent polymeric forms of nucleotides (deoxyribonucleotides or ribonucleotides) of any length, or analogs thereof. A polynucleotide can have any three-dimensional structure and can perform any function, whether known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of genes or gene segments, loci (loci) defined by linkage analysis, exons, introns, messenger RNA (mRNA), translocation RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), microRNA (miRNA), ribozyme, cDNA, recombinant polynucleotide, branched polynucleotide, Plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, primers and linkers. A polynucleotide may include one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. Modifications to the nucleotide structure, if present, can be imparted either before or after assembly of the polymer. Nucleotide sequences can be interrupted by non-nucleotide components. Polynucleotides may be further modified after polymerization, such as by conjugation with labeling components.
在本申请中,术语“修饰状态”通常是指本申请中基因片段、核苷酸或其碱基具有的修饰状态。例如,本申请中的修饰状态可以是指胞嘧啶的修饰状态。例如,本申请的具有修饰状态的基因片段可以具有改变的基因表达活性。例如,本申请的修饰状态可以是指碱基具有的甲基化修饰。例如,本申请的修饰状态可以是指在基因组DNA的CpG区域的胞嘧啶5'碳位共价结合一个甲基基团,例如可以成为5-甲基胞嘧啶(5mC)。例如,修饰状态可以是指DNA序列内存在或不存在5-甲基胞嘧啶(“5-mCyt”)。In the present application, the term "modification state" generally refers to the modification state of the gene fragment, nucleotide or its base in the present application. For example, the modification state in the present application may refer to the modification state of cytosine. For example, a gene segment of the present application having a modified state may have altered gene expression activity. For example, the modification status of the present application may refer to the methylation modification of a base. For example, the modified state in this application may refer to the covalent bonding of a methyl group at the 5' carbon position of cytosine in the CpG region of genomic DNA, for example, it may become 5-methylcytosine (5mC). For example, a modification state can refer to the presence or absence of 5-methylcytosine ("5-mCyt") within the DNA sequence.
在本申请中,术语“甲基化”通常是指本申请中基因片段、核苷酸或其碱基具有的甲基化状态。例如,本申请中基因所在的DNA片段可以在一条链或多条链上具有甲基化。例如,本申请中基因所在的DNA片段可以在一个位点或多个位点上具有甲基化。In this application, the term "methylation" generally refers to the methylation state of a gene fragment, nucleotide or its base in this application. For example, the DNA fragment where the gene in this application is located may have methylation on one strand or multiple strands. For example, the DNA fragment where the gene in this application is located may have methylation at one site or multiple sites.
在本申请中,术语“转化”通常是指将一种或多种结构转变为另一种结构。例如,本申请的转化可以是具有特异性。例如,不具有甲基化修饰的胞嘧啶经过转化可以变为其它结构(例如尿嘧啶),且具有甲基化修饰的胞嘧啶经过转化可以基本不发生变化。例如,不具有甲基化修饰的胞嘧啶经过转化可以被剪切,且具有甲基化修饰的胞嘧啶经过转化可以基本不发生变化。In this application, the term "transformation" generally refers to the transformation of one or more structures into another structure. For example, the transformations of the present application can be specific. For example, cytosine without methylation modification can be converted into other structures (such as uracil), and cytosine with methylation modification can be substantially unchanged after conversion. For example, cytosine without methylation modification can be cleaved after conversion, and cytosine with methylation modification can be substantially unchanged after conversion.
在本申请中,术语“重亚硫酸盐”,或称为“亚硫酸氢盐”通常是指一种可以区分具有修饰状态和不具有修饰状态的DNA区域的试剂。例如,重亚硫酸盐可以包括重亚硫酸盐、或其类似物或上述的组合。例如,重亚硫酸盐可以使未修饰的胞嘧啶的氨基脱氨基化,以使其与修饰的胞嘧啶区分。在本申请中,术语“类似物”通常是指具有类似结构和/或功能的物质。例如重亚硫酸盐的类似物可以与重亚硫酸盐具有类似的结构。例如,重亚硫酸盐的类似物可以是指一种同样可以区分具有修饰状态和不具有修饰状态的DNA区域的试剂。In this application, the term "bisulfite", or "bisulfite" generally refers to a reagent that can distinguish DNA regions with and without modification states. For example, the bisulfite may include bisulfite, or an analog thereof, or a combination thereof. For example, bisulfite can deaminate the amino group of unmodified cytosine to distinguish it from modified cytosine. In this application, the term "analogue" generally refers to a substance having a similar structure and/or function. For example, analogs of bisulfite may have a similar structure to bisulfite. For example, an analog of bisulfite may refer to a reagent that can also distinguish between DNA regions that have a modified state and those that do not.
在本申请中,术语“包含”通常是指包括明确指定的特征,但不排除其他要素。In this application, the term "comprising" generally means including specifically specified features, but not excluding other elements.
在本申请中,术语“约”通常是指在指定数值以上或以下0.5%-10%的范围内变动,例如在指定数值以上或以下0.5%、1%、1.5%、2%、2.5%、3%、3.5%、4%、4.5%、5%、5.5%、6%、6.5%、7%、7.5%、8%、8.5%、9%、9.5%、或10%的范围内变动。In this application, the term "about" generally refers to a range of 0.5%-10% above or below the specified value, such as 0.5%, 1%, 1.5%, 2%, 2.5%, above or below the specified value. 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, or 10%.
发明详述Detailed description of the invention
一方面,本申请提供一种核酸分子组合,所述核酸分子组合中的核酸分子对于来源于目标区域的核酸序列的结合自由能与对于来源于非目标区域的核酸序列的结合自由能相差大于一特定阈值。例如,所述特定阈值为约12至约50kcal/mol。例如,所述特定阈值为约20至30kcal/mol。例如,所述特定阈值约为20kcal/mol。On the one hand, the present application provides a combination of nucleic acid molecules, the nucleic acid molecule in the combination of nucleic acid molecules has a binding free energy for a nucleic acid sequence derived from a target region and a binding free energy for a nucleic acid sequence derived from a non-target region differs by more than one specific threshold. For example, the specified threshold is from about 12 to about 50 kcal/mol. For example, the specific threshold is about 20 to 30 kcal/mol. For example, the specific threshold is about 20 kcal/mol.
例如,本申请的核酸分子组合基于对候选的目标区域的筛选,确定合适的核酸分子。例如,本申请中针对候选的目标区域设计的核酸分子组合,对于来源于目标区域的核酸序列的结合自由能具有更高的结合自由能。例如,相对于来源于非目标区域的核酸序列的结合自由能,本申请针对候选的目标区域设计的核酸分子组合,对于来源于目标区域的核酸序列的结合自由能具有更高的结合自由能。例如,本申请的核酸分子组合中的核酸分子对于来源于目标区域的核酸序列的结合自由能与对于来源于非目标区域的核酸序列的结合自由能相差约12或更高。例如,本申请的核酸分子组合中的核酸分子对于来源于目标区域的核酸序列的 结合自由能,高于来源于非目标区域的核酸序列的结合自由能约12kcal/mol、约13For example, the combination of nucleic acid molecules of the present application is based on the screening of candidate target regions to determine suitable nucleic acid molecules. For example, the nucleic acid molecule combination designed for the candidate target region in the present application has a higher binding free energy for the nucleic acid sequence derived from the target region. For example, compared with the binding free energy of nucleic acid sequences derived from non-target regions, the combination of nucleic acid molecules designed in the present application for candidate target regions has higher binding free energy for nucleic acid sequences derived from target regions. For example, the nucleic acid molecules in the combination of nucleic acid molecules of the present application have a binding free energy for a nucleic acid sequence derived from a target region that differs from the binding free energy for a nucleic acid sequence derived from a non-target region by about 12 or more. For example, the nucleic acid molecules in the nucleic acid molecule combination of the present application have a binding free energy for the nucleic acid sequence derived from the target region, which is about 12 kcal/mol, about 13 kcal/mol higher than the binding free energy of the nucleic acid sequence derived from the non-target region.
kcal/mol、约14kcal/mol、约15kcal/mol、约20kcal/mol、约25kcal/mol、约30kcal/mol、约40kcal/mol、或约50kcal/mol。kcal/mol, about 14 kcal/mol, about 15 kcal/mol, about 20 kcal/mol, about 25 kcal/mol, about 30 kcal/mol, about 40 kcal/mol, or about 50 kcal/mol.
一方面,本申请提供一种核酸分子组合,所述核酸分子组合包含至少一个覆盖待测核酸的目标区域的核酸探针组,所述核酸探针组至少包含:(1)与第一链互补的第一探针,所述第一链为所述目标区域经过碱基替换后的序列;(2)与第二链互补的第二探针,所述第二链为所述目标区域的互补区域经过碱基替换后的序列;并且可以包含以下两个探针中的任意一个或可以同时包含以下两个探针中的两个:(3)与第三链互补的第三探针,所述第三链与所述第一链互补;(4)与第四链互补的第四探针,所述第四链为第二链的互补序列。例如,本申请的核酸分子组合针对假设待测核酸为零甲基化的目标区域,对该区域的经过碱基替换后的第一链(目标上链,OT链),对该区域的互补区域的经过碱基替换后的第二链(目标下链,OB链),以及对于第一链的互补链(目标上链的互补链,CTOT链),设计与第三链互补的第三探针;同时根据第二链的互补链(目标下链的互补链,CTOB链),设计与第四链互补的第四探针。例如,本申请的核酸分子组合用于甲基化检测的捕获探针。例如,二代测序的甲基化检测的捕获探针。In one aspect, the present application provides a combination of nucleic acid molecules, the combination of nucleic acid molecules comprising at least one nucleic acid probe set covering the target region of the nucleic acid to be tested, the set of nucleic acid probes at least comprising: (1) complementary to the first strand The first probe of the first strand is the sequence of the target region after base substitution; (2) the second probe complementary to the second strand, the second strand is the complementary sequence of the target region The sequence of the region after base substitution; and may contain any one of the following two probes or may contain two of the following two probes at the same time: (3) the third probe complementary to the third strand, so The third strand is complementary to the first strand; (4) a fourth probe complementary to the fourth strand, the fourth strand being the complementary sequence of the second strand. For example, the nucleic acid molecule combination of the present application is aimed at the target region assuming that the nucleic acid to be tested is zero methylation, the first strand (target upper strand, OT strand) of the region after base replacement, and the complementary region of the region The second strand after base substitution (the lower strand of the target, OB strand), and for the complementary strand of the first strand (the complementary strand of the upper strand of the target, CTOT strand), design a third probe complementary to the third strand ; Simultaneously design a fourth probe complementary to the fourth strand according to the complementary strand of the second strand (the complementary strand of the lower strand of the target, CTOB strand). For example, nucleic acid molecules of the present application are combined with capture probes for methylation detection. For example, capture probes for methylation detection in next-generation sequencing.
例如,所述经过碱基替换的位点包含胞嘧啶所在的位点。例如,所述碱基替换包含通过化学和/或生物过程取得胞嘧啶替换为胸腺嘧啶或尿嘧啶的核酸序列。例如,所述碱基替换包含取得将所有的胞嘧啶都替换为胸腺嘧啶或尿嘧啶的核酸序列。所述碱基替换可以包含经过重亚硫酸盐转化处理,原始上链和原始下链中未被甲基化修饰的C被转化为尿嘧啶For example, the base-substituted site includes a site where cytosine is present. For example, the base substitution comprises a nucleic acid sequence in which cytosine is replaced by thymine or uracil through chemical and/or biological processes. For example, the base substitution includes obtaining a nucleic acid sequence in which all cytosines are replaced with thymine or uracil. The base replacement may include bisulfite conversion treatment, and the unmethylated C in the original upper chain and the original lower chain is converted into uracil
(U)。由于尿嘧啶(U)与腺嘌呤(A)互补配对,而DNA的PCR扩增中引入的与腺嘌呤(A)配对碱基为胸腺嘧啶(T),因此所述碱基替换可以包含在进一步的PCR扩增过程中,原始上链和原始下链中的未被甲基化修饰的C被T取代。(U). Since uracil (U) is paired complementary with adenine (A), and the base paired with adenine (A) introduced in the PCR amplification of DNA is thymine (T), so the base substitution can be included in further During the PCR amplification of , the unmethylated C in the original upper strand and the original lower strand was replaced by T.
例如,本申请的核酸分子组合中,所述核酸探针组还包含:(1)与第五链互补的第五探针,所述第五链为所述目标区域未经过碱基替换的序列;(2)与第六链互补的第六探针,所述第六链为所述目标区域的互补区域未经过碱基替换的序列;(3)与第七链互补的第七探针,所述第七链与所述第五链互补;(4)与第八链互补的第八探针,所述第八链为第六链的互补序列。例如,本申请的核酸分子组合针对假设待测核酸为全甲基化的目标区域,设计另外四个探针。For example, in the nucleic acid molecule combination of the present application, the nucleic acid probe set further comprises: (1) a fifth probe complementary to the fifth strand, the fifth strand being the sequence of the target region without base substitution (2) a sixth probe complementary to the sixth strand, the sixth strand being a sequence in which the complementary region of the target region has not undergone base substitution; (3) a seventh probe complementary to the seventh strand, The seventh strand is complementary to the fifth strand; (4) an eighth probe complementary to the eighth strand, the eighth strand being the complementary sequence of the sixth strand. For example, for the nucleic acid molecule combination of the present application, four other probes are designed for the target region assumed to be fully methylated in the nucleic acid to be tested.
例如,所述核酸分子组合包含覆盖所述待测核酸的10000个或更多个不同目标区域的核酸探针组。例如,本申请的核酸分子组合针对待测核酸的10000个或更多个、15000个或更多个、20000个或更多个、25000个或更多个、30000个或更多个、40000个或更多个、或 50000个或更多个不同目标区域进行设计。For example, the combination of nucleic acid molecules comprises a set of nucleic acid probes covering 10,000 or more different target regions of the nucleic acid to be tested. For example, the nucleic acid molecule combination of the present application is aimed at 10000 or more, 15000 or more, 20000 or more, 25000 or more, 30000 or more, 40000 or more, or 50,000 or more different target areas for design.
一方面,本申请提供一种核酸分子组合,所述核酸分子组合中对于特定甲基化水平的标准品,例如对于20%和/或50%甲基化水平的甲基化标准品的检测结果符合选自以下组的指标:甲基化水平检测结果波动为25%或更低、和重复性为9E-05或更低。优选地,所述波动为检测结果最大值与最小值的差值,所述重复性为复孔间甲基化水平均方误差中值。例如,甲基化水平检测结果波动用于评估核酸分子组合的准确性。例如,对于20%和/或50%甲基化水平的甲基化标准品,本申请核酸分子组合的检测结果波动为22%或更低、23%或更低、24%或更低、25%或更低、26%或更低、或27%或更低。如,对于20%和/或50%甲基化水平的甲基化标准品,本申请核酸分子组合的两次或更多次重复测量的候选捕获探针组合检测的甲基化水平的均方差为约1.3E-05至约2.7E-04之间,优选为9E-05或更低、更优选为约8E-05或更低、进一步优选为约7E-05或更低。In one aspect, the present application provides a combination of nucleic acid molecules, in which standards for specific methylation levels, for example, the detection results of methylation standards for 20% and/or 50% methylation levels The index selected from the following group is met: the fluctuation of the detection result of the methylation level is 25% or lower, and the repeatability is 9E-05 or lower. Preferably, the fluctuation is the difference between the maximum value and the minimum value of the detection result, and the repeatability is the median value of the mean square error of the methylation level among multiple wells. For example, fluctuations in methylation levels are used to assess the accuracy of nucleic acid molecule combinations. For example, for methylation standards with 20% and/or 50% methylation levels, the detection result fluctuation of the nucleic acid molecule combination of the present application is 22% or lower, 23% or lower, 24% or lower, 25% or lower. % or less, 26% or less, or 27% or less. For example, for methylation standards with 20% and/or 50% methylation levels, the mean square error of the methylation levels detected by the candidate capture probe combinations for two or more repeated measurements of the nucleic acid molecule combinations of the present application Between about 1.3E-05 and about 2.7E-04, preferably 9E-05 or lower, more preferably about 8E-05 or lower, further preferably about 7E-05 or lower.
例如,所述核酸分子组合中的核酸分子的长度为约80至约120个碱基。例如,所述核酸分子组合中的核酸分子的长度为约80个、约90个、约100个、约110个、或约120个碱基。For example, the nucleic acid molecules in the combination of nucleic acid molecules are about 80 to about 120 bases in length. For example, the nucleic acid molecules in the combination of nucleic acid molecules are about 80, about 90, about 100, about 110, or about 120 bases in length.
例如,所述核酸分子组合中的任意两个核酸分子重叠的区域包含约10至约110个碱基。例如,所述核酸分子组合中的任意两个核酸分子重叠的区域包含约10、约20、约50、约70、约80、约90、约100、或约110个碱基。For example, the region where any two nucleic acid molecules in the combination of nucleic acid molecules overlap comprises about 10 to about 110 bases. For example, the region where any two nucleic acid molecules in the combination of nucleic acid molecules overlap comprises about 10, about 20, about 50, about 70, about 80, about 90, about 100, or about 110 bases.
例如,所述核酸分子组合中的核酸分子所互补的区域不包含与重复区域交叠10个或更多个的连续碱基。例如,重复区域的信息可以记载在本领域已知的内容中,例如repeatmasker.org记载的重复区域(repeats)。For example, the region to which the nucleic acid molecules in the combination of nucleic acid molecules are complementary does not contain 10 or more consecutive bases overlapping the repeat region. For example, the information of the repeating region can be described in what is known in the art, such as the repeating region (repeats) described in repeatmasker.org.
一方面,本申请提供一种核酸分子组合的设计方法,根据来源于目标区域并经过碱基替换的第一链及其互补链,以及来源于目标区域并经过碱基替换的第二链及其互补链,设计能够互补于上述链中的三条或更多条链的所述核酸分子组合。In one aspect, the present application provides a method for designing a combination of nucleic acid molecules, based on the first strand derived from the target region and subjected to base substitution and its complementary strand, and the second strand derived from the target region and subjected to base substitution and its complementary strand. Complementary chains, designing combinations of said nucleic acid molecules that are complementary to three or more of the above-mentioned chains.
一方面,本申请提供一种核酸分子组合的设计方法,其包含(1)筛选目标区域,针对候选的目标区域设计的核酸分子组合,对于来源于目标区域的核酸序列的结合自由能具有更高的结合自由能;(2)对候选的目标区域设计4个探针,本申请的核酸分子组合针对假设待测核酸为全甲基化的目标区域,对该区域的经过碱基替换后的第一链及其互补链,对该区域的互补区域的经过碱基替换后的第二链及其互补链,设计4个探针;(3)筛选核酸分子组合,对于特定甲基化水平的标准品,筛选符合选自以下组的指标的核酸分子组合:甲基化水平检测结果波动为25%或更低、和重复性为9E-05或更低。On the one hand, the present application provides a method for designing a combination of nucleic acid molecules, which includes (1) screening target regions, and the combination of nucleic acid molecules designed for candidate target regions has a higher binding free energy for nucleic acid sequences derived from the target region (2) Design 4 probes for the candidate target region, the nucleic acid molecule combination of the present application is aimed at the target region assuming that the nucleic acid to be tested is fully methylated, and the first base substitution in the region One strand and its complementary strand, and four probes are designed for the second strand and its complementary strand after base substitution in the complementary region of the region; (3) screening nucleic acid molecule combinations, for the standard of specific methylation level As a product, a combination of nucleic acid molecules meeting the criteria selected from the following group is screened: the methylation level detection result fluctuation is 25% or lower, and the repeatability is 9E-05 or lower.
一方面,本申请提供一种核酸分子组合的设计方法,其包含(1)本申请的针对候选的 目标区域设计的核酸分子组合,对于来源于目标区域的核酸序列的结合自由能,高于来源于非目标区域的核酸序列的结合自由能约12或更高;(2)本申请的核酸分子组合针对假设待测核酸为全甲基化的目标区域,对该区域的经过碱基替换后的第一链(Top链),对该区域的互补区域的经过碱基替换后的第二链(Bottom链),以及对于第一链的互补链(CTOT链),设计与第三链互补的第三探针;同时根据第二链的互补链(CTOB链),设计与第四链互补的第四探针;(3)筛选核酸分子组合,对于特定甲基化水平的标准品,筛选符合选自以下组的指标的核酸分子组合:甲基化水平检测结果波动为25%或更低、和重复性为9E-05或更低。优选地,所述波动为检测结果最大值与最小值的差值,所述重复性为复孔间甲基化水平均方误差中值。On the one hand, the present application provides a method for designing a combination of nucleic acid molecules, which includes (1) the combination of nucleic acid molecules designed for a candidate target region of the present application, the binding free energy of the nucleic acid sequence derived from the target region is higher than that of the source The binding free energy of the nucleic acid sequence in the non-target region is about 12 or higher; (2) the nucleic acid molecule combination of the present application is aimed at the target region assuming that the nucleic acid to be tested is fully methylated, and the base substitution of the region is The first strand (Top strand), the second strand (Bottom strand) after base substitution in the complementary region of this region, and the complementary strand (CTOT strand) to the first strand, the second strand complementary to the third strand is designed. Three probes; at the same time, according to the complementary strand (CTOB strand) of the second strand, the fourth probe complementary to the fourth strand is designed; (3) screening nucleic acid molecule combinations, for the standards of specific methylation levels, screening meets the selection A nucleic acid molecule combination of indicators from the following groups: a methylation level detection result fluctuation of 25% or less, and a repeatability of 9E-05 or less. Preferably, the fluctuation is the difference between the maximum value and the minimum value of the detection result, and the repeatability is the median value of the mean square error of the methylation level among multiple wells.
例如,本申请的设计方法,所述核酸分子组合用于甲基化检测的捕获探针。例如,本申请的方法用到的特定甲基化水平的标准品,所述特定甲基化水平标准品通过本申请的方法制备获得。For example, in the design method of the present application, the nucleic acid molecule is combined with a capture probe for methylation detection. For example, the standard of specific methylation level used in the method of the present application, the standard of specific methylation level is prepared by the method of the present application.
一方面,本申请提供了本申请的核酸分子组合的设计方法得到的核酸分子组合。例如,所述核酸分子组合用于甲基化检测的捕获探针。In one aspect, the present application provides the nucleic acid molecule combination obtained by the design method of the nucleic acid molecule combination of the present application. For example, the nucleic acid molecule is combined with a capture probe for methylation detection.
一方面,本申请提供一种试剂盒,包含本申请的核酸分子组合。In one aspect, the present application provides a kit comprising the nucleic acid molecule combination of the present application.
一方面,本申请提供本申请的核酸分子组合和/或如本申请的试剂盒在制备人肿瘤基因检测制剂中的应用。例如,所述检测制剂用于检测目标区域的碱基修饰水平。例如,所述碱基修饰包含甲基化修饰。例如,所述人肿瘤来自于同质肿瘤(homogenous tumors)、异质肿瘤、血液癌和/或实体瘤。例如,所述人肿瘤来自于以下组的癌症中的一种或多种:脑癌、肺癌、皮肤癌、鼻咽癌、咽喉癌、肝癌、骨癌、淋巴瘤、胰腺癌、皮肤癌、肠癌、直肠癌、甲状腺癌、膀胱癌、肾癌、口腔癌、胃癌、实体瘤、卵巢癌、食管癌、胆囊癌、胆道癌、乳腺癌、宫颈癌、子宫癌、前列腺癌、头颈癌、肉瘤、胸腔恶性肿瘤(除肺外)、黑色素瘤、睾丸癌。In one aspect, the present application provides the application of the nucleic acid molecule combination of the present application and/or the kit of the present application in the preparation of human tumor gene detection preparations. For example, the detection preparation is used to detect the base modification level of the target region. For example, the base modification includes methylation modification. For example, the human tumors are from homogenous tumors, heterogeneous tumors, hematological cancers and/or solid tumors. For example, the human tumor is from one or more of the following group of cancers: brain cancer, lung cancer, skin cancer, nasopharyngeal cancer, throat cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, intestinal Cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumor, ovarian cancer, esophageal cancer, gallbladder cancer, biliary tract cancer, breast cancer, cervical cancer, uterine cancer, prostate cancer, head and neck cancer, sarcoma , Thoracic malignancy (except lung), melanoma, testicular cancer.
一方面,本申请提供一种检测碱基修饰水平的方法,包含提供本申请的核酸分子组合和/或本申请的试剂盒。例如,所述碱基修饰包含甲基化修饰。In one aspect, the present application provides a method for detecting the level of base modification, comprising providing the nucleic acid molecule combination of the present application and/or the kit of the present application. For example, the base modification includes methylation modification.
一方面,本申请提供一种储存介质,其记载可以运行本申请的方法的程序。例如,所述非易失性计算机可读存储介质可以包括软盘、柔性盘、硬盘、固态存储(SSS)(例如固态驱动(SSD))、固态卡(SSC)、固态模块(SSM))、企业级闪存驱动、磁带或任何其他非临时性磁介质等。非易失性计算机可读存储介质还可以包括打孔卡、纸带、光标片(或任何其他具有孔型图案或其他光学可识别标记的物理介质)、压缩盘只读存储器(CD-ROM)、可重写式光盘(CD-RW)、数字通用光盘(DVD)、蓝光光盘(BD)和/或任何其他非临时性光学 介质。In one aspect, the present application provides a storage medium, which records a program capable of running the method of the present application. For example, the non-transitory computer readable storage medium may include a floppy disk, a flexible disk, a hard disk, a solid state storage (SSS) (such as a solid state drive (SSD)), a solid state card (SSC), a solid state module (SSM)), an enterprise high-grade flash drives, tape, or any other non-transitory magnetic media, etc. Non-transitory computer readable storage media may also include punched cards, paper tape, cursor sheets (or any other physical media having a pattern of holes or other optically identifiable markings), compact disc read only memory (CD-ROM) , Rewritable Disc (CD-RW), Digital Versatile Disc (DVD), Blu-ray Disc (BD) and/or any other non-transitory optical media.
一方面,本申请提供一种设备,所述设备包含本申请的储存介质。例如,所述设备还包含耦接至所述储存介质的处理器,所述处理器被配置为基于存储在所述储存介质中的程序执行以实现本申请的方法。In one aspect, the present application provides a device, and the device includes the storage medium of the present application. For example, the device further includes a processor coupled to the storage medium, and the processor is configured to execute based on a program stored in the storage medium to implement the method of the present application.
一方面,本申请提供一种用作碱基修饰程度检测标准品的核酸分子,所述核酸分子包含碱基修饰程度为约0%候选区域。例如,本申请的所述候选区域的总长度为约1bp-约10000bp。例如,本申请的所述候选区域的总长度为约1bp、约10bp、约100bp、约1000bp、约10000bp、约50000bp、或约100000bp。例如,所述核酸分子可以选自以下细胞系中的一种或多种:GM24385、GM12878、GM12877、GM24631。In one aspect, the present application provides a nucleic acid molecule used as a standard for detecting the degree of base modification, the nucleic acid molecule comprising a candidate region with a degree of base modification of about 0%. For example, the total length of the candidate region of the present application is about 1 bp to about 10000 bp. For example, the total length of the candidate region of the present application is about 1 bp, about 10 bp, about 100 bp, about 1000 bp, about 10000 bp, about 50000 bp, or about 100000 bp. For example, the nucleic acid molecule can be selected from one or more of the following cell lines: GM24385, GM12878, GM12877, GM24631.
一方面,本申请提供一种制备碱基修饰程度检测标准品的方法,所述方法包含确定核酸分子中碱基修饰程度为约0%候选区域。In one aspect, the present application provides a method for preparing a base modification degree detection standard, the method comprising determining a candidate region in a nucleic acid molecule where the base modification degree is about 0%.
一方面,本申请提供了一种核酸分子在制备碱基修饰程度检测标准品的用途,所述核酸分子包含碱基修饰程度为约0%候选区域。In one aspect, the present application provides the use of a nucleic acid molecule in the preparation of a base modification degree detection standard, the nucleic acid molecule comprising a candidate region with a base modification degree of about 0%.
例如,所述碱基修饰程度包含所述候选区域内胞嘧啶的甲基化程度。For example, the degree of base modification includes the degree of methylation of cytosine in the candidate region.
一方面,本申请提供一种如本申请的核酸分子中所述候选区域碱基修饰处理前的集合,其用作未经碱基修饰处理的标准品的用途。例如,所述核酸分子可以作为零甲基化标准品。In one aspect, the present application provides a set of candidate regions before base modification treatment as described in the nucleic acid molecule of the present application, which is used as a standard without base modification treatment. For example, the nucleic acid molecule can serve as a zero methylation standard.
一方面,本申请提供一种制备碱基修饰程度检测标准品的方法,所述方法包含确定核酸分子中碱基修饰程度为约0%候选区域。In one aspect, the present application provides a method for preparing a base modification degree detection standard, the method comprising determining a candidate region in a nucleic acid molecule where the base modification degree is about 0%.
一方面,本申请提供了一种核酸分子在制备碱基修饰程度检测标准品的用途,所述核酸分子包含碱基修饰程度为约0%候选区域。In one aspect, the present application provides the use of a nucleic acid molecule in the preparation of a base modification degree detection standard, the nucleic acid molecule comprising a candidate region with a base modification degree of about 0%.
一方面,本申请提供一种如本申请的核酸分子中所述候选区域全部经碱基修饰处理后的集合,其用作经碱基修饰处理的标准品的用途。例如,所述核酸分子可以作为全甲基化标准品。In one aspect, the present application provides a collection of all base-modified candidate regions as described in the nucleic acid molecule of the present application, which is used as a base-modified standard. For example, the nucleic acid molecule can serve as a permethylation standard.
一方面,本申请提供一种制备碱基修饰程度检测标准品的方法,所述方法包含确定核酸分子中碱基修饰处理前碱基修饰程度为约0%候选区域,且将所述核酸分子进行碱基修饰处理。In one aspect, the present application provides a method for preparing a base modification degree detection standard, the method comprising determining a candidate region in a nucleic acid molecule where the base modification degree is about 0% before base modification treatment, and subjecting the nucleic acid molecule to Base modification treatment.
一方面,本申请提供了一种核酸分子在制备碱基修饰程度检测标准品的用途,所述核酸分子包含碱基修饰处理前碱基修饰程度为约0%候选区域,且将所述核酸分子进行碱基修饰处理。On the one hand, the present application provides the use of a nucleic acid molecule in the preparation of a base modification degree detection standard, the nucleic acid molecule comprises a candidate region with a base modification degree of about 0% before base modification treatment, and the nucleic acid molecule Perform base modification.
例如,通过预定比例混合所述碱基修饰处理前的所述核酸分子以及所述碱基修饰处理后 的所述核酸分子,得到预定碱基修饰程度的甲基化标准品。例如,所述碱基修饰处理包含使所述核酸分子与甲基化转移酶接触。例如,对于m%的上述全甲基化标准品与1-m%上述的零甲基化标准品进行混合,可以得到m%的甲基化程度标准品,所述m%的甲基化程度标准品在候选区域的甲基化程度为m%。For example, by mixing the nucleic acid molecule before the base modification treatment and the nucleic acid molecule after the base modification treatment in a predetermined ratio, a methylation standard with a predetermined degree of base modification is obtained. For example, the base modification treatment comprises contacting the nucleic acid molecule with a methyltransferase. For example, if m% of the above-mentioned full methylation standard is mixed with 1-m% of the above-mentioned zero-methylation standard, the m% methylation standard can be obtained, and the m% methylation The degree of methylation of the standard in the candidate region is m%.
本申请提供了一种试剂盒,包含本申请的核酸分子。例如,所述试剂盒可以作为甲基化检测的捕获探针的标准品。The present application provides a kit comprising the nucleic acid molecule of the present application. For example, the kit can be used as a standard for capture probes for methylation detection.
不欲被任何理论所限,下文中的实施例仅仅是为了阐释本申请的方法和用途等,而不用于限制本申请发明的范围。Not intending to be limited by any theory, the following examples are only for explaining the methods and uses of the present application, and are not intended to limit the scope of the invention of the present application.
实施例Example
实施例1Example 1
探针筛选Probe Screening
经过重亚硫酸盐处理的二代测序,文库复杂度降低,对靶向捕获特异性造成很大挑战,对比甲基化扩增子方法(amplicon),杂交捕获法往往适用长探针,从而提供更好的特异性和对单核苷酸多态性(SNP,single-nucleotide polymorphism)的容忍。然而随着探针长度增加,融化温度(Tm)进一步升高,导致某些探针容易形成局部二级空间结构,从而捕获能力受限,因此本申请提供了一种长序列探针设计的热动力学计算方式,实现基因组目标区域高度均一的捕获和良好的可重复性。The next-generation sequencing treated with bisulfite reduces the complexity of the library and poses a great challenge to the specificity of target capture. Compared with the methylated amplicon method (amplicon), the hybridization capture method is often suitable for long probes, thus providing Better specificity and tolerance to single-nucleotide polymorphism (SNP, single-nucleotide polymorphism). However, as the length of the probe increases, the melting temperature (Tm) further increases, causing some probes to easily form local secondary space structures, thereby limiting the capture ability. Therefore, the application provides a thermal solution for long-sequence probe design. Kinetic calculation method achieves highly uniform capture and good repeatability of genomic target regions.
杂交过程实现目标DNA(T,target):RNA(P,probe)互补序列的特异结合,这个动态反应的平衡常数R eq可以通过标准自由能ΔG o计算,而后者可通过重亚硫酸盐处理后全部转化,或者全部非转化的假设进行计算。 The hybridization process realizes the specific binding of the target DNA (T, target): RNA (P, probe) complementary sequence. The equilibrium constant R eq of this dynamic reaction can be calculated by the standard free energy ΔG o , and the latter can be calculated by bisulfite treatment All conversions, or all non-transformations are assumed to be calculated.
R eq=[TP]/[T][P] R eq =[TP]/[T][P]
杂交产率(Ψ)可以通过形成DNA:RNA互补结合或者单链形态进行计算。考虑到P在体系中远远过量,为了简化计算,Hybridization yield (Ψ) can be calculated by forming DNA:RNA complementary binding or single-stranded morphology. Considering that P is far in excess in the system, in order to simplify the calculation,
Ψ=[TP]/([Tp]+[T])Ψ=[TP]/([Tp]+[T])
R eq′≡[c] -Δn*R eq R eq ′≡[c] -Δn *R eq
此处[c]指代杂交探针原始浓度,Δn指代反应过程中T和P种类的变化,R eq′用来评估反映热力学平衡:R eq′>>1,则Ψ趋近于1;同理R eq′<<1,则Ψ趋近于0。引入浓度参数后的标准自由能 Here [c] refers to the original concentration of the hybridization probe, Δn refers to the change of T and P species during the reaction, Req ′ is used to evaluate and reflect the thermodynamic equilibrium: Req ′>>1, then Ψ tends to 1; Similarly, R eq ′<<1, then Ψ tends to 0. Standard free energy after introducing the concentration parameter
ΔG o≡-RTlog(R eq′)=ΔG o+(Δn)RTlog([c]) ΔG o ≡-RTlog( Req ′)=ΔG o +(Δn)RTlog([c])
探针在此杂交捕获系统的特异性可以定义为The specificity of probes in this hybridization capture system can be defined as
ΔΔG o=(ΔG o(T fP)-ΔG o(T f)ΔG o(P))-(ΔG o(T nP)-ΔG o(T nP)ΔG o(P)) ΔΔG o =(ΔG o (T f P)-ΔG o (T f )ΔG o (P))-(ΔG o (T n P)-ΔG o (T n P)ΔG o (P))
此处T n指代针对目标序列的杂交产物;T f指代非特异杂交产物。 Here T n refers to the hybridization product against the target sequence; T f refers to the non-specific hybridization product.
为了获得精确度高的探针组合,需要选择合适的目标区域用于设计候选探针序列。合适的目标区域需要满足,候选探针对于来源于目标区域的结合自由能与对于来源于非目标区域的核酸序列的结合自由能相差(ΔΔG o)大于一特定阈值,所述特定阈值为约12至50kcal/mol。优选地,候选探针对于来源于目标区域的结合自由能与对于来源于非目标区域的核酸序列的结合自由能相差(ΔΔG o)约20kcal/mol或更高,约50kcal/mol或更高。 In order to obtain a probe combination with high precision, it is necessary to select an appropriate target region for designing candidate probe sequences. A suitable target region is such that the difference between the free energy of binding (ΔΔG o ) of the candidate probe for a nucleic acid sequence derived from the target region and the nucleic acid sequence derived from the non-target region is greater than a specified threshold value of about 12 to 50kcal/mol. Preferably, the difference (ΔΔG o ) between the free energy of binding of the candidate probes for the target region and the nucleic acid sequence derived from the non-target region is about 20 kcal/mol or higher, about 50 kcal/mol or higher.
根据既往的试验结果,一条120nt的探针只需要与捕获序列具有60nt相似序列,便可以将序列捕获。对于预设计的探针序列,按照60nt为窗口,1nt为步长,进行滑窗,得到每个探针的目标子序列集,每个子序列长度为60nt,计算每一个子序列和探针序列的ΔΔG o。两个序列之间的ΔΔG o计算可以参考本领域已知的计算方法,例如Zhang,D等人.自然化学4,208–214(2012).。合并所有探针序列的与子序列的ΔΔG o结果,取最小值,可以作为ΔΔG o阈值。本申请中,可以用于选择合适目标区域的ΔΔG o范围为约12kcal/mol或更高。优选地,用于选择合适目标区域的ΔΔG o范围为约20kcal/mol或更高,约50kcal/mol或更高。 According to previous test results, a 120nt probe only needs to have a 60nt similar sequence with the capture sequence to capture the sequence. For the pre-designed probe sequence, according to 60nt as the window and 1nt as the step size, the sliding window is performed to obtain the target subsequence set of each probe, and the length of each subsequence is 60nt, and the distance between each subsequence and the probe sequence is calculated. ΔΔG o . The calculation of ΔΔG o between two sequences can refer to calculation methods known in the art, such as Zhang, D et al. Nature Chemistry 4, 208-214 (2012). Combine the ΔΔG o results of all probe sequences and subsequences, and take the minimum value, which can be used as the ΔΔG o threshold. In the present application, the range of ΔΔG o that can be used to select suitable target regions is about 12 kcal/mol or higher. Preferably, the ΔΔG o range for selecting a suitable target region is about 20 kcal/mol or higher, about 50 kcal/mol or higher.
ΔΔG o可以通过以下示例进行计算: ΔΔG o can be calculated by the following example:
示例1(不合适的探针区域,即探针在基因组上存在相似性序列,相似序列与探针的ΔΔG o小于阈值,该探针被过滤): Example 1 (inappropriate probe region, that is, the probe has a similar sequence on the genome, and the ΔΔG o between the similar sequence and the probe is less than the threshold, and the probe is filtered):
探针区域:chr9:132331252-132331371:Probe region: chr9:132331252-132331371:
Figure PCTCN2022100272-appb-000002
Figure PCTCN2022100272-appb-000002
相似序列区域:chr22:23908697-23908816:Similar sequence region: chr22:23908697-23908816:
Figure PCTCN2022100272-appb-000003
Figure PCTCN2022100272-appb-000003
两个序列比较:Compare two sequences:
Figure PCTCN2022100272-appb-000004
Figure PCTCN2022100272-appb-000004
Figure PCTCN2022100272-appb-000005
Figure PCTCN2022100272-appb-000005
相似序列与探针序列计算得到的ΔΔG o值(全非甲基化状态,所有C转化为T)为11.27; The ΔΔG o value (full non-methylation state, all C converted to T) calculated by the similar sequence and the probe sequence was 11.27;
相似序列与探针序列计算得到的ΔΔG o值(全甲基化状态,仅非CpG的C转化为T)为11.55。目标区域的ΔΔG o范围小于约12,该探针区域被舍弃。 The ΔΔG o value (full methylation status, only non-CpG C converted to T) calculated from the similar sequence and the probe sequence was 11.55. Regions of interest with a ΔΔG o range of less than about 12 were discarded.
示例2(不合适的探针区域,探针在基因组上存在相似性序列,相似序列与探针的ΔΔG o小于阈值,该探针被过滤): Example 2 (inappropriate probe region, the probe has a similar sequence on the genome, the ΔΔG o between the similar sequence and the probe is less than the threshold, and the probe is filtered):
探针区域:chr22:50176001-50176120:Probe region: chr22:50176001-50176120:
Figure PCTCN2022100272-appb-000006
Figure PCTCN2022100272-appb-000006
相似序列区域1:chr22:50176480-50176599:Similar sequence region 1: chr22:50176480-50176599:
Figure PCTCN2022100272-appb-000007
Figure PCTCN2022100272-appb-000007
相似序列1与探针序列比较:Comparison of Similar Sequence 1 and Probe Sequence:
Figure PCTCN2022100272-appb-000008
Figure PCTCN2022100272-appb-000008
相似序列1与探针序列计算得到的ΔΔG o值(全非甲基化状态,所有C转化为T)为2.28; The ΔΔG o value (full non-methylation state, all C converted to T) calculated by the similar sequence 1 and the probe sequence is 2.28;
相似序列1与探针序列计算得到的ΔΔG o值(全甲基化状态,仅非CpG的C转化为T)为2.28。目标区域的ΔΔG o范围小于约12,该探针区域被舍弃。 The calculated ΔΔG o value (full methylation status, only non-CpG C converted to T) of the similar sequence 1 and the probe sequence was 2.28. Regions of interest with a ΔΔG o range of less than about 12 were discarded.
相似序列区域2:chr22:50176259-50176403:Similar sequence region 2: chr22:50176259-50176403:
Figure PCTCN2022100272-appb-000009
Figure PCTCN2022100272-appb-000009
相似序列2与探针序列比较:Comparison of Similar Sequence 2 and Probe Sequence:
Figure PCTCN2022100272-appb-000010
Figure PCTCN2022100272-appb-000010
相似序列2与探针序列计算得到的ΔΔG o值(全非甲基化状态,所有C转化为T)为4.67; The ΔΔG o value (full non-methylation state, all C converted to T) calculated by the similar sequence 2 and the probe sequence was 4.67;
相似序列2与探针序列计算得到的ΔΔG o值(全甲基化状态,仅非CpG的C转化为T)为4.67。目标区域的ΔΔG o范围小于约12,该探针区域被舍弃。 The calculated ΔΔG o value (full methylation status, only non-CpG C converted to T) of the similar sequence 2 and the probe sequence was 4.67. Regions of interest with a ΔΔG o range of less than about 12 were discarded.
示例3(合适的探针区域,探针在基因组上不存在较长的相似性序列,探针被保留):Example 3 (appropriate probe region, the probe does not have a longer similar sequence on the genome, and the probe is retained):
探针区域:chr1:849521-849640:Probe region: chr1:849521-849640:
Figure PCTCN2022100272-appb-000011
Figure PCTCN2022100272-appb-000011
相似序列区域:chr10:133011469-133011488:Similar sequence region: chr10:133011469-133011488:
Figure PCTCN2022100272-appb-000012
Figure PCTCN2022100272-appb-000012
相似序列(相似序列左右延伸至于探针序列长度一致)与探针序列比较:Similar sequences (similar sequences extend left and right until the length of the probe sequence is the same) compared with the probe sequence:
Figure PCTCN2022100272-appb-000013
Figure PCTCN2022100272-appb-000013
Figure PCTCN2022100272-appb-000014
Figure PCTCN2022100272-appb-000014
相似序列与探针序列计算得到的ΔΔG o值(全非甲基化状态,所有C转化为T)为50.77; The ΔΔG o value calculated from the similar sequence and the probe sequence (full non-methylation state, all C converted to T) is 50.77;
相似序列与探针序列计算得到的ΔΔG o值(全甲基化状态,仅非CpG的C转化为T)为50.85。目标区域的ΔΔG o范围大于约12,该探针区域被保留。 The ΔΔG o value (full methylation status, only non-CpG C converted to T) calculated from the similar sequence and the probe sequence was 50.85. Target regions with a ΔΔG o range greater than about 12, the probe region is retained.
实施例2Example 2
捕获探针设计capture probe design
附图5提供了一个仅供说明的参考示例,上方显示的一段预期进行甲基化检测的双链DNA片段,按箭头方向排序,其包含原始上链(CCGGCATGTTTAAACGCT)和原始下链(AGCGTTTAAACATGCCGG),其中部分假定所有CpG中的胞嘧啶(C)都发生了甲基化,以-mC标识出。上述双链DNA片段经过变性解旋为单链形式后,经过重亚硫酸盐转化处理,原始上链和原始下链中未被甲基化(-mC)修饰的C被转化为尿嘧啶(U),而甲基化修饰的C则依然保持为C。在随后的PCR扩增过程中,由于尿嘧啶(U)与腺嘌呤(A)互补配对,而DNA的PCR扩增中引入的与腺嘌呤(A)配对碱基为胸腺嘧啶(T)。在PCR扩增中,首先形成了与重亚硫酸盐处理后的带有尿嘧啶(U)的原始上链互补的目标上链互补链(CTOT),以及与重亚硫酸盐处理后的带有尿嘧啶(U)的原始下链互补的目标下链互补链(CTOB)。在之后的PCR扩增过程中,形成了由原始上链转化的与CTOT互补的目标上链(OT),以及由原始下链转化的与CTOB互补的目标下链(OB)。对比可知,原始上链和原始下链中的未被甲基化修饰的C在目标上链和目标下链中被T取代,而甲基化修饰的C(以下划线标识出)则保持不变。根据这一特点,可以通过测定重亚硫酸盐转化处理后的 C来识别经甲基化修饰的C的数量与位置,从而实现DNA甲基化检测的目的。为便于简述,本文说明中以C转化为T、C替换为T、C被T取代等方式表达上述过程。 Figure 5 provides a reference example for illustration only, a double-stranded DNA fragment expected to be detected for methylation shown above, sorted in the direction of the arrow, including the original upper strand (CCGGCATGTTTAAACGCT) and the original lower strand (AGCGTTTAAACATGCCGG), Some of them assume that the cytosine (C) in all CpGs is methylated, marked with -mC. After the above-mentioned double-stranded DNA fragments are denatured and unwound into a single-stranded form, after bisulfite conversion treatment, the unmethylated (-mC) modified C in the original upper strand and the original lower strand is converted into uracil (U ), while the methylated C remains C. In the subsequent PCR amplification process, due to the complementary pairing of uracil (U) and adenine (A), the base paired with adenine (A) introduced in the PCR amplification of DNA is thymine (T). In PCR amplification, the target upper strand complementary strand (CTOT) complementary to the original upper strand with uracil (U) after bisulfite treatment is first formed, and the target upper strand complementary strand (CTOT) with the original upper strand after bisulfite treatment is formed. The target lower strand complementary strand (CTOB) to which the original lower strand of uracil (U) is complementary. In the subsequent PCR amplification process, the target upper strand (OT) transformed from the original upper strand and complementary to CTOT, and the target lower strand (OB) transformed from the original lower strand complementary to CTOB were formed. The comparison shows that the unmethylated C in the original upper strand and the original lower strand is replaced by T in the target upper strand and the target lower strand, while the methylated C (underlined) remains unchanged . According to this feature, the number and position of methylated C can be identified by measuring the C after bisulfite conversion treatment, so as to achieve the purpose of DNA methylation detection. For the sake of brief description, the above-mentioned process is expressed in the description herein as C being converted to T, C being replaced by T, C being replaced by T, and the like.
如图5所示的一种理想情况,即原始链的甲基化情况是已知的。但实际操作中目标链(OT、OB)中C是否被T取代的情况是未知的,也是需要通过检测测定的,而在未知情况下又需要对OT和OB链设计探针进行杂交捕获,因此本申请进行如下两种假设:An ideal situation as shown in Figure 5, that is, the methylation status of the original strand is known. However, in actual operation, whether C in the target chain (OT, OB) is replaced by T is unknown, and it needs to be determined by detection, and in unknown cases, it is necessary to hybridize and capture the designed probes for OT and OB chains, so This application makes the following two assumptions:
1)目标链(OT、OB)中所有的C都没有被甲基化修饰,因而在重亚硫酸盐转化和PCR处理之后,所有的C都被T取代,并据此设计出对应的互补链(CTOT、CTOB)。1) All Cs in the target strand (OT, OB) are not modified by methylation, so after bisulfite conversion and PCR treatment, all Cs are replaced by T, and the corresponding complementary strands are designed accordingly (CTOT, CTOB).
2)目标链(OT、OB)中所有的C都被甲基化修饰,因而在重亚硫酸盐转化和PCR处理之后,所有的C都保留不变,并据此设计出对应的互补链(CTOT、CTOB)。2) All Cs in the target strands (OT, OB) are modified by methylation, so after bisulfite conversion and PCR treatment, all Cs remain unchanged, and the corresponding complementary strands ( CTOT, CTOB).
首先,本申请的捕获探针可以根据假设为无甲基化的目标区域进行设计。以图5为例, 将原始上链中所有C转化为T作为第一链,对应于图5中的序列则为:T TGGTATGTTTAAA TGTT,设计与第一链互补的第一探针;将原始下链中所有C转化为T作为第二链,对应于图5中的序列则为:AGTGTTTAAATATGTTGG,设计与第二链互补的第二探针;同时根据第一链的互补链作为第三链,对应于图5中的序列则为:AACATTTAAACATACCAA,设计与第三链互补的第三探针;同时根据第二链的互补链作为第四链,对应于图5中的序列则为:CC AACATATTTAAAC ACT,设计与第四链互补的第四探针。本申请的探针除了两条来源于原始链的目标链,还对两条目标链的互补链也都设计了探针,实现了良好的覆盖。经验证可以提高捕获的性能,例如探针的准确性和重复性。需要说明上述图5中的仅为便于说明的示例,实际需要选取的目标链的数量非常庞大,不局限于图5中的序列。 First, the capture probes of the present application can be designed based on target regions that are assumed to be unmethylated. Taking Figure 5 as an example, convert all C in the original upper strand to T as the first strand, and the sequence corresponding to Figure 5 is: T T GGTATGTTTAAA T GTT, and design a first probe complementary to the first strand; All C in the original lower strand is converted to T as the second strand, corresponding to the sequence in Figure 5: AGTGTTTAAATATGTTGG, design a second probe complementary to the second strand; at the same time, use the complementary strand of the first strand as the third strand , corresponding to the sequence in Figure 5 is: AACATTTAAACATACCAA, design the third probe complementary to the third strand; at the same time, according to the complementary strand of the second strand as the fourth strand, corresponding to the sequence in Figure 5 is: CC A ACATATTTAAAC A CT to design a fourth probe complementary to the fourth strand. In addition to the two target strands derived from the original strands, the probes of the present application are also designed for the complementary strands of the two target strands, achieving good coverage. Proven to improve capture performance, such as probe accuracy and reproducibility. It should be noted that the above-mentioned FIG. 5 is only an example for convenience of explanation, and the number of target chains to be selected is actually very large, and is not limited to the sequence in FIG. 5 .
优选地,本申请的捕获探针也可以进一步地根据假设为全甲基化的目标区域进行设计,在CpG岛作为甲基化测定的主体的情况下,只考虑碱基排序(例如图5中的箭头方向)为“CG”中的碱基C会发生甲基化,除此以外的情况,认为都不会发生甲基化。同样以图5为例,将原始上链中仅非CpG的C转化为T作为第五链,对应于图5中的序列则为:T CGGTATGTTTAAA CGTT,设计与第五链互补的第五探针;将目标下链(OB)中仅非CpG的C转化为T作为第六链,对应于图5中的序列则为:AGCGTTTAAATATGTCGG,设计与第六链互补的第六探针;同时根据第五链的互补链作为第七链,对应于图5中的序列则为:AACGTTTAAACATACCGA,设计与第七链互补的第七探针;同时根据第六链的互补链作为第八链,对应于图5中的序列则为:CCGACATATTTAAACGCT,设计与第八链互补的第八探针。 Preferably, the capture probe of the present application can also be further designed according to the target region assumed to be fully methylated. In the case of CpG islands as the main body of methylation determination, only base sequencing is considered (for example, in Figure 5 The direction of the arrow) means that the base C in "CG" will be methylated, and in other cases, it is considered that methylation will not occur. Also take Figure 5 as an example, convert only non-CpG C in the original upper chain to T as the fifth chain, and the sequence corresponding to Figure 5 is: T C GGTATGTTTAAA C GTT, design the fifth chain complementary to the fifth chain Probe; only non-CpG C in the target lower strand (OB) is converted into T as the sixth strand, corresponding to the sequence in Figure 5: AGCGTTTAAATATGTCGG, the sixth probe is designed to be complementary to the sixth strand; simultaneously according to The complementary strand of the fifth strand is used as the seventh strand, corresponding to the sequence in Figure 5: AACGTTTAAACATACCGA, the seventh probe complementary to the seventh strand is designed; at the same time, the complementary strand of the sixth strand is used as the eighth strand, corresponding to The sequence in Fig. 5 is: CCGACATATTTAAACGCT, the eighth probe complementary to the eighth strand is designed.
在设计捕获探针中,本申请的目标区域优选为约10000个或更多个。In designing capture probes, the target regions of the present application are preferably about 10,000 or more.
实施例3Example 3
捕获探针的性能检测Performance testing of capture probes
本申请通过特定甲基化的标准品,检测捕获探针组合的性能,确定最终在探针组中应用的探针。In this application, the performance of the capture probe combination is detected through specific methylation standards, and the final probes used in the probe set are determined.
对于20%和/或50%甲基化标准品(已知在特定区域,甲基化水平为20%和/或50%的标准测试样品),检测本申请捕获探针的准确性和重复性。For 20% and/or 50% methylation standards (known in specific regions, the methylation level is 20% and/or 50% standard test samples), to detect the accuracy and repeatability of the capture probes of the present application .
准确性accuracy
检测偏差的计算方法为,候选捕获探针组合检测的甲基化水平与实际(或理论)甲基化水平的差值/实际(或理论)甲基化水平。合适的捕获探针组合的检测波动,即最大值和最 小值的差值,为约25%或更低。更具体而言,所述差值=最大值-最小值。The detection deviation is calculated as the difference between the methylation level detected by the candidate capture probe combination and the actual (or theoretical) methylation level/the actual (or theoretical) methylation level. Suitable capture probe combinations have a detection fluctuation, i.e., the difference between the maximum and minimum values, of about 25% or less. More specifically, the difference=maximum value−minimum value.
Figure PCTCN2022100272-appb-000015
Figure PCTCN2022100272-appb-000015
在利用上述公式来评估待测样品的甲基化水平时,所有的探针组合的集合应当覆盖90%以上的目标区域。优选地,所有的探针组合的集合应当覆盖95%以上的目标区域。进一步优选地,所有的探针组合的集合应当覆盖99%以上的目标区域。更进一步优选地,所有的探针组合的集合应当覆盖100%的目标区域。When using the above formula to evaluate the methylation level of the sample to be tested, the collection of all probe combinations should cover more than 90% of the target region. Preferably, the collection of all probe combinations should cover more than 95% of the target area. Further preferably, the collection of all probe combinations should cover more than 99% of the target area. Even more preferably, the collection of all probe combinations should cover 100% of the target area.
参见图6示例性的说明,随着测序深度的增加,对于单一CpG位点会有多个读段覆盖,而且不同的读段对于同一个CpG位点的甲基化检测结果也可能是不同的。例如,对于CpG-2位点,读段1-4显示的结果是甲基化阳性(以黑点●标识),但读段5-6显示的结果是甲基化阴性(以白点○标识),计算所有的读段的所有位点的甲基化状态,而不是强制对单个位点定性地选择阳性或阴性状态,可以避免对测序结果中甲基化信号人为干预而导致误差。Refer to the exemplary illustration in Figure 6. As the sequencing depth increases, multiple reads will cover a single CpG site, and different reads may have different methylation detection results for the same CpG site. . For example, for the CpG-2 site, reads 1-4 showed results that were positive for methylation (indicated by black dots ●), but reads 5-6 showed results that were negative for methylation (indicated by white dots ○ ), calculate the methylation status of all sites in all reads, instead of forcing a single site to qualitatively select a positive or negative status, which can avoid errors caused by artificial intervention in the methylation signal in the sequencing results.
重复性repeatability
重复性RMSE的计算方法为,对于特定甲基化水平为20%和/或50%的标准测试样品,两次或更多次重复测量的候选捕获探针组合检测的甲基化水平的均方差。合适的捕获探针组合的重复性,即复孔间甲基化水平均方误差中值,为约9E-05或更低。The repeatability RMSE is calculated as the mean squared error of the methylation levels detected by the candidate capture probe combination for two or more replicate measurements for a standard test sample with a specific methylation level of 20% and/or 50% . The reproducibility, ie, the median squared error of the methylation level between duplicate wells, for a suitable capture probe combination is about 9E-05 or less.
任选地,对捕获探针组合的均一性和偏好性进行检测。Optionally, the uniformity and bias of the combination of capture probes is tested.
均一性Uniformity
均一性CV的计算方法为,The uniformity CV is calculated as,
Figure PCTCN2022100272-appb-000016
Figure PCTCN2022100272-appb-000016
其中,对于捕获探针组合中的k个捕获探针,d i表示第i个探针的测序深度,
Figure PCTCN2022100272-appb-000017
表示所有探针测序深度的均值。合适的捕获探针组合的覆盖均一性CV应小于1;优选地,CV应当小于0.5;更优选地,CV应当小于0.3;进一步优选地,CV应当小于0.2。
where, for k capture probes in a capture probe combination, d i represents the sequencing depth of the i-th probe,
Figure PCTCN2022100272-appb-000017
Indicates the mean of the sequencing depth of all probes. The coverage uniformity CV of a suitable capture probe combination should be less than 1; preferably, the CV should be less than 0.5; more preferably, the CV should be less than 0.3; further preferably, the CV should be less than 0.2.
偏好性preference
偏好性R的计算方法为,The calculation method of preference R is,
Figure PCTCN2022100272-appb-000018
Figure PCTCN2022100272-appb-000018
其中,对于捕获探针组合中的m个捕获探针,x i表示第i个探针对于目标链(OT+OB) 的测序深度,
Figure PCTCN2022100272-appb-000019
表示所有探针对于目标链的测序深度的均值,y i表示第i个探针对于互补链(CTOT+CTOB)的测序深度,y表示所有探针对于互补链的测序深度的均值。
Among them, for the m capture probes in the capture probe combination, x i represents the sequencing depth of the i-th probe for the target strand (OT+OB),
Figure PCTCN2022100272-appb-000019
Indicates the mean value of the sequencing depth of all probes for the target strand, y i represents the sequencing depth of the i-th probe for the complementary strand (CTOT+CTOB), and y represents the average value of the sequencing depth of all probes for the complementary strand.
其中,OT表示目标区域的目标上链,CTOT表示目标上链的互补链;OB表示目标区域的目标下链,CTOB表示目标下链的互补链。Among them, OT represents the target upper strand of the target region, CTOT represents the complementary strand of the target upper strand; OB represents the target lower strand of the target region, and CTOB represents the complementary strand of the target lower strand.
任选地,捕获探针组合中捕获探针的长度为约80至约120个碱基。任选地,捕获探针组合中任意两个捕获探针的重叠的区域包含约10至约110个碱基。任选地,捕获探针组合中捕获探针所互补的区域不包含与重复区域交叠10个或更多个的连续碱基。重复区域记载在本领域已知的内容中,例如repeatmasker.org记载的重复区域(repeats)。Optionally, the capture probes in the capture probe set are about 80 to about 120 bases in length. Optionally, the region of overlap between any two capture probes in the capture probe set comprises about 10 to about 110 bases. Optionally, the region of the capture probe set to which the capture probes are complementary does not contain 10 or more contiguous bases that overlap the repeat region. Repeated regions are described in what is known in the art, for example repeats described in repeatmasker.org.
实施例4Example 4
甲基化的标准品的构建Construction of methylated standards
目前的甲基化标准品来自于全基因组扩增得到的样品,然而扩增得到“零甲基化标准品”过程中,可能出现标准品实际上所有胞嘧啶都未甲基化,使得上述样品经过重亚硫酸盐转化后没有胞嘧啶,容易出现较大捕获偏差,不适用于作为评估捕获法性能的标准品。The current methylation standards come from samples obtained by whole genome amplification. However, in the process of obtaining "zero methylation standards" in the amplification process, it may appear that all cytosines in the standards are actually unmethylated, making the above samples After bisulfite conversion, there is no cytosine, and it is prone to large capture deviation, so it is not suitable as a standard for evaluating the performance of capture methods.
本申请提供了一种用于捕获探针对的甲基化标准品构建的方法。对于来源于人细胞系的核酸样品,通过甲基化转移酶(例如M.sssI)进行处理,得到“全甲基化的标准品(PC)”,对于未经过甲基化转移酶处理的相应核酸样品,作为“零甲基化的标准品(NC)”。对于“零甲基化的标准品”和“全甲基化的标准品”进行甲基化测序;对于在“零甲基化的标准品”中甲基化水平为零,且在“全甲基化的标准品”中甲基化水平为100%的特定区域,作为标准区域。当以任意比例对“零甲基化的标准品”和“全甲基化的标准品”掺比时,所述标准区域的甲基化水平即为本申请甲基化标准品的实际甲基化水平。例如:以20%全甲基化标准品与80%零甲基化标准品混合后,在经选定的特定区域中,可以视为实际甲基化水平(也可称为理论甲基化水平)为20%。The present application provides a method for the construction of a methylation standard for a capture probe pair. For nucleic acid samples derived from human cell lines, treated with a methyltransferase (such as M.sssI) to obtain a "perfectly methylated standard (PC)", for the corresponding Nucleic acid samples, as "zero methylation standard (NC)". Methylation sequencing was carried out for "zero methylated standard" and "full methylated standard"; A specific region with a methylation level of 100% in the “Standards for Methylation” was used as the standard region. When the "zero methylation standard" and "full methylation standard" are mixed in any ratio, the methylation level in the standard area is the actual methylation level of the methylation standard in this application level. For example: after mixing 20% full methylation standard with 80% zero methylation standard, in the selected specific region, it can be regarded as the actual methylation level (also called theoretical methylation level ) is 20%.
甲基化转移酶(例如M.SssI)酶的反应条件为:37℃下反应15min,65℃下反应20min。图1和图3中左右两侧显示的是本申请“零甲基化的标准品”和“全甲基化的标准品”的甲基化测量结果。通过多次测量,NC甲基化水平为0-0.002;PC甲基化水平为0.97-1.00,本申请的甲基化标准品适用于捕获探针的评估。The reaction conditions for the methyltransferase (eg M.SssI) enzyme are: react at 37° C. for 15 minutes, and react at 65° C. for 20 minutes. The left and right sides of Figure 1 and Figure 3 show the methylation measurement results of the "zero methylation standard" and "full methylation standard" of the present application. Through multiple measurements, the methylation level of NC is 0-0.002; the methylation level of PC is 0.97-1.00, and the methylation standard of this application is suitable for the evaluation of capture probes.
其中,互补链组表示仅对于互补链(CTOT+CTOB)设计捕获探针,目标链组表示对于目标链(OT+OB)设计捕获探针,双链组表示对于目标链和互补链的双链设计捕获探针。其中,OT表示目标上链,CTOT表示目标上链的互补链;OB表示目标下链,CTOB表示目标下链的互补链。Among them, the complementary strand group means that the capture probe is only designed for the complementary strand (CTOT+CTOB), the target strand group means that the capture probe is designed for the target strand (OT+OB), and the double-strand group means the double-strand for the target strand and the complementary strand Design capture probes. Among them, OT means the upper chain of the target, CTOT means the complementary chain of the upper chain of the target; OB means the lower chain of the target, and CTOB means the complementary chain of the lower chain of the target.
实施例5Example 5
本申请捕获探针组合的性能结果Performance results of this application capture probe combination
以20%和50%掺比的20%和50%的标准品为例,此标准品可用于不同探针批次的准确性评估。图1显示的是本申请“20%的标准品”和“50%的标准品”的甲基化测量结果。结果如下:Take the 20% and 50% standard in the 20% and 50% ratio as an example, this standard can be used for the accuracy evaluation of different probe batches. Figure 1 shows the methylation measurement results of "20% standard" and "50% standard" of the present application. The result is as follows:
表1:20%与50%探针准确性评估结果Table 1: 20% and 50% probe accuracy evaluation results
Figure PCTCN2022100272-appb-000020
Figure PCTCN2022100272-appb-000020
(1)理论甲基化水平与实际测试甲基化水平的偏差评估,图3横轴表示理论的甲基化水平,纵轴表示实际测试的甲基化水平,结果如下:双链探针设计实测甲基化信号平均值和理论甲基化水平较接近;20%甲基化水平时,双链、目标链、互补链的甲基化检测平均值与理论值的偏差(偏差=(检测值-理论值)/理论值)分别为:0.28,0.32,0.28;50%甲基化水平时双链、目标链、互补链的甲基化检测平均值与理论值的偏差分别为:0.14,0.15,0.13。(1) Evaluation of the deviation between the theoretical methylation level and the actual test methylation level. The horizontal axis in Figure 3 represents the theoretical methylation level, and the vertical axis represents the actual test methylation level. The results are as follows: double-stranded probe design The average value of the measured methylation signal is closer to the theoretical methylation level; when the methylation level is 20%, the deviation between the average value of the methylation detection of the double strand, the target strand, and the complementary strand and the theoretical value (deviation=(detection value -theoretical value)/theoretical value) are respectively: 0.28, 0.32, 0.28; when the methylation level is 50%, the deviations between the average value of the methylation detection of the double strand, the target strand and the complementary strand and the theoretical value are respectively: 0.14, 0.15 , 0.13.
(2)理论甲基化水平与实际测试甲基化水平的波动评估,双链探针设计实测甲基化信号波动最小,针对双链、目标链、互补链设计的三种探针的最大值最小值的差异,20%甲基化水平的波动为0.22,0.24,0.25,50%甲基化水平的波动为0.22,0.25,0.27。(2) Fluctuation evaluation between the theoretical methylation level and the actual test methylation level, the double-strand probe design has the smallest fluctuation of the measured methylation signal, and the maximum value of the three probes designed for the double-strand, target strand, and complementary strand The difference of minimum value, the fluctuation of 20% methylation level is 0.22, 0.24, 0.25, and the fluctuation of 50% methylation level is 0.22, 0.25, 0.27.
捕获探针组合的均一性用来评估探针对不同靶标区域覆盖的均匀程度,变异系数CV范围,以下图示横轴表示不同的甲基化水平,纵轴表示测序深度。图2A-2C显示的是本申请针对双链、目标链、互补链设计的三种探针的均一性测量结果。结果显示双链探针设计的均一度好于单独的互补链探针设计,和传统的目标链探针设计接近。The uniformity of the capture probe combination is used to evaluate the uniformity of probe coverage on different target regions, and the CV range of the coefficient of variation. The horizontal axis of the figure below represents different methylation levels, and the vertical axis represents the sequencing depth. Figures 2A-2C show the uniformity measurement results of the three probes designed by the present application for the double strand, the target strand, and the complementary strand. The results showed that the uniformity of the double-strand probe design was better than that of the complementary-strand probe design alone, and was close to that of the traditional target-strand probe design.
重复性的评估,采用不同甲基化水平检测的均方差来评估不同探针设计的重复性,横轴 表示不同的甲基化水平,纵轴表示重复样本之间的偏差,值越小表示检测方法越稳定。The evaluation of repeatability, using the mean square error of detection of different methylation levels to evaluate the repeatability of different probe designs, the horizontal axis represents different methylation levels, the vertical axis represents the deviation between repeated samples, the smaller the value represents the detection The method is more stable.
表2:20%与50%探针重复性评估结果Table 2: 20% and 50% probe repeatability evaluation results
边界boundary 理论甲基化水平theoretical methylation level 双链重复性duplex repeatability 目标链重复性target chain repeatability 互补链重复性Complementary strand repeatability
最小值minimum value 0.50.5 1.34E-051.34E-05 2.49E-052.49E-05 1.67E-051.67E-05
中值median value 0.50.5 7.03E-057.03E-05 1.23E-041.23E-04 9.16E-059.16E-05
最大值maximum value 0.50.5 2.19E-042.19E-04 3.54E-043.54E-04 3.04E-043.04E-04
最小值minimum value 0.20.2 1.55E-051.55E-05 2.60E-052.60E-05 2.51E-052.51E-05
中值median value 0.20.2 8.05E-058.05E-05 1.22E-041.22E-04 1.12E-041.12E-04
最大值maximum value 0.20.2 2.67E-042.67E-04 3.58E-043.58E-04 3.40E-043.40E-04
图3显示的是,本申请针对双链、目标链、互补链设计的三种探针的重复性测量结果。结果显示双链探针设计的重复性好于单独的互补链探针设计,和目标链探针设计接近。Figure 3 shows the repeatability measurement results of the three probes designed by the present application for the double strand, the target strand, and the complementary strand. The results showed that the reproducibility of the double-strand probe design was better than that of the complementary-strand probe design alone, and was close to that of the target-strand probe design.
对于目标链,20%甲基化标准品评估的重复性中值为1.22E-04,50%甲基化标准品评估的重复性中值为1.23E-04;对于互补链,20%甲基化标准品评估的重复性中值为1.12E-04,50%甲基化标准品评估的重复性中值为9.16E-05;对于本申请优选的双链,20%甲基化标准品评估的重复性中值为8.05E-05,50%甲基化标准品评估的重复性中值为7.03E-05。For the target strand, the median estimated repeatability was 1.22E-04 for the 20% methylated standard and 1.23E-04 for the 50% methylated standard; for the complementary strand, 20% methylated Median repeatability of assessments for methylated standards was 1.12E-04, median repeatability for assessments of 50% methylated standards was 9.16E-05; for double strands preferred for this application, 20% methylated standards assessed The median repeatability value of 8.05E-05 was 8.05E-05, and the repeatability median value of 50% methylation standard evaluation was 7.03E-05.
捕获链偏好性评估不同探针对目标链(OT+OB)和互补链(CTOT+CTOB)的捕获的深度,图示横轴均表示目标链的覆盖深度,纵轴表示互补链的测序深度,结果显示使用双链探针捕获的具有较低的链偏好性R^2。The capture strand preference evaluates the depth of capture of the target strand (OT+OB) and complementary strand (CTOT+CTOB) by different probes. The horizontal axis of the figure indicates the coverage depth of the target strand, and the vertical axis indicates the sequencing depth of the complementary strand. The results showed a lower strand preference R^2 for capture using double-stranded probes.
图4A-4C显示的是本申请针对双链、目标链、互补链设计的三种探针的偏好性测量结果。结果显示双链探针设计的偏好性好于单独的互补链探针设计,和传统的目标链探针设计接近。Figures 4A-4C show the bias measurement results of the three probes designed by the present application for the double strand, the target strand, and the complementary strand. The results showed that the design bias of the double-strand probe was better than that of the complementary-strand probe design alone, and was close to that of the traditional target-strand probe design.
前述详细说明是以解释和举例的方式提供的,并非要限制所附权利要求的范围。目前本申请所列举的实施方式的多种变化对本领域普通技术人员来说是显而易见的,且保留在所附的权利要求和其等同方案的范围内。The foregoing detailed description has been offered by way of explanation and example, not to limit the scope of the appended claims. Variations on the presently recited embodiments of this application will be apparent to those of ordinary skill in the art and remain within the scope of the appended claims and their equivalents.

Claims (15)

  1. 一种核酸分子组合,所述核酸分子组合包含至少一个覆盖一待测核酸的一目标区域的核酸探针组,其特征在于,所述核酸探针组至少包含:(1)与第一链互补的第一探针,所述第一链为所述目标区域经过碱基替换后的序列;(2)与第二链互补的第二探针,所述第二链为所述目标区域的互补区域经过碱基替换后的序列;并且包含以下两个探针中的任意一个或同时包含两个:(3)与第三链互补的第三探针,所述第三链与所述第一链互补;(4)与第四链互补的第四探针,所述第四链为第二链的互补序列。A nucleic acid molecule combination, the nucleic acid molecule combination comprising at least one nucleic acid probe set covering a target region of a nucleic acid to be detected, characterized in that the nucleic acid probe set at least comprises: (1) complementary to the first strand The first probe of the first strand is the sequence of the target region after base substitution; (2) the second probe complementary to the second strand, the second strand is the complementary sequence of the target region The sequence of the region after base substitution; and contains any one or both of the following two probes: (3) a third probe that is complementary to the third strand, and the third strand is complementary to the first strand complementary; (4) a fourth probe complementary to a fourth strand which is the complementary sequence of the second strand.
  2. 如权利要求1所述的核酸分子组合,其特征在于,所述核酸分子组合中的核酸分子对于来源于目标区域的核酸序列的结合自由能与对于来源于非目标区域的核酸序列的结合自由能相差大于一特定阈值,所述特定阈值为约12至50kcal/mol;优选地,所述特定阈值优选为约20-30kcal/mol。The combination of nucleic acid molecules according to claim 1, wherein the free energy of binding of the nucleic acid molecules in the combination of nucleic acid molecules to the nucleic acid sequence derived from the target region and the free energy of binding to the nucleic acid sequence derived from the non-target region The difference is greater than a certain threshold value, said certain threshold value being about 12 to 50 kcal/mol; preferably, said certain threshold value being preferably about 20-30 kcal/mol.
  3. 如权利要求1-2中任一项所述的核酸分子组合,其特征在于,所述碱基替换包含通过化学和/或生物过程取得胞嘧啶替换为胸腺嘧啶或尿嘧啶的核酸序列。The combination of nucleic acid molecules according to any one of claims 1-2, wherein the base substitution comprises a nucleic acid sequence in which cytosine is replaced by thymine or uracil through chemical and/or biological processes.
  4. 如权利要求1-3中任一项所述的核酸分子组合,其特征在于,所述核酸探针组还包含:(1)与第五链互补的第五探针,所述第五链为所述目标区域未经过碱基替换的序列;(2)与第六链互补的第六探针,所述第六链为所述目标区域的互补区域未经过碱基替换的序列;(3)与第七链互补的第七探针,所述第七链与所述第五链互补;(4)与第八链互补的第八探针,所述第八链为第六链的互补序列。The nucleic acid molecule combination according to any one of claims 1-3, wherein the nucleic acid probe set further comprises: (1) a fifth probe complementary to the fifth strand, the fifth strand being The sequence of the target region without base substitution; (2) the sixth probe complementary to the sixth strand, the sixth strand being the sequence of the complementary region of the target region without base substitution; (3) A seventh probe complementary to the seventh strand, the seventh strand being complementary to the fifth strand; (4) an eighth probe complementary to the eighth strand, the eighth strand being the complementary sequence of the sixth strand .
  5. 如权利要求1-4中任一项所述的核酸分子组合,其特征在于,所述核酸分子组合中对于20%甲基化水平的标准品的检测结果符合以下指标:波动为25%或更低、和/或重复性为9E-05或更低;优选地,所述波动为检测结果最大值与最小值的差值,所述重复性为复孔间甲基化水平均方误差中值。The nucleic acid molecule combination according to any one of claims 1-4, wherein the detection result of the standard substance of the 20% methylation level in the nucleic acid molecule combination meets the following index: the fluctuation is 25% or more Low, and/or repeatability is 9E-05 or lower; preferably, the fluctuation is the difference between the maximum value and the minimum value of the detection result, and the repeatability is the median value of the mean square error of the methylation level between multiple wells .
  6. 如权利要求1-5中任一项所述的核酸分子组合,所述核酸分子组合中的核酸分子的长度为约80至约120个碱基、所述核酸分子组合中的任意两个核酸分子重叠的区域包含约10至约110个碱基和/或所述核酸分子组合中的核酸分子所互补的区域不包含与重复区域交叠10个或更多个的连续碱基。The nucleic acid molecule combination according to any one of claims 1-5, the length of the nucleic acid molecule in the nucleic acid molecule combination is about 80 to about 120 bases, any two nucleic acid molecules in the nucleic acid molecule combination The region of overlap comprises about 10 to about 110 bases and/or the region to which the nucleic acid molecules in the combination of nucleic acid molecules are complementary does not comprise 10 or more contiguous bases overlapping the repeat region.
  7. 如权利要求1-6中任一项所述的核酸分子组合在制备人肿瘤基因检测制剂中的应用。Use of the nucleic acid molecule combination according to any one of claims 1-6 in the preparation of human tumor gene detection preparations.
  8. 如权利要求7所述的应用,所述检测制剂用于检测目标区域的碱基修饰水平;优选地,所述碱基修饰包含甲基化修饰。The use according to claim 7, wherein the detection preparation is used to detect the base modification level of the target region; preferably, the base modification comprises methylation modification.
  9. 如权利要求7-8中任一项所述的应用,其特征在于,所述人肿瘤来自于同质肿瘤(homogenous tumors)、异质肿瘤、血液癌和/或实体瘤;优选地,所述人肿瘤来自于以下组 的癌症中的一种或多种:脑癌、肺癌、皮肤癌、鼻咽癌、咽喉癌、肝癌、骨癌、淋巴瘤、胰腺癌、皮肤癌、肠癌、直肠癌、甲状腺癌、膀胱癌、肾癌、口腔癌、胃癌、实体瘤、卵巢癌、食管癌、胆囊癌、胆道癌、乳腺癌、宫颈癌、子宫癌、前列腺癌、头颈癌、肉瘤、胸腔恶性肿瘤(除肺外)、黑色素瘤、和睾丸癌。The application according to any one of claims 7-8, wherein the human tumors are from homogenous tumors, heterogeneous tumors, blood cancers and/or solid tumors; preferably, the Human tumor One or more cancers from the following group: brain cancer, lung cancer, skin cancer, nasopharyngeal cancer, throat cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, bowel cancer, rectal cancer , thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumors, ovarian cancer, esophagus cancer, gallbladder cancer, biliary tract cancer, breast cancer, cervical cancer, uterine cancer, prostate cancer, head and neck cancer, sarcoma, thoracic malignancies (except lung), melanoma, and testicular cancer.
  10. 一种用作评估权利要求7-9中任一项应用中检测的碱基修饰程度准确性的标准品核酸分子,其特征在于,所述标准品核酸分子包含碱基修饰程度为约0%候选区域,所述候选区域的总长度为约1bp-约10000bp。A standard substance nucleic acid molecule used as the accuracy of the degree of base modification detected in the application of any one of claims 7-9, characterized in that, the standard substance nucleic acid molecule comprises a base modification degree of about 0% candidate Region, the total length of the candidate region is about 1bp-about 10000bp.
  11. 如权利要求10所述的标准品核酸分子,其特征在于,所述标准品核酸分子选自以下细胞系中的一种或多种:GM24385、GM12878、GM12877、GM24631。The standard nucleic acid molecule according to claim 10, wherein the standard nucleic acid molecule is selected from one or more of the following cell lines: GM24385, GM12878, GM12877, and GM24631.
  12. 如权利要求10-11中任一项所述的标准品核酸分子,其特征在于,所述碱基修饰程度包含所述候选区域内胞嘧啶的甲基化程度。The standard nucleic acid molecule according to any one of claims 10-11, wherein the degree of base modification comprises the degree of methylation of cytosine in the candidate region.
  13. 如权利要求10-12中任一项所述的标准品核酸分子,其特征在于,通过预定比例混合所述未经碱基修饰处理的标准品以及全部经碱基修饰处理的核酸分子或标准品,得到预定碱基修饰程度的甲基化标准品。The standard nucleic acid molecule according to any one of claims 10-12, characterized in that, the standard substance without base modification and all nucleic acid molecules or standards that have been processed through base modification are mixed in a predetermined ratio , to obtain methylated standards with a predetermined degree of base modification.
  14. 如权利要求13所述的核酸分子,其特征在于,所述全部经碱基修饰处理的核酸分子或标准品的占比为20%或50%。The nucleic acid molecule according to claim 13, wherein the proportion of all the nucleic acid molecules or standards subjected to base modification is 20% or 50%.
  15. 如权利要求10-14中任一所述的核酸分子,所述碱基修饰处理包含使所述核酸分子与甲基化转移酶接触。The nucleic acid molecule according to any one of claims 10-14, wherein the base modification treatment comprises contacting the nucleic acid molecule with a methyltransferase.
PCT/CN2022/100272 2022-02-28 2022-06-22 Genetic diagnosis probes and use thereof WO2023159817A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210185115.7 2022-02-28
CN202210185115.7A CN114438080A (en) 2022-02-28 2022-02-28 Gene diagnosis probe and application thereof

Publications (1)

Publication Number Publication Date
WO2023159817A1 true WO2023159817A1 (en) 2023-08-31

Family

ID=81373469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100272 WO2023159817A1 (en) 2022-02-28 2022-06-22 Genetic diagnosis probes and use thereof

Country Status (2)

Country Link
CN (1) CN114438080A (en)
WO (1) WO2023159817A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114438080A (en) * 2022-02-28 2022-05-06 广州燃石医学检验所有限公司 Gene diagnosis probe and application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120149593A1 (en) * 2009-01-23 2012-06-14 Hicks James B Methods and arrays for profiling dna methylation
CN107447004A (en) * 2017-08-11 2017-12-08 北京呈诺医学科技有限公司 The method for detecting specificity of DNA methylation PCR detection primers or probe
CN108018336A (en) * 2018-01-05 2018-05-11 山东师范大学 A kind of DNA methylation detection kit and its application method
CN112522407A (en) * 2020-12-14 2021-03-19 北京起源聚禾生物科技有限公司 Ultra-sensitive detection method for methylation detection of plasma free DNA (deoxyribonucleic acid) genes
CN114438080A (en) * 2022-02-28 2022-05-06 广州燃石医学检验所有限公司 Gene diagnosis probe and application thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9624530B2 (en) * 2009-08-03 2017-04-18 Epigenomics Ag Methods for preservation of genomic DNA sequence complexity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120149593A1 (en) * 2009-01-23 2012-06-14 Hicks James B Methods and arrays for profiling dna methylation
CN107447004A (en) * 2017-08-11 2017-12-08 北京呈诺医学科技有限公司 The method for detecting specificity of DNA methylation PCR detection primers or probe
CN108018336A (en) * 2018-01-05 2018-05-11 山东师范大学 A kind of DNA methylation detection kit and its application method
CN112522407A (en) * 2020-12-14 2021-03-19 北京起源聚禾生物科技有限公司 Ultra-sensitive detection method for methylation detection of plasma free DNA (deoxyribonucleic acid) genes
CN114438080A (en) * 2022-02-28 2022-05-06 广州燃石医学检验所有限公司 Gene diagnosis probe and application thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUANHUA WEI, PAN SHIYANG, WANG XIAN, CHEN DAN, ZHANG LIXIA, HUANG PEIJUN, TONG MINGQIN: "Quality control of the procedure of methylation-specific PCR ", CHINESE JOURNAL OF CLINICAL LABORATORY SCIENCE, vol. 24, no. 1, 15 January 2006 (2006-01-15), pages 44 - 48, XP093087182 *

Also Published As

Publication number Publication date
CN114438080A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
Khodakov et al. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches
KR102028375B1 (en) Systems and methods to detect rare mutations and copy number variation
TWI832482B (en) Determination of base modifications of nucleic acids
CN110785490A (en) Compositions and methods for detecting genomic variations and DNA methylation status
CN110964814B (en) Primers, compositions and methods for nucleic acid sequence variation detection
CN114574581A (en) System and method for detecting rare mutations and copy number variations
CN110982907B (en) Thyroid nodule-related rDNA methylation marker and application thereof
JP2019536474A (en) Multiplexed detection method for methylated DNA
US20190106735A1 (en) Method for analyzing cancer gene using multiple amplification nested signal amplification and kit
EP2699695A2 (en) Prostate cancer markers
JP2021531016A (en) Cell-free DNA damage analysis and its clinical application
WO2023159817A1 (en) Genetic diagnosis probes and use thereof
CN101575639A (en) DNA sequencing method capable of verifying base information for second time
CN112210601A (en) Colorectal cancer screening kit based on fecal sample
CN115103909A (en) Systems and methods for targeted nucleic acid capture
CN114667355A (en) Method for detecting colorectal cancer
WO2022262831A1 (en) Substance and method for tumor assessment
US8377657B1 (en) Primers for analyzing methylated sequences and methods of use thereof
US20130309667A1 (en) Primers for analyzing methylated sequences and methods of use thereof
CN113637754B (en) Application of biomarker in diagnosis of esophageal cancer
CN112210602B (en) Colorectal cancer screening method based on fecal sample
CN115491411A (en) Methylation marker for identifying pancreatitis and pancreatic cancer and application thereof
WO2019161253A1 (en) Methods for sequencing with single frequency detection
KR20190116773A (en) Molecularly Indexed Bisulfite Sequencing
KR102559496B1 (en) Composition for determining whether a nucleic acid is methylated and Method for determining whether a nucleic acid is methylated

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928088

Country of ref document: EP

Kind code of ref document: A1