WO2023104136A1 - Methylation marker in diagnosis of benign and malignant nodules of thyroid cancer and applications thereof - Google Patents

Methylation marker in diagnosis of benign and malignant nodules of thyroid cancer and applications thereof Download PDF

Info

Publication number
WO2023104136A1
WO2023104136A1 PCT/CN2022/137459 CN2022137459W WO2023104136A1 WO 2023104136 A1 WO2023104136 A1 WO 2023104136A1 CN 2022137459 W CN2022137459 W CN 2022137459W WO 2023104136 A1 WO2023104136 A1 WO 2023104136A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
sequence
genome
benign
vicious
Prior art date
Application number
PCT/CN2022/137459
Other languages
French (fr)
Chinese (zh)
Inventor
苏志熙
刘轶颖
徐敏杰
马成城
刘蕊
Original Assignee
江苏鹍远生物科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏鹍远生物科技股份有限公司 filed Critical 江苏鹍远生物科技股份有限公司
Publication of WO2023104136A1 publication Critical patent/WO2023104136A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/166Oligonucleotides used as internal standards, controls or normalisation probes

Definitions

  • the invention relates to a methylation marker for the diagnosis of benign and malignant nodules of thyroid cancer and its application.
  • Thyroid cancer is a malignant tumor originating from the thyroid follicular epithelium. Women are more likely to be affected, and the male to female incidence ratio is 1: (2 to 4). The age of onset is generally 21-40 years old. Papillary thyroid cancer (PTC) is the most common thyroid cancer, accounting for about 80% of all thyroid cancers. In recent years, the incidence of thyroid cancer in China has been on the rise. As long as thyroid cancer is detected early and treated in time, the prognosis is good, and the 10-year survival rate can reach more than 90%. However, if the early diagnosis is missed, the disease develops to a locally advanced stage, the chance of surgery is lost, and the 5-year survival rate drops significantly.
  • PTC Papillary thyroid cancer
  • the routine clinical diagnosis method is imaging examination. Ultrasound examination is highly suspicious of malignant thyroid nodules, and further fine needle aspiration cytology (fine needle aspiration, FNA) examination is needed to confirm the diagnosis. Due to the similar cytological features of malignant nodules and benign nodules, some PTCs are difficult to diagnose, and up to 40% of thyroid nodules are difficult to be accurately diagnosed by cytological features. Current molecular diagnostic methods have improved differential accuracy, but the sensitivity of these methods still needs to be improved. Gene Expression Classifier is widely used, but its positive predictive value (positive predictive value, PPV) is only 47%, and it can only be tested on fresh punctured tissues, which limits the wide application of some samples.
  • PPV positive predictive value
  • ThyroSeqv2 detects H/K/NRAS gene mutations and RET/PTC gene rearrangements frequently carried by benign nodules, and its PPV is only 42-77%.
  • DDMS Diagnostic DNA Methylation Signature approach
  • the accuracy of this method is very high, some samples cannot be detected by this method due to technical reasons [John H Yim, Audrey H Choi, Arthur X Li, etc., Identification of Tissue-Specific DNA Methylation Signatures for Thyroid Nodule Diagnostics, Clin Cancer Res , 2019 Jan 15;25(2):544-551].
  • the first aspect of the present invention provides the application of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers in the preparation of a detection reagent or a diagnostic kit for diagnosing benign and malignant thyroid nodules in individuals , and the application of the device for determining the methylation status or level of at least one CpG dinucleotide of the following one or more target markers in the preparation of a diagnostic kit for diagnosing benign and malignant thyroid nodules in individuals, wherein, The one or more target markers are selected from: PRDM16 gene or genome PRDM16 sequence, CAMK2N1 gene or genome CAMK2N1 sequence, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1
  • the one or more target markers are selected from: PRDM16 gene or genomic PRDM16 sequence, BIN1 gene or genomic BIN1 sequence, LIMK1 gene or genomic LIMK1 sequence, EGR3 gene or genomic EGR3 sequence, PPIF gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence , C19orf77 gene or genome C19orf77 sequence, ICAM5 gene or genome ICAM5 sequence, CRTC1 gene or genome CRTC1 sequence, RTN4R gene or genome RTN4R sequence, CAMK2N1 gene or genome CAMK2N1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, DUSP26 DUSP26 sequence of gene or genome, ICAM2 gene or ICAM2 sequence of genome, BAIAP2 gene or genome of BAIAP
  • the one or more target markers include at least one or more of the following target markers: EGR3 gene or genomic EGR3 sequence, TNK1 gene or genomic TNK1 sequence, DNASE1L3 gene Or the DNASE1L3 sequence of the genome, the DUSP26 gene or the DUSP26 sequence of the genome, the BAIAP2 gene or the BAIAP2 sequence of the genome, the MED16 gene or the MED16 sequence of the genome, the C19orf77 gene or the C19orf77 sequence of the genome, the NOL4L-DT gene or the NOL4L-DT sequence of the genome, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, BCR gene or genome BCR sequence.
  • EGR3 gene or genomic EGR3 sequence TNK1 gene or genomic TNK1 sequence
  • the DUSP26 gene or the DUSP26 sequence of the genome the BAIAP2 gene or the BAIAP2 sequence of
  • the one or more target markers include: PRDM16 gene or genomic PRDM16 sequence, BIN1 gene or genomic BIN1 sequence, LIMK1 gene or genomic LIMK1 sequence, EGR3 gene or genomic EGR3 sequence, PPIF gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence, C19orf77 gene or genome C19orf77 sequence, ICAM5 gene or genome ICAM5 sequence, CRTC1 gene or genome CRTC1 sequence, and RTN4R gene or genome RTN4R sequence.
  • the CAMK2N1 gene or genomic CAMK2N1 sequence DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, ICAM2 gene or genomic ICAM2 sequence, BAIAP2 Gene or genome BAIAP2 sequence, MED16 gene or genome MED16 sequence, C19orf77 gene or genome C19orf77 sequence, and NOL4L-DT gene or genome NOL4L-DT sequence.
  • TACSTD2 gene or genomic TACSTD2 sequence DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, TNK1 gene or genomic TNK1 sequence, BAIAP2 BAIAP2 sequence of gene or genome, MED16 gene or MED16 sequence of genome, NOL4L-DT gene or NOL4L-DT sequence of genome, and BCR gene or BCR sequence of genome.
  • the Hg19 coordinates of the one or more target markers are as follows: PRDM16 gene: chr1:3155061:3155760; CAMK2N1 gene: chr1:20813203:20813902; TACSTD2 gene: chr1:59041615:59042314; CRABP2 gene: chr1:156676274:156676973; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454538; BIN1 gene: chr2: 127822196:127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene:
  • the Hg19 coordinates of the one or more target markers are as follows: PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453:20813652; TACSTD2 gene: chr1:59041865:59042064; CRABP2 gene: chr1:156676524:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2: 127822446:127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene:
  • the reagents include primer and/or probe molecules; preferably, the primer molecules are identical to, complementary to, or hybridize under stringent conditions to the one or more target markers and Containing at least 9 consecutive nucleotides, the probe molecule hybridizes to the amplification product of the one or more target markers under stringent conditions.
  • the reagents described above are those required for performing genome-wide reduced methylation sequencing techniques.
  • the reagents required for implementing the simplified genome methylation sequencing technology include reagents required for enzyme digestion, reagents required for library construction (such as end repair, adding A tails and adapters, etc.), Reagents required for cytosine conversion, reagents required for PCR amplification, etc.
  • One or more of the above-mentioned reagents may be included in the detection reagent or diagnostic kit of the present invention.
  • the second aspect of the present invention provides a method for detecting the methylation status or methylation level of at least one CpG dinucleotide of one or more target markers described in any embodiment herein to diagnose benign and malignant thyroid nodules
  • a diagnostic reagent or diagnostic kit comprising reagents for detecting the methylation status or level of at least one CpG dinucleotide of one or more markers of interest.
  • the diagnostic reagent or diagnostic kit includes primer and/or probe molecules, wherein the primer molecules are identical to, complementary to, or hybridize under stringent conditions to the one or more A target marker and comprising at least 9 consecutive nucleotides; the probe molecule hybridizes with the amplification product of the one or more target markers under stringent conditions; optionally, the diagnostic reagent or diagnostic reagent
  • the box also includes primer molecules and/or probe molecules for detecting the internal reference gene ACTB.
  • the diagnostic reagent or diagnostic kit also includes one or more substances selected from the following: PCR buffer, polymerase, dNTP, restriction endonuclease, enzyme cleavage buffer, Fluorescent dyes, fluorescent quenchers, fluorescent reporters, exonuclease, alkaline phosphatase, internal standards, controls, KCl, MgCl 2 and (NH 4 ) 2 SO 4 .
  • the reagents for detecting methylation also include reagents used in one or more of the following methods: PCR based on bisulfite conversion, DNA sequencing, methylation-sensitive restriction Endonuclease assays, fluorometric assays, methylation-sensitive high-resolution melting curves, chip-based methylation profiling, and mass spectrometry.
  • the reagent is selected from one or more of the following: bisulfite and derivatives thereof, fluorescent dyes, fluorescent quenchers, fluorescent reporters, internal standards, and controls.
  • a third aspect of the invention provides at least one reagent or set of reagents for distinguishing between methylated and unmethylated CpG dinucleotides in at least one target region of genomic DNA prepared for detecting and/or classifying thyroid nodules in an individual Use in a kit for a method of benign and malignant, wherein said method comprises contacting genomic DNA isolated from said individual biological sample with said at least one reagent or set of reagents, wherein said target region is identical to or complementary to A sequence of at least 16 contiguous nucleotides of one or more target markers according to any of the embodiments herein, wherein said contiguous nucleotides comprise at least one CpG dinucleotide sequence, thereby at least partially providing for Detection and/or classification of benign and malignant thyroid nodules.
  • a fourth aspect of the present invention provides one or more reagents, amplification enzymes, and other bases that convert unmethylated cytosine bases at position 5 to uracil or other bases that are detectably different from cytosine in terms of hybridization properties.
  • step b) the genomic DNA or fragments thereof are treated with a reagent selected from the group consisting of bisulfite, acid sulfite, pyrosulfite and combinations thereof.
  • thermostable DNA polymerase as said amplification enzyme, using a polymerase lacking 5'-3' exonuclease activity, using polymerase chain reaction and/or Or generate an amplification product with a detectable label for contacting or amplifying nucleic acid molecules.
  • the contacting or amplifying in c) comprises the use of methylation-specific primers.
  • a fifth aspect of the present invention provides one or more methylation-sensitive restriction enzymes and amplification enzymes and at least one primer comprising at least 9 contiguous nucleotides prepared for detection and/or classification of benign thyroid nodules in individuals Use in a kit for a malignant method, wherein the primers are identical to, complementary to, or hybridize under stringent conditions to one or more target markers described in any embodiment herein; the method comprises: a) Isolating genomic DNA from the individual biological sample; b) digesting the genomic DNA or fragments thereof described in a) with the one or more methylation-sensitive restriction enzymes, and allowing the resulting digestion product to combine with the amplification enzyme and the contacting said at least one primer; and c) determining the methylation status or level of at least one CpG dinucleotide of said one or more markers of interest based on the presence or absence or nature of said amplified product, whereby Detecting and/or classifying, at least in part, benign or malignant thyroid nodules in an individual
  • the presence or absence of the amplified product is determined by hybridizing at least one nucleic acid or peptide nucleic acid that is identical to or complementary to the group selected from the one or more A fragment of at least 16 bases in length of the sequence of the marker of interest.
  • the sixth aspect of the present invention provides the use of the processed nucleic acid derived from one or more target markers described in any embodiment herein in the preparation of a kit for diagnosing benign and malignant thyroid nodules, wherein the processing is suitable for for converting at least one unmethylated cytosine base of the one or more markers of interest to uracil or other bases that hybridize detectably different from cytosine.
  • the seventh aspect of the present invention provides a device for detecting and diagnosing benign and malignant thyroid nodules in individuals, the device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The following steps are achieved when the procedure is described: (1) obtaining the methylation level or methylation state of at least one CpG dinucleotide of one or more target markers described in any embodiment herein in the sample, and (2) The benign and malignant thyroid nodules were judged according to the methylation level or methylation status of (1).
  • Figure 1 ROC curves for malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 1
  • Figure 2 ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 2.
  • Figure 3 ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 3.
  • Figure 4 ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 4.
  • Figure 5 ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 5.
  • target markers related to malignant thyroid nodules include: PRDM16 gene or genomic PRDM16 sequence, CAMK2N1 gene or genomic CAMK2N1 sequence, TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1 gene or genome BIN1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, LSG1 gene or genome LSG1 sequence, SH3BP2 gene or genome SH3BP2 sequence, SLC12A7 gene or genome SLC12A7 sequence, NR2F1 gene or genome NR2F1 sequence, EGR1 gene or genome EGR1 sequence, LARP1 gene or genomic LARP1 sequence, RARS gene or genomic RARS sequence, TTBK1 gene or genomic TTBK1 sequence
  • target marker refers to a target nucleic acid or gene region whose methylation level indicates benign or malignant nodular thyroid nodules.
  • the term "marker of interest” shall be considered to include all transcript variants of the genes described herein and all promoters and regulatory elements thereof.
  • certain genes are known to exhibit allelic variation or single nucleotide polymorphisms ("SNPs") between individuals. SNPs include insertions and deletions of simple repeats (eg, dinucleotide and trinucleotide repeats) of varying lengths. Accordingly, this application should be understood to extend to all forms of the marker/gene resulting from any other mutation, polymorphism or allelic variation.
  • target marker shall include both the sense strand sequence of the marker or gene and the antisense strand sequence of the marker or gene.
  • the term "marker of interest” as used herein is broadly interpreted to include both 1) the original marker (in a specific methylation state) found in a biological sample or genomic DNA, and 2) its processed sequence ( For example the corresponding area after bisulfite conversion or the corresponding area after MSRE treatment).
  • the bisulfite-converted corresponding region differs from the marker of interest in the genomic sequence by one or more unmethylated cytosine residues being converted to a uracil base, a thymine base, or Other bases that behave differently from cytosine.
  • the MSRE-treated corresponding region differs from the target marker in the genomic sequence by being cleaved at one or more MSRE cleavage sites.
  • the molecular diagnosis in the present invention includes not only the early diagnosis of thyroid malignancy, but also the late diagnosis of thyroid malignancy, and also includes thyroid malignancy screening, risk assessment, prognosis, and disease identification.
  • Early diagnosis refers to the possibility of finding cancer before metastasis, preferably before morphological changes in tissue or cells are observable.
  • the chromosomal coordinates are consistent with the Hg19 version of the Human Genome Database released in February 2009 (referred to herein as "Hg19 coordinates"). It should be understood that the sequence of a gene and its genome described herein also includes fragments of each gene containing at least one CpG dinucleotide sequence. In some embodiments, the fragment is the target region of each gene described herein.
  • the HG19 coordinates of each gene mentioned in this article are as follows: PRDM16 gene: CHR1: 3155061: 3155760; CAMK2N1 gene: CHR1: 20813203: 20813902; TACSTD2 gene: CHR1: 59041615: 59042314; CRA BP2 gene: CHR1: 156676274 :156676973; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454 538;BIN1gene:chr2:127822196:127822895;DNASE1L3gene : chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3
  • the EGR3 gene, NAV2 gene, TMC6 gene, C19orf77 gene and RTN4R gene may include the following two Hg coordinate regions:
  • EGR3 gene chr8:22547976:22548675; chr8:22548391:22549090;
  • NAV2 gene chr11:19734801:19735500; chr11:19735660:19736359;
  • TMC6 gene chr17:76113226:76113925; chr17:76123392:76124091;
  • RTN4R gene chr22:20226373:20227072; chr22:20226575:20227274.
  • the Hg coordinate regions of one or more target markers described herein are: PRDM16 gene: chr1: 3155311: 3155510; CAMK2N1 gene: chr1: 20813453: 20813652; TACSTD2 gene: chr1: 59041865 :59042064; CRABP2 gene: chr1:156676524:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527 253;MTHFD2gene:chr2:74454089:74454288;BIN1gene : chr2:127822446:127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH
  • the target marker of the present invention also includes 5 kb upstream of each start site and 5 kb downstream of each end site of each of the above regions.
  • Specific nucleotide sequences for the above Hg19 coordinates, as well as 5 kb upstream of each start site and 5 kb downstream of each end site for each region, are available in public databases such as UCSC Genome Browser, Ensemble, and the NCBI website.
  • the target marker of the present invention (such as a certain gene and its genome sequence, or a fragment of each gene containing at least one CpG dinucleotide sequence, or a sequence comprising an intergenic region) also includes non-enzymatic transformation (such as Corresponding regions after bisulfite conversion, and corresponding regions obtained after enzymatic conversion such as MSRE conversion.
  • non-enzymatic transformation such as Corresponding regions after bisulfite conversion, and corresponding regions obtained after enzymatic conversion such as MSRE conversion.
  • the target markers of the present invention also include various variants of the above-mentioned genes.
  • Variants include at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% sequence identity to a gene or region described herein from the same region Nucleic acid sequences that are neutral (ie, have one or more deletions, insertions, substitutions, reverse sequences, etc.). Accordingly, the disclosure of this application should be understood to extend to such variants which achieve the same result, notwithstanding the fact that there are minor genetic variations in the actual nucleic acid sequence between individuals.
  • the term “percentage (%) of sequence identity” refers to the same percentage of the amino acid (or nucleic acid) residues of the candidate sequence and the amino acid (or nucleic acid) residues of the reference sequence after sequence alignment, when compared Spacers (if necessary) may be introduced to maximize the number of identical amino acids (or nucleic acids).
  • sequence identity percentage (%) of an amino acid sequence (or nucleic acid sequence) can be calculated by dividing the number of amino acid residues (or bases) identical to the reference sequence by the number of amino acid residues (or bases) in the candidate sequence or reference sequence ) (whichever is shorter). Conservative substitutions of amino acid residues may or may not be considered to be the same residue.
  • the percent amino acid (or nucleic acid) sequence identity can be determined, for example, using published tools such as BLASTN, BLASTp (available on the website of the National Center for Biotechnology Information (NCBI), see also Altschul S.F. et al., J.Mol.Biol., 215:403–410 (1990); Stephen F. et al., Nucleic Acids Res., 25:3389–3402 (1997)), ClustalW2 (available at European Bioinformatics Research Institute's website), see also Higgins D.G. et al., Methods in Enzymology, 266:383-402 (1996); Larkin M.A.
  • the target markers of the present invention also include the corresponding regions of the 5kb upstream of the start site and 5kb downstream of the end site of the above-mentioned genes after non-enzymatic conversion (such as bisulfite conversion) or enzymatic treatment (such as formazan Corresponding regions after methylation-sensitive restriction enzyme treatment).
  • the target marker can be from any biological sample of an individual of interest.
  • the term "subject” includes humans and non-human animals. Non-human animals include all vertebrates, such as mammals and non-mammals. “Subject” may also be a livestock such as cattle, pigs, sheep, poultry, and horses; or a rodent such as a rat, mouse; or a non-human primate such as an ape, monkey, rhesus monkey; or a domesticated Animals, such as dogs or cats.
  • the individual is a human or non-human primate.
  • the individual is a human. In this application, "individual”, “subject” and “subject” are used interchangeably.
  • sequences given in Section I above are human sequences.
  • the corresponding positions and corresponding sequences of the above-mentioned genes in the non-human animal genome can be easily determined by using existing technologies.
  • biological sample refers to a biological composition obtained or derived from an individual comprising cells and/or other molecular entities (such as DNA) to be characterized or identified based on physical, biochemical, chemical and/or physiological characteristics ).
  • a biological sample includes, but is not limited to, cells, tissues, organs and/or biological fluids of an individual obtained by any method known to those skilled in the art.
  • the biological sample is selected from the group consisting of histological sections, tissue biopsies, paraffin-embedded tissues, body fluids, surgical resection samples, isolated blood cells, cells isolated from blood, and any combination thereof.
  • the body fluid is selected from the group consisting of whole blood, serum, plasma, and any combination thereof.
  • the biological sample is whole blood of an individual.
  • the biological sample is plasma from an individual.
  • Various methods of preparing plasma from whole blood are known to those skilled in the art.
  • plasma is obtained by centrifuging whole blood from an individual one, two, three, four, five or more times.
  • the biological sample is a biopsy of a thyroid nodule, preferably a fine needle aspiration biopsy.
  • the DNA to be detected can be isolated from said biological sample.
  • the DNA to be detected can be isolated and purified from a biological sample by using various methods known in the art. Isolation and purification can be performed using commercially available kits. For example, DNA is isolated from cells and tissues by lysis of raw materials under highly denaturing and reducing conditions, partial use of protein-degrading enzymes, purification of nucleic acid fractions obtained by phenol/chloroform extraction processes, and separation from water by dialysis or ethanol precipitation. Nucleic acids are recovered in phase (see for example Sambrook, J., Fritsch, E.F. in T. Maniatis, CSH, Molecular Cloning, 1989).
  • reagent systems that are particularly suitable for the purification of DNA fragments from agarose gels, the isolation of plasmid DNA from bacterial lysates, and the isolation of longer chains of nucleic acids (genomic DNA, total cellular RNA).
  • Many of these commercially available purification systems are based on the fairly well-known principle of binding nucleic acids to mineral supports in the presence of solutions of various chaotropic salts. In these systems, suspensions of finely ground glass powder, diatomaceous earth or silica gel are used as support materials.
  • Some other methods of isolating and purifying DNA from biological samples are described in eg US7888006B2 and EP1626085A1. Choosing between methods will be influenced by several factors, including time, expense, and the amount of DNA required.
  • the DNA contained in the biological sample comprises genomic DNA.
  • genomic DNA refers to DNA comprising the complete genome of a cell or organism as well as fragments or parts thereof.
  • Genomic DNA is a large stretch of DNA (e.g., longer than about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or 300 kb) derived from an individual and may have natural modifications, such as DNA methylation .
  • the DNA contained in the biological sample includes cellular DNA.
  • cellular DNA refers to DNA present in a cell, or DNA obtained from a cell in vivo and isolated in vitro, or otherwise manipulated in vitro, so long as the DNA is not removed from the cell in vivo.
  • the DNA contained in the biological sample includes cell-free extracellular DNA.
  • the term "free extracellular DNA” as used herein refers to DNA fragments present outside cells in vivo. The term may also be used to refer to a segment of DNA obtained from an extracellular source in vivo and isolated, or manipulated in vitro. DNA fragments in extracellular free DNA usually have a length of about 100 to 200 bp, presumably related to the length of DNA fragments wrapped in nucleosomes.
  • Cell-free extracellular DNA includes, for example, cell-free extracellular fetal DNA and circulating tumor DNA.
  • cell-free extracellular fetal DNA circulates in the body (eg, blood) of pregnant women and represents the fetal genome, whereas circulating tumor DNA circulates in the body (eg, blood) of cancer patients.
  • cell-free extracellular DNA can be substantially free of the individual's cellular DNA.
  • the cell-free extracellular DNA can comprise less than about 1,000 ng/mL, less than about 100 ng/mL, less than about 10 ng/mL, less than about 1 ng/mL of cellular DNA.
  • Extracellular episomal DNA can be prepared by using conventional techniques known in the art. For example, blood samples can be centrifuged at about 200-20,000 g, about 200-10,000 g, about 200-5,000 g, about 300-4000 g, etc., for about 3-30 minutes, about 3-15 minutes, about 3-10 minutes , about 3-5 minutes to obtain the extracellular DNA of the blood sample.
  • cell-free extracellular DNA from a blood sample can be obtained by centrifuging the individual's plasma or serum one, two, three, four, five or more times.
  • the biological sample may be obtained by microfiltration in order to separate cells and fragments thereof from cell-free fractions comprising soluble DNA.
  • microfiltration can be performed by using a filter, for example, a 0.1-0.45 micron membrane filter, such as a 0.22 micron membrane filter.
  • cell-free extracellular DNA is extracted from whole blood, serum, or plasma for analysis using commercially available DNA extraction products.
  • This extraction method is reported to provide high recovery (>50%) of circulating DNA, and some products (such as the QIAamp Circulating Nucleic Acid Kit from Qiagen) are reported to extract DNA fragments of small size.
  • Typical sample volumes used are 1-5 mL of serum or plasma.
  • cell-free extracellular DNA includes circulating tumor DNA.
  • Circulating tumor DNA (“ctDNA”) is fragmented DNA of tumor origin in body fluids (eg, blood, urine, saliva, sputum, feces, pleural fluid, cerebrospinal fluid, etc.) that are not associated with cells.
  • body fluids eg, blood, urine, saliva, sputum, feces, pleural fluid, cerebrospinal fluid, etc.
  • ctDNA is highly fragmented, with an average length of approximately 150 base pairs.
  • ctDNA typically comprises a very small fraction of cell-free extracellular DNA in bodily fluids such as plasma, eg ctDNA may constitute less than about 10% of plasma DNA. Typically, this percentage is less than about 1%, such as less than about 0.5% or less than about 0.01%.
  • the total amount of plasma DNA is usually very low, eg, about 10 ng/mL plasma.
  • the amount of ctDNA varies from person to person and depends on the type of tumor, its location and, in the case of cancerous tumors, the stage of the cancer.
  • ctDNA is usually very rare in body fluids and can only be detected by extremely sensitive and specific techniques. Detection of ctDNA may be useful in detecting and diagnosing tumors, guiding tumor-specific therapy, monitoring therapy, and monitoring cancer remission.
  • DNA methylation is the biological process of adding a methyl group to a DNA molecule (eg, to one or more cytosine bases of the DNA molecule) (eg, by the action of a DNA methyltransferase).
  • DNA methylation occurs at the 5' position of cytosine-phosphate-guanine (CpG) dinucleotide (ie, "CpG site"), when it occurs at the promoter or first When present in the 5'-CpG-3' dinucleotide in the exon, it can lead to epigenetic inactivation of the gene. It has been well documented that DNA methylation plays an important role in regulating gene expression, tumorigenesis, and other genetic and epigenetic diseases.
  • methylated cytosine residue refers to a derivative of a cytosine residue in which a methyl group is attached to a carbon atom of the cytosine ring (eg, C5).
  • unmethylated cytosine residue refers to an underivatized cytosine residue in which, in contrast to "methylated cytosine residue", there is no Methyl linkage.
  • a CpG site in which cytosine residues are methylated is a methylated CpG site, and a CpG site in which cytosine residues are not methylated is an unmethylated CpG site .
  • conversions can occur between bases of DNA or RNA.
  • Conversion refers to the use of non-enzymatic or enzymatic methods to treat DNA to convert unmodified cytosine bases (cytosine, C) into guanine (G ) combined base (such as uracil base (uracil, U)) process.
  • cytosine, C unmodified cytosine bases
  • G guanine
  • uracil base uracil, U
  • Some reagents are able to distinguish between unmethylated and methylated CpG sites in DNA, resulting in processed DNA. This reagent acts selectively on unmethylated cytosine residues but not significantly on methylated cytosine residues.
  • the reagent may act selectively on methylated cytosine residues but not significantly on unmethylated cytosine residues.
  • some reagents can selectively convert unmethylated cytosine residues to uracil, thymine, or another base that hybridizes differently from cytosine, while methylated cytosine residues remain unmethylated. Transformation state; as another example, some reagents can selectively cleave methylated residues, or selectively cleave unmethylated residues.
  • the original DNA is converted into processed DNA in a manner dependent on whether it is methylated or not, so that the processed DNA can be distinguished from the original DNA by its hybridization behavior.
  • processed DNA refers to a CpG site that has been treated to be able to distinguish between unmethylated and methylated DNA, nucleic acid sequences, gene fragments DNA, nucleic acid sequences, and gene fragments treated with the spot reagents.
  • cytosine conversion can be performed using non-enzymatic or enzymatic methods.
  • non-enzymatic methods include bisulfite or bisulfate treatment.
  • reagents used in non-enzymatic methods include bisulfite reagents.
  • bisulfite reagent refers to, for example, those disclosed herein that can be used to distinguish between methylated and unmethylated CpG dinucleotide sequences, including bisulfite, bisulfite, ions or any combination thereof.
  • the treatment of DNA with a bisulfite reagent is also described as a "bisulfite reaction” or “bisulfite treatment” and refers to a reaction that converts unmethylated cytosine residues, especially is the conversion of unmethylated cytosine residues in nucleic acids to uracil bases, thymine bases, or other bases that differ in hybridization behavior from cytosine in the presence of bisulfite ions, while Therein methylated cytosine residues were not significantly converted.
  • bisulfite treatment can be used to distinguish methylated CpG dinucleotides from unmethylated CpG dinucleotides.
  • methylated cytosine residues are not significantly converted does not exclude very small percentages (e.g., less than 0.1%, less than 0.2%, less than 0.3%, less than 0.4%, less than 0.5%, less than 0.6 %, less than 0.7%, less than 0.8%, less than 0.9%, less than 1%, less than 2%, less than 3%, less than 4%, less than 5%, less than 6%, less than 7%, less than 8%, less than 9%, Less than 10%, less than 11%, less than 12%, less than 13%, less than 14%, less than 15%, less than 16%, less than 17%, less than 18%, less than 19%, less than 20%) of methylated cells Pyrimidine residues are converted to uracil, thymine, or other bases that hybridize differently from cytosine, although it is intended that only unmethylated cytosine residues be converted.
  • very small percentages e.g., less than 0.1%, less than 0.2%, less than 0.3%, less than
  • the bisulfite reagent is selected from the group consisting of ammonium bisulfite, sodium bisulfite, potassium bisulfite, calcium bisulfite, magnesium bisulfite, aluminum bisulfite, sulfurous acid Hydrogen ions, and any combination thereof.
  • the bisulfite reagent is sodium bisulfite.
  • bisulfite reagents are commercially available, eg, MethylCode TM Bisulfite Conversion Kit, EpiMark TM Bisulfite Conversion Kit, EpiJET TM Bisulfite Conversion Kit, EZDNAMethylation-Gold TM Kit, and the like.
  • the bisulfite reaction is performed according to the kit's instructions.
  • Exemplary enzymatic methods include deaminase treatment, and the use of reagents that selectively cleave unmethylated residues but not methylated residues, or selectively cleave methylated residues but not cleave Unmethylated residues.
  • the reagent is a methylation sensitive restriction enzyme (MSRE).
  • methylation-sensitive restriction enzyme refers to an enzyme that selectively digests a nucleic acid based on the methylation status of its recognition site. For restriction enzymes that specifically cleave when the recognition site is unmethylated or hemimethylated, when the recognition site is methylated, cleavage does not occur, or cleaves at a significantly reduced efficiency . For restriction enzymes that specifically cleave when the recognition site is methylated, when the recognition site is not methylated, cleavage does not occur, or cleaves at a significantly reduced efficiency.
  • the recognition sequence for a methylation sensitive restriction enzyme contains a CG dinucleotide (eg, cgcg or cccggg). In some embodiments, when the cytosine in the CG dinucleotide is methylated at the C5 carbon atom, the methylation-sensitive restriction enzyme does not cleavage.
  • Exemplary MSREs are selected from the group consisting of HpaII enzymes, SalI enzymes, Enzymes, ScrFI enzymes, BbeI enzymes, NotI enzymes, SmaI enzymes, XmaI enzymes, MboI enzymes, BstBI enzymes, ClaI enzymes, MluI enzymes, NaeI enzymes, Narl enzymes, PvuI enzymes, SacII enzymes, HhaI enzymes, and any combination thereof.
  • a methylation-sensitive restriction enzyme that distinguishes between methylated and unmethylated CpG dinucleotides in the region of interest or that includes a methylation-sensitive restriction enzyme
  • a range of restriction enzyme reagents can be used to determine methylation, such as, but not limited to, differential methylation hybridization ("DMH").
  • DNA in a biological sample can be cleaved prior to treatment with a methylation-sensitive restriction enzyme.
  • a methylation-sensitive restriction enzyme Such methods are known in the art and may include both physical and enzymatic means. It is particularly preferred to use one or more restriction enzymes which are insensitive to methylation and whose recognition sites are AT-rich and do not contain CG dinucleotides. The use of such enzymes results in the preservation of CpG sites and CpG-rich regions in DNA fragments.
  • such restriction enzymes are selected from the group consisting of Msel enzyme, BfaI enzyme, Csp6I15 enzyme, Trull enzyme, Tru9I enzyme, MaeI enzyme, XspI enzyme, and any combination thereof.
  • the transformed DNA is optionally purified.
  • DNA purification methods suitable for use herein are well known in the art.
  • the methylation status or methylation level is used to distinguish benign from malignant thyroid nodules.
  • the detection reagent and diagnostic kit of the present invention can be used for the detection of the methylation state or methylation level.
  • the "benign” and “malignant” denote properties of thyroid nodules.
  • benign nodules grow slowly, have uniform texture, good mobility, smooth surface, cystic changes, no lymphadenopathy, and no calcification. Malignancy manifests as uncontrolled growth, spread, and tissue infiltration of malignant cells.
  • Ultrasound signs that suggest that a thyroid nodule is malignant include: nodules that are taller than wide, lack of halos, microcalcifications, irregular borders, hypoechoic, solid nodules, and rich blood flow within the nodules.
  • the malignant thyroid nodule comprises thyroid cancer.
  • methylation status refers to the presence or absence of one or more methylated nucleotide bases in a nucleic acid molecule.
  • a nucleic acid molecule that contains methylated cytosines is considered methylated (eg, the methylation status of the nucleic acid molecule is methylated).
  • a nucleic acid molecule that does not contain any methylated nucleotides is considered unmethylated.
  • a nucleic acid may be characterized as "unmethylated” if it is not methylated at a particular locus (e.g., a locus of a particular single CpG dinucleotide) or a particular combination of loci, even if It is methylated at other loci of the same gene or molecule as well.
  • the methylation state describes the state of methylation of a nucleic acid (eg, a genomic sequence or a marker of interest as described herein).
  • methylation status refers to a characteristic of a nucleic acid segment at a particular genomic locus that is associated with methylation. Such characteristics include, but are not limited to, whether any cytosine (C) residues within the DNA sequence are methylated, the location of one or more methylated C residues, methylation throughout any particular region of the nucleic acid Frequency or percentage of C and allelic differences in methylation due to, for example, differences in allelic origin.
  • C cytosine
  • Methods refers to the relative concentration, absolute concentration or pattern of methylated C or unmethylated C throughout any particular region of nucleic acid in a biological sample. For example, if one or more cytosine (C) residues within a nucleic acid sequence are methylated, it may be said to be “hypermethylated” or have “increased methylation”, whereas if within the DNA sequence One or more of the cytosine (C) residues is unmethylated, it can be said to be “demethylated” or have “reduced methylation”.
  • cytosine (C) residues within a nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), the sequence is considered to be different from the other The nucleic acid sequence is hypermethylated or has increased methylation compared to the nucleic acid sequence.
  • one or more cytosine (C) residues within a DNA sequence are unmethylated compared to another nucleic acid sequence (e.g. from a different region or from a different individual, etc.), the sequence is considered to be different from the other The nucleic acid sequence is demethylated or has reduced methylation compared to.
  • the methylation level represents the proportion (or percentage, fraction, ratio, degree) of one or more sites in the methylation state.
  • the methylation level of a region (or group of sites) is the average of the methylation levels of all sites in the region (or all sites in the group). Therefore, an increase or decrease in the methylation level of a region does not mean that the methylation level of all methylated sites in the region is increased or decreased.
  • the process of converting the results obtained by methods for detecting DNA methylation (such as simplified methylation sequencing) into methylation levels is known in the art. Methylation levels can be determined, for example, by quantitative analysis of the amount of intact DNA present after restriction digestion with a methylation-sensitive restriction enzyme.
  • the methylation level as in the above example can be used as a quantitative indicator of methylation status. This is especially useful when the methylation levels of sequences in a sample need to be compared to a threshold level.
  • the methylation level (eg, Ct value) of a marker of interest is increased or decreased when compared to a reference level.
  • the methylation marker level (eg, Ct value) meets a certain threshold, the thyroid nodule is identified as malignant.
  • a mathematical analysis of the methylation levels of the target markers can be performed to obtain a score. For the detected samples, when the score is greater than or less than the threshold, the result is determined to be positive, that is, the thyroid nodule is malignant.
  • SVM support vector machine
  • a support vector machine is constructed for the training group samples, and the accuracy, sensitivity and specificity of the test results are calculated using the model, as well as the area under the characteristic curve (ROC) (AUC) of the predicted value, Statistical test set sample prediction scores.
  • the methylation level/state of one or more CpG dinucleotide sequences within a DNA sequence can be determined by various analytical methods known in the art, preferably quantitative analytical methods.
  • Exemplary assays include: polymerase chain reaction, including real-time polymerase chain reaction, digital polymerase chain reaction, and bisulfite conversion-based PCR (e.g., methylation-specific PCR , MSP)); nucleic acid sequencing; genome-wide methylation sequencing (RRBS); simplified methylation sequencing; mass-based separation (e.g., electrophoresis, mass spectrometry); target capture (e.g., hybridization, microarray); methylation Sensitive restriction enzyme assays; methylation-sensitive high-resolution melting curves; chip-based methylation profiling; mass spectrometry;
  • detection includes detection of either strand at a gene or locus.
  • quantitative analysis is performed by real-time PCR.
  • real-time PCR include HeavyMethyl TM PCR described by Cottrell et al., Nucl. Acids Res. 32:e10, 2003; MethyLight TM PCR described by Eads et al., Cancer Res. 59:2302-2306, 1999; Headloop PCR as described by Rand et al., Nucl. Acids Res. 33:e 127, 2005.
  • HeavyMethyl TM PCR refers to an art-recognized real-time PCR technique in which one or more non-extendable nucleic acid (e.g., oligonucleotide) blockers are combined with subgroups in a methylation-specific manner.
  • Bisulfate-treated nucleic acids bind (ie, the blocker binds specifically to unmutated DNA under conditions of moderate to high stringency).
  • the amplification reaction is carried out using one or more primers which may optionally be methylation specific but flanked by one or more blockers.
  • the blocker binds and no PCR product is produced.
  • the level of methylation of nucleic acids in a sample is determined using a TaqMan TM assay essentially as described, eg, by Holland et al., Proc. Natl. Acad. Sci. USA, 88:7276-7280, 1991.
  • Methods of Methods of Methods of Methods of Methods of Methods of Methods of Methods of Methods of Methods refers to an art-recognized fluorescence-based real-time PCR technique in which dual-labeled fluorescent oligonucleotide probes called TaqMan TM probes are employed and designed to Hybridizes to CpG-rich sequences located between forward and reverse amplification primers.
  • the TaqMan (TM) probes comprise a fluorescent "reporter moiety” and "quencher moiety” covalently bound to a linker moiety (eg, phosphoramidite) attached to the nucleotide of the TaqMan (TM) oligonucleotide.
  • linker moiety eg, phosphoramidite
  • TaqMan TM probes that hybridize to CpG-rich sequences are cleaved by the 5' nuclease activity of Taq polymerase, resulting in a signal that is detected in real-time during the PCR reaction.
  • molecular beacons can be used as detectable probes, and the system is independent of the 5'-3' exonuclease activity of the DNA polymerase used (see Mhlanga and Malmberg, Methods 25: 463-471, 2001).
  • Headloop PCR refers to an art-recognized type of real-time PCR that selectively amplifies a target nucleic acid, but suppresses non-enzymatic activity by extending the 3' stem-loop to form a hairpin that does not provide further template for amplification. Amplify the amplification of the variant of interest.
  • the real-time PCR is multiplex real-time PCR.
  • the term “multiplex” may refer to the use of more than one marker, each having at least one distinct detection characteristic, such as a fluorescence characteristic (e.g., excitation wavelength, emission wavelength, emission intensity, FWHM (half maximum Full width at height) or fluorescence lifetime) or unique nucleic acid or protein sequence characteristics, assays or other analytical methods that can simultaneously determine the presence and/or amount of multiple markers (eg, multiple nucleic acid sequences).
  • a fluorescence characteristic e.g., excitation wavelength, emission wavelength, emission intensity, FWHM (half maximum Full width at height) or fluorescence lifetime
  • unique nucleic acid or protein sequence characteristics e.g., assays or other analytical methods that can simultaneously determine the presence and/or amount of multiple markers (eg, multiple nucleic acid sequences).
  • nucleic acid sequencing is performed by nucleic acid sequencing.
  • Exemplary methods of nucleic acid sequencing are known in the art, see, e.g., Frommer et al., Proc. Natl. Acad. Sci. USA 89:1827-1831, 1992; Clark et al., Nucl. Acids Res. 22: 2990-2997,1994.
  • comparison of the sequence obtained from a sample that was not treated with bisulfite or the known nucleotide sequence of the target region with the sequence obtained from a sample that was treated with bisulfite helps to identify methyl groups in the DNA sequence.
  • Cytosine Thymine residues detected at any cytosine site in bisulfite-treated samples compared to untreated samples can be considered mutations caused by bisulfite treatment, i.e., the presence of Methylated cytosine.
  • Methods for sequencing DNA include, for example, the dideoxy chain termination method or the Maxam-Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual ( 2nd Ed., CSHP, New York 1989)) , pyrosequencing (seeing Uhlmann et al., Electrophoresis, 23:4072-4079,2002), solid-phase pyrosequencing (seeing Landegren et al., Genome Res., 8(8):769-776,1998), Solid-phase microsequencing (see, e.g., Southern et al., Genomics, 13:1008-1017, 1992), microsequencing using FRET (see, e.g., Chen and Kwok, Nucleic Acids Res. 25:347-353, 1997), Sequencing by ligation or ultra-deep sequencing (see Marguiles et al., Nature 437(7057):376-80 (2005)).
  • quantitative analysis is performed by mass-based separation (eg, electrophoresis, mass spectrometry).
  • mass-based separation eg, electrophoresis, mass spectrometry
  • COBRA combined bisulfite restriction analysis
  • This method utilizes the presence of restriction enzymes between methylated and unmethylated nucleic acids following treatment with compounds that selectively mutate unmethylated cytosine residues (e.g., bisulfite) Identify differences in loci.
  • the restriction endonuclease Taq1 cuts the sequence TCGA, which after bisulfite treatment of unmethylated nucleic acids will be TTGA and thus will not be cut. Digested and/or undigested nucleic acids are then detected using detection means known in the art, such as electrophoresis and/or mass spectrometry.
  • MS-SSCA formazan Methylation-specific single-strand conformation analysis
  • MS-DGGE methylation-specific denaturing gradient gel electrophoresis
  • MS-DHPLC methylation-specific denaturing high-performance liquid chromatography
  • target capture eg, hybridization, microarray
  • Suitable detection methods by hybridization are known in the art, such as Southern, dot blot, slot blot or other means of nucleic acid hybridization (Kawai et al., Mol. Cell. Biol. 14:7421-7427, 1994; Gonzalgo et al. al., Cancer Res. 57:594-599, 1997).
  • probes for hybridization analysis are detectably labeled.
  • nucleic acid-based probes used in hybridization assays are unlabeled.
  • Such unlabeled probes can be immobilized on a solid support, such as a microarray, and can hybridize to detectably labeled target nucleic acid molecules.
  • a microarray is a methylation-specific microarray, which can be used to distinguish sequences with converted cytosine residues from sequences with non-converted cytosine residues (see Adorjan et al., Nucl. Acids Res. , 30:e21, 2002).
  • Hybridization-based analysis can also be used on nucleic acids after treatment with methylation-sensitive restriction enzymes.
  • the methylation status of CpG dinucleotide sequences within a DNA sequence can be determined by oligonucleotide probes that hybridize to bisulfite-treated DNA simultaneously with PCR amplification primers (wherein the primers may be methylation-specific primers or standard primers).
  • detection reagent is a reagent used to detect the presence, absence or amount of nucleic acid in a quantitative analysis step.
  • detection reagent is selected from the group consisting of fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.
  • the quantitative analysis comprises amplifying the treated DNA using a quantitative primer pair and a DNA polymerase.
  • the term "quantitative primer pair" refers to one or more primer pairs used in a quantitative analysis step.
  • the quantitative primer pair is capable of hybridizing to at least 9 consecutive nucleotides of the processed DNA under stringent conditions, moderately stringent conditions or highly stringent conditions.
  • the quantitative analysis comprises determining the concentration of one or more markers of interest based on the presence or level of a plurality of CpG dinucleotides, TpG dinucleotides, or CpA dinucleotides in the processed DNA. methylation levels. In some embodiments, the quantitative analysis comprises determining the level of methylation of cytosine residues based on the presence or level of one or more CpG dinucleotides in the processed DNA. In some embodiments, said quantitative analysis comprises determining the level of methylation of cytosine residues based on the presence or level of one or more TpG dinucleotides in said processed DNA. In some embodiments, said quantitative analysis comprises determining the level of methylation of cytosine residues based on the presence of CpA dinucleotides in said processed DNA.
  • the quantifying step is performed by separating the processed DNA product into fractions.
  • a plurality of different quantitative assays are performed on a plurality of fractions, wherein quantification of said processed DNA product (if present in said fraction) is performed in one of the plurality of fractions. different combinations.
  • the control markers in each fraction are quantified.
  • the methylation level of each marker of interest is quantified separately based on the pre-amplified DNA by using MSP (see Herman, supra). For example, by using one or more primers that specifically hybridize to non-transformed sequences under conditions of moderate and/or high stringency, amplification products are generated only when the template contains methylated cytosines at CpG sites.
  • the quantitative primer pair is designed to amplify at least a portion of the processed DNA product, ie the quantitative analysis is designed as a nested PCR.
  • Nested PCR is a modification of PCR designed to increase sensitivity and specificity. Nested PCR involves the use of two primer sets and two consecutive PCR reactions. A first round of amplification is performed to generate the first amplicon, and a second round of amplification is performed using a primer pair where one or both primers anneal to a site within the region bounded by the initial primer pair, i.e. the second A primer pair is said to be "nested" within the first primer pair. In this way, background amplification products from the first PCR reaction that do not contain the correct internal sequence are not further amplified in the second PCR reaction.
  • the PCR reaction solution includes Taq DNA polymerase, PCR buffer, primers, probes, dNTPs, and Mg 2+ .
  • the Taq DNA polymerase is a hot-start Taq DNA polymerase.
  • the final concentration of Mg 2+ is 1.0-20.0 mM; the concentration of each primer is 100-500 nM; the concentration of each probe is 100-500 nM.
  • Exemplary PCR reaction conditions are: pre-denaturation at 95°C for 5 minutes; denaturation at 95°C for 15s, annealing and extension at 60°C for 60s, 50 cycles.
  • the methods of the invention include a pre-amplification step.
  • One of the purposes of preamplifying a marker of interest is to increase the amount of the marker of interest in the processed DNA.
  • the term “amplification” refers generally to any process capable of resulting in an increase in the copy number of a molecule or group of related molecules.
  • “Amplification” when applied to a polynucleotide molecule refers to the production of multiple copies of a polynucleotide molecule, or multiple copies of a portion of a polynucleotide molecule, usually starting from a small number of polynucleotides, wherein the amplified substance ( Amplicon, PCR amplicon) is usually detectable.
  • Amplification of polynucleotides encompasses multiple chemical and enzymatic processes. Formats of amplification include by polymerase chain reaction (reverse transcription PCR, PCR), strand displacement amplification (SDA) reaction, transcription-mediated amplification (TMA) reaction, nucleic acid sequence-based amplification (NASBA) reaction or ligation Enzyme chain reaction (LCR), which generates multiple copies of DNA from one or a few copies of a template RNA or DNA molecule.
  • polymerase chain reaction reverse transcription PCR, PCR
  • SDA strand displacement amplification
  • TMA transcription-mediated amplification
  • NASBA nucleic acid sequence-based amplification
  • LCR ligation Enzyme chain reaction
  • the target markers in the processed DNA can be preamplified with preamplification primers.
  • primer refers to a single-stranded oligonucleotide capable of reacting in four different nucleoside triphosphates and reagents for polymerization ( For example, in the presence of DNA polymerase), as the starting point for template-directed DNA synthesis.
  • the length of the primer depends, for example, on the intended use of the primer, and typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. Primers do not have to reflect the exact sequence of the template, but must be sufficiently complementary to hybridize to the template.
  • the primer site is the region on the template to which the primer hybridizes.
  • a primer pair is a set of primers that includes a 5' forward primer that hybridizes to the 5' end of the sequence to be amplified and a 3' reverse primer that hybridizes to the complementary strand at the 3' end of the sequence to be amplified.
  • Those skilled in the art can design primers according to the markers to be amplified based on common knowledge in the art (see, for example, PCR Primer: A Laboratory Manual, Cold Spring Harbor Laboratories, NY, 1995).
  • several software packages for designing optimal probes and/or primers for use in a wide variety of assays are publicly available, for example, from the Center for Genome Research, Cambridge, MA, USA.
  • a primer designed for the purposes of the present invention may include at least one CpG site, or an amplification product obtained from the primer may include at least one CpG site.
  • Tools for designing primers for detecting DNA methylation status are also known in the art, such as MethPrimer (Li LC and Dahiya R. MethPrimer: designing primers for methylation PCRs. Bioinformatics. 2002 Nov; 18(11): 1427-31) .
  • any target marker (every at least a portion of the target marker or a subregion of the target marker) in the processed DNA can be pre-amplified by using the pre-amplification primer as a primer pool.
  • complementary refers to hybridization or base pairing between nucleotides or nucleic acids, for example, between the two strands of a double-stranded DNA molecule, or a primer on a single-stranded nucleic acid to be sequenced or amplified Between the binding site and the oligonucleotide primer.
  • Complementary nucleotides are usually A and T (or A and U), or C and G.
  • nucleotides of one strand are optimally aligned and compared, with appropriate nucleotide insertions or deletions, at least about 80% (usually at least about 90% to 95%, more preferably Between about 98% and 100%) of the nucleotide pairs, two single-stranded RNA or DNA molecules are said to be complementary.
  • complementarity exists when a strand of RNA or DNA hybridizes to its complement under selective hybridization conditions.
  • selective hybridization will occur when there is at least about 65% (preferably at least about 75%, more preferably at least about 90%) complementarity over a stretch of at least 14 to 25 nucleotides. See M. Kanehisa, Nucleic Acids Res. 12:203 (1984), incorporated herein by reference.
  • the pool of preamplification primers comprises at least one methylation-specific primer pair. In some embodiments, the pool of preamplification primers comprises a plurality of methylation-specific primer pairs. In some embodiments, the preamplification step is performed by methylation-specific PCR ("MSP"), which is PCR using methylation-specific primers. This technique (i.e. MSP ).
  • MSP methylation-specific PCR
  • methylation-specific primer pair refers to a primer pair specifically designed to recognize CpG sites to exploit differences in methylation to amplify a specific marker of interest in processed DNA. Primers work only on molecules with or without a specific methylation state.
  • a primer can be an oligonucleotide that, under stringent conditions, moderately stringent conditions, or highly stringent conditions, can specifically hybridize in a methylation-specific manner to a specific CpG site with methylation, but not to a specific CpG site without methylation. Hybridization of methylated specific CpG sites.
  • the primers will specifically amplify target markers that have methylation at specific CpG sites.
  • the primer can be an oligonucleotide that can specifically hybridize to a specific unmethylated CpG site in a methylation-specific manner under stringent conditions, moderately stringent conditions or highly stringent conditions, but Does not hybridize to methylated specific CpG sites.
  • the primers will specifically amplify target markers that are not methylated at specific CpG sites.
  • methylated and unmethylated CpG sites can be distinguished using methylation-specific primers in the preamplification of at least one target marker within the processed DNA.
  • the methylation-specific primer pairs of the present application comprise at least one primer that hybridizes to a bisulfite-treated CpG dinucleotide.
  • sequence of the primer specific for methylated DNA comprises at least one CpG dinucleotide and the sequence of the primer specific for unmethylated DNA comprises a "T" at the C position of the CpG, and/or contain "A" at the G position in the CpG.
  • a pair of methylation-specific primers typically comprises a forward primer and a reverse primer, each comprising an oligonucleotide sequence that is compatible with one of the target markers (or a subset of the target marker).
  • the acid comprises at least one (eg, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CpG site.
  • hybridization may refer to a process in which two single-stranded polynucleotides associate non-covalently to form a stable double-stranded polynucleotide.
  • the resulting double-stranded polynucleotide can be a "hybrid” or “double-stranded.”
  • Salt concentrations in “hybridization conditions” are generally less than about 1 M, often less than about 500 mM and can be less than about 200 mM.
  • “Hybridization buffer” includes buffered saline solutions, such as 5% SSPE, or other such buffers known in the art.
  • Hybridization temperatures can be as low as 5°C, but are typically above 22°C, and more typically above about 30°C, and often above 37°C.
  • Hybridization is typically performed under stringent conditions, ie, conditions under which a sequence will hybridize to its target sequence but to no other noncomplementary sequences. Stringent conditions are sequence dependent and will be different in different circumstances. For example, longer fragments may require higher hybridization temperatures than short fragments for specific hybridization. Since other factors may affect the stringency of hybridization, including base composition and length of complementary strands, the presence of organic solvents, and the degree of base mismatching, the combination of parameters is more important than the absolute measurement of any one parameter alone.
  • Tm melting point
  • the Tm can be the temperature at which half of a population of double-stranded nucleic acid molecules are separated into single strands.
  • Tm melting point
  • hybrid stability is a function of ion concentration and temperature.
  • hybridization reactions are performed under less stringent conditions followed by washes in different but higher stringency washes.
  • Exemplary stringent conditions include a pH of about 7.0 to about 8.3, a temperature of at least 25°C, and a sodium ion (or other salt) concentration of at least 0.01M to no more than 1M.
  • 5x SSPE 750mM NaCl, 50mM sodium phosphate, 5mM EDTA, pH 7.4
  • a temperature of about 30°C are suitable for allele-specific hybridization, although suitable temperatures depend on the length and/or GC content of the hybridization region.
  • the "stringency of hybridization" for determining the percentage of mismatches can be as follows: 1) high stringency: 0.1x SSPE, 0.1% SDS, 65°C; 2) medium stringency (also known as moderate stringency): 0.2 x SSPE, 0.1% SDS, 50°C; 3) Low stringency: 1.0x SSPE, 0.1% SDS, 50°C. It is understood that the same stringency can be achieved using alternative buffers, salts and temperatures.
  • moderately stringent hybridization can refer to conditions that allow a nucleic acid molecule (eg, a probe) to bind a complementary nucleic acid molecule.
  • Hybridizing nucleic acid molecules typically have at least 60% identity, including, for example, at least 70%, 75%, 80%, 85%, 90%, or 95% identity.
  • Moderately stringent conditions can be conditions equivalent to the following conditions: 42°C, 50% formamide, 5x Denhardt solution, 5x SSPE, 0.2% SDS for hybridization, and then wash with 42°C, 0.2x SSPE, 0.2% SDS.
  • Highly stringent conditions can be provided by, for example, 42°C, 50% formamide, 5x Denhardt's solution, 5x SSPE, 0.2% SDS for hybridization, followed by 65°C, 0.1x SSPE and 0.1% SDS for washing.
  • Low stringency hybridization may be equivalent to the following conditions: 22°C, 10% formamide, 5x Denhardt's solution, 6x SSPE, 0.2% SDS, followed by washing in 1x SSPE, 0.2% SDS at 37°C.
  • Denhardt's solution contained 1% polysucrose, 1% polyvinylpyrrolidone and 1% bovine serum albumin (BSA).
  • BSA bovine serum albumin
  • 20x SSPE Sodium Chloride, Sodium Phosphate, EDTA
  • the pool of preamplification primers also includes a control primer pair for amplifying a control marker.
  • a control marker is a nucleic acid of known characteristics (eg, known sequence, known copy number per cell) for comparison to an experimental target (eg, nucleic acid of unknown concentration).
  • a control can be an endogenous, preferably invariant gene, against which the test or target nucleic acid under analysis can be normalized. Such controls for normalization due to inter-sample variability may occur, for example, in sample handling, assay efficiency, etc., and allow accurate inter-sample data comparisons, quantitative analysis of amplification efficiencies and biases.
  • the present invention uses RRBS technology to detect the methylation level of the CpG site of the target marker of interest, and then calculates the average methylation fraction (average methylation fraction, AMF) of the marker, which is used as the DNA methylation levels of markers.
  • AMF average methylation fraction
  • the present invention has discovered that the methylation level of one or more target markers described herein can be used to determine whether an individual's thyroid nodule is benign or malignant.
  • the methylation level of the CpG site in the target marker described herein can be detected, and then the average methylation ratio (AMF) of the target marker can be calculated as the target marker.
  • AMF average methylation ratio
  • AMF can be calculated by the following formula:
  • M is the total number of CpG sites in the marker, i is one of the CpG sites, N C,i is the number of sequencing reads methylated at the CpG site, N T,i is the number of unmethylated CpG sites Number of methylated sequencing reads.
  • the predicted probability of malignancy of the sample is calculated through the constructed mathematical model.
  • the predicted probability of malignancy was calculated using a Logistic Regression model.
  • calculate the input z of the Sigmoid function which is obtained by the following formula:
  • w is the regression model coefficient for each marker, w0 is the intercept, and x is the calculated DNA methylation level of the marker (ie, AMF).
  • the ⁇ value is the predicted probability of malignancy.
  • the DNA methylation level of each marker in the training set samples was used to construct the training set, and the threshold defined by the Youden index of the training set was used as the malignant prediction threshold, and the malignant prediction threshold of each marker described in this paper was obtained respectively.
  • the malignant prediction threshold of each marker can be seen in Table 6 of this article.
  • the malignant prediction probability of each sample is calculated according to the above formula, if the value is higher than the target shown in Table 6 If the threshold value of the marker is higher, it is judged as malignant, otherwise it is judged as benign.
  • the target marker is the PRDM16 gene or the genome's PRDM16 sequence, the BIN1 gene or the genome's BIN1 sequence, the LIMK1 gene or the genome's LIMK1 sequence, the EGR3 gene or the genome's EGR3 sequence, the PPIF gene or the genome's sequence PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence, C19orf77 gene or genome C19orf77 sequence , ICAM5 gene or genome ICAM5 sequence, CRTC1 gene or genome CRTC1 sequence, RTN4R gene or genome RTN4R sequence, CAMK2N1 gene or genome CAMK2N1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, DUSP26 gene or genome DUSP26 sequence, ICAM2 Gene or genome ICAM2 sequence
  • any 2, any 3, any 4, any 5, any 6, any 7, any 8, any of the target markers described herein can be determined using the methods described herein. Any 9 types, any 10 types, any 11 types, any 12 types, any 13 types, any 14 types, any 15 types, any 16 types, any 17 types, any 18 types, any 19 types, any 20 or more objects
  • the combination of markers is used as the malignant prediction threshold when evaluating the basis, and the combination of target markers is used as a marker (marker) for diagnosing benign and malignant thyroid nodules, and individual samples (preferably thyroid nodular tissue, such as aspiration) are measured Compare the malignant prediction probability of the target marker combination with the threshold, if the threshold is higher, it indicates malignant, otherwise, it indicates benign.
  • the one or more target markers include at least one or more of the following target markers: EGR3 gene or genomic EGR3 sequence, TNK1 gene or genomic TNK1 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, DUSP26 gene or genome DUSP26 sequence, BAIAP2 gene or genome BAIAP2 sequence, MED16 gene or genome MED16 sequence, C19orf77 gene or genome C19orf77 sequence, NOL4L-DT gene or genome NOL4L-DT sequence, TACSTD2 gene or Genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, BCR gene or genomic BCR sequence.
  • EGR3 gene or genomic EGR3 sequence TNK1 gene or genomic TNK1 sequence
  • DNASE1L3 gene or genomic DNASE1L3 sequence DNASE1L3 gene or genomic DNASE1L3 sequence
  • DUSP26 gene or genome DUSP26 sequence DUSP26 gene or genome DUSP26 sequence
  • BAIAP2 gene or genome BAIAP2 sequence MED
  • the one or more target markers include: PRDM16 gene or genomic PRDM16 sequence, BIN1 gene or genomic BIN1 sequence, LIMK1 gene or genomic LIMK1 sequence, EGR3 gene or genomic EGR3 sequence, PPIF Gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence, C19orf77 gene or Genomic C19orf77 sequence, ICAM5 gene or genomic ICAM5 sequence, CRTC1 gene or genomic CRTC1 sequence, and RTN4R gene or genomic RTN4R sequence; and the threshold value is 0.49.
  • the one or more target markers include: CAMK2N1 gene or genomic CAMK2N1 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, ICAM2 ICAM2 sequence of the gene or genome, BAIAP2 gene or BAIAP2 sequence of the genome, MED16 gene or MED16 sequence of the genome, C19orf77 gene or C19orf77 sequence of the genome, and NOL4L-DT gene or NOL4L-DT sequence of the genome; and the threshold value is 0.58.
  • the one or more target markers include: TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, LSG1 gene or genomic LSG1 sequence, EGR3 EGR3 sequence of the gene or genome, TNK1 gene or TNK1 sequence of the genome, BAIAP2 gene or BAIAP2 sequence of the genome, NOL4L-DT gene or NOL4L-DT sequence of the genome, and BCR gene or BCR sequence of the genome; and the threshold value is 0.52.
  • the one or more target markers include: TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 DUSP26 sequence of the gene or genome, TNK1 gene or TNK1 sequence of the genome, BAIAP2 gene or BAIAP2 sequence of the genome, NOL4L-DT gene or NOL4L-DT sequence of the genome, and BCR gene or BCR sequence of the genome; and the threshold value is 0.52.
  • the one or more target markers include: TACSTD2 gene or genomic TACSTD2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, TNK1 TNK1 sequence of the gene or genome, BAIAP2 gene or BAIAP2 sequence of the genome, MED16 gene or MED16 sequence of the genome, NOL4L-DT gene or NOL4L-DT sequence of the genome, and BCR gene or BCR sequence of the genome; and the threshold value is 0.52.
  • the respective Hg coordinates of the target markers are as described herein, especially as shown in Table 6.
  • those skilled in the art can also determine whether an individual's thyroid nodule is malignant or the risk of being malignant based on various factors, such as age, gender, medical history, family history, symptoms, etc.
  • the present invention provides a methylation detection or diagnostic kit and diagnostic reagent or diagnostic composition for differentiating between benign and malignant thyroid nodules.
  • the kits and compositions may contain primer and/or probe molecules.
  • the primers include primer pairs capable of hybridizing to the target marker to be detected or its target region under stringent conditions, moderately stringent conditions or highly stringent conditions. Primers may also include primers to detect internal controls such as ACTB.
  • the primers are packaged in a single container or packaged in separate containers.
  • the kit further comprises one or more blocking oligonucleotides.
  • kits and compositions further comprise detection reagents.
  • the detection reagent is selected from the group consisting of fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.
  • the kit may further comprise a DNA polymerase and/or a container suitable for storing a biological sample obtained from an individual.
  • the kit further includes instructions for use and/or an explanation of the test results of the kit.
  • kits and compositions may also include reagents for enzymatic or non-enzymatic transformations.
  • the kits shown also include a bisulfite reagent or a methylation sensitive restriction enzyme (MSRE).
  • the bisulfite reagent is selected from the group consisting of ammonium bisulfite, sodium bisulfite, potassium bisulfite, calcium bisulfite, magnesium bisulfite, aluminum bisulfite, sulfurous acid Hydrogen ions, and any combination thereof.
  • the bisulfite reagent is sodium bisulfite.
  • the MSRE is selected from the group consisting of HpaII enzyme, SalI enzyme, Enzymes, ScrFI enzymes, BbeI enzymes, NotI enzymes, SmaI enzymes, XmaI enzymes, MboI enzymes, BstBI enzymes, ClaI enzymes, MluI enzymes, NaeI enzymes, Narl enzymes, PvuI enzymes, SacII enzymes, HhaI enzymes, and any combination thereof.
  • kits and compositions may also include converted positive standards in which unmethylated cytosines are converted to bases that do not bind guanine.
  • the positive standard can be fully methylated.
  • kits and compositions may also include PCR reaction reagents.
  • the PCR reaction reagents include Taq DNA polymerase, PCR buffer (buffer), dNTPs, Mg 2+ .
  • kits and compositions further comprise standard reagents useful for CpG position-specific methylation analysis, wherein the analysis includes one or more of the following techniques: MS-SNuPE, MSP, MethyLight TM , HeavyMethyl TM , COBRA and nucleic acid sequencing.
  • kits and compositions may comprise additional reagents selected from the group consisting of buffers (e.g. restriction enzymes, PCR, storage or wash buffers), DNA recovery reagents or kits (e.g. precipitation, ultrafiltration, affinity column) and DNA recovery components, etc.
  • buffers e.g. restriction enzymes, PCR, storage or wash buffers
  • DNA recovery reagents or kits e.g. precipitation, ultrafiltration, affinity column
  • DNA recovery components e.g. precipitation, ultrafiltration, affinity column
  • the kit of the present application may further comprise one or more of the following components known in the field of DNA enrichment: a protein component that selectively binds to methylated DNA; a triple-strand forming nucleic acid component , one or more linkers, optionally in a suitable solution; substances or solutions for performing the ligation, such as ligases, buffers; substances or solutions for performing column chromatography; for performing immunologically based A substance or solution for enrichment (e.g.
  • immunoprecipitation a substance or solution for nucleic acid amplification, such as PCR; a dye or dyes, if suitable for a coupling agent, if suitable for use in solution; for A substance or solution for carrying out hybridization; and/or a substance or solution for carrying out a washing step.
  • a substance or solution for nucleic acid amplification such as PCR
  • a dye or dyes if suitable for a coupling agent, if suitable for use in solution
  • a substance or solution for carrying out hybridization for A substance or solution for carrying out hybridization
  • a substance or solution for carrying out a washing step a substance or solution for carrying out a washing step.
  • the composition of the present invention contains an isolated nucleic acid molecule selected from one or more of the following: PRDM16 gene: chr1:3155061:3155760; CAMK2N1 gene: chr1:20813203: 20813902; TACSTD2 gene: chr1:59041615:59042314; CRABP2 gene: chr1:156676274:156676973; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925 399;ITGB1BP1gene:chr2:9526804:9527503;MTHFD2gene: chr2:74453839:74454538; BIN1 gene: chr2:127822196:127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:1944092
  • the composition of the present invention contains an isolated nucleic acid molecule selected from one or more of the following: PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453: 20813652; TACSTD2 gene: chr1:59041865:59042064; CRABP2 gene: chr1:156676524:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925 149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2:127822446:127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:1944089
  • the present application also includes a medium recording the sequence of the isolated nucleic acid molecule described herein and optionally its methylation information, said medium being used for comparison with gene methylation sequencing data to determine the presence of said nucleic acid molecule, content and/or methylation levels.
  • said medium is a card printed with said sequence and optionally its methylation information, eg paper, plastic, metal, glass card.
  • the medium is a computer-readable medium storing the sequence and optionally its methylation information and a computer program.
  • methylation of the sample The methylation sequencing data is compared with the sequence, so as to obtain the presence, content and/or methylation level of nucleic acid molecules containing the sequence in the sample.
  • the present application also includes a device for distinguishing benign from malignant thyroid nodules, the device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program
  • the following steps are achieved: (1) obtaining the methylation level of one or more target markers or target regions selected from the following one or more target markers described herein in the sample, (2) interpreting the thyroid nodule according to the methylation level of (1) benign and malignant.
  • the obtaining step is performed by any one of the methods described in Section IV of the present application; preferably, the interpretation is performed by any one of the methods described in Section V of the present application.
  • the present application also provides the application of the isolated nucleic acid molecule described in the present application as a detection target in the diagnosis of benign and malignant thyroid nodules.
  • the sensitivity of the methylation marker of the present invention to identify thyroid cancer reaches 100%; more importantly, the sensitivity of the present invention to identify thyroid nodules with unclear cytological classification reaches 100%.
  • the methylation markers and technical solutions provided by the present invention effectively solve the problem of low sensitivity of current diagnostic techniques, and contribute to early diagnosis and early treatment of thyroid cancer , to increase the cure rate.
  • AMF average methylation fraction
  • M is the total number of CpG sites in the marker, i is one of the CpG sites, N C,i is the number of sequencing reads that are methylated at the CpG site, NT,i is the unmethylated CpG site The number of sequencing reads.
  • w is the regression model coefficient for each marker, w0 is the intercept, and x is the DNA methylation level of the marker in the sample.
  • the sample malignancy prediction probability threshold calculated based on the data model constructed by the combination of methylation markers, that is, the sample malignancy prediction probability greater than the threshold is judged as malignant, otherwise it is judged as benign.
  • the present invention is consistent with the article sample grouping, using its Developing cohort as a training set (28 cases of benign nodules, 39 cases of malignant nodules), and using the Testing cohort as a verification set 1 (37 cases of benign nodules, 41 cases of malignant nodules) .
  • the present invention collected 74 cases of Chinese thyroid surgery samples as verification set 2 (37 cases of benign nodules, 37 cases of malignant nodules), each sample was obtained by RRBS technology and the above analysis process in each methylation marker For the detected CpG sites, AMF was calculated and used as the marker DNA methylation level.
  • two sets of verification set samples are used to predict the area under the receiver operating characteristic curve (AUC (Area Under Curve) AUC (Area Under Curve) of the receiver operating characteristic curve (ROC) using the mathematical model constructed by the training set samples.
  • methylation markers chr1:3155061:3155760, chr2:127822196:127822895, chr7:73508743:73509442, chr8:22547976:22548675, chr8:22548391:22549090, chr10:8 1001706:81002405, chr14:21559748:21560447, chr15:70766881:70767580, chr17:7286958:7287657, chr17:76879761:76880460, chr19:1177275:1177974, chr19:3434666:3435365, chr19:104048 32:10405531, chr19:18770961:18771660, chr22:20226373:20227072 are combinations ( The model constructed by the methylation marker combination 1) tested AUC in two sets of validation set samples. The coefficient w of the logistic regression model for
  • Methylation markers gene name Logistic regression model coefficients chr1:3155061:3155760 PRDM16 0.273 chr2:127822196:127822895 BIN1 -0.347 chr7:73508743:73509442 LIMK1 -0.258 chr8:22547976:22548675 EGR3 0.373 chr8:22548391:22549090 EGR3 0.239 chr10:81001706:81002405 PPIF -0.228 chr14:21559748:21560447 ZNF219 0.413 chr15:70766881:70767580 UACA -0.172 chr17:7286958:7287657 TNK1 -0.143 chr17:76879761:76880460 CEP295NL -0.170 chr19:1177275:1177974 SBNO2 -0.230 chr19:3434666:3435365 C19
  • the malignant prediction threshold is 0.49, that is, the malignant prediction probability greater than 0.49 is judged as malignant, otherwise it is judged as benign;
  • the sensitivity reached 100%, the specificity reached 76%, the PPV reached 82%, and the NPV (negative predict value) reached 100%;
  • the sensitivity for the diagnosis of malignant thyroid nodules in the verification set 2 reached 87%, the specificity reached 84%, and the PPV It reached 84%, and the NPV reached 86%.
  • the results predicted by using the methylation marker combination 1 for the two sets of validation samples are shown in Table 1-2 and Table 1-3, respectively.
  • Table 1-2 The results of the prediction of the validation set 1 samples using the methylation marker combination 1
  • Table 1-3 Prediction results of validation set 2 samples using methylation marker combination 1
  • the malignant prediction threshold is 0.58, that is, the malignant prediction probability is greater than 0.58, which is judged as malignant, otherwise it is judged as benign;
  • chr20 The model constructed by the combination of 31162101:31162800, chr22:23624092:23624791 (methylation marker combination 3) tested AUC in two sets of validation set samples.
  • the logistic regression model coefficients of each marker are shown in Table 3-1.
  • the logistic regression model intercept is 1.681.
  • the malignant prediction threshold is 0.52, that is, the malignant prediction probability is greater than 0.52, which is judged as malignant, otherwise it is judged as benign;
  • the sensitivity reached 98%, the specificity reached 100%, the PPV reached 100%, and the NPV reached 97%; the sensitivity for the diagnosis of malignant thyroid nodules in the validation set 2 reached 92%, the specificity reached 87%, the PPV reached 87%, and the NPV Reached 91%. See Table 3-2 and Table 3-3 for the prediction results of the two sets of validation set samples using methylation marker combination 3, respectively.
  • Table 3-2 Prediction results of samples in validation set 1 using methylation marker combination 3
  • GFP-FR180615008 vicious 0.464 benign GFP-FR180615010 vicious 0.663 vicious GFP-FR180615012 vicious 0.357 benign GFP-FR180615014 vicious 0.856 vicious GFP-FR180615016 vicious 0.513 benign GFP-FR180615018 vicious 0.823 vicious GFP-FR180615020 vicious 0.619 vicious GFP-FR180615022 vicious 0.852 vicious GFP-FR180615024 vicious 0.690 vicious GFP-FR180615026 vicious 0.749 vicious GFP-FR180615028 vicious 0.762 vicious GFP-FR180615030 vicious 0.707 vicious GFP-FR180615032 vicious 0.533 vicious GFP-FR180615034 vicious 0.823 vicious GFP-FR180615036 vicious 0.752 vicious GFP-FR180615038 vicious 0.549 vicious GFP-FR180713031 vicious 0.947 vicious GFP-FR180713033 vicious 0.695 vicious GFP-FR171230001 benign 0.388 benign GFP-FR17123000
  • chr20 The model constructed by the combination of 31162101:31162800, chr22:23624092:23624791 (methylation marker combination 4) tested AUC in two sets of validation set samples.
  • the logistic regression model coefficients of each marker are shown in Table 4-1.
  • the logistic regression model intercept is 1.358.
  • the malignant prediction threshold is 0.52, that is, the malignant prediction probability is greater than 0.52, which is judged as malignant, otherwise it is judged as benign;
  • the sensitivity for the diagnosis of malignant thyroid nodules in the validation set 2 reached 92%, the specificity reached 87%, the PPV reached 87%, and the NPV Reached 91%.
  • the prediction results of two groups of validation set samples using methylation marker combination 4 are shown in Table 4-2 and Table 4-3 respectively.
  • Table 4-2 Prediction results of samples in validation set 1 using methylation marker combination 4
  • GFP-FR180615016 vicious 0.504 benign GFP-FR180615018 vicious 0.817 vicious GFP-FR180615020 vicious 0.628 vicious GFP-FR180615022 vicious 0.852 vicious GFP-FR180615024 vicious 0.688 vicious GFP-FR180615026 vicious 0.768 vicious GFP-FR180615028 vicious 0.774 vicious GFP-FR180615030 vicious 0.666 vicious GFP-FR180615032 vicious 0.551 vicious GFP-FR180615034 vicious 0.846 vicious GFP-FR180615036 vicious 0.747 vicious GFP-FR180615038 vicious 0.564 vicious GFP-FR180713031 vicious 0.954 vicious GFP-FR180713033 vicious 0.708 vicious GFP-FR171230001 benign 0.376 benign GFP-FR171230003 benign 0.511 benign GFP-FR171230005 benign 0.442 benign GFP-FR171230007 benign 0.291 benign GFP-FR171230009 benign 0.157 benign GFP-FR171230013
  • GFP-FR180525020 benign 0.166 benign GFP-FR180525022 benign 0.162 benign GFP-FR180525024 benign 0.243 benign GFP-FR180525026 benign 0.154 benign GFP-FR180525028 benign 0.248 benign GFP-FR180525030 benign 0.539 vicious GFP-FR180713002 benign 0.749 vicious GFP-FR180713004 benign 0.335 benign GFP-FR180713006 benign 0.136 benign GFP-FR180713008 benign 0.286 benign GFP-FR180713010 benign 0.461 benign GFP-FR180713012 benign 0.594 vicious GFP-FR180713014 benign 0.099 benign GFP-FR180713016 benign 0.451 benign GFP-FR180713025 benign 0.143 benign GFP-FR180713035 benign 0.329 benign GFP-FR180713037 benign 0.266 benign GFP-FR180713041 benign 0.336 benign GFP-FR180713043 benign 0.315 benign GFP-FR180713045 benign 0.158 benign
  • chr20 The model constructed by the combination of 31162101:31162800, chr22:23624092:23624791 (methylation marker combination 5) tested AUC in two sets of validation set samples.
  • the logistic regression model coefficients of each marker are shown in Table 5-1.
  • the logistic regression model intercept is 1.447.
  • the malignant prediction threshold is 0.52, that is, the malignant prediction probability is greater than 0.52, which is judged as malignant, otherwise it is judged as benign;
  • the sensitivity for the diagnosis of malignant thyroid nodules in the validation set 2 reached 92%, the specificity reached 87%, the PPV reached 87%, and the NPV Reached 91%. See Table 5-2 and Table 5-3 respectively for the prediction results of the two sets of validation set samples using methylation marker combination 5.
  • GFP-FR180615024 vicious 0.673 vicious GFP-FR180615026 vicious 0.762 vicious GFP-FR180615028 vicious 0.763 vicious GFP-FR180615030 vicious 0.671 vicious GFP-FR180615032 vicious 0.533 vicious GFP-FR180615034 vicious 0.862 vicious GFP-FR180615036 vicious 0.767 vicious GFP-FR180615038 vicious 0.565 vicious GFP-FR180713031 vicious 0.955 vicious GFP-FR180713033 vicious 0.740 vicious GFP-FR171230001 benign 0.361 benign GFP-FR171230003 benign 0.499 benign GFP-FR171230005 benign 0.427 benign GFP-FR171230007 benign 0.307 benign GFP-FR171230009 benign 0.164 benign GFP-FR171230013 benign 0.408 benign GFP-FR171230015 benign 0.360 benign GFP-FR171230017 benign 0.172 benign GFP-FR171230019 benign 0.277 benign GFP-FR180525002 benign
  • the models built with each methylation marker were tested for AUC in two sets of validation set samples separately.
  • the threshold defined by the Youden index of the training set was used as the malignant prediction threshold, above which it was judged as malignant, otherwise it was judged as benign.
  • the predictive performance of each methylation marker on the two sets of validation samples is shown in Table 6.
  • Table 7-1 The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 1
  • Table 7-2 The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 2
  • Table 7-3 The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 3
  • Table 7-4 The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 4
  • the correct rate of prediction using the methylation marker combination 5 of Example 5 was 96%, the sensitivity was 90%, and the specificity was 100%.
  • the prediction results of the validation set 1 samples with unclear cytological classification using the methylation marker combination 5 in Example 5 are shown in Table 7-5.
  • Table 7-5 The prediction results of the validation set 1 sample with ambiguous cytological classification using the methylation marker combination 5
  • AUS Atypia of Undetermined Significance
  • FN follicular neoplasms
  • SFN Suspicious for follicular neoplasm

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Plant Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a methylation marker in diagnosis of benign and malignant nodules of thyroid cancer and applications thereof, in particular, to an application of a reagent for detecting the methylation state or level of at least one CpG dinucleotide of one or more target markers in preparation of a detection reagent or diagnostic kit for diagnosing benign and malignant thyroid nodules in individuals, and an application of a device for detecting the methylation state or level of at least one CpG dinucleotide of one or more target markers in preparation of a diagnostic kit for diagnosing benign and malignant thyroid nodules in individuals, the target markers comprising a PRDM16 sequence of a PRDM16 gene or genome, a CAMK2N1 sequence of a CAMK2N1 gene or genome, a TACSTD2 sequence of a TACSTD2 gene or genome, and the like. The present invention further relates to a diagnostic reagent or diagnostic kit for detecting the methylation state or methylation level of at least one CpG dinucleotide in a target marker to diagnose benign and malignant thyroid nodules.

Description

甲状腺癌良恶性结节诊断的甲基化标志物及其应用Methylation markers and their application in the diagnosis of benign and malignant thyroid cancer nodules 技术领域technical field
本发明涉及甲状腺癌良恶性结节诊断的甲基化标志物及其应用。The invention relates to a methylation marker for the diagnosis of benign and malignant nodules of thyroid cancer and its application.
背景技术Background technique
甲状腺癌是起源于甲状腺滤泡上皮的恶性肿瘤。女性发病较多,男女发病比例为1:(2~4),发病年龄一般为21-40岁。甲状腺乳头状癌(Papillary thyroid cancer,PTC)是最常见的甲状腺癌,大约占所有甲状腺癌的80%。近年来,国内甲状腺癌发病率呈上升趋势。甲状腺癌只要早期发现并及时治疗,预后很好,10年生存率可达90%以上;但如果早期漏诊,病情发展到局部晚期,失去手术机会,无法治愈,5年生存率明显下降。Thyroid cancer is a malignant tumor originating from the thyroid follicular epithelium. Women are more likely to be affected, and the male to female incidence ratio is 1: (2 to 4). The age of onset is generally 21-40 years old. Papillary thyroid cancer (PTC) is the most common thyroid cancer, accounting for about 80% of all thyroid cancers. In recent years, the incidence of thyroid cancer in China has been on the rise. As long as thyroid cancer is detected early and treated in time, the prognosis is good, and the 10-year survival rate can reach more than 90%. However, if the early diagnosis is missed, the disease develops to a locally advanced stage, the chance of surgery is lost, and the 5-year survival rate drops significantly.
临床常规诊断方法是影像学检查。超声检查高度怀疑是恶性的甲状腺结节,还需要进一步的细针穿刺细胞学(fine needle aspiration,FNA)检查才能确诊。恶性结节和良性结节由于近似的细胞学特征导致一些PTC的诊断存在一定难度,高达40%的甲状腺结节很难通过细胞学特征准确诊断。目前的分子诊断方法提升了鉴别准确率,但这些方法的敏感性仍有待提高。
Figure PCTCN2022137459-appb-000001
Gene Expression Classifier应用较为普遍,但其阳性预测值(positive predictive value,PPV)只有47%,而且只能对新鲜的穿刺组织进行检测,限制了一些样本广泛应用。ThyroSeqv2检测良性结节经常携带的H/K/NRAS基因突变和RET/PTC基因重排,其PPV只有42-77%。此外,Diagnostic DNA Methylation Signature approach(DDMS)是一种基于DNA甲基化特征的诊断方法,用于甲状腺癌良恶性组织的鉴别。尽管该方法准确性很高,但有部分样本由于技术原因无法用该方法检测〔John H Yim,Audrey H Choi,Arthur X Li等,Identification of Tissue-Specific DNA Methylation Signatures for Thyroid Nodule Diagnostics,Clin Cancer Res,2019Jan 15;25(2):544-551〕。
The routine clinical diagnosis method is imaging examination. Ultrasound examination is highly suspicious of malignant thyroid nodules, and further fine needle aspiration cytology (fine needle aspiration, FNA) examination is needed to confirm the diagnosis. Due to the similar cytological features of malignant nodules and benign nodules, some PTCs are difficult to diagnose, and up to 40% of thyroid nodules are difficult to be accurately diagnosed by cytological features. Current molecular diagnostic methods have improved differential accuracy, but the sensitivity of these methods still needs to be improved.
Figure PCTCN2022137459-appb-000001
Gene Expression Classifier is widely used, but its positive predictive value (positive predictive value, PPV) is only 47%, and it can only be tested on fresh punctured tissues, which limits the wide application of some samples. ThyroSeqv2 detects H/K/NRAS gene mutations and RET/PTC gene rearrangements frequently carried by benign nodules, and its PPV is only 42-77%. In addition, the Diagnostic DNA Methylation Signature approach (DDMS) is a diagnostic method based on DNA methylation features, which is used to differentiate benign and malignant thyroid cancer tissues. Although the accuracy of this method is very high, some samples cannot be detected by this method due to technical reasons [John H Yim, Audrey H Choi, Arthur X Li, etc., Identification of Tissue-Specific DNA Methylation Signatures for Thyroid Nodule Diagnostics, Clin Cancer Res , 2019 Jan 15;25(2):544-551].
发明内容Contents of the invention
本发明第一方面提供检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂在制备诊断个体甲状腺结节良恶性的检测试剂或诊断试剂盒中的应用,以及用于确定下述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的装置在制备诊断个体甲状腺结节良恶性的诊断试剂盒中的应用,其中,所述一个或多个目标标志物选自:PRDM16基因或基因组的PRDM16序列、CAMK2N1基因或基因组的 CAMK2N1序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、IER5基因或基因组的IER5序列、ITPKB基因或基因组的ITPKB序列、ITGB1BP1基因或基因组的ITGB1BP1序列、MTHFD2基因或基因组的MTHFD2序列、BIN1基因或基因组的BIN1序列、DNASE1L3基因或基因组的DNASE1L3序列、LSG1基因或基因组的LSG1序列、SH3BP2基因或基因组的SH3BP2序列、SLC12A7基因或基因组的SLC12A7序列、NR2F1基因或基因组的NR2F1序列、EGR1基因或基因组的EGR1序列、LARP1基因或基因组的LARP1序列、RARS基因或基因组的RARS序列、TTBK1基因或基因组的TTBK1序列、FAM20C基因或基因组的FAM20C序列、CREB5基因或基因组的CREB5序列、LIMK1基因或基因组的LIMK1序列、PRKAG2基因或基因组的PRKAG2序列、SLC39A14基因或基因组的SLC39A14序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、AGPAT2基因或基因组的AGPAT2序列、NRARP基因或基因组的NRARP序列、EGR2基因或基因组的EGR2序列、PPIF基因或基因组的PPIF序列、CHID1基因或基因组的CHID1序列、ADM基因或基因组的ADM序列、NAV2基因或基因组的NAV2序列、EHBP1L1基因或基因组的EHBP1L1序列、PHLDB1基因或基因组的PHLDB1序列、PARP11基因或基因组的PARP11序列、ANO6基因或基因组的ANO6序列、PLXNC1基因或基因组的PLXNC1序列、ZNF219基因或基因组的ZNF219序列、FOXA1基因或基因组的FOXA1序列、PAPLN基因或基因组的PAPLN序列、UACA基因或基因组的UACA序列、PGPEP1L基因或基因组的PGPEP1L序列、ITPRIPL2基因或基因组的ITPRIPL2序列、TNK1基因或基因组的TNK1序列、RPL19基因或基因组的RPL19序列、ICAM2基因或基因组的ICAM2序列、TMC6基因或基因组的TMC6序列、CEP295NL基因或基因组的CEP295NL序列、BAIAP2基因或基因组的BAIAP2序列、TBCD基因或基因组的TBCD序列、METRNL基因或基因组的METRNL序列、MED16基因或基因组的MED16序列、SBNO2基因或基因组的SBNO2序列、CIRBP基因或基因组的CIRBP序列、KLF16基因或基因组的KLF16序列、C19orf77基因或基因组的C19orf77序列、SNAPC2基因或基因组的SNAPC2序列、ICAM1基因或基因组的ICAM1序列、ICAM5基因或基因组的ICAM5序列、IER2基因或基因组的IER2序列、ASF1B基因或基因组的ASF1B序列、CRTC1基因或基因组的CRTC1序列、ZNF536基因或基因组的ZNF536序列、LTBP4基因或基因组的LTBP4序列、NOL4L-DT基因或基因组的NOL4L-DT序列、KCNK15基因或基因组的KCNK15序列、UCKL1基因或基因组的UCKL1序列、RTN4R基因或基因组的RTN4R序列、BCR基因或基因组的BCR序列和TEF基因或基因组的TEF序列。The first aspect of the present invention provides the application of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers in the preparation of a detection reagent or a diagnostic kit for diagnosing benign and malignant thyroid nodules in individuals , and the application of the device for determining the methylation status or level of at least one CpG dinucleotide of the following one or more target markers in the preparation of a diagnostic kit for diagnosing benign and malignant thyroid nodules in individuals, wherein, The one or more target markers are selected from: PRDM16 gene or genome PRDM16 sequence, CAMK2N1 gene or genome CAMK2N1 sequence, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1 gene or genome BIN1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, LSG1 gene or genome LSG1 sequence, SH3BP2 gene or genome SH3BP2 sequence, SLC12A7 gene or genome SLC12A7 sequence, NR2F1 gene or genome NR2F1 sequence, EGR1 gene or genome EGR1 sequence, LARP1 gene or genome LARP1 sequence, RARS gene or genome RARS sequence, TTBK1 gene or genomic TTBK1 sequence, FAM20C gene or genomic FAM20C sequence, CREB5 gene or genomic CREB5 sequence, LIMK1 gene or genomic LIMK1 sequence, PRKAG2 gene or genomic PRKAG2 sequence, SLC39A14 gene or genomic SLC39A14 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, AGPAT2 gene or genomic AGPAT2 sequence, NRARP gene or genomic NRARP sequence, EGR2 gene or genomic EGR2 sequence, PPIF gene or genomic PPIF sequence, CHID1 gene or genomic CHID1 sequence, ADM gene or genome ADM sequence, NAV2 gene or genome NAV2 sequence, EHBP1L1 gene or genome EHBP1L1 sequence, PHLDB1 gene or genome PHLDB1 sequence, PARP11 gene or genome PARP11 sequence, ANO6 gene or genome ANO6 sequence, PLXNC1 gene or genome PLXNC1 sequence, ZNF219 gene or genome ZNF219 sequence, FOXA1 gene or genome FOXA1 sequence, PAPLN gene or genome PAPLN sequence, UACA gene or genome UACA sequence, PGPEP1L gene or genome PGPEP1L sequence, ITPRIPL2 gene or genomic ITPRIPL2 sequence, TNK1 gene or genomic TNK1 sequence, RPL19 gene or genomic RPL19 sequence, ICAM2 gene or genomic ICAM2 sequence, TMC6 gene or genomic TMC6 sequence, CEP295NL gene or genomic CEP295NL sequence, BAIAP2 gene or genomic BAIAP2 sequence, TBCD gene or genome TBCD sequence, METRNL gene or genome METRNL sequence, MED16 gene or genome MED16 sequence, SBNO2 gene or genome SBNO2 sequence, CIRBP gene or genome CIRBP sequence, KLF16 gene or genome KLF16 sequence, C19orf77 gene or genome C19orf77 sequence, SNAPC2 gene or genome SNAPC2 sequence, ICAM1 gene or genome ICAM1 sequence, ICAM5 gene or genome ICAM5 sequence, IER2 gene or genome IER2 sequence, ASF1B gene or genome ASF1B sequence, CRTC1 gene or genome CRTC1 sequence, ZNF536 gene or genome ZNF536 sequence, LTBP4 gene or genome LTBP4 sequence, NOL4L-DT gene or genome NOL4L-DT sequence, KCNK15 gene or genome KCNK15 sequence, UCKL1 gene or genome UCKL1 sequence, RTN4R gene or genome RTN4R sequence, BCR gene or genome BCR sequence and TEF gene or genome TEF sequence.
在一个或多个实施方案中,所述一个或多个目标标志物选自:PRDM16基因或基因组的PRDM16序列、BIN1基因或基因组的BIN1序列、LIMK1基因或基因组的LIMK1序列、 EGR3基因或基因组的EGR3序列、PPIF基因或基因组的PPIF序列、ZNF219基因或基因组的ZNF219序列、UACA基因或基因组的UACA序列、TNK1基因或基因组的TNK1序列、CEP295NL基因或基因组的CEP295NL序列、SBNO2基因或基因组的SBNO2序列、C19orf77基因或基因组的C19orf77序列、ICAM5基因或基因组的ICAM5序列、CRTC1基因或基因组的CRTC1序列、RTN4R基因或基因组的RTN4R序列、CAMK2N1基因或基因组的CAMK2N1序列、DNASE1L3基因或基因组的DNASE1L3序列、DUSP26基因或基因组的DUSP26序列、ICAM2基因或基因组的ICAM2序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、NOL4L-DT基因或基因组的NOL4L-DT序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、LSG1基因或基因组的LSG1序列和BCR基因或基因组的BCR序列。In one or more embodiments, the one or more target markers are selected from: PRDM16 gene or genomic PRDM16 sequence, BIN1 gene or genomic BIN1 sequence, LIMK1 gene or genomic LIMK1 sequence, EGR3 gene or genomic EGR3 sequence, PPIF gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence , C19orf77 gene or genome C19orf77 sequence, ICAM5 gene or genome ICAM5 sequence, CRTC1 gene or genome CRTC1 sequence, RTN4R gene or genome RTN4R sequence, CAMK2N1 gene or genome CAMK2N1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, DUSP26 DUSP26 sequence of gene or genome, ICAM2 gene or ICAM2 sequence of genome, BAIAP2 gene or genome of BAIAP2 sequence, MED16 gene or genome of MED16 sequence, NOL4L-DT gene or genome of NOL4L-DT sequence, TACSTD2 gene or genome of TACSTD2 sequence , CRABP2 gene or genome CRABP2 sequence, LSG1 gene or genome LSG1 sequence and BCR gene or genome BCR sequence.
在一个或多个实施方案中,所述一个或多个目标标志物至少包括下述目标标志物中的一个或多个:EGR3基因或基因组的EGR3序列、TNK1基因或基因组的TNK1序列、DNASE1L3基因或基因组的DNASE1L3序列、DUSP26基因或基因组的DUSP26序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、C19orf77基因或基因组的C19orf77序列、NOL4L-DT基因或基因组的NOL4L-DT序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、BCR基因或基因组的BCR序列。In one or more embodiments, the one or more target markers include at least one or more of the following target markers: EGR3 gene or genomic EGR3 sequence, TNK1 gene or genomic TNK1 sequence, DNASE1L3 gene Or the DNASE1L3 sequence of the genome, the DUSP26 gene or the DUSP26 sequence of the genome, the BAIAP2 gene or the BAIAP2 sequence of the genome, the MED16 gene or the MED16 sequence of the genome, the C19orf77 gene or the C19orf77 sequence of the genome, the NOL4L-DT gene or the NOL4L-DT sequence of the genome, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, BCR gene or genome BCR sequence.
在一个或多个实施方案中,所述一个或多个目标标志物包括:PRDM16基因或基因组的PRDM16序列、BIN1基因或基因组的BIN1序列、LIMK1基因或基因组的LIMK1序列、EGR3基因或基因组的EGR3序列、PPIF基因或基因组的PPIF序列、ZNF219基因或基因组的ZNF219序列、UACA基因或基因组的UACA序列、TNK1基因或基因组的TNK1序列、CEP295NL基因或基因组的CEP295NL序列、SBNO2基因或基因组的SBNO2序列、C19orf77基因或基因组的C19orf77序列、ICAM5基因或基因组的ICAM5序列、CRTC1基因或基因组的CRTC1序列和RTN4R基因或基因组的RTN4R序列。In one or more embodiments, the one or more target markers include: PRDM16 gene or genomic PRDM16 sequence, BIN1 gene or genomic BIN1 sequence, LIMK1 gene or genomic LIMK1 sequence, EGR3 gene or genomic EGR3 sequence, PPIF gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence, C19orf77 gene or genome C19orf77 sequence, ICAM5 gene or genome ICAM5 sequence, CRTC1 gene or genome CRTC1 sequence, and RTN4R gene or genome RTN4R sequence.
在一个或多个实施方案中,CAMK2N1基因或基因组的CAMK2N1序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、ICAM2基因或基因组的ICAM2序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、C19orf77基因或基因组的C19orf77序列和NOL4L-DT基因或基因组的NOL4L-DT序列。In one or more embodiments, the CAMK2N1 gene or genomic CAMK2N1 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, ICAM2 gene or genomic ICAM2 sequence, BAIAP2 Gene or genome BAIAP2 sequence, MED16 gene or genome MED16 sequence, C19orf77 gene or genome C19orf77 sequence, and NOL4L-DT gene or genome NOL4L-DT sequence.
在一个或多个实施方案中,TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、DNASE1L3基因或基因组的DNASE1L3序列、LSG1基因或基因组的LSG1序列、EGR3基因或基因组的EGR3序列、TNK1基因或基因组的TNK1序列、 BAIAP2基因或基因组的BAIAP2序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列。In one or more embodiments, the TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, LSG1 gene or genomic LSG1 sequence, EGR3 gene or genomic EGR3 sequence, TNK1 The TNK1 sequence of the gene or genome, the BAIAP2 gene or the BAIAP2 sequence of the genome, the NOL4L-DT gene or the NOL4L-DT sequence of the genome, and the BCR gene or the BCR sequence of the genome.
在一个或多个实施方案中,TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列。In one or more embodiments, the TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, TNK1 Gene or genome TNK1 sequence, BAIAP2 gene or genome BAIAP2 sequence, NOL4L-DT gene or genome NOL4L-DT sequence and BCR gene or genome BCR sequence.
在一个或多个实施方案中,TACSTD2基因或基因组的TACSTD2序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列。In one or more embodiments, TACSTD2 gene or genomic TACSTD2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, TNK1 gene or genomic TNK1 sequence, BAIAP2 BAIAP2 sequence of gene or genome, MED16 gene or MED16 sequence of genome, NOL4L-DT gene or NOL4L-DT sequence of genome, and BCR gene or BCR sequence of genome.
在一个或多个实施方案中,所述一个或多个目标标志物的Hg19坐标如下:PRDM16基因:chr1:3155061:3155760;CAMK2N1基因:chr1:20813203:20813902;TACSTD2基因:chr1:59041615:59042314;CRABP2基因:chr1:156676274:156676973;IER5基因:chr1:181074539:181075238;ITPKB基因:chr1:226924700:226925399;ITGB1BP1基因:chr2:9526804:9527503;MTHFD2基因:chr2:74453839:74454538;BIN1基因:chr2:127822196:127822895;DNASE1L3基因:chr3:58153211:58153910;LSG1基因:chr3:194408527:194409226;SH3BP2基因:chr4:2795032:2795731;SLC12A7基因:chr5:1117661:1118360;NR2F1基因:chr5:92914797:92915496;EGR1基因:chr5:137802399:137803098;LARP1基因:chr5:154133955:154134654;RARS基因:chr5:167837780:167838479;TTBK1基因:chr6:43215063:43215762;FAM20C基因:chr7:193512:194211;CREB5基因:chr7:28449041:28449740;LIMK1基因:chr7:73508743:73509442;PRKAG2基因:chr7:151424814:151425513;SLC39A14基因:chr8:22236914:22237613;EGR3基因:chr8:22547976:22549090;DUSP26基因:chr8:34104888:34105587;AGPAT2基因:chr9:139581855:139582554;NRARP基因:chr9:140205734:140206433;EGR2基因:chr10:64578269:64578968;PPIF基因:chr10:81001706:81002405;CHID1基因:chr11:911289:911988;ADM基因:chr11:10328946:10329645;NAV2基因:chr11:19734801:19736359;EHBP1L1基因:chr11:65343387:65344086;PHLDB1基因:chr11:118479144:118479843;PARP11基因:chr12:4139935:4140634;ANO6基因:chr12:45610331:45611030;PLXNC1基因:chr12:94544076:94544775;ZNF219基因:chr14:21559748:21560447;FOXA1基因: chr14:38064876:38065575;PAPLN基因:chr14:73704629:73705328;UACA基因:chr15:70766881:70767580;PGPEP1L基因:chr15:99466242:99466941;ITPRIPL2基因:chr16:19125694:19126393;TNK1基因:chr17:7286958:7287657;RPL19基因:chr17:37366033:37366732;ICAM2基因:chr17:62076008:62076707;TMC6基因:chr17:76113226:76124091;CEP295NL基因:chr17:76879761:76880460;BAIAP2基因:chr17:79060865:79061564;TBCD基因:chr17:80744791:80745490;METRNL基因:chr17:81083812:81084511;MED16基因:chr19:883793:884492;SBNO2基因:chr19:1177275:1177974;CIRBP基因:chr19:1265690:1266389;KLF16基因:chr19:1860343:1861042;C19orf77基因:chr19:3434666:3435687;SNAPC2基因:chr19:7985709:7986408;ICAM1基因:chr19:10381317:10382016;ICAM5基因:chr19:10404832:10405531;IER2基因:chr19:13266647:13267346;ASF1B基因:chr19:14248133:14248832;CRTC1基因:chr19:18770961:18771660;ZNF536基因:chr19:31039247:31039946;LTBP4基因:chr19:41105706:41106405;NOL4L-DT基因:chr20:31162101:31162800;KCNK15基因:chr20:43374048:43374747;UCKL1基因:chr20:62588113:62588812;RTN4R基因:chr22:20226373:20227274;BCR基因:chr22:23624092:23624791;TEF基因:chr22:41771229:41771928。In one or more embodiments, the Hg19 coordinates of the one or more target markers are as follows: PRDM16 gene: chr1:3155061:3155760; CAMK2N1 gene: chr1:20813203:20813902; TACSTD2 gene: chr1:59041615:59042314; CRABP2 gene: chr1:156676274:156676973; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454538; BIN1 gene: chr2: 127822196:127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene: chr4:2795032:2795731; SLC12A7 gene: chr5:11176 61:1118360; NR2F1 gene:chr5:92914797:92915496; EGR1 Gene: chr5:137802399:137803098; LARP1 gene: chr5:154133955:154134654; RARS gene: chr5:167837780:167838479; TTBK1 gene: chr6:43215063:43215762; FAM20C gene: chr7 :193512:194211; CREB5 gene: chr7:28449041 :28449740; LIMK1 gene: chr7:73508743:73509442; PRKAG2 gene: chr7:151424814:151425513; SLC39A14 gene: chr8:22236914:22237613; EGR3 gene: chr8:22547976:2254 9090; DUSP26 gene:chr8:34104888:34105587; AGPAT2 gene : chr9:139581855:139582554; NRARP gene: chr9:140205734:140206433; EGR2 gene: chr10:64578269:64578968; PPIF gene: chr10:81001706:81002405; CHID1 gene: chr11:91 1289:911988; ADM gene: chr11:10328946: 10329645; NAV2 gene: chr11:19734801:19736359; EHBP1L1 gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144:118479843; PARP11 gene: chr12:4139935:414 0634; ANO6 gene: chr12:45610331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14:21559748:21560447; FOXA1 gene: chr14:38064876:38065575; PAPLN gene: chr14:73704629:73705328; UACA gene: chr15: 70766881:70767580; PGPEP1L gene: chr15:99466242:99466941 ; ITPRIPL2 gene: chr16:19125694:19126393; TNK1 gene: chr17:7286958:7287657; RPL19 gene: chr17:37366033:37366732; ICAM2 gene: chr17:62076008:62076707; TMC6 gene: ch r17:76113226:76124091; CEP295NL gene: chr17 :76879761:76880460; BAIAP2 gene: chr17:79060865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr17:81083812:81084511; MED16 gene: chr19:883793 :884492; SBNO2 gene: chr19:1177275:1177974; CIRBP gene: chr19:1265690:1266389; KLF16 gene: chr19:1860343:1861042; C19orf77 gene: chr19:3434666:3435687; SNAPC2 gene: chr19:7985709:7986408; ICAM1 gene: chr19:1 0381317:10382016; ICAM5 gene: chr19: 10404832:10405531; IER2 gene: chr19:13266647:13267346; ASF1B gene: chr19:14248133:14248832; CRTC1 gene: chr19:18770961:18771660; ZNF536 gene: chr19:310392 47:31039946; LTBP4 gene: chr19:41105706:41106405; NOL4L -DT gene: chr20:31162101:31162800; KCNK15 gene: chr20:43374048:43374747; UCKL1 gene: chr20:62588113:62588812; RTN4R gene: chr22:20226373:20227274; BCR gene: chr 22:23624092:23624791; TEF gene: chr22 :41771229:41771928.
在一个或多个实施方案中,所述一个或多个目标标志物的Hg19坐标如下:PRDM16基因:chr1:3155311:3155510;CAMK2N1基因:chr1:20813453:20813652;TACSTD2基因:chr1:59041865:59042064;CRABP2基因:chr1:156676524:156676723;IER5基因:chr1:181074789:181074988;ITPKB基因:chr1:226924950:226925149;ITGB1BP1基因:chr2:9527054:9527253;MTHFD2基因:chr2:74454089:74454288;BIN1基因:chr2:127822446:127822645;DNASE1L3基因:chr3:58153461:58153660;LSG1基因:chr3:194408777:194408976;SH3BP2基因:chr4:2795282:2795481;SLC12A7基因:chr5:1117911:1118110;NR2F1基因:chr5:92915047:92915246;EGR1基因:chr5:137802649:137802848;LARP1基因:chr5:154134205:154134404;RARS基因:chr5:167838030:167838229;TTBK1基因:chr6:43215313:43215512;FAM20C基因:chr7:193762:193961;CREB5基因:chr7:28449291:28449490;LIMK1基因:chr7:73508993:73509192;PRKAG2基因:chr7:151425064:151425263;SLC39A14基因:chr8:22237164:22237363;EGR3基因:chr8:22548226:22548425;EGR3基因:chr8:22548641:22548840;DUSP26基因:chr8:34105138:34105337;AGPAT2基因:chr9:139582105:139582304;NRARP基因:chr9:140205984:140206183;EGR2基因:chr10:64578519:64578718;PPIF基因:chr10:81001956:81002155;CHID1基因: chr11:911539:911738;ADM基因:chr11:10329196:10329395;NAV2基因:chr11:19735051:19735250;NAV2基因:chr11:19735910:19736109;EHBP1L1基因:chr11:65343637:65343836;PHLDB1基因:chr11:118479394:118479593;PARP11基因:chr12:4140185:4140384;ANO6基因:chr12:45610581:45610780;PLXNC1基因:chr12:94544326:94544525;ZNF219基因:chr14:21559998:21560197;FOXA1基因:chr14:38065126:38065325;PAPLN基因:chr14:73704879:73705078;UACA基因:chr15:70767131:70767330;PGPEP1L基因:chr15:99466492:99466691;ITPRIPL2基因:chr16:19125944:19126143;TNK1基因:chr17:7287208:7287407;RPL19基因:chr17:37366283:37366482;ICAM2基因:chr17:62076258:62076457;TMC6基因:chr17:76113476:76113675;TMC6基因:chr17:76123642:76123841;CEP295NL基因:chr17:76880011:76880210;BAIAP2基因:chr17:79061115:79061314;TBCD基因:chr17:80745041:80745240;METRNL基因:chr17:81084062:81084261;MED16基因:chr19:884043:884242;SBNO2基因:chr19:1177525:1177724;CIRBP基因:chr19:1265940:1266139;KLF16基因:chr19:1860593:1860792;C19orf77基因:chr19:3434916:3435115;C19orf77基因:chr19:3435238:3435437;SNAPC2基因:chr19:7985959:7986158;ICAM1基因:chr19:10381567:10381766;ICAM5基因:chr19:10405082:10405281;IER2基因:chr19:13266897:13267096;ASF1B基因:chr19:14248383:14248582;CRTC1基因:chr19:18771211:18771410;ZNF536基因:chr19:31039497:31039696;LTBP4基因:chr19:41105956:41106155;NOL4L-DT基因:chr20:31162351:31162550;KCNK15基因:chr20:43374298:43374497;UCKL1基因:chr20:62588363:62588562;RTN4R基因:chr22:20226623:20226822;RTN4R基因:chr22:20226825:20227024;BCR基因:chr22:23624342:23624541;TEF基因:chr22:41771479:41771678。In one or more embodiments, the Hg19 coordinates of the one or more target markers are as follows: PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453:20813652; TACSTD2 gene: chr1:59041865:59042064; CRABP2 gene: chr1:156676524:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2: 127822446:127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene: chr4:2795282:2795481; SLC12A7 gene: chr5:11179 11:1118110; NR2F1 gene:chr5:92915047:92915246; EGR1 Gene: chr5:137802649:137802848; LARP1 gene: chr5:154134205:154134404; RARS gene: chr5:167838030:167838229; TTBK1 gene: chr6:43215313:43215512; FAM20C gene: chr7 :193762:193961; CREB5 gene: chr7:28449291 :28449490; LIMK1 gene: chr7:73508993:73509192; PRKAG2 gene: chr7:151425064:151425263; SLC39A14 gene: chr8:22237164:22237363; EGR3 gene: chr8:22548226:2254 8425; EGR3 gene:chr8:22548641:22548840; DUSP26 gene : chr8: 34105138: 34105337; AGPAT2 gene: chr9: 139582105: 139582304; NRARP gene: chr9: 140205984: 140206183; EGR2 gene: chr10: 64578519: 64578718; PPIF gene: chr10: 81 001956:81002155; CHID1 gene: chr11:911539: 911738; ADM gene: chr11:10329196:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736109; EHBP1L1 gene: chr11:65343637:65343836; PHLDB1 gene: chr11:118479394:118479593; PARP11 gene: chr12:4140185:4140384; ANO6 gene: chr12:45610581:45610780; PLXNC1 gene: chr12:94544326:94544525; ZNF219 gene: chr14:21559998:21560197; FOXA1 gene: chr14:38 065126:38065325; PAPLN gene: chr14:73704879:73705078 ; UACA gene: chr15:70767131:70767330; PGPEP1L gene: chr15:99466492:99466691; ITPRIPL2 gene: chr16:19125944:19126143; TNK1 gene: chr17:7287208:7287407; RPL19 gene : chr17:37366283:37366482; ICAM2 gene: chr17 :62076258:62076457; TMC6 gene: chr17:76113476:76113675; TMC6 gene: chr17:76123642:76123841; CEP295NL gene: chr17:76880011:76880210; BAIAP2 gene: chr17:790 61115:79061314; TBCD gene: chr17:80745041:80745240; METRNL gene: chr17:81084062:81084261; MED16 gene: chr19:884043:884242; SBNO2 gene: chr19:1177525:1177724; CIRBP gene: chr19:1265940:1266139; KLF16 gene: chr19:18 60593:1860792; C19orf77 gene: chr19: 3434916:3435115; C19orf77 gene: chr19:3435238:3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 gene: chr19:10381567:10381766; ICAM5 gene: chr19:10405082:1 0405281; IER2 gene: chr19:13266897:13267096; ASF1B Gene: chr19:14248383:14248582; CRTC1 gene: chr19:18771211:18771410; ZNF536 gene: chr19:31039497:31039696; LTBP4 gene: chr19:41105956:41106155; NOL4L-DT gene: ch r20:31162351:31162550; KCNK15 gene: chr20 :43374298:43374497; UCKL1 gene: chr20:62588363:62588562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr22:20226825:20227024; BCR gene: chr22:236243 42:23624541; TEF gene: chr22:41771479:41771678.
在一个或多个实施方案中,所述试剂包括引物和/或探针分子;优选地,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸,所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交。In one or more embodiments, the reagents include primer and/or probe molecules; preferably, the primer molecules are identical to, complementary to, or hybridize under stringent conditions to the one or more target markers and Containing at least 9 consecutive nucleotides, the probe molecule hybridizes to the amplification product of the one or more target markers under stringent conditions.
在一个或多个实施方案中,述试剂为实施基因组简化甲基化测序技术所需的试剂。在一个或多个实施方案中,所述实施基因组简化甲基化测序技术所需的试剂包括酶切所需的试剂、文库构建(如末端修复、加A尾和接头等)所需的试剂、进行胞嘧啶转化所需的试剂和PCR扩增所需的试剂等。本发明的检测试剂或诊断试剂盒中可包括上述试剂中的一种或多种。In one or more embodiments, the reagents described above are those required for performing genome-wide reduced methylation sequencing techniques. In one or more embodiments, the reagents required for implementing the simplified genome methylation sequencing technology include reagents required for enzyme digestion, reagents required for library construction (such as end repair, adding A tails and adapters, etc.), Reagents required for cytosine conversion, reagents required for PCR amplification, etc. One or more of the above-mentioned reagents may be included in the detection reagent or diagnostic kit of the present invention.
本发明第二方面提供一种用于检测本文任一实施方案所述的一个或多个目标标志物至 少一个CpG二核苷酸的甲基化状态或甲基化水平以诊断甲状腺结节良恶性的诊断试剂或诊断试剂盒,其包含用于检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂。The second aspect of the present invention provides a method for detecting the methylation status or methylation level of at least one CpG dinucleotide of one or more target markers described in any embodiment herein to diagnose benign and malignant thyroid nodules A diagnostic reagent or diagnostic kit comprising reagents for detecting the methylation status or level of at least one CpG dinucleotide of one or more markers of interest.
在一个或多个实施方案中,所述诊断试剂或诊断试剂盒包括引物和/或探针分子,其中,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸;所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交;任选地,所述诊断试剂或诊断试剂盒还包括检测内参基因ACTB的引物分子和/或探针分子。In one or more embodiments, the diagnostic reagent or diagnostic kit includes primer and/or probe molecules, wherein the primer molecules are identical to, complementary to, or hybridize under stringent conditions to the one or more A target marker and comprising at least 9 consecutive nucleotides; the probe molecule hybridizes with the amplification product of the one or more target markers under stringent conditions; optionally, the diagnostic reagent or diagnostic reagent The box also includes primer molecules and/or probe molecules for detecting the internal reference gene ACTB.
在一个或多个实施方案中,所述诊断试剂或诊断试剂盒还包括选自以下的一种或多种物质:PCR缓冲液、聚合酶、dNTP、限制性内切酶、酶切缓冲液、荧光染料、荧光淬灭剂、荧光报告剂、外切核酸酶、碱性磷酸酶、内标、对照物、KCl、MgCl 2和(NH 4) 2SO 4In one or more embodiments, the diagnostic reagent or diagnostic kit also includes one or more substances selected from the following: PCR buffer, polymerase, dNTP, restriction endonuclease, enzyme cleavage buffer, Fluorescent dyes, fluorescent quenchers, fluorescent reporters, exonuclease, alkaline phosphatase, internal standards, controls, KCl, MgCl 2 and (NH 4 ) 2 SO 4 .
在一个或多个实施方案中,所述检测甲基化的试剂还包括下述一个或多个方法中所用的试剂:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析和质谱。In one or more embodiments, the reagents for detecting methylation also include reagents used in one or more of the following methods: PCR based on bisulfite conversion, DNA sequencing, methylation-sensitive restriction Endonuclease assays, fluorometric assays, methylation-sensitive high-resolution melting curves, chip-based methylation profiling, and mass spectrometry.
在一个或多个实施方案中,所述试剂选自以下一种或多种:重亚硫酸盐及其衍生物、荧光染料、荧光淬灭剂、荧光报告剂、内标和对照物。In one or more embodiments, the reagent is selected from one or more of the following: bisulfite and derivatives thereof, fluorescent dyes, fluorescent quenchers, fluorescent reporters, internal standards, and controls.
本发明第三方面提供区分基因组DNA至少一个靶区域内甲基化和未甲基化CpG二核苷酸的至少一种试剂或成组试剂在制备用于检测和/或分类个体中甲状腺结节良恶性的方法的试剂盒中的用途,其中所述方法包括使从所述个体生物样品中分离的基因组DNA与所述至少一种试剂或成组试剂接触,其中所述靶区域等同于或互补于本文任一实施方案所述的一个或多个目标标志物的至少16连续核苷酸的序列,其中所述连续核苷酸包含至少一个CpG二核苷酸序列,由此至少部分地提供对甲状腺结节良恶性的检测和/或分类。A third aspect of the invention provides at least one reagent or set of reagents for distinguishing between methylated and unmethylated CpG dinucleotides in at least one target region of genomic DNA prepared for detecting and/or classifying thyroid nodules in an individual Use in a kit for a method of benign and malignant, wherein said method comprises contacting genomic DNA isolated from said individual biological sample with said at least one reagent or set of reagents, wherein said target region is identical to or complementary to A sequence of at least 16 contiguous nucleotides of one or more target markers according to any of the embodiments herein, wherein said contiguous nucleotides comprise at least one CpG dinucleotide sequence, thereby at least partially providing for Detection and/or classification of benign and malignant thyroid nodules.
本发明第四方面提供将5位未甲基化的胞嘧啶碱基转化为尿嘧啶或在杂交性能方面可检测地不同于胞嘧啶的其它碱基的一种或多种试剂、扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中甲状腺结节良恶性的方法的试剂盒中的用途,其中所述方法包括:a)从所述个体生物样品分离基因组DNA;b)用所述一种或多种试剂处理a)的所述基因组DNA或其片段;c)使所述经处理的基因组DNA或其经处理的片段与所述扩增酶和所述至少一种引物接触,所述引物相同于、互补于或在严谨条件下杂交于本文任一实施方案所述的一个或多个目标标志物,其中所述经处理的基因组DNA或其片段被扩增以产生至少一种扩增产物或不被扩增;以及d)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,或者反映所述一个或多个目标标志物的多个CpG二核苷酸平均甲基化状态或水平的均值或值, 由此至少部分地检测和/或分类个体中甲状腺结节的良恶性。A fourth aspect of the present invention provides one or more reagents, amplification enzymes, and other bases that convert unmethylated cytosine bases at position 5 to uracil or other bases that are detectably different from cytosine in terms of hybridization properties. Use of at least one primer comprising at least 9 consecutive nucleotides in the preparation of a test kit for a method of detecting and/or classifying benign and malignant thyroid nodules in an individual, wherein said method comprises: a) obtaining from said individual isolating genomic DNA from a biological sample; b) treating said genomic DNA or a fragment thereof of a) with said one or more reagents; c) combining said processed genomic DNA or a processed fragment thereof with said amplified The enzyme is contacted with the at least one primer that is identical to, complementary to, or hybridizes under stringent conditions to the one or more target markers described in any of the embodiments herein, wherein the processed genomic DNA or Fragments thereof are amplified to produce at least one amplification product or are not amplified; and d) determining at least one CpG di-core of the one or more target markers based on the presence or absence or nature of the amplified product The methylation state or level of the nucleotide, or the mean or value of the average methylation state or level of a plurality of CpG dinucleotides reflecting the one or more target markers, thereby at least partially detecting and/or Classification of benign and malignant thyroid nodules in individuals.
在一个或多个实施方案中,步骤b)中,使用选自亚硫酸氢盐、酸式亚硫酸盐、焦亚硫酸盐及其组合的试剂处理所述基因组DNA或其片段。In one or more embodiments, in step b), the genomic DNA or fragments thereof are treated with a reagent selected from the group consisting of bisulfite, acid sulfite, pyrosulfite and combinations thereof.
在一个或多个实施方案中,c)中,通过使用耐热DNA聚合酶作为所述扩增酶、使用缺乏5’-3’外切酶活性的聚合酶、使用聚合酶链式反应和/或产生带有可检测标记的扩增产物进行核酸分子的接触或扩增。In one or more embodiments, in c), by using thermostable DNA polymerase as said amplification enzyme, using a polymerase lacking 5'-3' exonuclease activity, using polymerase chain reaction and/or Or generate an amplification product with a detectable label for contacting or amplifying nucleic acid molecules.
在一个或多个实施方案中,c)中的接触或扩增包括使用甲基化特异的引物。In one or more embodiments, the contacting or amplifying in c) comprises the use of methylation-specific primers.
本发明第五方面提供一种或多种甲基化敏感限制酶和扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中甲状腺结节良恶性的方法的试剂盒中的用途,其中,所述引物相同于、互补于或在严谨条件下杂交于本文任一实施方案所述的一个或多个目标标志物;所述方法包括:a)从所述个体生物样品分离基因组DNA;b)以所述一种或多种甲基化敏感限制酶消化a)所述的基因组DNA或其片段,使所得消化产物与所述扩增酶和所述至少一种引物接触;和c)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,由此至少部分地检测和/或分类个体中甲状腺结节的良恶性。A fifth aspect of the present invention provides one or more methylation-sensitive restriction enzymes and amplification enzymes and at least one primer comprising at least 9 contiguous nucleotides prepared for detection and/or classification of benign thyroid nodules in individuals Use in a kit for a malignant method, wherein the primers are identical to, complementary to, or hybridize under stringent conditions to one or more target markers described in any embodiment herein; the method comprises: a) Isolating genomic DNA from the individual biological sample; b) digesting the genomic DNA or fragments thereof described in a) with the one or more methylation-sensitive restriction enzymes, and allowing the resulting digestion product to combine with the amplification enzyme and the contacting said at least one primer; and c) determining the methylation status or level of at least one CpG dinucleotide of said one or more markers of interest based on the presence or absence or nature of said amplified product, whereby Detecting and/or classifying, at least in part, benign or malignant thyroid nodules in an individual.
在一个或多个实施方案中,通过杂交至少一种核酸或肽核酸来确定扩增产物的存在与否,所述至少一种核酸或肽核酸等同于或互补于选自所述一个或多个目标标志物的序列的至少16碱基长片段。In one or more embodiments, the presence or absence of the amplified product is determined by hybridizing at least one nucleic acid or peptide nucleic acid that is identical to or complementary to the group selected from the one or more A fragment of at least 16 bases in length of the sequence of the marker of interest.
本发明第六方面提供衍生自本文任一实施方案所述的一个或多个目标标志物的经处理的核酸在制备用于诊断甲状腺结节良恶性的试剂盒中的用途,其中所述处理适合于将所述一个或多个目标标志物的至少一个未甲基化的胞嘧啶碱基转化至尿嘧啶或在杂交上可检测地不同于胞嘧啶的其它碱基。The sixth aspect of the present invention provides the use of the processed nucleic acid derived from one or more target markers described in any embodiment herein in the preparation of a kit for diagnosing benign and malignant thyroid nodules, wherein the processing is suitable for for converting at least one unmethylated cytosine base of the one or more markers of interest to uracil or other bases that hybridize detectably different from cytosine.
本发明第七方面提供用于检测并诊断个体甲状腺结节良恶性的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤:(1)获取样品中本文任一实施方案所述的一个或多个目标标志物至少一个CpG二核苷酸的甲基化水平或甲基化状态,和(2)根据(1)的甲基化水平或甲基化状态判读甲状腺结节良恶性。The seventh aspect of the present invention provides a device for detecting and diagnosing benign and malignant thyroid nodules in individuals, the device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The following steps are achieved when the procedure is described: (1) obtaining the methylation level or methylation state of at least one CpG dinucleotide of one or more target markers described in any embodiment herein in the sample, and (2) The benign and malignant thyroid nodules were judged according to the methylation level or methylation status of (1).
附图说明Description of drawings
图1:实施例1标志物组合构建的模型在训练集和两组验证集样本中针对恶性结节的ROC曲线Figure 1: ROC curves for malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 1
图2:实施例2标志物组合构建的模型在训练集和两组验证集样本中诊断恶性结节的 ROC曲线。Figure 2: ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 2.
图3:实施例3标志物组合构建的模型在训练集和两组验证集样本中诊断恶性结节的ROC曲线。Figure 3: ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 3.
图4:实施例4标志物组合构建的模型在训练集和两组验证集样本中诊断恶性结节的ROC曲线。Figure 4: ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 4.
图5:实施例5标志物组合构建的模型在训练集和两组验证集样本中诊断恶性结节的ROC曲线。Figure 5: ROC curves for the diagnosis of malignant nodules in the training set and two sets of validation set samples of the model constructed by the combination of markers in Example 5.
具体实施方式Detailed ways
虽然本申请公开了本申请的各个方面和各种实施方式,但是本领域技术人员可以在不脱离本申请的精神和范围的前提下做出各种等同改变或修改。本申请公开的各个方面和各种实施方式均是示例性的,并不旨在限制本申请的范围,本申请的实际保护范围以权利要求书为准。除非另有说明,否则本申请中使用的所有技术和科学术语均是本领域技术人员通常理解的含义。本申请引用的所有参考文献、专利和专利申请均通过引用并入本申请。Although the present application discloses various aspects and various embodiments of the present application, various equivalent changes or modifications can be made by those skilled in the art without departing from the spirit and scope of the present application. All aspects and various implementations disclosed in this application are exemplary, and are not intended to limit the scope of this application. The actual protection scope of this application shall be determined by the claims. Unless otherwise stated, all technical and scientific terms used in this application have the meaning commonly understood by those skilled in the art. All references, patents, and patent applications cited in this application are hereby incorporated by reference.
需注意的是,在本申请的说明书和权利要求书中,单数形式的“一个”、“一种”和“所述”均包括其复数形式,除非上下文另有说明。因此,例如,“一种试剂”包括多种试剂。It should be noted that in the specification and claims of the present application, the singular forms "a", "an" and "the" all include plural forms unless the context dictates otherwise. Thus, for example, reference to "a reagent" includes multiple reagents.
在本申请的说明书和权利要求书,除非另有说明,否则术语“包含”、“包括”或“含有”是指含有所列出的数值、步骤或成分,但也不排除还含有其他数值、步骤或成分。In the specification and claims of this application, unless otherwise stated, the term "comprising", "including" or "comprising" refers to the inclusion of listed values, steps or components, but does not exclude the inclusion of other values, steps or ingredients.
经过深入的研究,本发明人发现了一些与恶性甲状腺结节相关的目标标志物,这些目标标志物包括:PRDM16基因或基因组的PRDM16序列、CAMK2N1基因或基因组的CAMK2N1序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、IER5基因或基因组的IER5序列、ITPKB基因或基因组的ITPKB序列、ITGB1BP1基因或基因组的ITGB1BP1序列、MTHFD2基因或基因组的MTHFD2序列、BIN1基因或基因组的BIN1序列、DNASE1L3基因或基因组的DNASE1L3序列、LSG1基因或基因组的LSG1序列、SH3BP2基因或基因组的SH3BP2序列、SLC12A7基因或基因组的SLC12A7序列、NR2F1基因或基因组的NR2F1序列、EGR1基因或基因组的EGR1序列、LARP1基因或基因组的LARP1序列、RARS基因或基因组的RARS序列、TTBK1基因或基因组的TTBK1序列、FAM20C基因或基因组的FAM20C序列、CREB5基因或基因组的CREB5序列、LIMK1基因或基因组的LIMK1序列、PRKAG2基因或基因组的PRKAG2序列、SLC39A14基因或基因组的SLC39A14序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、AGPAT2基因或基因组的AGPAT2序列、NRARP基因或基因组的NRARP序列、EGR2基因或基因组的EGR2序列、PPIF基因或基因组的PPIF序 列、CHID1基因或基因组的CHID1序列、ADM基因或基因组的ADM序列、NAV2基因或基因组的NAV2序列、EHBP1L1基因或基因组的EHBP1L1序列、PHLDB1基因或基因组的PHLDB1序列、PARP11基因或基因组的PARP11序列、ANO6基因或基因组的ANO6序列、PLXNC1基因或基因组的PLXNC1序列、ZNF219基因或基因组的ZNF219序列、FOXA1基因或基因组的FOXA1序列、PAPLN基因或基因组的PAPLN序列、UACA基因或基因组的UACA序列、PGPEP1L基因或基因组的PGPEP1L序列、ITPRIPL2基因或基因组的ITPRIPL2序列、TNK1基因或基因组的TNK1序列、RPL19基因或基因组的RPL19序列、ICAM2基因或基因组的ICAM2序列、TMC6基因或基因组的TMC6序列、CEP295NL基因或基因组的CEP295NL序列、BAIAP2基因或基因组的BAIAP2序列、TBCD基因或基因组的TBCD序列、METRNL基因或基因组的METRNL序列、MED16基因或基因组的MED16序列、SBNO2基因或基因组的SBNO2序列、CIRBP基因或基因组的CIRBP序列、KLF16基因或基因组的KLF16序列、C19orf77基因或基因组的C19orf77序列、SNAPC2基因或基因组的SNAPC2序列、ICAM1基因或基因组的ICAM1序列、ICAM5基因或基因组的ICAM5序列、IER2基因或基因组的IER2序列、ASF1B基因或基因组的ASF1B序列、CRTC1基因或基因组的CRTC1序列、ZNF536基因或基因组的ZNF536序列、LTBP4基因或基因组的LTBP4序列、NOL4L-DT基因或基因组的NOL4L-DT序列、KCNK15基因或基因组的KCNK15序列、UCKL1基因或基因组的UCKL1序列、RTN4R基因或基因组的RTN4R序列、BCR基因或基因组的BCR序列和TEF基因或基因组的TEF序列。通过对来自个体的包含DNA的生物样品中的目标标志物中的一个或多个的甲基化水平进行检测可鉴别甲状腺结节良恶性。After in-depth research, the inventors have discovered some target markers related to malignant thyroid nodules, these target markers include: PRDM16 gene or genomic PRDM16 sequence, CAMK2N1 gene or genomic CAMK2N1 sequence, TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1 gene or genome BIN1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, LSG1 gene or genome LSG1 sequence, SH3BP2 gene or genome SH3BP2 sequence, SLC12A7 gene or genome SLC12A7 sequence, NR2F1 gene or genome NR2F1 sequence, EGR1 gene or genome EGR1 sequence, LARP1 gene or genomic LARP1 sequence, RARS gene or genomic RARS sequence, TTBK1 gene or genomic TTBK1 sequence, FAM20C gene or genomic FAM20C sequence, CREB5 gene or genomic CREB5 sequence, LIMK1 gene or genomic LIMK1 sequence, PRKAG2 gene or genomic PRKAG2 sequence, SLC39A14 gene or genome SLC39A14 sequence, EGR3 gene or genome EGR3 sequence, DUSP26 gene or genome DUSP26 sequence, AGPAT2 gene or genome AGPAT2 sequence, NRARP gene or genome NRARP sequence, EGR2 gene or genome EGR2 sequence, PPIF gene or genome PPIF sequence, CHID1 gene or genome CHID1 sequence, ADM gene or genome ADM sequence, NAV2 gene or genome NAV2 sequence, EHBP1L1 gene or genome EHBP1L1 sequence, PHLDB1 gene or genome PHLDB1 sequence, PARP11 gene or genome PARP11 sequence, ANO6 gene or genome ANO6 sequence, PLXNC1 gene or genome PLXNC1 sequence, ZNF219 gene or genome ZNF219 sequence, FOXA1 gene or genome FOXA1 sequence, PAPLN gene or genome PAPLN sequence, UACA gene Or the UACA sequence of the genome, the PGPEP1L gene or the PGPEP1L sequence of the genome, the ITPRIPL2 gene or the ITPRIPL2 sequence of the genome, the TNK1 gene or the TNK1 sequence of the genome, the RPL19 gene or the RPL19 sequence of the genome, the ICAM2 gene or the ICAM2 sequence of the genome, the TMC6 gene or the genome TMC6 sequence, CEP295NL gene or genome CEP295NL sequence, BAIAP2 gene or genome BAIAP2 sequence, TBCD gene or genome TBCD sequence, METRNL gene or genome METRNL sequence, MED16 gene or genome MED16 sequence, SBNO2 gene or genome SBNO2 sequence, CIRBP gene or genome CIRBP sequence, KLF16 gene or genome KLF16 sequence, C19orf77 gene or genome C19orf77 sequence, SNAPC2 gene or genome SNAPC2 sequence, ICAM1 gene or genome ICAM1 sequence, ICAM5 gene or genome ICAM5 sequence, IER2 gene or genome IER2 sequence, ASF1B gene or genome ASF1B sequence, CRTC1 gene or genome CRTC1 sequence, ZNF536 gene or genome ZNF536 sequence, LTBP4 gene or genome LTBP4 sequence, NOL4L-DT gene or genome NOL4L-DT sequence, KCNK15 gene or genome KCNK15 sequence, UCKL1 gene or genome UCKL1 sequence, RTN4R gene or genome RTN4R sequence, BCR gene or genome BCR sequence and TEF gene or genome TEF sequence. By detecting the methylation level of one or more target markers in a DNA-containing biological sample from an individual, benign and malignant thyroid nodules can be distinguished.
I.目标标志物及其目标区域I. Markers of interest and their target regions
如本文所用,术语“目标标志物”是指这样的目的核酸或基因区域:其甲基化水平指示着结甲状腺结节良恶性。术语“目标标志物”应被认为包括本文所述基因的所有转录变体及其所有启动子和调控元件。如本领域技术人员所理解的,已知某些基因在个体之间表现出等位基因变异或单核苷酸多态性(“SNP”)。SNP包括不同长度的简单的重复序列(例如二核苷酸和三核苷酸重复)的插入和缺失。因此,本申请应被理解为扩展到由任何其他突变、多态性或等位基因变异产生的标志物/基因的所有形式。另外,应当理解,术语“目标标志物”应既包括标志物或基因的正义链序列,也包括标志物或基因的反义链序列。As used herein, the term "target marker" refers to a target nucleic acid or gene region whose methylation level indicates benign or malignant nodular thyroid nodules. The term "marker of interest" shall be considered to include all transcript variants of the genes described herein and all promoters and regulatory elements thereof. As will be appreciated by those skilled in the art, certain genes are known to exhibit allelic variation or single nucleotide polymorphisms ("SNPs") between individuals. SNPs include insertions and deletions of simple repeats (eg, dinucleotide and trinucleotide repeats) of varying lengths. Accordingly, this application should be understood to extend to all forms of the marker/gene resulting from any other mutation, polymorphism or allelic variation. In addition, it should be understood that the term "target marker" shall include both the sense strand sequence of the marker or gene and the antisense strand sequence of the marker or gene.
本文所用的术语“目标标志物”被宽泛地解释为既包括1)在生物样品或基因组DNA中发现的原始标志物(处于特定的甲基化状态),也包括2)其经过处理的序列(例如亚硫酸氢盐转化后的对应区域或MSRE处理后的对应区域)。亚硫酸氢盐转化后的对应区域与 基因组序列中的目标标志物不同之处在于,一个或多个未甲基化的胞嘧啶残基被转化为尿嘧啶碱基、胸腺嘧啶碱基或在杂交行为上与胞嘧啶不同的其他碱基。经MSRE处理的对应区域与基因组序列中的目标标志物不同之处在于,该序列在一个或多个MSRE切割位点处被切割。The term "marker of interest" as used herein is broadly interpreted to include both 1) the original marker (in a specific methylation state) found in a biological sample or genomic DNA, and 2) its processed sequence ( For example the corresponding area after bisulfite conversion or the corresponding area after MSRE treatment). The bisulfite-converted corresponding region differs from the marker of interest in the genomic sequence by one or more unmethylated cytosine residues being converted to a uracil base, a thymine base, or Other bases that behave differently from cytosine. The MSRE-treated corresponding region differs from the target marker in the genomic sequence by being cleaved at one or more MSRE cleavage sites.
本发明中的分子诊断,除了甲状腺恶性肿瘤的早期诊断,还包括甲状腺恶性肿瘤晚期诊断,且也包括甲状腺恶性肿瘤筛选、风险评估、预后、疾病识别。早期诊断指的是在转移之前发现癌症的可能性,优选在可观察到组织或者细胞的形态学变化之前。The molecular diagnosis in the present invention includes not only the early diagnosis of thyroid malignancy, but also the late diagnosis of thyroid malignancy, and also includes thyroid malignancy screening, risk assessment, prognosis, and disease identification. Early diagnosis refers to the possibility of finding cancer before metastasis, preferably before morphological changes in tissue or cells are observable.
在本文中,应该理解的是,本文所述各产品、用途和方法中所涉及的目标标志物PRDM16、CAMK2N1、TACSTD2、CRABP2、IER5、ITPKB、ITGB1BP1、MTHFD2、BIN1、DNASE1L3、LSG1、SH3BP2、SLC12A7、NR2F1、EGR1、LARP1、RARS、TTBK1、FAM20C、CREB5、LIMK1、PRKAG2、SLC39A14、EGR3、DUSP26、AGPAT2、NRARP、EGR2、PPIF、CHID1、ADM、NAV2、EHBP1L1、PHLDB1、PARP11、ANO6、PLXNC1、ZNF219、FOXA1、PAPLN、UACA、PGPEP1L、ITPRIPL2、TNK1、RPL19、ICAM2、TMC6、CEP295NL、BAIAP2、TBCD、METRNL、MED16、SBNO2、CIRBP、KLF16、C19orf77、SNAPC2、ICAM1、ICAM5、IER2、ASF1B、CRTC1、ZNF536、LTBP4、NOL4L-DT、KCNK15、UCKL1、RTN4R、BCR和TEF基因既可通过引用其名称又可通过其染色体坐标来进行描述。所述染色体坐标与2009年2月发布的人类基因组数据库Hg19版本一致(在本文中称为“Hg19坐标”)。应理解的是,本所述的某个基因及其基因组的序列也包括各基因的含有至少一个CpG二核苷酸序列的片段。在一些实施方案中,该片段为本文所述的各基因的目标区域。Herein, it should be understood that the target markers PRDM16, CAMK2N1, TACSTD2, CRABP2, IER5, ITPKB, ITGB1BP1, MTHFD2, BIN1, DNASE1L3, LSG1, SH3BP2, SLC12A7 involved in the products, uses and methods described herein , NR2F1, EGR1, LARP1, RARS, TTBK1, FAM20C, CREB5, LIMK1, PRKAG2, SLC39A14, EGR3, DUSP26, AGPAT2, NRARP, EGR2, PPIF, CHID1, ADM, NAV2, EHBP1L1, PHLDB1, PARP11, ANO6, PLXNC1, ZNF219 , FOXA1, PAPLN, UACA, PGPEP1L, ITPRIPL2, TNK1, RPL19, ICAM2, TMC6, CEP295NL, BAIAP2, TBCD, METRNL, MED16, SBNO2, CIRBP, KLF16, C19orf77, SNAPC2, ICAM1, ICAM5, IER2, ASF1B, CRTC1, ZNF 536 , LTBP4, NOL4L-DT, KCNK15, UCKL1, RTN4R, BCR, and TEF genes can be described both by reference to their names and by their chromosomal coordinates. The chromosomal coordinates are consistent with the Hg19 version of the Human Genome Database released in February 2009 (referred to herein as "Hg19 coordinates"). It should be understood that the sequence of a gene and its genome described herein also includes fragments of each gene containing at least one CpG dinucleotide sequence. In some embodiments, the fragment is the target region of each gene described herein.
在一些实施方案中,本文提及的各基因的Hg19坐标如下:PRDM16基因:chr1:3155061:3155760;CAMK2N1基因:chr1:20813203:20813902;TACSTD2基因:chr1:59041615:59042314;CRABP2基因:chr1:156676274:156676973;IER5基因:chr1:181074539:181075238;ITPKB基因:chr1:226924700:226925399;ITGB1BP1基因:chr2:9526804:9527503;MTHFD2基因:chr2:74453839:74454538;BIN1基因:chr2:127822196:127822895;DNASE1L3基因:chr3:58153211:58153910;LSG1基因:chr3:194408527:194409226;SH3BP2基因:chr4:2795032:2795731;SLC12A7基因:chr5:1117661:1118360;NR2F1基因:chr5:92914797:92915496;EGR1基因:chr5:137802399:137803098;LARP1基因:chr5:154133955:154134654;RARS基因:chr5:167837780:167838479;TTBK1基因:chr6:43215063:43215762;FAM20C基因:chr7:193512:194211;CREB5基因:chr7:28449041:28449740;LIMK1基因:chr7:73508743:73509442;PRKAG2基因:chr7:151424814:151425513;SLC39A14基因:chr8:22236914:22237613;EGR3基因:chr8:22547976:22549090;DUSP26基因: chr8:34104888:34105587;AGPAT2基因:chr9:139581855:139582554;NRARP基因:chr9:140205734:140206433;EGR2基因:chr10:64578269:64578968;PPIF基因:chr10:81001706:81002405;CHID1基因:chr11:911289:911988;ADM基因:chr11:10328946:10329645;NAV2基因:chr11:19734801:19736359;EHBP1L1基因:chr11:65343387:65344086;PHLDB1基因:chr11:118479144:118479843;PARP11基因:chr12:4139935:4140634;ANO6基因:chr12:45610331:45611030;PLXNC1基因:chr12:94544076:94544775;ZNF219基因:chr14:21559748:21560447;FOXA1基因:chr14:38064876:38065575;PAPLN基因:chr14:73704629:73705328;UACA基因:chr15:70766881:70767580;PGPEP1L基因:chr15:99466242:99466941;ITPRIPL2基因:chr16:19125694:19126393;TNK1基因:chr17:7286958:7287657;RPL19基因:chr17:37366033:37366732;ICAM2基因:chr17:62076008:62076707;TMC6基因:chr17:76113226:76124091;CEP295NL基因:chr17:76879761:76880460;BAIAP2基因:chr17:79060865:79061564;TBCD基因:chr17:80744791:80745490;METRNL基因:chr17:81083812:81084511;MED16基因:chr19:883793:884492;SBNO2基因:chr19:1177275:1177974;CIRBP基因:chr19:1265690:1266389;KLF16基因:chr19:1860343:1861042;C19orf77基因:chr19:3434666:3435687;SNAPC2基因:chr19:7985709:7986408;ICAM1基因:chr19:10381317:10382016;ICAM5基因:chr19:10404832:10405531;IER2基因:chr19:13266647:13267346;ASF1B基因:chr19:14248133:14248832;CRTC1基因:chr19:18770961:18771660;ZNF536基因:chr19:31039247:31039946;LTBP4基因:chr19:41105706:41106405;NOL4L-DT基因:chr20:31162101:31162800;KCNK15基因:chr20:43374048:43374747;UCKL1基因:chr20:62588113:62588812;RTN4R基因:chr22:20226373:20227274;BCR基因:chr22:23624092:23624791;TEF基因:chr22:41771229:41771928。In some embodiments, the HG19 coordinates of each gene mentioned in this article are as follows: PRDM16 gene: CHR1: 3155061: 3155760; CAMK2N1 gene: CHR1: 20813203: 20813902; TACSTD2 gene: CHR1: 59041615: 59042314; CRA BP2 gene: CHR1: 156676274 :156676973; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454 538;BIN1gene:chr2:127822196:127822895;DNASE1L3gene : chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene: chr4:2795032:2795731; SLC12A7 gene: chr5:1117661:1118360; NR2F1 gene: chr5:9291 4797:92915496; EGR1 gene: chr5:137802399: 137803098; LARP1 gene: chr5:154133955:154134654; RARS gene: chr5:167837780:167838479; TTBK1 gene: chr6:43215063:43215762; FAM20C gene: chr7:193512:194211; CR EB5 gene: chr7:28449041:28449740; LIMK1 gene: chr7:73508743:73509442; PRKAG2 gene: chr7:151424814:151425513; SLC39A14 gene: chr8:22236914:22237613; EGR3 gene: chr8:22547976:22549090; DUSP26 gene: chr8 :34104888:34105587; AGPAT2 gene: chr9:139581855:139582554 ; NRARP gene: chr9:140205734:140206433; EGR2 gene: chr10:64578269:64578968; PPIF gene: chr10:81001706:81002405; CHID1 gene: chr11:911289:911988; ADM gene: chr11:103 28946:10329645; NAV2 gene: chr11 :19734801:19736359; EHBP1L1 gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144:118479843; PARP11 gene: chr12:4139935:4140634; ANO6 gene: chr12:456 10331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14:21559748:21560447; FOXA1 gene: chr14:38064876:38065575; PAPLN gene: chr14:73704629:73705328; UACA gene: chr15:70766881:70767580; PGPEP1L gene: chr15:99466242:99466941; ITPRIPL2 gene: chr16: 19125694:19126393; TNK1 gene: chr17:7286958:7287657; RPL19 gene: chr17:37366033:37366732; ICAM2 gene: chr17:62076008:62076707; TMC6 gene: chr17:76113226: 76124091; CEP295NL gene: chr17:76879761:76880460; BAIAP2 Gene: chr17:79060865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr17:81083812:81084511; MED16 gene: chr19:883793:884492; SBNO2 gene: chr19:1177 275:1177974; CIRBP gene: chr19:1265690 :1266389; KLF16 gene: chr19:1860343:1861042; C19orf77 gene: chr19:3434666:3435687; SNAPC2 gene: chr19:7985709:7986408; ICAM1 gene: chr19:10381317:10382016 ; ICAM5 gene: chr19:10404832:10405531; IER2 gene : chr19:13266647:13267346; ASF1B gene: chr19:14248133:14248832; CRTC1 gene: chr19:18770961:18771660; ZNF536 gene: chr19:31039247:31039946; LTBP4 gene: chr19: 41105706:41106405; NOL4L-DT gene: chr20: 31162101:31162800; KCNK15 gene: chr20:43374048:43374747; UCKL1 gene: chr20:62588113:62588812; RTN4R gene: chr22:20226373:20227274; BCR gene: chr22:236240 92:23624791; TEF gene: chr22:41771229:41771928.
在一些实施方案中,所述EGR3基因、NAV2基因、TMC6基因、C19orf77基因以及RTN4R基因可包括下述两个Hg坐标区域:In some embodiments, the EGR3 gene, NAV2 gene, TMC6 gene, C19orf77 gene and RTN4R gene may include the following two Hg coordinate regions:
EGR3基因:chr8:22547976:22548675;chr8:22548391:22549090;EGR3 gene: chr8:22547976:22548675; chr8:22548391:22549090;
NAV2基因:chr11:19734801:19735500;chr11:19735660:19736359;NAV2 gene: chr11:19734801:19735500; chr11:19735660:19736359;
TMC6基因:chr17:76113226:76113925;chr17:76123392:76124091;TMC6 gene: chr17:76113226:76113925; chr17:76123392:76124091;
C19orf77基因:chr19:3434666:3435365;chr19:3434988:3435687;C19orf77 gene: chr19:3434666:3435365; chr19:3434988:3435687;
RTN4R基因:chr22:20226373:20227072;chr22:20226575:20227274。RTN4R gene: chr22:20226373:20227072; chr22:20226575:20227274.
在进一步优选的实施方案中,本文所述的一个或多个目标标志物的Hg坐标区域分别为:PRDM16基因:chr1:3155311:3155510;CAMK2N1基因:chr1:20813453:20813652;TACSTD2 基因:chr1:59041865:59042064;CRABP2基因:chr1:156676524:156676723;IER5基因:chr1:181074789:181074988;ITPKB基因:chr1:226924950:226925149;ITGB1BP1基因:chr2:9527054:9527253;MTHFD2基因:chr2:74454089:74454288;BIN1基因:chr2:127822446:127822645;DNASE1L3基因:chr3:58153461:58153660;LSG1基因:chr3:194408777:194408976;SH3BP2基因:chr4:2795282:2795481;SLC12A7基因:chr5:1117911:1118110;NR2F1基因:chr5:92915047:92915246;EGR1基因:chr5:137802649:137802848;LARP1基因:chr5:154134205:154134404;RARS基因:chr5:167838030:167838229;TTBK1基因:chr6:43215313:43215512;FAM20C基因:chr7:193762:193961;CREB5基因:chr7:28449291:28449490;LIMK1基因:chr7:73508993:73509192;PRKAG2基因:chr7:151425064:151425263;SLC39A14基因:chr8:22237164:22237363;EGR3基因:chr8:22548226:22548425;EGR3基因:chr8:22548641:22548840;DUSP26基因:chr8:34105138:34105337;AGPAT2基因:chr9:139582105:139582304;NRARP基因:chr9:140205984:140206183;EGR2基因:chr10:64578519:64578718;PPIF基因:chr10:81001956:81002155;CHID1基因:chr11:911539:911738;ADM基因:chr11:10329196:10329395;NAV2基因:chr11:19735051:19735250;NAV2基因:chr11:19735910:19736109;EHBP1L1基因:chr11:65343637:65343836;PHLDB1基因:chr11:118479394:118479593;PARP11基因:chr12:4140185:4140384;ANO6基因:chr12:45610581:45610780;PLXNC1基因:chr12:94544326:94544525;ZNF219基因:chr14:21559998:21560197;FOXA1基因:chr14:38065126:38065325;PAPLN基因:chr14:73704879:73705078;UACA基因:chr15:70767131:70767330;PGPEP1L基因:chr15:99466492:99466691;ITPRIPL2基因:chr16:19125944:19126143;TNK1基因:chr17:7287208:7287407;RPL19基因:chr17:37366283:37366482;ICAM2基因:chr17:62076258:62076457;TMC6基因:chr17:76113476:76113675;TMC6基因:chr17:76123642:76123841;CEP295NL基因:chr17:76880011:76880210;BAIAP2基因:chr17:79061115:79061314;TBCD基因:chr17:80745041:80745240;METRNL基因:chr17:81084062:81084261;MED16基因:chr19:884043:884242;SBNO2基因:chr19:1177525:1177724;CIRBP基因:chr19:1265940:1266139;KLF16基因:chr19:1860593:1860792;C19orf77基因:chr19:3434916:3435115;C19orf77基因:chr19:3435238:3435437;SNAPC2基因:chr19:7985959:7986158;ICAM1基因:chr19:10381567:10381766;ICAM5基因:chr19:10405082:10405281;IER2基因:chr19:13266897:13267096;ASF1B基因:chr19:14248383:14248582;CRTC1基因:chr19:18771211:18771410;ZNF536基因: chr19:31039497:31039696;LTBP4基因:chr19:41105956:41106155;NOL4L-DT基因:chr20:31162351:31162550;KCNK15基因:chr20:43374298:43374497;UCKL1基因:chr20:62588363:62588562;RTN4R基因:chr22:20226623:20226822;RTN4R基因:chr22:20226825:20227024;BCR基因:chr22:23624342:23624541;TEF基因:chr22:41771479:41771678。In a further preferred embodiment, the Hg coordinate regions of one or more target markers described herein are: PRDM16 gene: chr1: 3155311: 3155510; CAMK2N1 gene: chr1: 20813453: 20813652; TACSTD2 gene: chr1: 59041865 :59042064; CRABP2 gene: chr1:156676524:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527 253;MTHFD2gene:chr2:74454089:74454288;BIN1gene : chr2:127822446:127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene: chr4:2795282:2795481; SLC12A7 gene: chr5: 1117911:1118110; NR2F1 gene: chr5:92915047: 92915246; EGR1 gene: chr5:137802649:137802848; LARP1 gene: chr5:154134205:154134404; RARS gene: chr5:167838030:167838229; TTBK1 gene: chr6:43215313:4321551 2; FAM20C gene: chr7:193762:193961; CREB5 gene: chr7:28449291:28449490; LIMK1 gene: chr7:73508993:73509192; PRKAG2 gene: chr7:151425064:151425263; SLC39A14 gene: chr8:22237164:22237363; EGR3 gene: chr8:2 2548226:22548425; EGR3 gene: chr8:22548641:22548840 ; DUSP26 gene: chr8: 34105138: 34105337; AGPAT2 gene: chr9: 139582105: 139582304; NRARP gene: chr9: 140205984: 140206183; EGR2 gene: chr10: 64578519: 64578718; PPIF gene: chr10:81001956:81002155; CHID1 gene: chr11 :911539:911738; ADM gene: chr11:10329196:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736109; EHBP1L1 gene: chr11:65343637:6 5343836; PHLDB1 gene: chr11:118479394:118479593; PARP11 gene: chr12:4140185:4140384; ANO6 gene: chr12:45610581:45610780; PLXNC1 gene: chr12:94544326:94544525; ZNF219 gene: chr14:21559998:21560197; FOXA1 gene: ch r14:38065126:38065325; PAPLN gene: chr14: 73704879:73705078; UACA gene: chr15:70767131:70767330; PGPEP1L gene: chr15:99466492:99466691; ITPRIPL2 gene: chr16:19125944:19126143; TNK1 gene: chr17:7287 208:7287407; RPL19 gene: chr17:37366283:37366482; ICAM2 Gene: chr17:62076258:62076457; TMC6 gene: chr17:76113476:76113675; TMC6 gene: chr17:76123642:76123841; CEP295NL gene: chr17:76880011:76880210; BAIAP2 gene: chr 17:79061115:79061314; TBCD gene: chr17:80745041 :80745240; METRNL gene: chr17:81084062:81084261; MED16 gene: chr19:884043:884242; SBNO2 gene: chr19:1177525:1177724; CIRBP gene: chr19:1265940:1266139; KLF1 6 genes: chr19:1860593:1860792; C19orf77 gene : chr19:3434916:3435115; C19orf77 gene: chr19:3435238:3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 gene: chr19:10381567:10381766; ICAM5 gene: chr19:1040 5082:10405281; IER2 gene: chr19:13266897: 13267096; ASF1B gene: chr19:14248383:14248582; CRTC1 gene: chr19:18771211:18771410; ZNF536 gene: chr19:31039497:31039696; LTBP4 gene: chr19:41105956:41106 155; NOL4L-DT gene: chr20:31162351:31162550; KCNK15 Gene: chr20:43374298:43374497; UCKL1 gene: chr20:62588363:62588562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr22:20226825:20227024; BCR gene: chr22: 23624342:23624541; TEF gene: chr22:41771479 :41771678.
本发明的目标标志物也包括上述每个区域的各个起始位点的上游5kb和各个末端位点的下游5kb。可在公共数据库(例如UCSC Genome Browser、Ensemble和NCBI网站)中获得上述Hg19坐标的特定核苷酸序列,以及每个区域的各个起始位点的上游5kb和各个末端位点的下游5kb。The target marker of the present invention also includes 5 kb upstream of each start site and 5 kb downstream of each end site of each of the above regions. Specific nucleotide sequences for the above Hg19 coordinates, as well as 5 kb upstream of each start site and 5 kb downstream of each end site for each region, are available in public databases such as UCSC Genome Browser, Ensemble, and the NCBI website.
本发明的目标标志物(如某个基因及其基因组的序列、或各基因的含有至少一个CpG二核苷酸序列的片段、或包含基因间隔区的序列)还包括非酶促法转化(如亚硫酸氢盐转化后的对应区域,以及酶促法转化(如MSRE转化)后获得的对应区域。The target marker of the present invention (such as a certain gene and its genome sequence, or a fragment of each gene containing at least one CpG dinucleotide sequence, or a sequence comprising an intergenic region) also includes non-enzymatic transformation (such as Corresponding regions after bisulfite conversion, and corresponding regions obtained after enzymatic conversion such as MSRE conversion.
在一些实施方式中,本发明的目标标志物也包括上述各基因的各类变体。变体包括来自相同区域的、与本文所述的基因或区域具有至少90%、91%、92%、93%、94%、95%、96%、97%、98%、99%的序列同一性(即,具有一个或多个缺失、插入、取代、反向序列等)的核酸序列。因此,本申请内容应理解为延伸至实现相同结果的此类变体,尽管事实上个体间的实际核酸序列具有微小的遗传变异。In some embodiments, the target markers of the present invention also include various variants of the above-mentioned genes. Variants include at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% sequence identity to a gene or region described herein from the same region Nucleic acid sequences that are neutral (ie, have one or more deletions, insertions, substitutions, reverse sequences, etc.). Accordingly, the disclosure of this application should be understood to extend to such variants which achieve the same result, notwithstanding the fact that there are minor genetic variations in the actual nucleic acid sequence between individuals.
如本文所用,术语“序列同一性的百分比(%)”是指候选序列的氨基酸(或核酸)残基和参考序列的氨基酸(或核酸)残基进行序列比对后的相同百分比,比对时可以引入间隔(如有必要)以使得相同的氨基酸(或核酸)数目达到最多。换言之,氨基酸序列(或核酸序列)的序列同一性百分比(%)可以通过用与参考序列相同的氨基酸残基(或碱基)的数目除以候选序列或参考序列中氨基酸残基(或碱基)的总数(以较短者为准)来计算。氨基酸残基的保守取代可以被认为或可以不被认为是相同的残基。可以通过以下方式来确定氨基酸(或核酸)序列同一性的百分比,例如,可以使用公开的工具如BLASTN、BLASTp(可在美国国家生物技术信息中心(NCBI)的网站上获得,也可参见Altschul S.F.et al.,J.Mol.Biol.,215:403–410(1990);Stephen F.et al.,Nucleic Acids Res.,25:3389–3402(1997))、ClustalW2(可在欧洲生物信息研究所的网站上找到),也可参见Higgins D.G.et al.,Methods in Enzymology,266:383-402(1996);Larkin M.A.et al.,Bioinformatics(Oxford,England),23(21):2947-8(2007))和ALIGN或Megalign(DNASTAR)软件。本领域技术人员可以使用所述工具提供的默认参数,或者可以(例如,通过选择合适的算法)定制适合比对的参数。As used herein, the term "percentage (%) of sequence identity" refers to the same percentage of the amino acid (or nucleic acid) residues of the candidate sequence and the amino acid (or nucleic acid) residues of the reference sequence after sequence alignment, when compared Spacers (if necessary) may be introduced to maximize the number of identical amino acids (or nucleic acids). In other words, the sequence identity percentage (%) of an amino acid sequence (or nucleic acid sequence) can be calculated by dividing the number of amino acid residues (or bases) identical to the reference sequence by the number of amino acid residues (or bases) in the candidate sequence or reference sequence ) (whichever is shorter). Conservative substitutions of amino acid residues may or may not be considered to be the same residue. The percent amino acid (or nucleic acid) sequence identity can be determined, for example, using published tools such as BLASTN, BLASTp (available on the website of the National Center for Biotechnology Information (NCBI), see also Altschul S.F. et al., J.Mol.Biol., 215:403–410 (1990); Stephen F. et al., Nucleic Acids Res., 25:3389–3402 (1997)), ClustalW2 (available at European Bioinformatics Research Institute's website), see also Higgins D.G. et al., Methods in Enzymology, 266:383-402 (1996); Larkin M.A. et al., Bioinformatics (Oxford, England), 23(21):2947-8 (2007)) and ALIGN or Megalign (DNASTAR) software. Those skilled in the art can use the default parameters provided by the tool, or can customize parameters suitable for alignment (eg, by selecting an appropriate algorithm).
本发明的目标标志物也包括上述基因的起始位点上游5kb和末端位点下游5kb经非酶 促法转化(如亚硫酸氢盐转化)后的对应区域或经酶促方法处理(如甲基化敏感限制酶处理)后的对应区域。The target markers of the present invention also include the corresponding regions of the 5kb upstream of the start site and 5kb downstream of the end site of the above-mentioned genes after non-enzymatic conversion (such as bisulfite conversion) or enzymatic treatment (such as formazan Corresponding regions after methylation-sensitive restriction enzyme treatment).
II.目标标志物的来源及制备II. Source and preparation of target markers
本文中,所述目标标志物可以来自任何感兴趣的个体的生物样品。本文所用的术语“个体”包括人类和非人类的动物。非人类动物包括所有脊椎动物,例如哺乳动物和非哺乳动物。“个体”也可以是家畜,例如牛、猪、绵羊、家禽和马;或啮齿动物,例如大鼠、小鼠;或非人类灵长类动物,例如猿、猴、恒河猴;或家养的动物,例如狗或猫。在一些实施方式中,个体是人类或非人类灵长类动物。在一些实施方式中,个体是人类。在本申请中,“个体”、“对象”和“受试者”可互换使用。Herein, the target marker can be from any biological sample of an individual of interest. As used herein, the term "subject" includes humans and non-human animals. Non-human animals include all vertebrates, such as mammals and non-mammals. "Subject" may also be a livestock such as cattle, pigs, sheep, poultry, and horses; or a rodent such as a rat, mouse; or a non-human primate such as an ape, monkey, rhesus monkey; or a domesticated Animals, such as dogs or cats. In some embodiments, the individual is a human or non-human primate. In some embodiments, the individual is a human. In this application, "individual", "subject" and "subject" are used interchangeably.
应理解,上述第I部分给出的序列为人的序列。当涉及非人动物的序列时,可采用现有技术容易地确定上述基因在非人动物基因组中的对应位置和对应序列。It should be understood that the sequences given in Section I above are human sequences. When the sequences of non-human animals are involved, the corresponding positions and corresponding sequences of the above-mentioned genes in the non-human animal genome can be easily determined by using existing technologies.
本文所用的术语“生物样品”是指获自或衍生自个体的生物组合物,其包含基于物理、生化、化学和/或生理特征待表征或待识别的细胞和/或其他分子实体(例如DNA)。生物样品包括但不限于通过本领域技术人员已知的任何方法获得的个体的细胞、组织、器官和/或生物体液。在一些实施方式中,所述生物样品选自下组:组织学切片、组织活检、石蜡包埋的组织、体液、手术切除样本、分离的血细胞、分离自血液的细胞,及其任意组合。在一些实施方式中,所述体液选自下组:全血、血清、血浆,及其任意组合。选择最适合的样品将取决于情境的性质。在一些实施方式中,所述生物样品为个体的全血。在一些实施方式中,所述生物样品为个体的血浆。本领域技术人员知道从全血制备血浆的各种方法。例如,在一些实施方式中,血浆通过将来自个体的全血离心一次、两次、三次、四次、五次或更多次来获得。在一些实施方式中,所述生物样品是甲状腺结节活检物,优选是细针穿刺活检物。The term "biological sample" as used herein refers to a biological composition obtained or derived from an individual comprising cells and/or other molecular entities (such as DNA) to be characterized or identified based on physical, biochemical, chemical and/or physiological characteristics ). A biological sample includes, but is not limited to, cells, tissues, organs and/or biological fluids of an individual obtained by any method known to those skilled in the art. In some embodiments, the biological sample is selected from the group consisting of histological sections, tissue biopsies, paraffin-embedded tissues, body fluids, surgical resection samples, isolated blood cells, cells isolated from blood, and any combination thereof. In some embodiments, the body fluid is selected from the group consisting of whole blood, serum, plasma, and any combination thereof. Selection of the most suitable sample will depend on the nature of the situation. In some embodiments, the biological sample is whole blood of an individual. In some embodiments, the biological sample is plasma from an individual. Various methods of preparing plasma from whole blood are known to those skilled in the art. For example, in some embodiments, plasma is obtained by centrifuging whole blood from an individual one, two, three, four, five or more times. In some embodiments, the biological sample is a biopsy of a thyroid nodule, preferably a fine needle aspiration biopsy.
待检测的DNA可分离自所述生物样品。可以通过使用本领域已知的各种方法从生物样品中分离和纯化出待检测的DNA。可使用市售试剂盒来进行分离和纯化。例如,通过以下方式从细胞和组织中分离DNA:在高度变性和还原条件下裂解原材料、部分使用蛋白质降解酶、纯化通过苯酚/氯仿提取工艺获得的核酸组分,并通过渗析或乙醇沉淀从水相中回收核酸(参见例如Sambrook,J.,Fritsch,E.F.in T.Maniatis,C S H,Molecular Cloning,1989)。又例如,现在有许多试剂体系特别适用于从琼脂糖凝胶中纯化DNA片段、从细菌裂解物中分离质粒DNA,以及从血液、组织或细胞培养物中分离较长链的核酸(基因组DNA、总细胞RNA)。许多这些可商购的纯化体系中是基于相当众所周知的原理,即,在不同离液盐的溶液的存在下将核酸与矿物载体相结合。在这些体系中,细磨的玻璃粉、硅藻土或硅胶 的悬浮液被用作载体材料。在例如US7888006B2和EP1626085A1中描述了从生物样品中分离和纯化DNA的一些其他方法。在方法之间进行选择将受到几个因素的影响,包括时间、费用和所需的DNA数量。The DNA to be detected can be isolated from said biological sample. The DNA to be detected can be isolated and purified from a biological sample by using various methods known in the art. Isolation and purification can be performed using commercially available kits. For example, DNA is isolated from cells and tissues by lysis of raw materials under highly denaturing and reducing conditions, partial use of protein-degrading enzymes, purification of nucleic acid fractions obtained by phenol/chloroform extraction processes, and separation from water by dialysis or ethanol precipitation. Nucleic acids are recovered in phase (see for example Sambrook, J., Fritsch, E.F. in T. Maniatis, CSH, Molecular Cloning, 1989). As another example, there are now many reagent systems that are particularly suitable for the purification of DNA fragments from agarose gels, the isolation of plasmid DNA from bacterial lysates, and the isolation of longer chains of nucleic acids (genomic DNA, total cellular RNA). Many of these commercially available purification systems are based on the fairly well-known principle of binding nucleic acids to mineral supports in the presence of solutions of various chaotropic salts. In these systems, suspensions of finely ground glass powder, diatomaceous earth or silica gel are used as support materials. Some other methods of isolating and purifying DNA from biological samples are described in eg US7888006B2 and EP1626085A1. Choosing between methods will be influenced by several factors, including time, expense, and the amount of DNA required.
在一些实施方式中,生物样品中包含的DNA包括基因组DNA。本文所用的术语“基因组DNA”是指包含细胞或生物体的完整基因组及其片段或部分的DNA。基因组DNA是来源于个体的大段DNA(例如长于大约10、20、30、40、50、60、70、80、90、100、200或300kb),并且可以具有天然修饰,例如DNA甲基化。In some embodiments, the DNA contained in the biological sample comprises genomic DNA. The term "genomic DNA" as used herein refers to DNA comprising the complete genome of a cell or organism as well as fragments or parts thereof. Genomic DNA is a large stretch of DNA (e.g., longer than about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or 300 kb) derived from an individual and may have natural modifications, such as DNA methylation .
在一些实施方式中,生物样品中包含的DNA包括细胞DNA。本文所用的术语“细胞DNA”是指存在于细胞内的DNA,或从体内细胞中获取DNA并在体外分离、或以其他方式在体外操作,只要该DNA未从体内细胞中移除。In some embodiments, the DNA contained in the biological sample includes cellular DNA. As used herein, the term "cellular DNA" refers to DNA present in a cell, or DNA obtained from a cell in vivo and isolated in vitro, or otherwise manipulated in vitro, so long as the DNA is not removed from the cell in vivo.
在一些实施方式中,生物样品中包含的DNA包括细胞外游离DNA。本文所用的术语“细胞外游离DNA”是指在体内的细胞外存在的DNA片段。该术语也可以被用于指代获取自体内的细胞外来源并在体外分离、或操作的DNA片段。细胞外游离DNA中的DNA片段通常具有约100到200bp的长度,推测与被包裹于核小体的DNA片段的长度有关。细胞外游离DNA(cfDNA)包括例如细胞外游离胎儿DNA和循环肿瘤DNA。细胞外游离胎儿DNA在孕妇的体内(例如血液)中循环,代表胎儿基因组,而循环肿瘤DNA在癌症患者的体内(例如血液)中循环。在一些实施方式中,细胞外游离DNA可基本上不含个体的细胞DNA。例如,所述细胞外游离DNA可包含小于约1,000ng/mL、小于约100ng/mL、小于约10ng/mL、小于约1ng/mL的细胞DNA。In some embodiments, the DNA contained in the biological sample includes cell-free extracellular DNA. The term "free extracellular DNA" as used herein refers to DNA fragments present outside cells in vivo. The term may also be used to refer to a segment of DNA obtained from an extracellular source in vivo and isolated, or manipulated in vitro. DNA fragments in extracellular free DNA usually have a length of about 100 to 200 bp, presumably related to the length of DNA fragments wrapped in nucleosomes. Cell-free extracellular DNA (cfDNA) includes, for example, cell-free extracellular fetal DNA and circulating tumor DNA. Cell-free extracellular fetal DNA circulates in the body (eg, blood) of pregnant women and represents the fetal genome, whereas circulating tumor DNA circulates in the body (eg, blood) of cancer patients. In some embodiments, cell-free extracellular DNA can be substantially free of the individual's cellular DNA. For example, the cell-free extracellular DNA can comprise less than about 1,000 ng/mL, less than about 100 ng/mL, less than about 10 ng/mL, less than about 1 ng/mL of cellular DNA.
可以通过使用本领域已知的常规技术来制备细胞外游离DNA。例如,可以通过以约200-20,000g、约200-10,000g、约200-5,000g、约300-4000g等的速度离心血液样品约3-30分钟、约3-15分钟、约3-10分钟、约3-5分钟来获得血液样品的细胞外游离DNA。例如,在一些实施方式中,可以通过将个体的血浆或血清离心一、二、三、四、五次或更多次来获得血液样本的细胞外游离DNA。在一些实施方式中,为了从包含可溶性DNA的无细胞组分中分离细胞及其片段,可以通过微滤来获得所述生物样品。通常来说,微滤可以通过使用过滤器来进行,例如,0.1微米~0.45微米的膜过滤器,诸如0.22微米的膜过滤器。Extracellular episomal DNA can be prepared by using conventional techniques known in the art. For example, blood samples can be centrifuged at about 200-20,000 g, about 200-10,000 g, about 200-5,000 g, about 300-4000 g, etc., for about 3-30 minutes, about 3-15 minutes, about 3-10 minutes , about 3-5 minutes to obtain the extracellular DNA of the blood sample. For example, in some embodiments, cell-free extracellular DNA from a blood sample can be obtained by centrifuging the individual's plasma or serum one, two, three, four, five or more times. In some embodiments, the biological sample may be obtained by microfiltration in order to separate cells and fragments thereof from cell-free fractions comprising soluble DNA. Generally, microfiltration can be performed by using a filter, for example, a 0.1-0.45 micron membrane filter, such as a 0.22 micron membrane filter.
在一些实施方式中,使用商购的DNA提取产品从全血、血清或血浆中提取细胞外游离DNA用于分析。这种提取方法据称对循环DNA的回收率高(>50%),某些产品(例如Qiagen生产的QIAamp Circulating Nucleic Acid Kit)据称可提取小尺寸的DNA片段。所使用的典型样品量为1-5mL血清或血浆。In some embodiments, cell-free extracellular DNA is extracted from whole blood, serum, or plasma for analysis using commercially available DNA extraction products. This extraction method is reported to provide high recovery (>50%) of circulating DNA, and some products (such as the QIAamp Circulating Nucleic Acid Kit from Qiagen) are reported to extract DNA fragments of small size. Typical sample volumes used are 1-5 mL of serum or plasma.
在一些实施方式中,细胞外游离DNA包括循环肿瘤DNA。循环肿瘤DNA(“ctDNA”)是与细胞无关的体液(例如血液、尿液、唾液、痰、粪便、胸膜液、脑脊液等)中肿瘤来 源的片段化DNA。通常,ctDNA高度片段化,平均长度约为150个碱基对。ctDNA通常包括体液(例如血浆)中细胞外游离DNA的极小部分,例如ctDNA可能构成血浆DNA的不到约10%。通常,该百分比小于约1%,例如小于约0.5%或小于约0.01%。另外,血浆DNA的总量通常非常低,例如约10ng/mL血浆。ctDNA的数量因人而异,并且取决于肿瘤的类型、位置,对于癌性肿瘤,则取决于癌症的阶段。但是,ctDNA通常在体液中非常罕见,只能通过极其敏感和特异性的技术进行检测。检测ctDNA可能有助于检测和诊断肿瘤、指导肿瘤特异性治疗、监测治疗以及监测癌症的缓解。In some embodiments, cell-free extracellular DNA includes circulating tumor DNA. Circulating tumor DNA ("ctDNA") is fragmented DNA of tumor origin in body fluids (eg, blood, urine, saliva, sputum, feces, pleural fluid, cerebrospinal fluid, etc.) that are not associated with cells. Typically, ctDNA is highly fragmented, with an average length of approximately 150 base pairs. ctDNA typically comprises a very small fraction of cell-free extracellular DNA in bodily fluids such as plasma, eg ctDNA may constitute less than about 10% of plasma DNA. Typically, this percentage is less than about 1%, such as less than about 0.5% or less than about 0.01%. Additionally, the total amount of plasma DNA is usually very low, eg, about 10 ng/mL plasma. The amount of ctDNA varies from person to person and depends on the type of tumor, its location and, in the case of cancerous tumors, the stage of the cancer. However, ctDNA is usually very rare in body fluids and can only be detected by extremely sensitive and specific techniques. Detection of ctDNA may be useful in detecting and diagnosing tumors, guiding tumor-specific therapy, monitoring therapy, and monitoring cancer remission.
III.碱基转化III. Base conversion
本文中,DNA甲基化是(例如,通过DNA甲基转移酶的作用)将甲基添加到DNA分子上(例如,添加至DNA分子的一个或多个胞嘧啶碱基)的生物学过程。在哺乳动物中,DNA甲基化出现于胞嘧啶-磷酸-鸟嘌呤(CpG)二核苷酸(即“CpG位点”)的5’位置,当其出现在基因的启动子或第一个外显子中的5’-CpG-3’二核苷酸中时,会导致基因的表观遗传失活。已充分证明了DNA甲基化在调节基因表达、肿瘤发生、以及其他遗传和表观遗传疾病中起重要作用。Herein, DNA methylation is the biological process of adding a methyl group to a DNA molecule (eg, to one or more cytosine bases of the DNA molecule) (eg, by the action of a DNA methyltransferase). In mammals, DNA methylation occurs at the 5' position of cytosine-phosphate-guanine (CpG) dinucleotide (ie, "CpG site"), when it occurs at the promoter or first When present in the 5'-CpG-3' dinucleotide in the exon, it can lead to epigenetic inactivation of the gene. It has been well documented that DNA methylation plays an important role in regulating gene expression, tumorigenesis, and other genetic and epigenetic diseases.
如本文所用,术语“甲基化的胞嘧啶残基”是指胞嘧啶残基的衍生物,其中一个甲基连接至胞嘧啶环的碳原子上(例如C5)。术语“未甲基化的胞嘧啶残基”是指未衍生化的胞嘧啶残基,其中与“甲基化的胞嘧啶残基”相反,在胞嘧啶环的碳原子(例如C5)上没有甲基连接。其内的胞嘧啶残基被甲基化的CpG位点就是甲基化的CpG位点,而其内的胞嘧啶残基未被甲基化的CpG位点是未甲基化的CpG位点。As used herein, the term "methylated cytosine residue" refers to a derivative of a cytosine residue in which a methyl group is attached to a carbon atom of the cytosine ring (eg, C5). The term "unmethylated cytosine residue" refers to an underivatized cytosine residue in which, in contrast to "methylated cytosine residue", there is no Methyl linkage. A CpG site in which cytosine residues are methylated is a methylated CpG site, and a CpG site in which cytosine residues are not methylated is an unmethylated CpG site .
如本文所述,DNA或RNA的碱基之间可发生转化。本文所述“转化”、“胞嘧啶转化”或“CT转化”是利用非酶促或酶促方法处理DNA,将未修饰的胞嘧啶碱基(cytosine,C)转化为不与鸟嘌呤(G)结合的碱基(例如尿嘧啶碱基(uracil,U))的过程。一些试剂能够区分DNA中的未甲基化和甲基化的CpG位点,从而获得经处理的DNA。该试剂可以选择性地作用于未甲基化的胞嘧啶残基,但不能显著地作用于甲基化的胞嘧啶残基。或者该试剂可以选择性地作用于甲基化的胞嘧啶残基,而不显著地作用于未甲基化的胞嘧啶残基。例如,一些试剂可以选择性地将未甲基化的胞嘧啶残基转化为尿嘧啶、胸腺嘧啶或杂交上与胞嘧啶不同的另一碱基,而甲基化的胞嘧啶残基依然处于未转化状态;又例如,一些试剂可以选择性地切割甲基化的残基,或者选择性地切割未甲基化的残基。由此,原始DNA以取决于是否被甲基化的方式转化为经处理的DNA,从而可以通过其杂交行为将经处理的DNA与原始DNA区分开。As described herein, conversions can occur between bases of DNA or RNA. "Conversion", "cytosine conversion" or "CT conversion" as used herein refers to the use of non-enzymatic or enzymatic methods to treat DNA to convert unmodified cytosine bases (cytosine, C) into guanine (G ) combined base (such as uracil base (uracil, U)) process. Some reagents are able to distinguish between unmethylated and methylated CpG sites in DNA, resulting in processed DNA. This reagent acts selectively on unmethylated cytosine residues but not significantly on methylated cytosine residues. Alternatively, the reagent may act selectively on methylated cytosine residues but not significantly on unmethylated cytosine residues. For example, some reagents can selectively convert unmethylated cytosine residues to uracil, thymine, or another base that hybridizes differently from cytosine, while methylated cytosine residues remain unmethylated. Transformation state; as another example, some reagents can selectively cleave methylated residues, or selectively cleave unmethylated residues. As a result, the original DNA is converted into processed DNA in a manner dependent on whether it is methylated or not, so that the processed DNA can be distinguished from the original DNA by its hybridization behavior.
如本文所用,“经处理的DNA”、“经处理的序列”、“经处理的片段”是指已经用能够区 分DNA、核酸序列、基因片段中的未甲基化和甲基化的CpG位点的试剂处理后的DNA、核酸序列、基因片段。As used herein, "processed DNA", "processed sequence", "processed fragment" refers to a CpG site that has been treated to be able to distinguish between unmethylated and methylated DNA, nucleic acid sequences, gene fragments DNA, nucleic acid sequences, and gene fragments treated with the spot reagents.
更具体而言,可采用非酶促或酶促方法进行胞嘧啶转化。示例性地,非酶促方法包括亚硫酸氢盐或重硫酸盐处理。在一些实施方式中,非酶促方法所用的试剂包括亚硫酸氢盐试剂。如本文所用,术语“亚硫酸氢盐试剂”是指,例如本申请所公开的可用于区分甲基化和未甲基化的CpG二核苷酸序列的包括亚硫酸氢盐、亚硫酸氢根离子或其任意组合的试剂。在本申请中,用亚硫酸氢盐试剂处理DNA也被描述为“亚硫酸氢盐反应”或“亚硫酸氢盐处理”,指的是转化未甲基化的胞嘧啶残基的反应,特别是在亚硫酸氢根离子存在的情况下,核酸中未甲基化的胞嘧啶残基被转化为尿嘧啶碱基、胸腺嘧啶碱基或在杂交行为上与胞嘧啶不同的其他碱基,而其中甲基化的胞嘧啶残基未被显著地转化。换言之,亚硫酸氢盐处理可用于区分甲基化的CpG二核苷酸和未甲基化的CpG二核苷酸。Frommer,M.,et al.,Proc Natl Acad Sci USA 89(1992)1827-31和Grigg,G.,Clark,S.,Bioessays 16(1994)431-6中详细描述了用于检测甲基化的胞嘧啶残基的亚硫酸氢盐反应。亚硫酸氢盐反应包括脱氨基步骤和脱磺酸基步骤(参见Grigg and Clark,同上)。“甲基化的胞嘧啶残基未被显著地转化”这一陈述,不排除非常小的百分比(例如,小于0.1%、小于0.2%、小于0.3%、小于0.4%、小于0.5%、小于0.6%、小于0.7%、小于0.8%、小于0.9%、小于1%、小于2%、小于3%、小于4%、小于5%、小于6%、小于7%、小于8%、小于9%、小于10%、小于11%、小于12%、小于13%、小于14%、小于15%、小于16%、小于17%、小于18%、小于19%、小于20%)的甲基化的胞嘧啶残基被转化为尿嘧啶、胸腺嘧啶或在杂交行为上与胞嘧啶不同的其他碱基,尽管其意在仅仅转化未甲基化的胞嘧啶残基。More specifically, cytosine conversion can be performed using non-enzymatic or enzymatic methods. Exemplary, non-enzymatic methods include bisulfite or bisulfate treatment. In some embodiments, reagents used in non-enzymatic methods include bisulfite reagents. As used herein, the term "bisulfite reagent" refers to, for example, those disclosed herein that can be used to distinguish between methylated and unmethylated CpG dinucleotide sequences, including bisulfite, bisulfite, ions or any combination thereof. In this application, the treatment of DNA with a bisulfite reagent is also described as a "bisulfite reaction" or "bisulfite treatment" and refers to a reaction that converts unmethylated cytosine residues, especially is the conversion of unmethylated cytosine residues in nucleic acids to uracil bases, thymine bases, or other bases that differ in hybridization behavior from cytosine in the presence of bisulfite ions, while Therein methylated cytosine residues were not significantly converted. In other words, bisulfite treatment can be used to distinguish methylated CpG dinucleotides from unmethylated CpG dinucleotides. Frommer, M., et al., Proc Natl Acad Sci USA 89 (1992) 1827-31 and Grigg, G., Clark, S., Bioessays 16 (1994) 431-6 described in detail for the detection of methylation Bisulfite reaction of cytosine residues. The bisulfite reaction involves a deamination step and a desulfonation step (see Grigg and Clark, supra). The statement "methylated cytosine residues are not significantly converted" does not exclude very small percentages (e.g., less than 0.1%, less than 0.2%, less than 0.3%, less than 0.4%, less than 0.5%, less than 0.6 %, less than 0.7%, less than 0.8%, less than 0.9%, less than 1%, less than 2%, less than 3%, less than 4%, less than 5%, less than 6%, less than 7%, less than 8%, less than 9%, Less than 10%, less than 11%, less than 12%, less than 13%, less than 14%, less than 15%, less than 16%, less than 17%, less than 18%, less than 19%, less than 20%) of methylated cells Pyrimidine residues are converted to uracil, thymine, or other bases that hybridize differently from cytosine, although it is intended that only unmethylated cytosine residues be converted.
在例如参考Frommer M.,et al.(同上)或Grigg and Clark(同上)的情况下(它们公开了亚硫酸氢盐处理的基本参数),本领域技术人员知道如何进行亚硫酸氢盐处理,特别是脱氨基步骤和脱磺酸基步骤。孵育时间和温度对脱氨基效率的影响、以及影响DNA降解的参数都已公开。The person skilled in the art knows how to carry out the bisulphite treatment, in the case of e.g. reference to Frommer M., et al. (supra) or Grigg and Clark (supra), which disclose the basic parameters of the bisulphite treatment, In particular the deamination step and the desulfonation step. The effect of incubation time and temperature on deamination efficiency, as well as parameters affecting DNA degradation, have been published.
在一些实施方式中,所述亚硫酸氢盐试剂选自下组:亚硫酸氢铵、亚硫酸氢钠、亚硫酸氢钾、亚硫酸氢钙、亚硫酸氢镁、亚硫酸氢铝、亚硫酸氢根离子,及其任意组合。在一些实施方式中,所述亚硫酸氢盐试剂是亚硫酸氢钠。在一些实施方式中,亚硫酸氢盐试剂是可商购的,例如,MethylCode TM Bisulfite Conversion Kit、EpiMark TM Bisulfite Conversion Kit、EpiJET TM Bisulfite Conversion Kit、EZDNAMethylation-Gold TM Kit等。在一些实施方式中,根据试剂盒的使用说明书进行亚硫酸氢盐反应。 In some embodiments, the bisulfite reagent is selected from the group consisting of ammonium bisulfite, sodium bisulfite, potassium bisulfite, calcium bisulfite, magnesium bisulfite, aluminum bisulfite, sulfurous acid Hydrogen ions, and any combination thereof. In some embodiments, the bisulfite reagent is sodium bisulfite. In some embodiments, bisulfite reagents are commercially available, eg, MethylCode Bisulfite Conversion Kit, EpiMark Bisulfite Conversion Kit, EpiJET Bisulfite Conversion Kit, EZDNAMethylation-Gold Kit, and the like. In some embodiments, the bisulfite reaction is performed according to the kit's instructions.
示例性的酶促方法包括脱氨酶处理,以及使用试剂选择性地切割未甲基化的残基但不切割甲基化的残基,或者选择性地切割甲基化的残基但不切割未甲基化的残基。优选地, 所述试剂是甲基化敏感限制酶(MSRE)。Exemplary enzymatic methods include deaminase treatment, and the use of reagents that selectively cleave unmethylated residues but not methylated residues, or selectively cleave methylated residues but not cleave Unmethylated residues. Preferably, the reagent is a methylation sensitive restriction enzyme (MSRE).
术语“甲基化敏感限制酶”是指根据其识别位点的甲基化状态而选择性地消化核酸的酶。对于当识别位点未被甲基化或半甲基化时才特异剪切的限制酶来说,当识别位点被甲基化时,不会发生剪切,或以显著降低的效率剪切。对于当识别位点被甲基化时才特异剪切的限制酶来说,当识别位点未被甲基化时,不会发生剪切,或以显著降低的效率剪切。在一些实施方式中,甲基化敏感限制酶的识别序列含有CG二核苷酸(例如cgcg或cccggg)。在一些实施方式中,当该CG二核苷酸中的胞嘧啶在C5碳原子处被甲基化时,甲基化敏感限制酶不进行剪切。The term "methylation-sensitive restriction enzyme" refers to an enzyme that selectively digests a nucleic acid based on the methylation status of its recognition site. For restriction enzymes that specifically cleave when the recognition site is unmethylated or hemimethylated, when the recognition site is methylated, cleavage does not occur, or cleaves at a significantly reduced efficiency . For restriction enzymes that specifically cleave when the recognition site is methylated, when the recognition site is not methylated, cleavage does not occur, or cleaves at a significantly reduced efficiency. In some embodiments, the recognition sequence for a methylation sensitive restriction enzyme contains a CG dinucleotide (eg, cgcg or cccggg). In some embodiments, when the cytosine in the CG dinucleotide is methylated at the C5 carbon atom, the methylation-sensitive restriction enzyme does not cleavage.
示例性的MSRE选自下组:HpaII酶、SalI酶、
Figure PCTCN2022137459-appb-000002
酶、ScrFI酶、BbeI酶、NotI酶、SmaI酶、XmaI酶、MboI酶、BstBI酶、ClaI酶、MluI酶、NaeI酶、NarI酶、PvuI酶、SacII酶、HhaI酶及其任意组合。
Exemplary MSREs are selected from the group consisting of HpaII enzymes, SalI enzymes,
Figure PCTCN2022137459-appb-000002
Enzymes, ScrFI enzymes, BbeI enzymes, NotI enzymes, SmaI enzymes, XmaI enzymes, MboI enzymes, BstBI enzymes, ClaI enzymes, MluI enzymes, NaeI enzymes, Narl enzymes, PvuI enzymes, SacII enzymes, HhaI enzymes, and any combination thereof.
使用本领域已知的方法,使用能区分目标区域内的甲基化的CpG二核苷酸和未甲基化的CpG二核苷酸的甲基化敏感限制酶或包含甲基化敏感限制酶的一系列限制酶试剂来确定甲基化,例如但不限于,差异性甲基化杂交(“DMH”)。Using methods known in the art, use a methylation-sensitive restriction enzyme that distinguishes between methylated and unmethylated CpG dinucleotides in the region of interest or that includes a methylation-sensitive restriction enzyme A range of restriction enzyme reagents can be used to determine methylation, such as, but not limited to, differential methylation hybridization ("DMH").
在一些实施方式中,生物样品中的DNA可以在用甲基化敏感限制酶处理之前被切割。这样的方法是本领域已知的,并且可以既包括物理方式也包括酶促方式。特别优选的是使用一种或多种对甲基化不敏感的并且其识别位点富含AT并且不包含CG二核苷酸的限制酶。使用此类酶使得DNA片段中的CpG位点和CpG富集区域得以保存。在一些实施方式中,此类限制酶选自MseI酶、BfaI酶、Csp6I15酶、Tru1I酶、Tru9I酶、MaeI酶、XspI酶及其任意组合。In some embodiments, DNA in a biological sample can be cleaved prior to treatment with a methylation-sensitive restriction enzyme. Such methods are known in the art and may include both physical and enzymatic means. It is particularly preferred to use one or more restriction enzymes which are insensitive to methylation and whose recognition sites are AT-rich and do not contain CG dinucleotides. The use of such enzymes results in the preservation of CpG sites and CpG-rich regions in DNA fragments. In some embodiments, such restriction enzymes are selected from the group consisting of Msel enzyme, BfaI enzyme, Csp6I15 enzyme, Trull enzyme, Tru9I enzyme, MaeI enzyme, XspI enzyme, and any combination thereof.
经转化的DNA任选经纯化。适用于本文的DNA纯化方法本领域周知。The transformed DNA is optionally purified. DNA purification methods suitable for use herein are well known in the art.
IV.定量分析IV. Quantitative analysis
可检测本文所述任意1种、任意2种、任意3种、任意4种、任意5种、任意6种、任意7种、任意8种、任意9种、任意10种、任意11种、任意12种、任意13种、任意14种、任意15种、任意16种、任意17种、任意18种、任意19种、任意20种以上的所述目标标志物中的至少一个CpG二核苷酸的甲基化状态或甲基化水平,用以鉴别甲状腺结节良恶性。本发明所述的检测试剂和诊断试剂盒可用于所述甲基化状态或甲基化水平的检测。Any 1, any 2, any 3, any 4, any 5, any 6, any 7, any 8, any 9, any 10, any 11, any 12, any 13, any 14, any 15, any 16, any 17, any 18, any 19, any 20 or more of the target markers at least one CpG dinucleotide The methylation status or methylation level is used to distinguish benign from malignant thyroid nodules. The detection reagent and diagnostic kit of the present invention can be used for the detection of the methylation state or methylation level.
本文中,所述“良性”和“恶性”表示甲状腺结节的性质。通常,良性表现为结节生长缓慢、质地均匀、活动度好、表面光滑、呈囊性改变、无淋巴结肿大、无钙化等。恶性表现为不可控的恶性细胞生长、扩散和组织浸润。提示甲状腺结节为恶性的超声征象包括:结节的 高度大于宽度、缺乏声晕、微小钙化、边界不规则、回声减低、实性结节、结节内部血流丰富等。在一些实施方式中,恶性甲状腺结节包括甲状腺癌。Herein, the "benign" and "malignant" denote properties of thyroid nodules. Generally, benign nodules grow slowly, have uniform texture, good mobility, smooth surface, cystic changes, no lymphadenopathy, and no calcification. Malignancy manifests as uncontrolled growth, spread, and tissue infiltration of malignant cells. Ultrasound signs that suggest that a thyroid nodule is malignant include: nodules that are taller than wide, lack of halos, microcalcifications, irregular borders, hypoechoic, solid nodules, and rich blood flow within the nodules. In some embodiments, the malignant thyroid nodule comprises thyroid cancer.
本文中,“甲基化状态”是指一种或多种甲基化核苷酸碱基在核酸分子中的存在或不存在。例如,含有甲基化胞嘧啶的核酸分子被认为是甲基化的(例如核酸分子的甲基化状态是甲基化的)。不含有任何甲基化核苷酸的核酸分子被认为是未甲基化的。在一些实施方案中,如果核酸在特定基因座(例如特定单一CpG二核苷酸的基因座)或基因座特定组合处不是甲基化的,则核酸可表征为“未甲基化”,即使它在相同基因或分子的其他基因座处为甲基化的,也如此。Herein, "methylation status" refers to the presence or absence of one or more methylated nucleotide bases in a nucleic acid molecule. For example, a nucleic acid molecule that contains methylated cytosines is considered methylated (eg, the methylation status of the nucleic acid molecule is methylated). A nucleic acid molecule that does not contain any methylated nucleotides is considered unmethylated. In some embodiments, a nucleic acid may be characterized as "unmethylated" if it is not methylated at a particular locus (e.g., a locus of a particular single CpG dinucleotide) or a particular combination of loci, even if It is methylated at other loci of the same gene or molecule as well.
因此,甲基化状态描述了核酸(例如基因组序列或本文所述的目标标志物)的甲基化的状态。另外,甲基化状态是指在特定基因组基因座处的核酸区段与甲基化相关的特征。此类特征包括但不限于此DNA序列内的任何胞嘧啶(C)残基是否为甲基化的、一个或多个甲基化C残基的位置、贯穿核酸的任何特定区域的甲基化C的频率或百分比以及由于例如等位基因起点的差异而导致的甲基化等位基因差异。“甲基化状态”是指在生物样品中贯穿核酸的任何特定区域的甲基化C或未甲基化C的相对浓度、绝对浓度或模式。例如,如果核酸序列内的一个或多个胞嘧啶(C)残基是甲基化的,则其可称为“超甲基化”或具有“增加的甲基化”,而如果DNA序列内的一个或多个胞嘧啶(C)残基是未甲基化的,则其可称为“去甲基化”或具有“减少的甲基化”。同样地,如果核酸序列内的一个或多个胞嘧啶(C)残基与另一个核酸序列(例如来自不同区域或来自不同个体等)相比是甲基化的,则该序列被认为与其他核酸序列相比是超甲基化的或具有增加的甲基化。或者,如果DNA序列内的一个或多个胞嘧啶(C)残基与另一个核酸序列(例如来自不同区域或来自不同个体等)相比是未甲基化的,则该序列被认为与其他核酸序列相比是去甲基化的或具有减少的甲基化。Thus, the methylation state describes the state of methylation of a nucleic acid (eg, a genomic sequence or a marker of interest as described herein). Additionally, methylation status refers to a characteristic of a nucleic acid segment at a particular genomic locus that is associated with methylation. Such characteristics include, but are not limited to, whether any cytosine (C) residues within the DNA sequence are methylated, the location of one or more methylated C residues, methylation throughout any particular region of the nucleic acid Frequency or percentage of C and allelic differences in methylation due to, for example, differences in allelic origin. "Methylation status" refers to the relative concentration, absolute concentration or pattern of methylated C or unmethylated C throughout any particular region of nucleic acid in a biological sample. For example, if one or more cytosine (C) residues within a nucleic acid sequence are methylated, it may be said to be "hypermethylated" or have "increased methylation", whereas if within the DNA sequence One or more of the cytosine (C) residues is unmethylated, it can be said to be "demethylated" or have "reduced methylation". Likewise, if one or more cytosine (C) residues within a nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), the sequence is considered to be different from the other The nucleic acid sequence is hypermethylated or has increased methylation compared to the nucleic acid sequence. Alternatively, if one or more cytosine (C) residues within a DNA sequence are unmethylated compared to another nucleic acid sequence (e.g. from a different region or from a different individual, etc.), the sequence is considered to be different from the other The nucleic acid sequence is demethylated or has reduced methylation compared to.
本文中,甲基化水平代表一个或多个位点处于甲基化状态的比例(或百分比、份数、比率、程度)。一个区域(或一组位点)的甲基化水平是该区域中所有位点(或组中所有位点)的甲基水平的均值。因此,区域的甲基化水平上升或下降并不表示区域中所有甲基化位点的甲基化水平都上升或下降。本领域知晓将检测DNA甲基化的方法(例如简化甲基化测序)所得结果转化为甲基化水平的过程。甲基化水平可以通过例如定量分析在用甲基化敏感性限制性酶进行限制性消化后存在的完整DNA的量来确定。在该例中,如果使用定量PCR对DNA中的特定序列进行定量分析,模板DNA的量大约等于模拟处理的对照则表明该序列未高度甲基化,而模板量明显少于模拟处理的样品中的模板量则表明该序列中存在甲基化DNA。因此,如上述例子中的甲基化水平可以用作甲基化状态的定量指标。当需要将样品中序列的甲基化水平与阈值水平进行比较时,这尤其有用。Herein, the methylation level represents the proportion (or percentage, fraction, ratio, degree) of one or more sites in the methylation state. The methylation level of a region (or group of sites) is the average of the methylation levels of all sites in the region (or all sites in the group). Therefore, an increase or decrease in the methylation level of a region does not mean that the methylation level of all methylated sites in the region is increased or decreased. The process of converting the results obtained by methods for detecting DNA methylation (such as simplified methylation sequencing) into methylation levels is known in the art. Methylation levels can be determined, for example, by quantitative analysis of the amount of intact DNA present after restriction digestion with a methylation-sensitive restriction enzyme. In this example, if quantitative PCR is used to quantify a specific sequence in DNA, the amount of template DNA approximately equal to that of the mock-treated control indicates that the sequence is not hypermethylated, while the amount of template is significantly less than in the mock-treated sample A template amount of 0 indicates the presence of methylated DNA in the sequence. Therefore, the methylation level as in the above example can be used as a quantitative indicator of methylation status. This is especially useful when the methylation levels of sequences in a sample need to be compared to a threshold level.
在一个或多个实施方案中,与参考水平比较时,目标标志物的甲基化水平(例如Ct值) 升高或降低。当甲基化标志物水平(例如Ct值)满足某一阈值时,则鉴定甲状腺结节为恶性。或者,可对目标标志物的甲基化水平进行数学分析,获得得分。对于检测的样品而言,当得分大于或小于阈值,则判定结果为阳性,即甲状腺结节为恶性。本领域知晓常规数学分析的方法以及确定阈值的过程,示例性的方法是支持向量机(SVM)数学模型。例如,对于差异甲基化标志物,对训练组样本构建支持向量机(SVM),利用模型统计检测结果的准确率,敏感性和特异性以及预测值特征曲线(ROC)下面积(AUC),统计测试集样本预测得分。In one or more embodiments, the methylation level (eg, Ct value) of a marker of interest is increased or decreased when compared to a reference level. When the methylation marker level (eg, Ct value) meets a certain threshold, the thyroid nodule is identified as malignant. Alternatively, a mathematical analysis of the methylation levels of the target markers can be performed to obtain a score. For the detected samples, when the score is greater than or less than the threshold, the result is determined to be positive, that is, the thyroid nodule is malignant. Methods of conventional mathematical analysis and procedures for determining thresholds are known in the art, an exemplary method being the support vector machine (SVM) mathematical model. For example, for differentially methylated markers, a support vector machine (SVM) is constructed for the training group samples, and the accuracy, sensitivity and specificity of the test results are calculated using the model, as well as the area under the characteristic curve (ROC) (AUC) of the predicted value, Statistical test set sample prediction scores.
DNA序列(例如目标标志物)内的一个或多个CpG二核苷酸序列的甲基化水平/状态可以通过本领域中已知的各种分析方法来确定,优选为定量分析方法。示例性的分析方法包括:聚合酶链式反应,包括实时聚合酶链式反应,数字聚合酶链式反应,和基于重亚硫酸盐转化的PCR(例如甲基化特异性PCR(Methylation-specific PCR,MSP));核酸测序;全基因组甲基化测序(RRBS);简化甲基化测序;基于质量的分离(例如电泳法、质谱法);靶标捕获(例如杂交、微阵列);甲基化敏感的限制性内切酶分析法;甲基化敏感性高分辨率熔解曲线法;基于芯片的甲基化图谱分析;质谱;和荧光定量法。本文中,检测包括检测基因或位点处的任一条链。The methylation level/state of one or more CpG dinucleotide sequences within a DNA sequence (eg, target marker) can be determined by various analytical methods known in the art, preferably quantitative analytical methods. Exemplary assays include: polymerase chain reaction, including real-time polymerase chain reaction, digital polymerase chain reaction, and bisulfite conversion-based PCR (e.g., methylation-specific PCR , MSP)); nucleic acid sequencing; genome-wide methylation sequencing (RRBS); simplified methylation sequencing; mass-based separation (e.g., electrophoresis, mass spectrometry); target capture (e.g., hybridization, microarray); methylation Sensitive restriction enzyme assays; methylation-sensitive high-resolution melting curves; chip-based methylation profiling; mass spectrometry; Herein, detection includes detection of either strand at a gene or locus.
在一些实施方式中,通过实时PCR进行定量分析。实时PCR的非限制性实例包括Cottrell et al.,Nucl.Acids Res.32:e10,2003描述的HeavyMethyl TM PCR;Eads et al.,Cancer Res.59:2302-2306,1999描述的MethyLight TMPCR;Rand et al.,Nucl.Acids Res.33:e 127,2005描述的Headloop PCR。 In some embodiments, quantitative analysis is performed by real-time PCR. Non-limiting examples of real-time PCR include HeavyMethyl PCR described by Cottrell et al., Nucl. Acids Res. 32:e10, 2003; MethyLight PCR described by Eads et al., Cancer Res. 59:2302-2306, 1999; Headloop PCR as described by Rand et al., Nucl. Acids Res. 33:e 127, 2005.
如本文所用,术语“HeavyMethyl TM PCR”是指本领域公认的一种实时PCR技术,其中一个或多个不可延伸性核酸(例如,寡核苷酸)封闭物以甲基化特异性方式与亚硫酸氢盐处理的核酸结合(即,封闭物在中等至高等严谨条件下与未突变的DNA特异性结合)。使用一种或多种引物进行扩增反应,所述引物可以任选地是甲基化特异性的,但旁侧分布一个或多个封闭物。在未甲基化的核酸(即突变的DNA)存在的情况下,封闭物结合并且无PCR产物产生。使用基本上像例如Holland et al.,Proc.Natl.Acad.Sci.USA,88:7276-7280,1991所述的TaqMan TM分析方法,样品中核酸的甲基化水平得以确定。 As used herein, the term "HeavyMethyl PCR" refers to an art-recognized real-time PCR technique in which one or more non-extendable nucleic acid (e.g., oligonucleotide) blockers are combined with subgroups in a methylation-specific manner. Bisulfate-treated nucleic acids bind (ie, the blocker binds specifically to unmutated DNA under conditions of moderate to high stringency). The amplification reaction is carried out using one or more primers which may optionally be methylation specific but flanked by one or more blockers. In the presence of unmethylated nucleic acid (ie mutated DNA), the blocker binds and no PCR product is produced. The level of methylation of nucleic acids in a sample is determined using a TaqMan assay essentially as described, eg, by Holland et al., Proc. Natl. Acad. Sci. USA, 88:7276-7280, 1991.
如本文所用,术语“MethyLight TMPCR”是指基于本领域公认的一种基于荧光的实时PCR技术,其中采用了称为TaqMan TM探针的双标记荧光寡核苷酸探针,并且被设计为可同位于正向和反向扩增引物之间的富含CpG的序列杂交。所述的TaqMan TM探针包含一个荧光“报告因子部分”和“淬灭剂部分”共价结合到与TaqMan TM寡核苷酸的核苷酸相连的接头部分(例如,亚磷酰胺)。在PCR扩增过程中,与富含CpG的序列杂交的TaqMan TM探针被Taq聚合酶的5’核酸酶活性切割,从而在PCR反应过程中产生以实时方式检测的信号。在该方法中, 可以将分子信标用作可检测的探针,并且该系统不依赖于所使用的DNA聚合酶的5’-3’核酸外切酶活性(参见Mhlanga and Malmberg,Methods 25:463-471,2001)。 As used herein, the term "MethyLight PCR" refers to an art-recognized fluorescence-based real-time PCR technique in which dual-labeled fluorescent oligonucleotide probes called TaqMan probes are employed and designed to Hybridizes to CpG-rich sequences located between forward and reverse amplification primers. The TaqMan (TM) probes comprise a fluorescent "reporter moiety" and "quencher moiety" covalently bound to a linker moiety (eg, phosphoramidite) attached to the nucleotide of the TaqMan (TM) oligonucleotide. During PCR amplification, TaqMan probes that hybridize to CpG-rich sequences are cleaved by the 5' nuclease activity of Taq polymerase, resulting in a signal that is detected in real-time during the PCR reaction. In this approach, molecular beacons can be used as detectable probes, and the system is independent of the 5'-3' exonuclease activity of the DNA polymerase used (see Mhlanga and Malmberg, Methods 25: 463-471, 2001).
如本文所用,术语“Headloop PCR”是指本领域公认的一种实时PCR,其选择性地扩增目标核酸,但是通过将3’茎环延伸形成不能进一步提供扩增模板的发卡结构来抑制非扩增目标变体的扩增。As used herein, the term "Headloop PCR" refers to an art-recognized type of real-time PCR that selectively amplifies a target nucleic acid, but suppresses non-enzymatic activity by extending the 3' stem-loop to form a hairpin that does not provide further template for amplification. Amplify the amplification of the variant of interest.
在一些实施方式中,所述实时PCR是多重实时PCR。如本文所用,术语“多重”可指,通过使用一个以上的标志物,每个标志物具有至少一个不同的检测特征,例如荧光特征(例如,激发波长、发射波长、发射强度、FWHM(半峰高处的全宽度)或荧光寿命)或独特的核酸或蛋白序列特征,可以同时对多个标志物(例如多个核酸序列)的存在和/或量进行测定的分析或其他分析方法。In some embodiments, the real-time PCR is multiplex real-time PCR. As used herein, the term "multiplex" may refer to the use of more than one marker, each having at least one distinct detection characteristic, such as a fluorescence characteristic (e.g., excitation wavelength, emission wavelength, emission intensity, FWHM (half maximum Full width at height) or fluorescence lifetime) or unique nucleic acid or protein sequence characteristics, assays or other analytical methods that can simultaneously determine the presence and/or amount of multiple markers (eg, multiple nucleic acid sequences).
在一些实施方式中,通过核酸测序进行定量分析。核酸测序的示例性方法是本领域已知的,参见,例如Frommer et al.,Proc.Natl.Acad.Sci.USA 89:1827-1831,1992;Clark et al.,Nucl.Acids Res.22:2990-2997,1994。例如,通过将未使用亚硫酸氢盐处理的样品获得的序列或目标区域的已知核苷酸序列与使用亚硫酸氢盐处理的样品获得的序列进行比较,有助于鉴定DNA序列中甲基化胞嘧啶。与未处理的样品相比,在亚硫酸氢盐处理的样品中的任意胞嘧啶位点检测到的胸腺嘧啶残基都可以认为是由亚硫酸氢盐处理而引起的突变,即该位点存在甲基化的胞嘧啶。In some embodiments, quantitative analysis is performed by nucleic acid sequencing. Exemplary methods of nucleic acid sequencing are known in the art, see, e.g., Frommer et al., Proc. Natl. Acad. Sci. USA 89:1827-1831, 1992; Clark et al., Nucl. Acids Res. 22: 2990-2997,1994. For example, comparison of the sequence obtained from a sample that was not treated with bisulfite or the known nucleotide sequence of the target region with the sequence obtained from a sample that was treated with bisulfite helps to identify methyl groups in the DNA sequence. Cytosine. Thymine residues detected at any cytosine site in bisulfite-treated samples compared to untreated samples can be considered mutations caused by bisulfite treatment, i.e., the presence of Methylated cytosine.
用于测序DNA的方法是本领域已知的,包括例如双脱氧链终止法或Maxam-Gilbert法(参见Sambrook et al.,Molecular Cloning,A Laboratory Manual(2 nd Ed.,CSHP,New York1989))、焦磷酸测序(参见Uhlmann et al.,Electrophoresis,23:4072-4079,2002)、固相焦磷酸测序(参见Landegren et al.,Genome Res.,8(8):769-776,1998)、固相微测序(参见例如,Southern et al.,Genomics,13:1008-1017,1992)、采用FRET的微测序(参见例如,Chen and Kwok,Nucleic Acids Res.25:347-353,1997)、连接法测序或超深度测序(参见Marguiles et al.,Nature 437(7057):376-80(2005))。 Methods for sequencing DNA are known in the art and include, for example, the dideoxy chain termination method or the Maxam-Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual ( 2nd Ed., CSHP, New York 1989)) , pyrosequencing (seeing Uhlmann et al., Electrophoresis, 23:4072-4079,2002), solid-phase pyrosequencing (seeing Landegren et al., Genome Res., 8(8):769-776,1998), Solid-phase microsequencing (see, e.g., Southern et al., Genomics, 13:1008-1017, 1992), microsequencing using FRET (see, e.g., Chen and Kwok, Nucleic Acids Res. 25:347-353, 1997), Sequencing by ligation or ultra-deep sequencing (see Marguiles et al., Nature 437(7057):376-80 (2005)).
在一些实施方式中,通过基于质量的分离(例如电泳、质谱法)进行定量分析。例如,甲基化胞嘧啶残基的存在可以通过联合亚硫酸氢盐限制分析法(COBRA)进行检测,基本如Xiong and Laird,Nucl.Acids Res.,25:2532-2534,2001所述。这种方法利用了在使用可以选择性地突变未甲基化的胞嘧啶残基的化合物(例如,亚硫酸氢盐)处理之后,在甲基化和未甲基化的核酸之间的限制酶识别位点的差异。例如,限制性核酸内切酶Taq1切割序列TCGA,在对未甲基化核酸进行亚硫酸氢盐处理后该序列将是TTGA,因此将不被切割。然后使用本领域已知的检测手段例如电泳和/或质谱法,检测消化的和/或未消化的核酸。又例如,在用选择性突变未甲基化胞嘧啶残基的化合物处理后,基于核苷酸序列和/或二级结 构的差异,使用不同的技术来检测扩增产物中核酸差异,例如甲基化特异性单链构象分析(MS-SSCA)(Bianco et al.,Hum.Mutat.,14:289-293,1999)、甲基化特异性变性梯度凝胶电泳(MS-DGGE)(Abrams and Stanton,Methods Enzymol.,212:71-74,1992)和甲基化特异性变性高效液相色谱(MS-DHPLC)(Deng et al.,Chin.J.Cancer Res.,12:171-191,2000)。In some embodiments, quantitative analysis is performed by mass-based separation (eg, electrophoresis, mass spectrometry). For example, the presence of methylated cytosine residues can be detected by combined bisulfite restriction analysis (COBRA), essentially as described by Xiong and Laird, Nucl. Acids Res., 25:2532-2534, 2001. This method utilizes the presence of restriction enzymes between methylated and unmethylated nucleic acids following treatment with compounds that selectively mutate unmethylated cytosine residues (e.g., bisulfite) Identify differences in loci. For example, the restriction endonuclease Taq1 cuts the sequence TCGA, which after bisulfite treatment of unmethylated nucleic acids will be TTGA and thus will not be cut. Digested and/or undigested nucleic acids are then detected using detection means known in the art, such as electrophoresis and/or mass spectrometry. As another example, different techniques are used to detect nucleic acid differences in amplified products based on differences in nucleotide sequence and/or secondary structure after treatment with compounds that selectively mutate unmethylated cytosine residues, such as formazan Methylation-specific single-strand conformation analysis (MS-SSCA) (Bianco et al., Hum.Mutat., 14:289-293,1999), methylation-specific denaturing gradient gel electrophoresis (MS-DGGE) (Abrams and Stanton, Methods Enzymol.,212:71-74,1992) and methylation-specific denaturing high-performance liquid chromatography (MS-DHPLC) (Deng et al., Chin.J.Cancer Res.,12:171-191 ,2000).
在一些实施方式中,通过靶标捕获(例如杂交、微阵列)进行定量分析。通过杂交的合适的检测方法是本领域已知的,例如Southern、斑点印迹、狭缝印迹或其他核酸杂交方式(Kawai et al.,Mol.Cell.Biol.14:7421-7427,1994;Gonzalgo et al.,Cancer Res.57:594-599,1997)。在一些实施方式中,用于杂交分析的探针被可检测地标记。在一些实施方式中,用于杂交分析的基于核酸的探针是未标记的。这种未标记的探针可以固定在固体载体如微阵列上,并且可以与被可检测地标记的目标核酸分子杂交。微阵列的一个实例是甲基化特异性微阵列,其可用于区分具有转化的胞嘧啶残基的序列和具有未转化的胞嘧啶残基的序列(参见Adorjan et al.,Nucl.Acids Res.,30:e21,2002)。基于杂交的分析还可被用于用甲基化敏感的限制酶处理后的核酸。又例如,可通过寡核苷酸探针确定DNA序列内CpG二核苷酸序列的甲基化状态,所述寡核苷酸探针与PCR扩增引物同时与亚硫酸氢盐处理的DNA杂交(其中所述引物可以是甲基化特异性引物或标准引物)。In some embodiments, quantitative analysis is performed by target capture (eg, hybridization, microarray). Suitable detection methods by hybridization are known in the art, such as Southern, dot blot, slot blot or other means of nucleic acid hybridization (Kawai et al., Mol. Cell. Biol. 14:7421-7427, 1994; Gonzalgo et al. al., Cancer Res. 57:594-599, 1997). In some embodiments, probes for hybridization analysis are detectably labeled. In some embodiments, nucleic acid-based probes used in hybridization assays are unlabeled. Such unlabeled probes can be immobilized on a solid support, such as a microarray, and can hybridize to detectably labeled target nucleic acid molecules. An example of a microarray is a methylation-specific microarray, which can be used to distinguish sequences with converted cytosine residues from sequences with non-converted cytosine residues (see Adorjan et al., Nucl. Acids Res. , 30:e21, 2002). Hybridization-based analysis can also be used on nucleic acids after treatment with methylation-sensitive restriction enzymes. As another example, the methylation status of CpG dinucleotide sequences within a DNA sequence can be determined by oligonucleotide probes that hybridize to bisulfite-treated DNA simultaneously with PCR amplification primers (wherein the primers may be methylation-specific primers or standard primers).
在一些实施方式中,定量分析在检测试剂的存在下进行。如本文所用,术语“检测试剂”是在定量分析步骤中用于检测核酸的存在、不存在或量的试剂。本领域已知的各种检测试剂在本申请中都可使用。在一些实施方式中,检测试剂选自下组:荧光探针、嵌入染料、生色团标记的探针、放射性同位素标记的探针和生物素标记的探针。In some embodiments, quantitative analysis is performed in the presence of a detection reagent. As used herein, the term "detection reagent" is a reagent used to detect the presence, absence or amount of nucleic acid in a quantitative analysis step. Various detection reagents known in the art can be used in this application. In some embodiments, the detection reagent is selected from the group consisting of fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.
在一些实施方式中,定量分析包含使用定量引物对和DNA聚合酶对经处理的DNA进行扩增。如本文所用,术语“定量引物对”是指在定量分析步骤中使用的一个或多个引物对。优选地,所述定量引物对能够与所述经处理的DNA的至少9个连续核苷酸在严谨条件下、中等严谨条件下或高度严谨条件下杂交。In some embodiments, the quantitative analysis comprises amplifying the treated DNA using a quantitative primer pair and a DNA polymerase. As used herein, the term "quantitative primer pair" refers to one or more primer pairs used in a quantitative analysis step. Preferably, the quantitative primer pair is capable of hybridizing to at least 9 consecutive nucleotides of the processed DNA under stringent conditions, moderately stringent conditions or highly stringent conditions.
在一些实施方式中,所述定量分析包括基于经处理的DNA中多个CpG二核苷酸、TpG二核苷酸或CpA二核苷酸的存在或水平,确定一个或多个目标标志物的甲基化水平。在一些实施方式中,所述定量分析包括基于经处理的DNA中一个或多个CpG二核苷酸的存在或水平来确定胞嘧啶残基的甲基化水平。在一些实施方式中,所述定量分析包括基于所述经处理的DNA中一个或多个TpG二核苷酸的存在或水平来确定胞嘧啶残基的甲基化水平。在一些实施方式中,所述定量分析包括基于所述经处理的DNA中CpA二核苷酸的存在来确定胞嘧啶残基的甲基化水平。In some embodiments, the quantitative analysis comprises determining the concentration of one or more markers of interest based on the presence or level of a plurality of CpG dinucleotides, TpG dinucleotides, or CpA dinucleotides in the processed DNA. methylation levels. In some embodiments, the quantitative analysis comprises determining the level of methylation of cytosine residues based on the presence or level of one or more CpG dinucleotides in the processed DNA. In some embodiments, said quantitative analysis comprises determining the level of methylation of cytosine residues based on the presence or level of one or more TpG dinucleotides in said processed DNA. In some embodiments, said quantitative analysis comprises determining the level of methylation of cytosine residues based on the presence of CpA dinucleotides in said processed DNA.
在一些实施方式中,定量分析步骤是通过将经处理的DNA产物分为多个组分来进行的。 在一些实施方式中,对多个组分进行多个不同的定量分析测试,其中在多个组分之一中定量分析所述经处理的DNA产物(如果存在于所述组分中的话)的不同组合。在一些实施方式中,定量分析每个组分中的对照标志物。In some embodiments, the quantifying step is performed by separating the processed DNA product into fractions. In some embodiments, a plurality of different quantitative assays are performed on a plurality of fractions, wherein quantification of said processed DNA product (if present in said fraction) is performed in one of the plurality of fractions. different combinations. In some embodiments, the control markers in each fraction are quantified.
在一些实施方式中,基于预扩增的DNA通过使用MSP(参见Herman,同上)分别定量分析每个目标标志物的甲基化水平。例如,通过使用在中等和/或高度严谨条件下与未转化序列特异性杂交的一种或多种引物,仅当模板在CpG位点包含甲基化胞嘧啶时才产生扩增产物。In some embodiments, the methylation level of each marker of interest is quantified separately based on the pre-amplified DNA by using MSP (see Herman, supra). For example, by using one or more primers that specifically hybridize to non-transformed sequences under conditions of moderate and/or high stringency, amplification products are generated only when the template contains methylated cytosines at CpG sites.
在一些实施方式中,所述定量引物对被设计为扩增所述经处理的DNA产物中的至少一部分,即定量分析被设计为巢式PCR。巢式PCR是PCR的一种改进,旨在提高灵敏度和特异性。巢式PCR涉及使用两个引物组和两个连续的PCR反应。进行第一轮扩增以产生第一扩增子,并使用一个引物对进行第二轮扩增,其中一个或两个引物与由初始引物对界定的区域内的位点退火,即第二个引物对被认为是“嵌套”在第一对引物中。以这种方式,不包含正确内部序列的来自第一次PCR反应的背景扩增产物在第二次PCR反应中不再被进一步扩增。In some embodiments, the quantitative primer pair is designed to amplify at least a portion of the processed DNA product, ie the quantitative analysis is designed as a nested PCR. Nested PCR is a modification of PCR designed to increase sensitivity and specificity. Nested PCR involves the use of two primer sets and two consecutive PCR reactions. A first round of amplification is performed to generate the first amplicon, and a second round of amplification is performed using a primer pair where one or both primers anneal to a site within the region bounded by the initial primer pair, i.e. the second A primer pair is said to be "nested" within the first primer pair. In this way, background amplification products from the first PCR reaction that do not contain the correct internal sequence are not further amplified in the second PCR reaction.
通常,PCR的反应液包含Taq DNA聚合酶、PCR缓冲液、引物、探针、dNTPs、Mg 2+。优选地,Taq DNA聚合酶为热启动Taq DNA聚合酶。示例性地,Mg 2+终浓度为1.0-20.0mM;各引物浓度为100-500nM;各探针浓度为100-500nM。示例性的PCR反应条件为,95℃预变性5min;95℃变性15s,60℃退火延伸60s,50个循环。 Usually, the PCR reaction solution includes Taq DNA polymerase, PCR buffer, primers, probes, dNTPs, and Mg 2+ . Preferably, the Taq DNA polymerase is a hot-start Taq DNA polymerase. Exemplarily, the final concentration of Mg 2+ is 1.0-20.0 mM; the concentration of each primer is 100-500 nM; the concentration of each probe is 100-500 nM. Exemplary PCR reaction conditions are: pre-denaturation at 95°C for 5 minutes; denaturation at 95°C for 15s, annealing and extension at 60°C for 60s, 50 cycles.
在一些实施方案中,本发明的方法包括预扩增步骤。对目标标志物进行预扩增的目的之一是增加经处理的DNA中的目标标志物的数量。如本文所用,术语“扩增”大体上指任何能够导致分子或一组相关分子的拷贝数增加的过程。当“扩增”被用于多核苷酸分子时,是指通常从少量多核苷酸开始产生多拷贝的多核苷酸分子或多核苷酸分子的一部分的多份拷贝,其中被扩增的物质(扩增子,PCR扩增子)通常是可被检测到的。多核苷酸的扩增涵盖多个化学和酶促过程。扩增的形式包括通过聚合酶链式反应(逆转录PCR、PCR)、链置换扩增(SDA)反应、转录介导扩增(TMA)反应、基于核酸序列的扩增(NASBA)反应或连接酶链反应(LCR),从一个或几个拷贝的模板RNA或DNA分子生成多个DNA拷贝。In some embodiments, the methods of the invention include a pre-amplification step. One of the purposes of preamplifying a marker of interest is to increase the amount of the marker of interest in the processed DNA. As used herein, the term "amplification" refers generally to any process capable of resulting in an increase in the copy number of a molecule or group of related molecules. "Amplification" when applied to a polynucleotide molecule refers to the production of multiple copies of a polynucleotide molecule, or multiple copies of a portion of a polynucleotide molecule, usually starting from a small number of polynucleotides, wherein the amplified substance ( Amplicon, PCR amplicon) is usually detectable. Amplification of polynucleotides encompasses multiple chemical and enzymatic processes. Formats of amplification include by polymerase chain reaction (reverse transcription PCR, PCR), strand displacement amplification (SDA) reaction, transcription-mediated amplification (TMA) reaction, nucleic acid sequence-based amplification (NASBA) reaction or ligation Enzyme chain reaction (LCR), which generates multiple copies of DNA from one or a few copies of a template RNA or DNA molecule.
可用预扩增引物预扩增经处理的DNA中的所述目标标志物。如本文所用,术语“引物”是指这样的单链寡核苷酸,其能够在合适的条件(例如缓冲液和温度)下,在四种不同的三磷酸核苷和用于聚合的试剂(例如DNA聚合酶)的存在下,作为模板指导的DNA合成的起始点。在任何给定的情况下,引物的长度取决于例如引物的预期用途,并且通常在15至30个核苷酸的范围内。短的引物分子通常需要较低的温度才能与模板形成足够稳定的杂交复合物。引物不必反映模板的确切序列,但必须足够互补以能与该模板杂交。引物位点是 模板上与引物杂交的区域。引物对是一组引物,其包括与待扩增的序列的5’末端杂交的5’正向引物和与待扩增的序列的3’末端的互补链杂交的3’反向引物。本领域技术人员可以基于本领域的公知常识根据待扩增的标志物设计引物(参见,例如PCR Primer:A Laboratory Manual,Cold Spring Harbor Laboratories,NY,1995)。此外,一些用于设计在各种各样分析中使用的最佳探针和/或引物的软件包是公开的,例如可从美国马萨诸塞州剑桥市的基因组研究中心(the Center for Genome Research,Cambridge,Mass.,USA)获得的Primer 3。显然,在设计探针或引物时其潜在用途也应考虑在内。例如,设计用于本发明目的的引物可以包括至少一个CpG位点,或者从该引物获得的扩增产物可以包括至少一个CpG位点。用于设计检测DNA甲基化状态的引物的工具也是本领域已知的,例如MethPrimer(Li LC and Dahiya R.MethPrimer:designing primers for methylation PCRs.Bioinformatics.2002Nov;18(11):1427-31)。在本申请中,通过将预扩增引物作为引物池,经处理的DNA中的任何目标标志物(目标标志物的每至少一部分或目标标志物的一个亚区域)均可以被预扩增。The target markers in the processed DNA can be preamplified with preamplification primers. As used herein, the term "primer" refers to a single-stranded oligonucleotide capable of reacting in four different nucleoside triphosphates and reagents for polymerization ( For example, in the presence of DNA polymerase), as the starting point for template-directed DNA synthesis. In any given case, the length of the primer depends, for example, on the intended use of the primer, and typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. Primers do not have to reflect the exact sequence of the template, but must be sufficiently complementary to hybridize to the template. The primer site is the region on the template to which the primer hybridizes. A primer pair is a set of primers that includes a 5' forward primer that hybridizes to the 5' end of the sequence to be amplified and a 3' reverse primer that hybridizes to the complementary strand at the 3' end of the sequence to be amplified. Those skilled in the art can design primers according to the markers to be amplified based on common knowledge in the art (see, for example, PCR Primer: A Laboratory Manual, Cold Spring Harbor Laboratories, NY, 1995). In addition, several software packages for designing optimal probes and/or primers for use in a wide variety of assays are publicly available, for example, from the Center for Genome Research, Cambridge, MA, USA. , Mass., USA) obtained Primer 3. Obviously, its potential use should also be considered when designing probes or primers. For example, a primer designed for the purposes of the present invention may include at least one CpG site, or an amplification product obtained from the primer may include at least one CpG site. Tools for designing primers for detecting DNA methylation status are also known in the art, such as MethPrimer (Li LC and Dahiya R. MethPrimer: designing primers for methylation PCRs. Bioinformatics. 2002 Nov; 18(11): 1427-31) . In this application, any target marker (every at least a portion of the target marker or a subregion of the target marker) in the processed DNA can be pre-amplified by using the pre-amplification primer as a primer pool.
如本文所用,术语“互补”是指核苷酸或核酸之间的杂交或碱基配对,例如,双链DNA分子的两条链之间,或待测序或扩增的单链核酸上的引物结合位点和寡核苷酸引物之间。互补核苷酸通常是A和T(或A和U),或C和G。当一条链的核苷酸以最佳的方式对齐、并比较、并有适当的核苷酸插入或缺失后,与另一链的至少约80%(通常至少约90%至95%,更优选地为约98%至100%)的核苷酸配对,两条单链RNA或DNA分子就被称为是互补的。或者,当RNA链或DNA链在选择性杂交条件下与其互补序列杂交时,互补存在。通常,当在至少14至25个核苷酸的一段上具有至少约65%(优选至少约75%、更优选至少约90%)的互补性时,将发生选择性杂交。参见M.Kanehisa,Nucleic Acids Res.12:203(1984),作为参考并入本文。As used herein, the term "complementary" refers to hybridization or base pairing between nucleotides or nucleic acids, for example, between the two strands of a double-stranded DNA molecule, or a primer on a single-stranded nucleic acid to be sequenced or amplified Between the binding site and the oligonucleotide primer. Complementary nucleotides are usually A and T (or A and U), or C and G. When the nucleotides of one strand are optimally aligned and compared, with appropriate nucleotide insertions or deletions, at least about 80% (usually at least about 90% to 95%, more preferably Between about 98% and 100%) of the nucleotide pairs, two single-stranded RNA or DNA molecules are said to be complementary. Alternatively, complementarity exists when a strand of RNA or DNA hybridizes to its complement under selective hybridization conditions. Typically, selective hybridization will occur when there is at least about 65% (preferably at least about 75%, more preferably at least about 90%) complementarity over a stretch of at least 14 to 25 nucleotides. See M. Kanehisa, Nucleic Acids Res. 12:203 (1984), incorporated herein by reference.
在一些实施方式中,预扩增引物池包含至少一个甲基化特异性引物对。在一些实施方式中,预扩增引物池包含多个甲基化特异性引物对。在一些实施方式中,预扩增步骤通过甲基化特异性PCR(“MSP”)进行,甲基化特异性PCR是使用甲基化特异性引物的PCR。Herman et al.,Methylation-specific PCR:a novelPCRassay for methylation status ofCpGislands.Proc Natl Acad Sci USA.1996September 3;93(18):9821-6和United States Patent No.6,265,171中已描述了该技术(即MSP)。In some embodiments, the pool of preamplification primers comprises at least one methylation-specific primer pair. In some embodiments, the pool of preamplification primers comprises a plurality of methylation-specific primer pairs. In some embodiments, the preamplification step is performed by methylation-specific PCR ("MSP"), which is PCR using methylation-specific primers. This technique (i.e. MSP ).
如本文所用,术语“甲基化特异性引物对”是指经特异性设计以识别CpG位点以利用甲基化的差异来扩增经处理的DNA中的特定目标标志物的引物对。引物仅作用于具有特定甲基化状态或没有特定甲基化状态的分子。例如,引物可以是寡核苷酸,在严谨条件、中等严谨条件或高度严谨条件下,其可以以甲基化特异性方式与具有甲基化的特定CpG位点特异性 杂交,但不能与没有甲基化的特定CpG位点杂交。因此,引物将特异性扩增在特定CpG位点具有甲基化的目标标志物。又例如,引物可以是寡核苷酸,在严谨条件、中等严谨条件或高度严谨条件下,其可以以甲基化特异性的方式与未甲基化的特定的CpG位点特异性杂交,但是不能与甲基化的特定的CpG位点杂交。因此,引物将特异性扩增在特定CpG位点没有甲基化的目标标志物。因此,在本申请中,对在经处理的DNA内的至少一个目标标志物的预扩增中使用甲基化特异性引物,可以区分甲基化的和未甲基化的CpG位点。本申请的甲基化特异性引物对包含至少一个与亚硫酸氢盐处理的CpG二核苷酸杂交的引物。因此,所述特异性针对甲基化DNA的引物的序列包含至少一个CpG二核苷酸,并且所述特异性针对未甲基化DNA的引物的序列在CpG的C位置上包含“T”,和/或在CpG中G位置上包含“A”。As used herein, the term "methylation-specific primer pair" refers to a primer pair specifically designed to recognize CpG sites to exploit differences in methylation to amplify a specific marker of interest in processed DNA. Primers work only on molecules with or without a specific methylation state. For example, a primer can be an oligonucleotide that, under stringent conditions, moderately stringent conditions, or highly stringent conditions, can specifically hybridize in a methylation-specific manner to a specific CpG site with methylation, but not to a specific CpG site without methylation. Hybridization of methylated specific CpG sites. Thus, the primers will specifically amplify target markers that have methylation at specific CpG sites. For another example, the primer can be an oligonucleotide that can specifically hybridize to a specific unmethylated CpG site in a methylation-specific manner under stringent conditions, moderately stringent conditions or highly stringent conditions, but Does not hybridize to methylated specific CpG sites. Thus, the primers will specifically amplify target markers that are not methylated at specific CpG sites. Thus, in the present application, methylated and unmethylated CpG sites can be distinguished using methylation-specific primers in the preamplification of at least one target marker within the processed DNA. The methylation-specific primer pairs of the present application comprise at least one primer that hybridizes to a bisulfite-treated CpG dinucleotide. Thus, the sequence of the primer specific for methylated DNA comprises at least one CpG dinucleotide and the sequence of the primer specific for unmethylated DNA comprises a "T" at the C position of the CpG, and/or contain "A" at the G position in the CpG.
甲基化特异性引物对通常包含正向引物和反向引物,所述引物均包含寡核苷酸序列,所述寡核苷酸序列与所述目标标志物之一(或目标标志物的亚区域)的至少9个连续核苷酸在严谨条件下、中等严谨条件下或高度严谨条件下杂交,其中所述目标标志物之一(或目标标志物的亚区域)的至少9个连续核苷酸包含至少一个(例如1、2、3、4、5、6、7、8、9、10或更多个)CpG位点。A pair of methylation-specific primers typically comprises a forward primer and a reverse primer, each comprising an oligonucleotide sequence that is compatible with one of the target markers (or a subset of the target marker). Region) of at least 9 consecutive nucleotides hybridized under stringent conditions, moderately stringent conditions or highly stringent conditions, wherein at least 9 consecutive nucleotides of one of the target markers (or a subregion of the target marker) The acid comprises at least one (eg, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CpG site.
如本文所用,术语“杂交”可以指其中两条单链多核苷酸非共价形式结合以形成稳定的双链多核苷酸的过程。在一个方面,所得的双链多核苷酸可以是“杂交物”或“双链”。“杂交条件”中的盐浓度通常约小于1M,经常小于约500mM并且可以小于约200mM。“杂交缓冲液”包括缓冲盐溶液,例如5%SSPE,或本领域已知的其他此类缓冲液。杂交温度可以低至5℃,但是通常高于22℃,并且更为通常地高于约30℃,并且通常超过37℃。杂交通常在严谨条件下进行,即在该条件下序列将与其目标序列杂交但不与其他非互补序列杂交。严谨条件取决于序列,且在不同情况下有所不同。例如,更长的片段可能需要比短片段更高的杂交温度才能进行特异性杂交。由于其他因素可能会影响杂交的严谨性,包括碱基组成和互补链的长度,有机溶剂的存在以及碱基错配的程度,因此参数组合比单独使用任何一个参数的绝对测量更为重要。通常严谨条件被选定为比特定序列在特定的离子强度和pH下的解链温度(Tm)低约5℃。Tm可以是双链核酸分子群体中的一半被分离成单链的温度。用于计算核酸的Tm的几个方程式是本领域众所周知的。如标准参考文献所示,当核酸在1M NaCl水溶液中时,可以通过公式Tm=81.5+0.41(%G+C)计算出简单估算的Tm值(参见例如Anderson and Young,Quantitative Filter Hybridization,in Nucleic Acid Hybridization(1985))。其他参考文献(例如Allawi and SantaLucia,Jr.,Biochemistry,36:10581-94(1997))包括替代的计算方法,其计算Tm时将结构和环境以及序列特征等考虑在内。As used herein, the term "hybridization" may refer to a process in which two single-stranded polynucleotides associate non-covalently to form a stable double-stranded polynucleotide. In one aspect, the resulting double-stranded polynucleotide can be a "hybrid" or "double-stranded." Salt concentrations in "hybridization conditions" are generally less than about 1 M, often less than about 500 mM and can be less than about 200 mM. "Hybridization buffer" includes buffered saline solutions, such as 5% SSPE, or other such buffers known in the art. Hybridization temperatures can be as low as 5°C, but are typically above 22°C, and more typically above about 30°C, and often above 37°C. Hybridization is typically performed under stringent conditions, ie, conditions under which a sequence will hybridize to its target sequence but to no other noncomplementary sequences. Stringent conditions are sequence dependent and will be different in different circumstances. For example, longer fragments may require higher hybridization temperatures than short fragments for specific hybridization. Since other factors may affect the stringency of hybridization, including base composition and length of complementary strands, the presence of organic solvents, and the degree of base mismatching, the combination of parameters is more important than the absolute measurement of any one parameter alone. Generally, stringent conditions are selected to be about 5°C lower than the melting point (Tm) for the specific sequence at a specified ionic strength and pH. The Tm can be the temperature at which half of a population of double-stranded nucleic acid molecules are separated into single strands. Several equations for calculating the Tm of nucleic acids are well known in the art. As shown in standard references, when the nucleic acid is in 1M NaCl aqueous solution, a simple estimated Tm value can be calculated by the formula Tm=81.5+0.41(%G+C) (see e.g. Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985)). Other references (eg, Allawi and SantaLucia, Jr., Biochemistry, 36:10581-94 (1997)) include alternative calculation methods that take structural and environmental and sequence characteristics into account when calculating Tm.
通常,杂交物的稳定性是关于离子浓度和温度的函数。通常,杂交反应在较低严谨条 件下进行,然后在具有不同但较高严谨性的洗涤液中洗涤。示例性的严谨条件包括pH约7.0至约8.3、温度至少25℃、钠离子(或其他盐)浓度为至少0.01M至不超过1M。例如,5x SSPE(750mM NaCl,50mM磷酸钠,5mM EDTA,pH 7.4)和约30℃的温度适合于等位基因特异性杂交,尽管合适的温度取决于杂交区域的长度和/或GC含量。在一个方面,确定错配百分比的“杂交严谨性”可以如下:1)高度严谨性:0.1x SSPE,0.1%SDS,65℃;2)中等严谨性(也称为中度严谨性):0.2x SSPE,0.1%SDS,50℃;3)低严谨性:1.0x SSPE,0.1%SDS,50℃。应当理解,使用替代的缓冲剂、盐和温度可以达到相同的严谨性。例如,中等严谨杂交可以是指允许核酸分子(例如探针)结合互补核酸分子的条件。杂交的核酸分子通常具有至少60%的同一性,包括例如至少70%、75%、80%、85%、90%或95%的同一性。中等严谨条件可以是与下述条件达到同等效果的条件:42℃,50%甲酰胺,5x Denhardt溶液,5x SSPE,0.2%SDS杂交,然后用42℃,0.2x SSPE,0.2%SDS进行洗涤。高度严谨条件可以通过如下条件提供,例如,42℃,50%甲酰胺,5x Denhardt溶液,5x SSPE,0.2%SDS杂交,然后65℃,0.1x SSPE和0.1%SDS中洗涤。低严谨性杂交可以是与下述条件达到同等效果的条件:22℃,10%甲酰胺,5x Denhardt溶液,6x SSPE,0.2%SDS杂交,然后在1x SSPE,0.2%SDS中于37℃洗涤。Denhardt的溶液包含1%聚蔗糖,1%聚乙烯吡咯烷酮和1%牛血清白蛋白(BSA)。20x SSPE(氯化钠,磷酸钠,EDTA)包含3M氯化钠、0.2M磷酸钠和0.025M EDTA。其他合适的中等严谨性和高度严谨性杂交缓冲液和条件是本领域技术人员众所周知的,并且描述于例如Sambrook et al.,Molecular Cloning:A Laboratory Manual,2nd ed.,Cold Spring Harbor Press,Plainview,N.Y.(1989)和Ausubel et al.,Short Protocols in Molecular Biology,4th ed.,John Wiley&Sons(1999)。In general, hybrid stability is a function of ion concentration and temperature. Typically, hybridization reactions are performed under less stringent conditions followed by washes in different but higher stringency washes. Exemplary stringent conditions include a pH of about 7.0 to about 8.3, a temperature of at least 25°C, and a sodium ion (or other salt) concentration of at least 0.01M to no more than 1M. For example, 5x SSPE (750mM NaCl, 50mM sodium phosphate, 5mM EDTA, pH 7.4) and a temperature of about 30°C are suitable for allele-specific hybridization, although suitable temperatures depend on the length and/or GC content of the hybridization region. In one aspect, the "stringency of hybridization" for determining the percentage of mismatches can be as follows: 1) high stringency: 0.1x SSPE, 0.1% SDS, 65°C; 2) medium stringency (also known as moderate stringency): 0.2 x SSPE, 0.1% SDS, 50°C; 3) Low stringency: 1.0x SSPE, 0.1% SDS, 50°C. It is understood that the same stringency can be achieved using alternative buffers, salts and temperatures. For example, moderately stringent hybridization can refer to conditions that allow a nucleic acid molecule (eg, a probe) to bind a complementary nucleic acid molecule. Hybridizing nucleic acid molecules typically have at least 60% identity, including, for example, at least 70%, 75%, 80%, 85%, 90%, or 95% identity. Moderately stringent conditions can be conditions equivalent to the following conditions: 42°C, 50% formamide, 5x Denhardt solution, 5x SSPE, 0.2% SDS for hybridization, and then wash with 42°C, 0.2x SSPE, 0.2% SDS. Highly stringent conditions can be provided by, for example, 42°C, 50% formamide, 5x Denhardt's solution, 5x SSPE, 0.2% SDS for hybridization, followed by 65°C, 0.1x SSPE and 0.1% SDS for washing. Low stringency hybridization may be equivalent to the following conditions: 22°C, 10% formamide, 5x Denhardt's solution, 6x SSPE, 0.2% SDS, followed by washing in 1x SSPE, 0.2% SDS at 37°C. Denhardt's solution contained 1% polysucrose, 1% polyvinylpyrrolidone and 1% bovine serum albumin (BSA). 20x SSPE (Sodium Chloride, Sodium Phosphate, EDTA) contains 3M Sodium Chloride, 0.2M Sodium Phosphate, and 0.025M EDTA. Other suitable intermediate and high stringency hybridization buffers and conditions are well known to those skilled in the art and are described, for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Press, Plainview, N.Y. (1989) and Ausubel et al., Short Protocols in Molecular Biology, 4th ed., John Wiley & Sons (1999).
在一些实施方式中,预扩增引物池还包含用于扩增对照标志物的对照引物对。通常,对照标志物是具有已知特征(例如,序列已知,每个细胞的拷贝数已知)的核酸,用于与实验目标(例如,浓度未知的核酸)进行比较。对照可以是内源的,优选为不变的基因,可以将分析中的实验核酸或目标核酸相对其进行标准化。此类因为样品间差异而标准化的对照可能发生在例如样品处理,分析效率等,并且允许精确的样品间数据比较,定量分析扩增效率和偏差。In some embodiments, the pool of preamplification primers also includes a control primer pair for amplifying a control marker. Typically, a control marker is a nucleic acid of known characteristics (eg, known sequence, known copy number per cell) for comparison to an experimental target (eg, nucleic acid of unknown concentration). A control can be an endogenous, preferably invariant gene, against which the test or target nucleic acid under analysis can be normalized. Such controls for normalization due to inter-sample variability may occur, for example, in sample handling, assay efficiency, etc., and allow accurate inter-sample data comparisons, quantitative analysis of amplification efficiencies and biases.
在一些实施方案中,本发明采用RRBS技术检测感兴趣目标标志物的CpG位点的甲基化水平,然后计算该标志物的平均甲基化比例(average methylation fraction,AMF),将其作为该标志物的DNA甲基化水平。AMF的计算可如本申请实施例所述进行。In some embodiments, the present invention uses RRBS technology to detect the methylation level of the CpG site of the target marker of interest, and then calculates the average methylation fraction (average methylation fraction, AMF) of the marker, which is used as the DNA methylation levels of markers. The calculation of AMF can be performed as described in the embodiment of this application.
V.甲状腺结节良恶性鉴定V. Identification of benign and malignant thyroid nodules
本发明发现,本文所述的一个或多个目标标志物的甲基化水平可用于确定个体甲状腺 结节的良恶性。在一个或多个实施方案中,可检测本文所述目标标志物中CpG位点的甲基化水平,然后计算该目标标志物的平均甲基化比例(AMF),将其作为该标志物的DNA甲基化水平。本文中,AMF可由以下公式计算得到:The present invention has discovered that the methylation level of one or more target markers described herein can be used to determine whether an individual's thyroid nodule is benign or malignant. In one or more embodiments, the methylation level of the CpG site in the target marker described herein can be detected, and then the average methylation ratio (AMF) of the target marker can be calculated as the target marker. DNA methylation levels. In this paper, AMF can be calculated by the following formula:
Figure PCTCN2022137459-appb-000003
Figure PCTCN2022137459-appb-000003
式中,M为该标志物中总的CpG位点数,i为其中一个CpG位点,N C,i为该CpG位点甲基化的测序reads数,N T,i为该CpG位点未甲基化的测序reads数。 In the formula, M is the total number of CpG sites in the marker, i is one of the CpG sites, N C,i is the number of sequencing reads methylated at the CpG site, N T,i is the number of unmethylated CpG sites Number of methylated sequencing reads.
然后通过已构建的数学模型计算样本的恶性预测概率。恶性预测概率用逻辑回归(Logistic Regression)模型进行计算。首先,计算Sigmoid函数的输入z,由下面的公式得出:Then the predicted probability of malignancy of the sample is calculated through the constructed mathematical model. The predicted probability of malignancy was calculated using a Logistic Regression model. First, calculate the input z of the Sigmoid function, which is obtained by the following formula:
z=Σw*x+w0z=Σw*x+w0
然后,计算Sigmoid函数,计算公式如下:Then, calculate the Sigmoid function, the calculation formula is as follows:
σ(z)=1/(1+e -z) σ(z)=1/(1+e -z )
w是每个标志物的回归模型系数,w0是截距,x是计算得到的该标志物DNA甲基化水平(即AMF)。该σ值即为恶性预测概率。w is the regression model coefficient for each marker, w0 is the intercept, and x is the calculated DNA methylation level of the marker (ie, AMF). The σ value is the predicted probability of malignancy.
本文分别以每个标志物在训练集样本中的DNA甲基化水平构建训练集,以训练集的约登指数界定的阈值作为恶性预测阈值,分别获得了本文所述各个标志物的恶性预测阈值,每一个标志物的恶性预测阈值可见本文表6。In this paper, the DNA methylation level of each marker in the training set samples was used to construct the training set, and the threshold defined by the Youden index of the training set was used as the malignant prediction threshold, and the malignant prediction threshold of each marker described in this paper was obtained respectively. , the malignant prediction threshold of each marker can be seen in Table 6 of this article.
在一些实施方案中,以本文所述的单个目标标志物的甲基化水平为判断依据,根据上述公式计算得到每个样本的恶性预测概率,若该值高于表6中所示的该目标标志物的阈值,则判断为恶性,反之判断为良性。在优选的实施方案中,所述目标标志物为PRDM16基因或基因组的PRDM16序列、BIN1基因或基因组的BIN1序列、LIMK1基因或基因组的LIMK1序列、EGR3基因或基因组的EGR3序列、PPIF基因或基因组的PPIF序列、ZNF219基因或基因组的ZNF219序列、UACA基因或基因组的UACA序列、TNK1基因或基因组的TNK1序列、CEP295NL基因或基因组的CEP295NL序列、SBNO2基因或基因组的SBNO2序列、C19orf77基因或基因组的C19orf77序列、ICAM5基因或基因组的ICAM5序列、CRTC1基因或基因组的CRTC1序列、RTN4R基因或基因组的RTN4R序列、CAMK2N1基因或基因组的CAMK2N1序列、DNASE1L3基因或基因组的DNASE1L3序列、DUSP26基因或基因组的DUSP26序列、ICAM2基因或基因组的ICAM2序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、NOL4L-DT基因或基因组的NOL4L-DT序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、LSG1基因或基因组的LSG1序列和BCR基因或基因组的BCR序列。In some embodiments, based on the methylation level of a single target marker described herein, the malignant prediction probability of each sample is calculated according to the above formula, if the value is higher than the target shown in Table 6 If the threshold value of the marker is higher, it is judged as malignant, otherwise it is judged as benign. In a preferred embodiment, the target marker is the PRDM16 gene or the genome's PRDM16 sequence, the BIN1 gene or the genome's BIN1 sequence, the LIMK1 gene or the genome's LIMK1 sequence, the EGR3 gene or the genome's EGR3 sequence, the PPIF gene or the genome's sequence PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence, C19orf77 gene or genome C19orf77 sequence , ICAM5 gene or genome ICAM5 sequence, CRTC1 gene or genome CRTC1 sequence, RTN4R gene or genome RTN4R sequence, CAMK2N1 gene or genome CAMK2N1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, DUSP26 gene or genome DUSP26 sequence, ICAM2 Gene or genome ICAM2 sequence, BAIAP2 gene or genome BAIAP2 sequence, MED16 gene or genome MED16 sequence, NOL4L-DT gene or genome NOL4L-DT sequence, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence , LSG1 gene or genome LSG1 sequence and BCR gene or genome BCR sequence.
在另外一些实施方案中,可采用本文所述方法确定本文所述目标标志物中的任意2种、 任意3种、任意4种、任意5种、任意6种、任意7种、任意8种、任意9种、任意10种、任意11种、任意12种、任意13种、任意14种、任意15种、任意16种、任意17种、任意18种、任意19种、任意20种以上的目标标志物的组合作为评价依据时的恶性预测阈值,并将该目标标志物的组合作为诊断甲状腺结节良恶性的标志物(marker),测定个体样品(优选是甲状腺结节组织,如穿刺物)中该目标标志物组合的恶性预测概率,将其与该阈值比较,高于阈值指示恶性,反之指示良性。In some other embodiments, any 2, any 3, any 4, any 5, any 6, any 7, any 8, any of the target markers described herein can be determined using the methods described herein. Any 9 types, any 10 types, any 11 types, any 12 types, any 13 types, any 14 types, any 15 types, any 16 types, any 17 types, any 18 types, any 19 types, any 20 or more objects The combination of markers is used as the malignant prediction threshold when evaluating the basis, and the combination of target markers is used as a marker (marker) for diagnosing benign and malignant thyroid nodules, and individual samples (preferably thyroid nodular tissue, such as aspiration) are measured Compare the malignant prediction probability of the target marker combination with the threshold, if the threshold is higher, it indicates malignant, otherwise, it indicates benign.
在一些实施方案中,所述一个或多个目标标志物至少包括下述目标标志物中的一个或多个:EGR3基因或基因组的EGR3序列、TNK1基因或基因组的TNK1序列、DNASE1L3基因或基因组的DNASE1L3序列、DUSP26基因或基因组的DUSP26序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、C19orf77基因或基因组的C19orf77序列、NOL4L-DT基因或基因组的NOL4L-DT序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、BCR基因或基因组的BCR序列。In some embodiments, the one or more target markers include at least one or more of the following target markers: EGR3 gene or genomic EGR3 sequence, TNK1 gene or genomic TNK1 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, DUSP26 gene or genome DUSP26 sequence, BAIAP2 gene or genome BAIAP2 sequence, MED16 gene or genome MED16 sequence, C19orf77 gene or genome C19orf77 sequence, NOL4L-DT gene or genome NOL4L-DT sequence, TACSTD2 gene or Genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, BCR gene or genomic BCR sequence.
在一些实施方案中,所述一个或多个目标标志物包括:PRDM16基因或基因组的PRDM16序列、BIN1基因或基因组的BIN1序列、LIMK1基因或基因组的LIMK1序列、EGR3基因或基因组的EGR3序列、PPIF基因或基因组的PPIF序列、ZNF219基因或基因组的ZNF219序列、UACA基因或基因组的UACA序列、TNK1基因或基因组的TNK1序列、CEP295NL基因或基因组的CEP295NL序列、SBNO2基因或基因组的SBNO2序列、C19orf77基因或基因组的C19orf77序列、ICAM5基因或基因组的ICAM5序列、CRTC1基因或基因组的CRTC1序列和RTN4R基因或基因组的RTN4R序列;且阈值为0.49。In some embodiments, the one or more target markers include: PRDM16 gene or genomic PRDM16 sequence, BIN1 gene or genomic BIN1 sequence, LIMK1 gene or genomic LIMK1 sequence, EGR3 gene or genomic EGR3 sequence, PPIF Gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or genome SBNO2 sequence, C19orf77 gene or Genomic C19orf77 sequence, ICAM5 gene or genomic ICAM5 sequence, CRTC1 gene or genomic CRTC1 sequence, and RTN4R gene or genomic RTN4R sequence; and the threshold value is 0.49.
在一些实施方案中,所述一个或多个目标标志物包括:CAMK2N1基因或基因组的CAMK2N1序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、ICAM2基因或基因组的ICAM2序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、C19orf77基因或基因组的C19orf77序列和NOL4L-DT基因或基因组的NOL4L-DT序列;且阈值为0.58。In some embodiments, the one or more target markers include: CAMK2N1 gene or genomic CAMK2N1 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, ICAM2 ICAM2 sequence of the gene or genome, BAIAP2 gene or BAIAP2 sequence of the genome, MED16 gene or MED16 sequence of the genome, C19orf77 gene or C19orf77 sequence of the genome, and NOL4L-DT gene or NOL4L-DT sequence of the genome; and the threshold value is 0.58.
在一些实施方案中,所述一个或多个目标标志物包括:TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、DNASE1L3基因或基因组的DNASE1L3序列、LSG1基因或基因组的LSG1序列、EGR3基因或基因组的EGR3序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列;且阈值为0.52。In some embodiments, the one or more target markers include: TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, LSG1 gene or genomic LSG1 sequence, EGR3 EGR3 sequence of the gene or genome, TNK1 gene or TNK1 sequence of the genome, BAIAP2 gene or BAIAP2 sequence of the genome, NOL4L-DT gene or NOL4L-DT sequence of the genome, and BCR gene or BCR sequence of the genome; and the threshold value is 0.52.
在一些实施方案中,所述一个或多个目标标志物包括:TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、DNASE1L3基因或基因组的 DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列;且阈值为0.52。In some embodiments, the one or more target markers include: TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 DUSP26 sequence of the gene or genome, TNK1 gene or TNK1 sequence of the genome, BAIAP2 gene or BAIAP2 sequence of the genome, NOL4L-DT gene or NOL4L-DT sequence of the genome, and BCR gene or BCR sequence of the genome; and the threshold value is 0.52.
在一些实施方案中,所述一个或多个目标标志物包括:TACSTD2基因或基因组的TACSTD2序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列;且阈值为0.52。In some embodiments, the one or more target markers include: TACSTD2 gene or genomic TACSTD2 sequence, DNASE1L3 gene or genomic DNASE1L3 sequence, EGR3 gene or genomic EGR3 sequence, DUSP26 gene or genomic DUSP26 sequence, TNK1 TNK1 sequence of the gene or genome, BAIAP2 gene or BAIAP2 sequence of the genome, MED16 gene or MED16 sequence of the genome, NOL4L-DT gene or NOL4L-DT sequence of the genome, and BCR gene or BCR sequence of the genome; and the threshold value is 0.52.
特别优选地,所述目标标志物各自的Hg坐标本文所述,尤其如表6所示。Particularly preferably, the respective Hg coordinates of the target markers are as described herein, especially as shown in Table 6.
除上述比较之外,本领域技术人员还可以基于各种因素,例如年龄、性别、病史、家族史、症状等,来确定个体的甲状腺结节是否为恶性或为恶性的风险。In addition to the above comparison, those skilled in the art can also determine whether an individual's thyroid nodule is malignant or the risk of being malignant based on various factors, such as age, gender, medical history, family history, symptoms, etc.
VI.组合物和试剂盒VI. Compositions and kits
本发明提供一种用于甲状腺结节良恶性鉴别的甲基化检测或诊断试剂盒和诊断试剂或诊断组合物,所述试剂盒和组合物包括用于检测本文所述的一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂。根据待检测的目标标志物,试剂盒和组合物中可含有引物和/或探针分子。优选地,引物包括能够与所述待检测的目标标志物或其目标区域在严谨条件下、中等严谨条件下或高度严谨条件下杂交的引物对。引物还可包括检测内参如ACTB的引物。The present invention provides a methylation detection or diagnostic kit and diagnostic reagent or diagnostic composition for differentiating between benign and malignant thyroid nodules. A reagent that marks the methylation status or level of at least one CpG dinucleotide. Depending on the target marker to be detected, the kits and compositions may contain primer and/or probe molecules. Preferably, the primers include primer pairs capable of hybridizing to the target marker to be detected or its target region under stringent conditions, moderately stringent conditions or highly stringent conditions. Primers may also include primers to detect internal controls such as ACTB.
在一些实施方式中,所述引物被包装在单一容器内或被包装在独立容器内。在一些实施方式中,所述试剂盒进一步包含一个或多个封闭寡核苷酸。In some embodiments, the primers are packaged in a single container or packaged in separate containers. In some embodiments, the kit further comprises one or more blocking oligonucleotides.
在一些实施方式中,所述试剂盒和组合物进一步包含检测试剂。在一些实施方式中,所述检测试剂选自下组:荧光探针,嵌入染料、生色团标记的探针,放射性同位素标记的探针和生物素标记的探针。In some embodiments, the kits and compositions further comprise detection reagents. In some embodiments, the detection reagent is selected from the group consisting of fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.
在一些实施方式中,所述试剂盒还可包含DNA聚合酶和/或适合存放从个体获取的生物样品的容器。在一些实施方式中,所述试剂盒进一步含使用说明书和/或对试剂盒检测结果的解释。In some embodiments, the kit may further comprise a DNA polymerase and/or a container suitable for storing a biological sample obtained from an individual. In some embodiments, the kit further includes instructions for use and/or an explanation of the test results of the kit.
在一些实施方式中,所述试剂盒和组合物还可包括用于酶促法或非酶促法进行转化的试剂。在优选的实施方案中,所示试剂盒还包括亚硫酸氢盐试剂或甲基化敏感限制酶(MSRE)。在一些实施方式中,所述亚硫酸氢盐试剂选自下组:亚硫酸氢铵、亚硫酸氢钠、亚硫酸氢钾、亚硫酸氢钙、亚硫酸氢镁、亚硫酸氢铝、亚硫酸氢根离子,及其任意组合。在一些实施方式中,亚硫酸氢盐试剂是亚硫酸氢钠。在一些实施方式中,所述MSRE选自下组 :HpaII酶、SalI酶、
Figure PCTCN2022137459-appb-000004
酶、ScrFI酶、BbeI酶、NotI酶、SmaI酶、XmaI酶、MboI酶、BstBI酶、ClaI酶、MluI酶、NaeI酶、NarI酶、PvuI酶、SacII酶、HhaI酶及其任意组合。
In some embodiments, the kits and compositions may also include reagents for enzymatic or non-enzymatic transformations. In preferred embodiments, the kits shown also include a bisulfite reagent or a methylation sensitive restriction enzyme (MSRE). In some embodiments, the bisulfite reagent is selected from the group consisting of ammonium bisulfite, sodium bisulfite, potassium bisulfite, calcium bisulfite, magnesium bisulfite, aluminum bisulfite, sulfurous acid Hydrogen ions, and any combination thereof. In some embodiments, the bisulfite reagent is sodium bisulfite. In some embodiments, the MSRE is selected from the group consisting of HpaII enzyme, SalI enzyme,
Figure PCTCN2022137459-appb-000004
Enzymes, ScrFI enzymes, BbeI enzymes, NotI enzymes, SmaI enzymes, XmaI enzymes, MboI enzymes, BstBI enzymes, ClaI enzymes, MluI enzymes, NaeI enzymes, Narl enzymes, PvuI enzymes, SacII enzymes, HhaI enzymes, and any combination thereof.
所述试剂盒和组合物还可包括经转化的阳性标准品,其中未甲基化的胞嘧啶转化为不与鸟嘌呤结合的碱基。所述阳性标准品可以是完全甲基化的。The kits and compositions may also include converted positive standards in which unmethylated cytosines are converted to bases that do not bind guanine. The positive standard can be fully methylated.
所述试剂盒和组合物还可包括PCR反应试剂。优选地,所述PCR反应试剂包括Taq DNA聚合酶、PCR缓冲液(buffer)、dNTPs、Mg 2+The kits and compositions may also include PCR reaction reagents. Preferably, the PCR reaction reagents include Taq DNA polymerase, PCR buffer (buffer), dNTPs, Mg 2+ .
在一些实施方式中,所述试剂盒和组合物还包含可用于进行CpG位置特异性甲基化分析的标准试剂,其中所述分析包括以下一种或多种技术:MS-SNuPE、MSP、MethyLight TM、HeavyMethyl TM、COBRA和核酸测序。 In some embodiments, the kits and compositions further comprise standard reagents useful for CpG position-specific methylation analysis, wherein the analysis includes one or more of the following techniques: MS-SNuPE, MSP, MethyLight TM , HeavyMethyl TM , COBRA and nucleic acid sequencing.
在一些实施方式中,所述试剂盒和组合物可包含选自下组的额外的试剂:缓冲液(例如限制酶、PCR、保存或洗涤缓冲液)、DNA回收试剂或试剂盒(例如沉淀、超滤、亲和柱)和DNA回收组件等。In some embodiments, the kits and compositions may comprise additional reagents selected from the group consisting of buffers (e.g. restriction enzymes, PCR, storage or wash buffers), DNA recovery reagents or kits (e.g. precipitation, ultrafiltration, affinity column) and DNA recovery components, etc.
本申请的试剂盒可进一步包含在DNA富集领域中已知的以下组分的一种或几种:蛋白组分,所述蛋白选择性地结合甲基化的DNA;三链形成核酸组分,一个或多个接头,任选地在合适的溶液中;用于进行连接的物质或溶液,例如连接酶、缓冲液;用于进行柱层析的物质或溶液;用于进行免疫学为基础的富集(例如免疫沉淀)的物质或溶液;用于进行核酸扩增的物质或溶液,例如PCR;一种染料或几种染料,若适用于偶联剂,若适用于溶液中;用于进行杂交的物质或溶液;和/或用于进行洗涤步骤的物质或溶液。The kit of the present application may further comprise one or more of the following components known in the field of DNA enrichment: a protein component that selectively binds to methylated DNA; a triple-strand forming nucleic acid component , one or more linkers, optionally in a suitable solution; substances or solutions for performing the ligation, such as ligases, buffers; substances or solutions for performing column chromatography; for performing immunologically based A substance or solution for enrichment (e.g. immunoprecipitation); a substance or solution for nucleic acid amplification, such as PCR; a dye or dyes, if suitable for a coupling agent, if suitable for use in solution; for A substance or solution for carrying out hybridization; and/or a substance or solution for carrying out a washing step.
在其他一些实施方案中,本发明的组合物含有分离的核酸分子,所述分离的核酸分子选自以下的一种或多种:PRDM16基因:chr1:3155061:3155760;CAMK2N1基因:chr1:20813203:20813902;TACSTD2基因:chr1:59041615:59042314;CRABP2基因:chr1:156676274:156676973;IER5基因:chr1:181074539:181075238;ITPKB基因:chr1:226924700:226925399;ITGB1BP1基因:chr2:9526804:9527503;MTHFD2基因:chr2:74453839:74454538;BIN1基因:chr2:127822196:127822895;DNASE1L3基因:chr3:58153211:58153910;LSG1基因:chr3:194408527:194409226;SH3BP2基因:chr4:2795032:2795731;SLC12A7基因:chr5:1117661:1118360;NR2F1基因:chr5:92914797:92915496;EGR1基因:chr5:137802399:137803098;LARP1基因:chr5:154133955:154134654;RARS基因:chr5:167837780:167838479;TTBK1基因:chr6:43215063:43215762;FAM20C基因:chr7:193512:194211;CREB5基因:chr7:28449041:28449740;LIMK1基因:chr7:73508743:73509442;PRKAG2基因:chr7:151424814:151425513;SLC39A14基因:chr8:22236914:22237613;EGR3基因: chr8:22547976:22549090;DUSP26基因:chr8:34104888:34105587;AGPAT2基因:chr9:139581855:139582554;NRARP基因:chr9:140205734:140206433;EGR2基因:chr10:64578269:64578968;PPIF基因:chr10:81001706:81002405;CHID1基因:chr11:911289:911988;ADM基因:chr11:10328946:10329645;NAV2基因:chr11:19734801:19736359;EHBP1L1基因:chr11:65343387:65344086;PHLDB1基因:chr11:118479144:118479843;PARP11基因:chr12:4139935:4140634;ANO6基因:chr12:45610331:45611030;PLXNC1基因:chr12:94544076:94544775;ZNF219基因:chr14:21559748:21560447;FOXA1基因:chr14:38064876:38065575;PAPLN基因:chr14:73704629:73705328;UACA基因:chr15:70766881:70767580;PGPEP1L基因:chr15:99466242:99466941;ITPRIPL2基因:chr16:19125694:19126393;TNK1基因:chr17:7286958:7287657;RPL19基因:chr17:37366033:37366732;ICAM2基因:chr17:62076008:62076707;TMC6基因:chr17:76113226:76124091;CEP295NL基因:chr17:76879761:76880460;BAIAP2基因:chr17:79060865:79061564;TBCD基因:chr17:80744791:80745490;METRNL基因:chr17:81083812:81084511;MED16基因:chr19:883793:884492;SBNO2基因:chr19:1177275:1177974;CIRBP基因:chr19:1265690:1266389;KLF16基因:chr19:1860343:1861042;C19orf77基因:chr19:3434666:3435687;SNAPC2基因:chr19:7985709:7986408;ICAM1基因:chr19:10381317:10382016;ICAM5基因:chr19:10404832:10405531;IER2基因:chr19:13266647:13267346;ASF1B基因:chr19:14248133:14248832;CRTC1基因:chr19:18770961:18771660;ZNF536基因:chr19:31039247:31039946;LTBP4基因:chr19:41105706:41106405;NOL4L-DT基因:chr20:31162101:31162800;KCNK15基因:chr20:43374048:43374747;UCKL1基因:chr20:62588113:62588812;RTN4R基因:chr22:20226373:20227274;BCR基因:chr22:23624092:23624791;TEF基因:chr22:41771229:41771928。In some other embodiments, the composition of the present invention contains an isolated nucleic acid molecule selected from one or more of the following: PRDM16 gene: chr1:3155061:3155760; CAMK2N1 gene: chr1:20813203: 20813902; TACSTD2 gene: chr1:59041615:59042314; CRABP2 gene: chr1:156676274:156676973; IER5 gene: chr1:181074539:181075238; ITPKB gene: chr1:226924700:226925 399;ITGB1BP1gene:chr2:9526804:9527503;MTHFD2gene: chr2:74453839:74454538; BIN1 gene: chr2:127822196:127822895; DNASE1L3 gene: chr3:58153211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 gene: chr4:27 95032:2795731; SLC12A7 gene: chr5:1117661:1118360 ; NR2F1 gene: chr5:92914797:92915496; EGR1 gene: chr5:137802399:137803098; LARP1 gene: chr5:154133955:154134654; RARS gene: chr5:167837780:167838479; TTBK1 gene : chr6:43215063:43215762; FAM20C gene: chr7 :193512:194211; CREB5 gene: chr7:28449041:28449740; LIMK1 gene: chr7:73508743:73509442; PRKAG2 gene: chr7:151424814:151425513; SLC39A14 gene: chr8:22236914 :22237613; EGR3 gene: chr8:22547976:22549090; DUSP26 gene: chr8:34104888:34105587; AGPAT2 gene: chr9:139581855:139582554; NRARP gene: chr9:140205734:140206433; EGR2 gene: chr10:64578269:64578968; PPIF gene: chr 10:81001706:81002405; CHID1 gene: chr11: 911289:911988; ADM gene: chr11:10328946:10329645; NAV2 gene: chr11:19734801:19736359; EHBP1L1 gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144: 118479843; PARP11 gene: chr12:4139935:4140634; ANO6 Gene: chr12:45610331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14:21559748:21560447; FOXA1 gene: chr14:38064876:38065575; PAPLN gene: chr1 4:73704629:73705328; UACA gene: chr15:70766881 :70767580; PGPEP1L gene: chr15:99466242:99466941; ITPRIPL2 gene: chr16:19125694:19126393; TNK1 gene: chr17:7286958:7287657; RPL19 gene: chr17:37366033:3736 6732; ICAM2 gene: chr17:62076008:62076707; TMC6 gene : chr17:76113226:76124091; CEP295NL gene: chr17:76879761:76880460; BAIAP2 gene: chr17:79060865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr1 7:81083812:81084511; MED16 gene: chr19:883793: 884492; SBNO2 gene: chr19:1177275:1177974; CIRBP gene: chr19:1265690:1266389; KLF16 gene: chr19:1860343:1861042; C19orf77 gene: chr19:3434666:3435687; SNAPC 2 genes: chr19:7985709:7986408; ICAM1 gene: chr19:10381317:10382016; ICAM5 gene: chr19:10404832:10405531; IER2 gene: chr19:13266647:13267346; ASF1B gene: chr19:14248133:14248832; CRTC1 gene: chr19:187 70961:18771660; ZNF536 gene: chr19:31039247:31039946 ; LTBP4 gene: chr19:41105706:41106405; NOL4L-DT gene: chr20:31162101:31162800; KCNK15 gene: chr20:43374048:43374747; UCKL1 gene: chr20:62588113:62588812; RT N4R gene: chr22:20226373:20227274; BCR gene : chr22:23624092:23624791; TEF gene: chr22:41771229:41771928.
在其他一些实施方案中,本发明的组合物含有分离的核酸分子,所述分离的核酸分子选自以下的一种或多种:PRDM16基因:chr1:3155311:3155510;CAMK2N1基因:chr1:20813453:20813652;TACSTD2基因:chr1:59041865:59042064;CRABP2基因:chr1:156676524:156676723;IER5基因:chr1:181074789:181074988;ITPKB基因:chr1:226924950:226925149;ITGB1BP1基因:chr2:9527054:9527253;MTHFD2基因:chr2:74454089:74454288;BIN1基因:chr2:127822446:127822645;DNASE1L3基因:chr3:58153461:58153660;LSG1基因:chr3:194408777:194408976;SH3BP2基因:chr4:2795282:2795481;SLC12A7基因:chr5:1117911:1118110;NR2F1基因:chr5:92915047:92915246;EGR1基因:chr5:137802649:137802848;LARP1基因: chr5:154134205:154134404;RARS基因:chr5:167838030:167838229;TTBK1基因:chr6:43215313:43215512;FAM20C基因:chr7:193762:193961;CREB5基因:chr7:28449291:28449490;LIMK1基因:chr7:73508993:73509192;PRKAG2基因:chr7:151425064:151425263;SLC39A14基因:chr8:22237164:22237363;EGR3基因:chr8:22548226:22548425;EGR3基因:chr8:22548641:22548840;DUSP26基因:chr8:34105138:34105337;AGPAT2基因:chr9:139582105:139582304;NRARP基因:chr9:140205984:140206183;EGR2基因:chr10:64578519:64578718;PPIF基因:chr10:81001956:81002155;CHID1基因:chr11:911539:911738;ADM基因:chr11:10329196:10329395;NAV2基因:chr11:19735051:19735250;NAV2基因:chr11:19735910:19736109;EHBP1L1基因:chr11:65343637:65343836;PHLDB1基因:chr11:118479394:118479593;PARP11基因:chr12:4140185:4140384;ANO6基因:chr12:45610581:45610780;PLXNC1基因:chr12:94544326:94544525;ZNF219基因:chr14:21559998:21560197;FOXA1基因:chr14:38065126:38065325;PAPLN基因:chr14:73704879:73705078;UACA基因:chr15:70767131:70767330;PGPEP1L基因:chr15:99466492:99466691;ITPRIPL2基因:chr16:19125944:19126143;TNK1基因:chr17:7287208:7287407;RPL19基因:chr17:37366283:37366482;ICAM2基因:chr17:62076258:62076457;TMC6基因:chr17:76113476:76113675;TMC6基因:chr17:76123642:76123841;CEP295NL基因:chr17:76880011:76880210;BAIAP2基因:chr17:79061115:79061314;TBCD基因:chr17:80745041:80745240;METRNL基因:chr17:81084062:81084261;MED16基因:chr19:884043:884242;SBNO2基因:chr19:1177525:1177724;CIRBP基因:chr19:1265940:1266139;KLF16基因:chr19:1860593:1860792;C19orf77基因:chr19:3434916:3435115;C19orf77基因:chr19:3435238:3435437;SNAPC2基因:chr19:7985959:7986158;ICAM1基因:chr19:10381567:10381766;ICAM5基因:chr19:10405082:10405281;IER2基因:chr19:13266897:13267096;ASF1B基因:chr19:14248383:14248582;CRTC1基因:chr19:18771211:18771410;ZNF536基因:chr19:31039497:31039696;LTBP4基因:chr19:41105956:41106155;NOL4L-DT基因:chr20:31162351:31162550;KCNK15基因:chr20:43374298:43374497;UCKL1基因:chr20:62588363:62588562;RTN4R基因:chr22:20226623:20226822;RTN4R基因:chr22:20226825:20227024;BCR基因:chr22:23624342:23624541;TEF基因:chr22:41771479:41771678。In some other embodiments, the composition of the present invention contains an isolated nucleic acid molecule selected from one or more of the following: PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453: 20813652; TACSTD2 gene: chr1:59041865:59042064; CRABP2 gene: chr1:156676524:156676723; IER5 gene: chr1:181074789:181074988; ITPKB gene: chr1:226924950:226925 149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2:127822446:127822645; DNASE1L3 gene: chr3:58153461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 gene: chr4:27 95282:2795481; SLC12A7 gene: chr5:1117911:1118110 ; NR2F1 gene: chr5:92915047:92915246; EGR1 gene: chr5:137802649:137802848; LARP1 gene: chr5:154134205:154134404; RARS gene: chr5:167838030:167838229; TTBK1 Gene: chr6:43215313:43215512; FAM20C gene: chr7 :193762:193961; CREB5 gene: chr7:28449291:28449490; LIMK1 gene: chr7:73508993:73509192; PRKAG2 gene: chr7:151425064:151425263; SLC39A14 gene: chr8:22237164 :22237363; EGR3 gene: chr8: 22548226: 22548425; EGR3 gene: chr8:22548641:22548840; DUSP26 gene: chr8:34105138:34105337; AGPAT2 gene: chr9:139582105:139582304; NRARP gene: chr9:140205984:140206183; EGR2 gene: ch r10:64578519:64578718; PPIF gene: chr10: 81001956:81002155; CHID1 gene: chr11:911539:911738; ADM gene: chr11:10329196:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736 109;EHBP1L1 gene:chr11:65343637:65343836;PHLDB1 Gene: chr11:118479394:118479593; PARP11 gene: chr12:4140185:4140384; ANO6 gene: chr12:45610581:45610780; PLXNC1 gene: chr12:94544326:94544525; ZNF219 gene: chr1 4:21559998:21560197; FOXA1 gene: chr14:38065126 :38065325; PAPLN gene: chr14:73704879:73705078; UACA gene: chr15:70767131:70767330; PGPEP1L gene: chr15:99466492:99466691; ITPRIPL2 gene: chr16:19125944:1912 6143; TNK1 gene: chr17:7287208:7287407; RPL19 gene : chr17:37366283:37366482; ICAM2 gene: chr17:62076258:62076457; TMC6 gene: chr17:76113476:76113675; TMC6 gene: chr17:76123642:76123841; CEP295NL gene: chr17 :76880011:76880210; BAIAP2 gene: chr17:79061115: 79061314; TBCD gene: chr17:80745041:80745240; METRNL gene: chr17:81084062:81084261; MED16 gene: chr19:884043:884242; SBNO2 gene: chr19:1177525:1177724; CIRBP gene :chr19:1265940:1266139; KLF16 gene: chr19:1860593:1860792; C19orf77 gene: chr19:3434916:3435115; C19orf77 gene: chr19:3435238:3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 gene: chr19:103 81567:10381766; ICAM5 gene: chr19:10405082:10405281 ; IER2 gene: chr19:13266897:13267096; ASF1B gene: chr19:14248383:14248582; CRTC1 gene: chr19:18771211:18771410; ZNF536 gene: chr19:31039497:31039696; LTBP4 gene: chr19:41105956:41106155; NOL4L-DT gene : chr20:31162351:31162550; KCNK15 gene: chr20:43374298:43374497; UCKL1 gene: chr20:62588363:62588562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr2 2:20226825:20227024; BCR gene: chr22:23624342: 23624541; TEF gene: chr22:41771479:41771678.
本申请还包括记载有本文所述分离的核酸分子的序列和任选的其甲基化信息的介质,所述介质用于与基因甲基化测序数据比对以确定所述核酸分子的存在、含量和/或甲基化水 平。优选地,所述介质是印有所述序列和任选的其甲基化信息的卡片,例如纸质、塑料、金属、玻璃卡片。优选地,所述介质是存储有所述序列和任选的其甲基化信息和计算机程序的计算机可读介质,当所述计算机程序被处理器执行时,实现下述步骤:将样品的甲基化测序数据与所述序列比较,从而获得所述样品中含所述序列的核酸分子的存在在、含量和/或甲基化水平。The present application also includes a medium recording the sequence of the isolated nucleic acid molecule described herein and optionally its methylation information, said medium being used for comparison with gene methylation sequencing data to determine the presence of said nucleic acid molecule, content and/or methylation levels. Preferably, said medium is a card printed with said sequence and optionally its methylation information, eg paper, plastic, metal, glass card. Preferably, the medium is a computer-readable medium storing the sequence and optionally its methylation information and a computer program. When the computer program is executed by a processor, the following steps are implemented: methylation of the sample The methylation sequencing data is compared with the sequence, so as to obtain the presence, content and/or methylation level of nucleic acid molecules containing the sequence in the sample.
本申请还包括一种用于鉴别甲状腺结节良恶性的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤:(1)获取样品中选自以下一种或多种本文所述的目标标志物或其目标区域的甲基化水平,(2)根据(1)的甲基化水平判读甲状腺结节良恶性。优选地,所述获取步骤采用本申请第IV部分所述的任意一种方法进行;优选地,所述判读采取本申请第V部分所述的任意一种方法进行。The present application also includes a device for distinguishing benign from malignant thyroid nodules, the device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program The following steps are achieved: (1) obtaining the methylation level of one or more target markers or target regions selected from the following one or more target markers described herein in the sample, (2) interpreting the thyroid nodule according to the methylation level of (1) benign and malignant. Preferably, the obtaining step is performed by any one of the methods described in Section IV of the present application; preferably, the interpretation is performed by any one of the methods described in Section V of the present application.
VII.用途VII. Purpose
本申请还提供本申请所述的分离的核酸分子做为检测靶标在甲状腺结节良恶性诊断中的应用。The present application also provides the application of the isolated nucleic acid molecule described in the present application as a detection target in the diagnosis of benign and malignant thyroid nodules.
本发明甲基化标志物鉴定甲状腺癌敏感性达到100%;更重要的是,本发明鉴定细胞学分类不明确甲状腺结节的敏感性达到100%。与现有的分子诊断甲状腺结节良恶性技术相比,本发明提供的甲基化标志物和技术方案有效地解决了目前诊断技术敏感性低的问题,有助于甲状腺癌的早诊早治,以提高治愈率。The sensitivity of the methylation marker of the present invention to identify thyroid cancer reaches 100%; more importantly, the sensitivity of the present invention to identify thyroid nodules with unclear cytological classification reaches 100%. Compared with the existing molecular diagnosis techniques for benign and malignant thyroid nodules, the methylation markers and technical solutions provided by the present invention effectively solve the problem of low sensitivity of current diagnostic techniques, and contribute to early diagnosis and early treatment of thyroid cancer , to increase the cure rate.
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数按重量计算。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. For the experimental methods without specific conditions indicated in the following examples, the conventional conditions or the conditions suggested by the manufacturer are usually followed. Percentages and parts are by weight unless otherwise indicated.
实施例Example
下文将以具体实施例的方式阐述本发明。应理解,这些实施例仅仅是阐述性的,并非意图限制本发明的范围。本发明方法包括下列步骤:The present invention will be illustrated below in the form of specific examples. It should be understood that these examples are illustrative only and are not intended to limit the scope of the present invention. The inventive method comprises the following steps:
1、应用基因组简化甲基化测序(RRBS)技术检测样本上述标志物中CpG位点甲基化水平,然后计算该标志物的平均甲基化比例(average methylation fraction,AMF),将其作为该标志物的DNA甲基化水平。AMF由下面的公式得出:1. Apply genome reduced methylation sequencing (RRBS) technology to detect the methylation level of CpG sites in the above markers in samples, and then calculate the average methylation fraction (average methylation fraction, AMF) of the markers, and use it as the DNA methylation levels of markers. AMF is derived from the following formula:
Figure PCTCN2022137459-appb-000005
Figure PCTCN2022137459-appb-000005
M为该标志物中总的CpG位点数,i为其中一个CpG位点,N C,i为该CpG位点甲基化的测 序reads数,N T,i为该CpG位点未甲基化的测序reads数。 M is the total number of CpG sites in the marker, i is one of the CpG sites, N C,i is the number of sequencing reads that are methylated at the CpG site, NT,i is the unmethylated CpG site The number of sequencing reads.
2、通过已构建的数学模型计算样本的恶性预测概率。恶性预测概率用逻辑回归(Logistic Regression)模型进行计算。首先,计算Sigmoid函数的输入z,由下面的公式得出:2. Calculate the malignant prediction probability of the sample through the established mathematical model. The predicted probability of malignancy was calculated using a Logistic Regression model. First, calculate the input z of the Sigmoid function, which is obtained by the following formula:
z=Σw*x+w0z=Σw*x+w0
然后,计算Sigmoid函数,计算公式如下:Then, calculate the Sigmoid function, the calculation formula is as follows:
σ(z)=1/(1+e -z) σ(z)=1/(1+e -z )
w是每个标志物的回归模型系数,w0是截距,x是样本该标志物DNA甲基化水平。w is the regression model coefficient for each marker, w0 is the intercept, and x is the DNA methylation level of the marker in the sample.
3、根据样本恶性预测概率鉴定甲状腺结节的良恶性。基于甲基化标志物组合构建的数据模型计算得到的样本恶性预测概率阈值,即样本恶性预测概率大于阈值判断为恶性,反之判断为良性。3. Identify benign and malignant thyroid nodules based on the predicted probability of malignancy of the samples. The sample malignancy prediction probability threshold calculated based on the data model constructed by the combination of methylation markers, that is, the sample malignancy prediction probability greater than the threshold is judged as malignant, otherwise it is judged as benign.
使用已发表的科研论文的公开数据,即基因组简化甲基化测序(RRBS)原始测序数据〔Guerra A,Carrano M,Angrisani E等,Detection of RAS mutation by pyrosequencing in thyroid cytology samples,Int J Surg,12Suppl 1:S91-4,2014〕,对其中全部甲状腺良恶性结节样本,即145例手术样本(65例良性结节,80例恶性结节)进行分析,获得测序深度10x以上的CpG位点的甲基化水平;然后,根据每个甲基化标志物中检测到的CpG位点,计算AMF,将其作为该标志物DNA甲基化水平。为了同已发表的文章〔Valderrabano P,Khazai L,Leon ME等,Evaluation of ThyroSeq v2performance in thyroid nodules with indeterminate cytology,Endocr Relat Cancer,24:127-136,2017〕甲状腺良恶性结节鉴定方法进行比较,本发明同该文章样本分组一致,将其Developing cohort作为训练集(28例良性结节,39例恶性结节),将Testing cohort作为验证集1(37例良性结节,41例恶性结节)。另外,本发明收集了74例中国人甲状腺手术样本作为验证集2(37例良性结节,37例恶性结节),每例样本用RRBS技术及上述分析流程获得每个甲基化标志物中检测到的CpG位点,计算AMF,将其作为该标志物DNA甲基化水平。以下实施例中,两组验证集样本用训练集样本构建的数学模型预测受试者工作特征曲线(receiver operating characteristic curve,ROC)下面积AUC(Area Under Curve)。Use the public data of published scientific papers, that is, the original sequencing data of Genome Reduced Methylation Sequencing (RRBS) [Guerra A, Carrano M, Angrisani E, etc., Detection of RAS mutation by pyrosequencing in thyroid cytology samples, Int J Surg, 12Suppl 1:S91-4, 2014], analyzed all samples of benign and malignant thyroid nodules, that is, 145 cases of surgical samples (65 cases of benign nodules, 80 cases of malignant nodules), and obtained the CpG sites with a sequencing depth of 10x or more. Methylation level; then, based on the CpG sites detected in each methylation marker, AMF was calculated as the marker DNA methylation level. In order to compare with the published article [Valderrabano P, Khazai L, Leon ME, etc., Evaluation of ThyroSeq v2performance in thyroid nodules with indeterminate cytology, Endocr Relat Cancer, 24:127-136, 2017] identification method of benign and malignant thyroid nodules, The present invention is consistent with the article sample grouping, using its Developing cohort as a training set (28 cases of benign nodules, 39 cases of malignant nodules), and using the Testing cohort as a verification set 1 (37 cases of benign nodules, 41 cases of malignant nodules) . In addition, the present invention collected 74 cases of Chinese thyroid surgery samples as verification set 2 (37 cases of benign nodules, 37 cases of malignant nodules), each sample was obtained by RRBS technology and the above analysis process in each methylation marker For the detected CpG sites, AMF was calculated and used as the marker DNA methylation level. In the following examples, two sets of verification set samples are used to predict the area under the receiver operating characteristic curve (AUC (Area Under Curve) AUC (Area Under Curve) of the receiver operating characteristic curve (ROC) using the mathematical model constructed by the training set samples.
实施例1Example 1
以下所述甲基化标志物chr1:3155061:3155760,chr2:127822196:127822895,chr7:73508743:73509442,chr8:22547976:22548675,chr8:22548391:22549090,chr10:81001706:81002405,chr14:21559748:21560447,chr15:70766881:70767580,chr17:7286958:7287657,chr17:76879761:76880460,chr19:1177275:1177974, chr19:3434666:3435365,chr19:10404832:10405531,chr19:18770961:18771660,chr22:20226373:20227072为组合(甲基化标志物组合1)构建的模型在两组验证集样本中测试AUC。每个标志物的逻辑回归模型系数w见表1-1,逻辑回归模型截距w0是0.305。The following methylation markers chr1:3155061:3155760, chr2:127822196:127822895, chr7:73508743:73509442, chr8:22547976:22548675, chr8:22548391:22549090, chr10:8 1001706:81002405, chr14:21559748:21560447, chr15:70766881:70767580, chr17:7286958:7287657, chr17:76879761:76880460, chr19:1177275:1177974, chr19:3434666:3435365, chr19:104048 32:10405531, chr19:18770961:18771660, chr22:20226373:20227072 are combinations ( The model constructed by the methylation marker combination 1) tested AUC in two sets of validation set samples. The coefficient w of the logistic regression model for each marker is shown in Table 1-1, and the intercept w0 of the logistic regression model is 0.305.
表1-1:每个标志物的逻辑回归模型系数Table 1-1: Logistic regression model coefficients for each marker
甲基化标志物Methylation markers 基因名称gene name 逻辑回归模型系数Logistic regression model coefficients
chr1:3155061:3155760chr1:3155061:3155760 PRDM16PRDM16 0.2730.273
chr2:127822196:127822895chr2:127822196:127822895 BIN1BIN1 -0.347-0.347
chr7:73508743:73509442chr7:73508743:73509442 LIMK1LIMK1 -0.258-0.258
chr8:22547976:22548675chr8:22547976:22548675 EGR3EGR3 0.3730.373
chr8:22548391:22549090chr8:22548391:22549090 EGR3EGR3 0.2390.239
chr10:81001706:81002405chr10:81001706:81002405 PPIFPPIF -0.228-0.228
chr14:21559748:21560447chr14:21559748:21560447 ZNF219ZNF219 0.4130.413
chr15:70766881:70767580chr15:70766881:70767580 UACAUACA -0.172-0.172
chr17:7286958:7287657chr17:7286958:7287657 TNK1TNK1 -0.143-0.143
chr17:76879761:76880460chr17:76879761:76880460 CEP295NLCEP295NL -0.170-0.170
chr19:1177275:1177974chr19:1177275:1177974 SBNO2SBNO2 -0.230-0.230
chr19:3434666:3435365chr19:3434666:3435365 C19orf77C19orf77 -0.184-0.184
chr19:10404832:10405531chr19:10404832:10405531 ICAM5ICAM5 0.4230.423
chr19:18770961:18771660chr19:18770961:18771660 CRTC1CRTC1 -0.251-0.251
chr22:20226373:20227072chr22:20226373:20227072 RTN4RRTN4R -0.180-0.180
结果如图1所示。结果显示,验证集1的ROC曲线下面积为0.98,95%CI为0.97~0.99;验证集2的ROC曲线下面积为0.95,95%CI为0.93~0.97。在训练集特异性为86%、敏感性为92%时,恶性预测阈值为0.49,即恶性预测概率大于0.49判断为恶性,反之判断为良性;用该阈值对验证集1甲状腺恶性结节诊断的敏感性达到100%,特异性达到76%,PPV达到82%,NPV(negative predict value)达到100%;对验证集2甲状腺恶性结节诊断的敏感性达到87%,特异性达到84%,PPV达到84%,NPV达到86%。两组验证集样本用该甲基化标志物组合1预测得到的结果分别见表1-2和表1-3。The result is shown in Figure 1. The results showed that the area under the ROC curve of the verification set 1 was 0.98, and the 95% CI was 0.97-0.99; the area under the ROC curve of the verification set 2 was 0.95, and the 95% CI was 0.93-0.97. When the specificity of the training set is 86% and the sensitivity is 92%, the malignant prediction threshold is 0.49, that is, the malignant prediction probability greater than 0.49 is judged as malignant, otherwise it is judged as benign; The sensitivity reached 100%, the specificity reached 76%, the PPV reached 82%, and the NPV (negative predict value) reached 100%; the sensitivity for the diagnosis of malignant thyroid nodules in the verification set 2 reached 87%, the specificity reached 84%, and the PPV It reached 84%, and the NPV reached 86%. The results predicted by using the methylation marker combination 1 for the two sets of validation samples are shown in Table 1-2 and Table 1-3, respectively.
表1-2:验证集1样本用甲基化标志物组合1预测得到的结果Table 1-2: The results of the prediction of the validation set 1 samples using the methylation marker combination 1
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
137T137T 恶性vicious 0.5180.518 恶性vicious
138T138T 恶性vicious 0.6200.620 恶性vicious
141T141T 恶性vicious 0.7670.767 恶性vicious
146T146T 恶性vicious 0.6660.666 恶性vicious
148T148T 恶性vicious 0.7380.738 恶性vicious
150T150T 恶性vicious 0.5580.558 恶性vicious
152T152T 恶性vicious 0.6610.661 恶性vicious
153T153T 恶性vicious 0.7480.748 恶性vicious
154T154T 恶性vicious 0.6080.608 恶性vicious
155T155T 恶性vicious 0.7090.709 恶性vicious
158T158T 恶性vicious 0.5140.514 恶性vicious
161T161T 恶性vicious 0.7330.733 恶性vicious
162T162T 恶性vicious 0.5840.584 恶性vicious
163T163T 恶性vicious 0.7750.775 恶性vicious
164T164T 恶性vicious 0.7090.709 恶性vicious
167T167T 恶性vicious 0.6330.633 恶性vicious
168T168T 恶性vicious 0.6610.661 恶性vicious
169T169T 恶性vicious 0.6450.645 恶性vicious
170T170T 恶性vicious 0.6710.671 恶性vicious
172T172T 恶性vicious 0.6580.658 恶性vicious
175T175T 恶性vicious 0.7520.752 恶性vicious
178T178T 恶性vicious 0.7230.723 恶性vicious
179T179T 恶性vicious 0.6610.661 恶性vicious
181T181T 恶性vicious 0.6580.658 恶性vicious
182T182T 恶性vicious 0.6260.626 恶性vicious
601T601T 恶性vicious 0.7050.705 恶性vicious
602T602T 恶性vicious 0.5000.500 恶性vicious
603T603T 恶性vicious 0.6830.683 恶性vicious
605T605T 恶性vicious 0.7230.723 恶性vicious
606T606T 恶性vicious 0.7140.714 恶性vicious
607T607T 恶性vicious 0.6930.693 恶性vicious
608T608T 恶性vicious 0.6470.647 恶性vicious
609T609T 恶性vicious 0.6660.666 恶性vicious
610T610T 恶性vicious 0.6620.662 恶性vicious
611T611T 恶性vicious 0.7970.797 恶性vicious
612T612T 恶性vicious 0.5950.595 恶性vicious
613T613T 恶性vicious 0.6880.688 恶性vicious
615T615T 恶性vicious 0.6990.699 恶性vicious
616T616T 恶性vicious 0.5490.549 恶性vicious
617T617T 恶性vicious 0.6880.688 恶性vicious
619T619T 恶性vicious 0.7050.705 恶性vicious
514B514B 良性benign 0.4230.423 良性benign
516B516B 良性benign 0.4910.491 恶性vicious
519B519B 良性benign 0.4250.425 良性benign
522B522B 良性benign 0.4920.492 恶性vicious
525B525B 良性benign 0.2990.299 良性benign
531B531B 良性benign 0.5410.541 恶性vicious
534B534B 良性benign 0.5380.538 恶性vicious
542B542B 良性benign 0.4280.428 良性benign
545B545B 良性benign 0.4400.440 良性benign
546B546B 良性benign 0.3750.375 良性benign
547B547B 良性benign 0.4730.473 良性benign
548B548B 良性benign 0.4730.473 良性benign
554B554B 良性benign 0.5600.560 恶性vicious
555B555B 良性benign 0.4750.475 良性benign
556B556B 良性benign 0.4500.450 良性benign
557B557B 良性benign 0.4320.432 良性benign
558B558B 良性benign 0.4820.482 良性benign
559B559B 良性benign 0.2520.252 良性benign
563B563B 良性benign 0.4500.450 良性benign
564B564B 良性benign 0.5680.568 恶性vicious
565B565B 良性benign 0.5800.580 恶性vicious
567B567B 良性benign 0.4270.427 良性benign
568B568B 良性benign 0.4390.439 良性benign
570B570B 良性benign 0.4230.423 良性benign
571B571B 良性benign 0.4180.418 良性benign
572B572B 良性benign 0.4920.492 恶性vicious
574B574B 良性benign 0.4810.481 良性benign
575B575B 良性benign 0.4210.421 良性benign
576B576B 良性benign 0.4270.427 良性benign
578B578B 良性benign 0.4210.421 良性benign
579B579B 良性benign 0.4810.481 良性benign
580B580B 良性benign 0.3900.390 良性benign
581B581B 良性benign 0.3460.346 良性benign
582B582B 良性benign 0.4740.474 良性benign
583B583B 良性benign 0.3370.337 良性benign
614B614B 良性benign 0.5650.565 恶性vicious
620B620B 良性benign 0.2460.246 良性benign
表1-3:验证集2样本用甲基化标志物组合1预测的结果Table 1-3: Prediction results of validation set 2 samples using methylation marker combination 1
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
GFP-FR171219001GFP-FR171219001 恶性vicious 0.5980.598 恶性vicious
GFP-FR171219003GFP-FR171219003 恶性vicious 0.5980.598 恶性vicious
GFP-FR171219005GFP-FR171219005 恶性vicious 0.6380.638 恶性vicious
GFP-FR171219019GFP-FR171219019 恶性vicious 0.5870.587 恶性vicious
GFP-FR171219021GFP-FR171219021 恶性vicious 0.7190.719 恶性vicious
GFP-FR171219023GFP-FR171219023 恶性vicious 0.7420.742 恶性vicious
GFP-FR171219027GFP-FR171219027 恶性vicious 0.7360.736 恶性vicious
GFP-FR171219031GFP-FR171219031 恶性vicious 0.6740.674 恶性vicious
GFP-FR171219033GFP-FR171219033 恶性vicious 0.5940.594 恶性vicious
GFP-FR171219035GFP-FR171219035 恶性vicious 0.6520.652 恶性vicious
GFP-FR171219039GFP-FR171219039 恶性vicious 0.6860.686 恶性vicious
GFP-FR171219041GFP-FR171219041 恶性vicious 0.6560.656 恶性vicious
GFP-FR171219043GFP-FR171219043 恶性vicious 0.7160.716 恶性vicious
GFP-FR171219045GFP-FR171219045 恶性vicious 0.7230.723 恶性vicious
GFP-FR171219047GFP-FR171219047 恶性vicious 0.7410.741 恶性vicious
GFP-FR171219049GFP-FR171219049 恶性vicious 0.7480.748 恶性vicious
GFP-FR180615002GFP-FR180615002 恶性vicious 0.6090.609 恶性vicious
GFP-FR180615004GFP-FR180615004 恶性vicious 0.6840.684 恶性vicious
GFP-FR180615006GFP-FR180615006 恶性vicious 0.4420.442 良性benign
GFP-FR180615008GFP-FR180615008 恶性vicious 0.4240.424 良性benign
GFP-FR180615010GFP-FR180615010 恶性vicious 0.6530.653 恶性vicious
GFP-FR180615012GFP-FR180615012 恶性vicious 0.3560.356 良性benign
GFP-FR180615014GFP-FR180615014 恶性vicious 0.6750.675 恶性vicious
GFP-FR180615016GFP-FR180615016 恶性vicious 0.4750.475 良性benign
GFP-FR180615018GFP-FR180615018 恶性vicious 0.7230.723 恶性vicious
GFP-FR180615020GFP-FR180615020 恶性vicious 0.6230.623 恶性vicious
GFP-FR180615022GFP-FR180615022 恶性vicious 0.6740.674 恶性vicious
GFP-FR180615024GFP-FR180615024 恶性vicious 0.6320.632 恶性vicious
GFP-FR180615026GFP-FR180615026 恶性vicious 0.5990.599 恶性vicious
GFP-FR180615028GFP-FR180615028 恶性vicious 0.5630.563 恶性vicious
GFP-FR180615030GFP-FR180615030 恶性vicious 0.6320.632 恶性vicious
GFP-FR180615032GFP-FR180615032 恶性vicious 0.4510.451 良性benign
GFP-FR180615034GFP-FR180615034 恶性vicious 0.6570.657 恶性vicious
GFP-FR180615036GFP-FR180615036 恶性vicious 0.6980.698 恶性vicious
GFP-FR180615038GFP-FR180615038 恶性vicious 0.5130.513 恶性vicious
GFP-FR180713031GFP-FR180713031 恶性vicious 0.7460.746 恶性vicious
GFP-FR180713033GFP-FR180713033 恶性vicious 0.5710.571 恶性vicious
GFP-FR171230001GFP-FR171230001 良性benign 0.4760.476 良性benign
GFP-FR171230003GFP-FR171230003 良性benign 0.5660.566 恶性vicious
GFP-FR171230005GFP-FR171230005 良性benign 0.4240.424 良性benign
GFP-FR171230007GFP-FR171230007 良性benign 0.3500.350 良性benign
GFP-FR171230009GFP-FR171230009 良性benign 0.3240.324 良性benign
GFP-FR171230013GFP-FR171230013 良性benign 0.2640.264 良性benign
GFP-FR171230015GFP-FR171230015 良性benign 0.4180.418 良性benign
GFP-FR171230017GFP-FR171230017 良性benign 0.3810.381 良性benign
GFP-FR171230019GFP-FR171230019 良性benign 0.4680.468 良性benign
GFP-FR180525002GFP-FR180525002 良性benign 0.3820.382 良性benign
GFP-FR180525004GFP-FR180525004 良性benign 0.3410.341 良性benign
GFP-FR180525006GFP-FR180525006 良性benign 0.3780.378 良性benign
GFP-FR180525008GFP-FR180525008 良性benign 0.4680.468 良性benign
GFP-FR180525010GFP-FR180525010 良性benign 0.3340.334 良性benign
GFP-FR180525012GFP-FR180525012 良性benign 0.4120.412 良性benign
GFP-FR180525014GFP-FR180525014 良性benign 0.4950.495 恶性vicious
GFP-FR180525016GFP-FR180525016 良性benign 0.3200.320 良性benign
GFP-FR180525020GFP-FR180525020 良性benign 0.2290.229 良性benign
GFP-FR180525022GFP-FR180525022 良性benign 0.3400.340 良性benign
GFP-FR180525024GFP-FR180525024 良性benign 0.3990.399 良性benign
GFP-FR180525026GFP-FR180525026 良性benign 0.3250.325 良性benign
GFP-FR180525028GFP-FR180525028 良性benign 0.3700.370 良性benign
GFP-FR180525030GFP-FR180525030 良性benign 0.5050.505 恶性vicious
GFP-FR180713002GFP-FR180713002 良性benign 0.6320.632 恶性vicious
GFP-FR180713004GFP-FR180713004 良性benign 0.4220.422 良性benign
GFP-FR180713006GFP-FR180713006 良性benign 0.3300.330 良性benign
GFP-FR180713008GFP-FR180713008 良性benign 0.3940.394 良性benign
GFP-FR180713010GFP-FR180713010 良性benign 0.3600.360 良性benign
GFP-FR180713012GFP-FR180713012 良性benign 0.5940.594 恶性vicious
GFP-FR180713014GFP-FR180713014 良性benign 0.3680.368 良性benign
GFP-FR180713016GFP-FR180713016 良性benign 0.4050.405 良性benign
GFP-FR180713025GFP-FR180713025 良性benign 0.3200.320 良性benign
GFP-FR180713035GFP-FR180713035 良性benign 0.4190.419 良性benign
GFP-FR180713037GFP-FR180713037 良性benign 0.3650.365 良性benign
GFP-FR180713041GFP-FR180713041 良性benign 0.3280.328 良性benign
GFP-FR180713043GFP-FR180713043 良性benign 0.3990.399 良性benign
GFP-FR180713045GFP-FR180713045 良性benign 0.5070.507 恶性vicious
实施例2Example 2
用甲基化标志物chr1:20813203:20813902,chr3:58153211:58153910,chr8:22547976:22548675,chr8:34104888:34105587,chr17:62076008:62076707,chr17:79060865:79061564,chr19:883793:884492,chr19:3434988:3435687,chr20:31162101:31162800的组合(甲基化标志物组合2)构建的模型在两组验证集样本中测试AUC。每个标志物的逻辑回归模型系数见表2-1,逻辑回归模型截距是1.212。With methylation markers chr1:20813203:20813902, chr3:58153211:58153910, chr8:22547976:22548675, chr8:34104888:34105587, chr17:62076008:62076707, chr17: 79060865:79061564, chr19: 883793: 884492, chr19: The model constructed by the combination of 3434988:3435687, chr20:31162101:31162800 (methylation marker combination 2) tested AUC in two sets of validation set samples. The logistic regression model coefficients of each marker are shown in Table 2-1, and the logistic regression model intercept is 1.212.
表2-1:每个标志物的逻辑回归模型系数Table 2-1: Logistic regression model coefficients for each marker
标志物landmark 基因名称gene name 逻辑回归模型系数Logistic regression model coefficients
chr1:20813203:20813902chr1:20813203:20813902 CAMK2N1CAMK2N1 -0.765-0.765
chr3:58153211:58153910chr3:58153211:58153910 DNASE1L3DNASE1L3 -0.912-0.912
chr8:22547976:22548675chr8:22547976:22548675 EGR3EGR3 2.2092.209
chr8:34104888:34105587chr8:34104888:34105587 DUSP26DUSP26 0.6330.633
chr17:62076008:62076707chr17:62076008:62076707 ICAM2ICAM2 -0.536-0.536
chr17:79060865:79061564chr17:79060865:79061564 BAIAP2BAIAP2 -0.987-0.987
chr19:883793:884492chr19:883793:884492 MED16MED16 -0.588-0.588
chr19:3434988:3435687chr19:3434988:3435687 C19orf77C19orf77 -0.807-0.807
chr20:31162101:31162800chr20:31162101:31162800 NOL4L-DTNOL4L-DT -1.414-1.414
结果如图2所示。结果显示,验证集1的ROC曲线下面积为1.00,95%CI为1.00~1.00;验证集2的ROC曲线下面积为0.96,95%CI为0.95~0.98。在训练集特异性为96%,敏感性为85%时,恶性预测阈值为0.58,即恶性预测概率大于0.58判断为恶性,反之判断为良性;用该阈值对验证集1甲状腺恶性结节诊断的敏感性达到88%,特异性达到100%,PPV达到100%,NPV达到88%;对验证集2甲状腺恶性结节诊断的敏感性达到76%,特异性达到95%,PPV达到93%,NPV达到80%。两组验证集样本用甲基化标志物组合2预测结果分别见表2-2,表2-3。The result is shown in Figure 2. The results showed that the area under the ROC curve of validation set 1 was 1.00, and the 95% CI was 1.00-1.00; the area under the ROC curve of validation set 2 was 0.96, and the 95% CI was 0.95-0.98. When the specificity of the training set is 96%, and the sensitivity is 85%, the malignant prediction threshold is 0.58, that is, the malignant prediction probability is greater than 0.58, which is judged as malignant, otherwise it is judged as benign; The sensitivity reached 88%, the specificity reached 100%, the PPV reached 100%, and the NPV reached 88%; the sensitivity for the diagnosis of malignant thyroid nodules in the validation set 2 reached 76%, the specificity reached 95%, the PPV reached 93%, and the NPV up to 80%. See Table 2-2 and Table 2-3 for the prediction results of the two sets of validation set samples using the methylation marker combination 2, respectively.
表2-2:验证集1样本用甲基化标志物组合2预测的结果Table 2-2: Prediction results of validation set 1 samples using methylation marker combination 2
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
137T137T 恶性vicious 0.6120.612 恶性vicious
138T138T 恶性vicious 0.6930.693 恶性vicious
141T141T 恶性vicious 0.8750.875 恶性vicious
146T146T 恶性vicious 0.8940.894 恶性vicious
148T148T 恶性vicious 0.8610.861 恶性vicious
150T150T 恶性vicious 0.5690.569 良性benign
152T152T 恶性vicious 0.8330.833 恶性vicious
153T153T 恶性vicious 0.9080.908 恶性vicious
154T154T 恶性vicious 0.7430.743 恶性vicious
155T155T 恶性vicious 0.8300.830 恶性vicious
158T158T 恶性vicious 0.5460.546 良性benign
161T161T 恶性vicious 0.8450.845 恶性vicious
162T162T 恶性vicious 0.5690.569 良性benign
163T163T 恶性vicious 0.9200.920 恶性vicious
164T164T 恶性vicious 0.7750.775 恶性vicious
167T167T 恶性vicious 0.8170.817 恶性vicious
168T168T 恶性vicious 0.7100.710 恶性vicious
169T169T 恶性vicious 0.6640.664 恶性vicious
170T170T 恶性vicious 0.7420.742 恶性vicious
172T172T 恶性vicious 0.7020.702 恶性vicious
175T175T 恶性vicious 0.9270.927 恶性vicious
178T178T 恶性vicious 0.8860.886 恶性vicious
179T179T 恶性vicious 0.6300.630 恶性vicious
181T181T 恶性vicious 0.5930.593 恶性vicious
182T182T 恶性vicious 0.7070.707 恶性vicious
601T601T 恶性vicious 0.8930.893 恶性vicious
602T602T 恶性vicious 0.5560.556 良性benign
603T603T 恶性vicious 0.8070.807 恶性vicious
605T605T 恶性vicious 0.8250.825 恶性vicious
606T606T 恶性vicious 0.8470.847 恶性vicious
607T607T 恶性vicious 0.7230.723 恶性vicious
608T608T 恶性vicious 0.7710.771 恶性vicious
609T609T 恶性vicious 0.8270.827 恶性vicious
610T610T 恶性vicious 0.6470.647 恶性vicious
611T611T 恶性vicious 0.9110.911 恶性vicious
612T612T 恶性vicious 0.7720.772 恶性vicious
613T613T 恶性vicious 0.7840.784 恶性vicious
615T615T 恶性vicious 0.8590.859 恶性vicious
616T616T 恶性vicious 0.5630.563 良性benign
617T617T 恶性vicious 0.8380.838 恶性vicious
619T619T 恶性vicious 0.8010.801 恶性vicious
514B514B 良性benign 0.2940.294 良性benign
516B516B 良性benign 0.3900.390 良性benign
519B519B 良性benign 0.3420.342 良性benign
522B522B 良性benign 0.4450.445 良性benign
525B525B 良性benign 0.2770.277 良性benign
531B531B 良性benign 0.5000.500 良性benign
534B534B 良性benign 0.4690.469 良性benign
542B542B 良性benign 0.2900.290 良性benign
545B545B 良性benign 0.3800.380 良性benign
546B546B 良性benign 0.2580.258 良性benign
547B547B 良性benign 0.4060.406 良性benign
548B548B 良性benign 0.3700.370 良性benign
554B554B 良性benign 0.4100.410 良性benign
555B555B 良性benign 0.3970.397 良性benign
556B556B 良性benign 0.4840.484 良性benign
557B557B 良性benign 0.4580.458 良性benign
558B558B 良性benign 0.4130.413 良性benign
559B559B 良性benign 0.1300.130 良性benign
563B563B 良性benign 0.4340.434 良性benign
564B564B 良性benign 0.5210.521 良性benign
565B565B 良性benign 0.5290.529 良性benign
567B567B 良性benign 0.4730.473 良性benign
568B568B 良性benign 0.3250.325 良性benign
570B570B 良性benign 0.4960.496 良性benign
571B571B 良性benign 0.2860.286 良性benign
572B572B 良性benign 0.4200.420 良性benign
574B574B 良性benign 0.3050.305 良性benign
575B575B 良性benign 0.3630.363 良性benign
576B576B 良性benign 0.3520.352 良性benign
578B578B 良性benign 0.3640.364 良性benign
579B579B 良性benign 0.4840.484 良性benign
580B580B 良性benign 0.4210.421 良性benign
581B581B 良性benign 0.2920.292 良性benign
582B582B 良性benign 0.4100.410 良性benign
583B583B 良性benign 0.3130.313 良性benign
614B614B 良性benign 0.5080.508 良性benign
620B620B 良性benign 0.2910.291 良性benign
表2-3:验证集2样本用甲基化标志物组合2预测的结果Table 2-3: Prediction results of validation set 2 samples using methylation marker combination 2
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
GFP-FR171219001GFP-FR171219001 恶性vicious 0.6250.625 恶性vicious
GFP-FR171219003GFP-FR171219003 恶性vicious 0.6960.696 恶性vicious
GFP-FR171219005GFP-FR171219005 恶性vicious 0.8350.835 恶性vicious
GFP-FR171219019GFP-FR171219019 恶性vicious 0.6380.638 恶性vicious
GFP-FR171219021GFP-FR171219021 恶性vicious 0.7970.797 恶性vicious
GFP-FR171219023GFP-FR171219023 恶性vicious 0.8990.899 恶性vicious
GFP-FR171219027GFP-FR171219027 恶性vicious 0.8700.870 恶性vicious
GFP-FR171219031GFP-FR171219031 恶性vicious 0.7500.750 恶性vicious
GFP-FR171219033GFP-FR171219033 恶性vicious 0.6680.668 恶性vicious
GFP-FR171219035GFP-FR171219035 恶性vicious 0.9050.905 恶性vicious
GFP-FR171219039GFP-FR171219039 恶性vicious 0.8150.815 恶性vicious
GFP-FR171219041GFP-FR171219041 恶性vicious 0.6450.645 恶性vicious
GFP-FR171219043GFP-FR171219043 恶性vicious 0.9360.936 恶性vicious
GFP-FR171219045GFP-FR171219045 恶性vicious 0.8210.821 恶性vicious
GFP-FR171219047GFP-FR171219047 恶性vicious 0.8960.896 恶性vicious
GFP-FR171219049GFP-FR171219049 恶性vicious 0.7950.795 恶性vicious
GFP-FR180615002GFP-FR180615002 恶性vicious 0.5360.536 良性benign
GFP-FR180615004GFP-FR180615004 恶性vicious 0.7510.751 恶性vicious
GFP-FR180615006GFP-FR180615006 恶性vicious 0.5360.536 良性benign
GFP-FR180615008GFP-FR180615008 恶性vicious 0.4100.410 良性benign
GFP-FR180615010GFP-FR180615010 恶性vicious 0.8140.814 恶性vicious
GFP-FR180615012GFP-FR180615012 恶性vicious 0.3590.359 良性benign
GFP-FR180615014GFP-FR180615014 恶性vicious 0.8400.840 恶性vicious
GFP-FR180615016GFP-FR180615016 恶性vicious 0.4980.498 良性benign
GFP-FR180615018GFP-FR180615018 恶性vicious 0.7770.777 恶性vicious
GFP-FR180615020GFP-FR180615020 恶性vicious 0.4970.497 良性benign
GFP-FR180615022GFP-FR180615022 恶性vicious 0.8350.835 恶性vicious
GFP-FR180615024GFP-FR180615024 恶性vicious 0.5520.552 良性benign
GFP-FR180615026GFP-FR180615026 恶性vicious 0.6810.681 恶性vicious
GFP-FR180615028GFP-FR180615028 恶性vicious 0.6670.667 恶性vicious
GFP-FR180615030GFP-FR180615030 恶性vicious 0.7030.703 恶性vicious
GFP-FR180615032GFP-FR180615032 恶性vicious 0.5080.508 良性benign
GFP-FR180615034GFP-FR180615034 恶性vicious 0.8810.881 恶性vicious
GFP-FR180615036GFP-FR180615036 恶性vicious 0.7670.767 恶性vicious
GFP-FR180615038GFP-FR180615038 恶性vicious 0.5350.535 良性benign
GFP-FR180713031GFP-FR180713031 恶性vicious 0.9370.937 恶性vicious
GFP-FR180713033GFP-FR180713033 恶性vicious 0.8310.831 恶性vicious
GFP-FR171230001GFP-FR171230001 良性benign 0.2620.262 良性benign
GFP-FR171230003GFP-FR171230003 良性benign 0.4950.495 良性benign
GFP-FR171230005GFP-FR171230005 良性benign 0.3980.398 良性benign
GFP-FR171230007GFP-FR171230007 良性benign 0.2880.288 良性benign
GFP-FR171230009GFP-FR171230009 良性benign 0.1980.198 良性benign
GFP-FR171230013GFP-FR171230013 良性benign 0.4680.468 良性benign
GFP-FR171230015GFP-FR171230015 良性benign 0.3910.391 良性benign
GFP-FR171230017GFP-FR171230017 良性benign 0.2760.276 良性benign
GFP-FR171230019GFP-FR171230019 良性benign 0.2130.213 良性benign
GFP-FR180525002GFP-FR180525002 良性benign 0.4250.425 良性benign
GFP-FR180525004GFP-FR180525004 良性benign 0.3920.392 良性benign
GFP-FR180525006GFP-FR180525006 良性benign 0.3290.329 良性benign
GFP-FR180525008GFP-FR180525008 良性benign 0.6190.619 恶性vicious
GFP-FR180525010GFP-FR180525010 良性benign 0.2710.271 良性benign
GFP-FR180525012GFP-FR180525012 良性benign 0.4870.487 良性benign
GFP-FR180525014GFP-FR180525014 良性benign 0.5140.514 良性benign
GFP-FR180525016GFP-FR180525016 良性benign 0.1650.165 良性benign
GFP-FR180525020GFP-FR180525020 良性benign 0.2760.276 良性benign
GFP-FR180525022GFP-FR180525022 良性benign 0.2380.238 良性benign
GFP-FR180525024GFP-FR180525024 良性benign 0.4190.419 良性benign
GFP-FR180525026GFP-FR180525026 良性benign 0.2380.238 良性benign
GFP-FR180525028GFP-FR180525028 良性benign 0.3140.314 良性benign
GFP-FR180525030GFP-FR180525030 良性benign 0.3960.396 良性benign
GFP-FR180713002GFP-FR180713002 良性benign 0.6770.677 恶性vicious
GFP-FR180713004GFP-FR180713004 良性benign 0.3550.355 良性benign
GFP-FR180713006GFP-FR180713006 良性benign 0.1740.174 良性benign
GFP-FR180713008GFP-FR180713008 良性benign 0.2770.277 良性benign
GFP-FR180713010GFP-FR180713010 良性benign 0.4450.445 良性benign
GFP-FR180713012GFP-FR180713012 良性benign 0.4480.448 良性benign
GFP-FR180713014GFP-FR180713014 良性benign 0.1830.183 良性benign
GFP-FR180713016GFP-FR180713016 良性benign 0.3310.331 良性benign
GFP-FR180713025GFP-FR180713025 良性benign 0.1970.197 良性benign
GFP-FR180713035GFP-FR180713035 良性benign 0.2970.297 良性benign
GFP-FR180713037GFP-FR180713037 良性benign 0.3180.318 良性benign
GFP-FR180713041GFP-FR180713041 良性benign 0.3080.308 良性benign
GFP-FR180713043GFP-FR180713043 良性benign 0.4280.428 良性benign
GFP-FR180713045GFP-FR180713045 良性benign 0.2600.260 良性benign
实施例3Example 3
用甲基化标志物chr1:59041615:59042314,chr1:156676274:156676973,chr3:58153211:58153910,chr3:194408527:194409226,chr8:22547976:22548675, chr17:7286958:7287657,chr17:79060865:79061564,chr20:31162101:31162800,chr22:23624092:23624791的组合(甲基化标志物组合3)构建的模型在两组验证集样本中测试AUC。每个标志物的逻辑回归模型系数见表3-1。逻辑回归模型截距是1.681。With methylation markers chr1:59041615:59042314, chr1:156676274:156676973, chr3:58153211:58153910, chr3:194408527:194409226, chr8:22547976:22548675, ch r17:7286958:7287657, chr17:79060865:79061564, chr20: The model constructed by the combination of 31162101:31162800, chr22:23624092:23624791 (methylation marker combination 3) tested AUC in two sets of validation set samples. The logistic regression model coefficients of each marker are shown in Table 3-1. The logistic regression model intercept is 1.681.
表3-1:每个标志物的逻辑回归模型系数Table 3-1: Logistic regression model coefficients for each marker
标志物landmark 基因名称gene name 逻辑回归模型系数Logistic regression model coefficients
chr1:59041615:59042314chr1:59041615:59042314 TACSTD2TACSTD2 -1.047-1.047
chr1:156676274:156676973chr1:156676274:156676973 CRABP2CRABP2 -0.551-0.551
chr3:58153211:58153910chr3:58153211:58153910 DNASE1L3DNASE1L3 -0.622-0.622
chr3:194408527:194409226chr3:194408527:194409226 LSG1LSG1 -0.518-0.518
chr8:22547976:22548675chr8:22547976:22548675 EGR3EGR3 2.2452.245
chr17:7286958:7287657chr17:7286958:7287657 TNK1TNK1 -1.076-1.076
chr17:79060865:79061564chr17:79060865:79061564 BAIAP2BAIAP2 -0.619-0.619
chr20:31162101:31162800chr20:31162101:31162800 NOL4L-DTNOL4L-DT -1.242-1.242
chr22:23624092:23624791chr22:23624092:23624791 BCRBCR -0.673-0.673
结果如图3所示。结果显示,验证集1的ROC曲线下面积为1.00,95%CI为0.99~1.00;验证集2的ROC曲线下面积为0.97,95%CI为0.95~0.98。在训练集特异性为93%,敏感性为95%时,恶性预测阈值为0.52,即恶性预测概率大于0.52判断为恶性,反之判断为良性;用该阈值对验证集1甲状腺恶性结节诊断的敏感性达到98%,特异性达到100%,PPV达到100%,NPV达到97%;对验证集2甲状腺恶性结节诊断的敏感性达到92%,特异性达到87%,PPV达到87%,NPV达到91%。两组验证集样本用甲基化标志物组合3预测结果分别见表3-2,表3-3。The result is shown in Figure 3. The results showed that the area under the ROC curve of validation set 1 was 1.00, and the 95% CI was 0.99-1.00; the area under the ROC curve of validation set 2 was 0.97, and the 95% CI was 0.95-0.98. When the specificity of the training set is 93%, and the sensitivity is 95%, the malignant prediction threshold is 0.52, that is, the malignant prediction probability is greater than 0.52, which is judged as malignant, otherwise it is judged as benign; The sensitivity reached 98%, the specificity reached 100%, the PPV reached 100%, and the NPV reached 97%; the sensitivity for the diagnosis of malignant thyroid nodules in the validation set 2 reached 92%, the specificity reached 87%, the PPV reached 87%, and the NPV Reached 91%. See Table 3-2 and Table 3-3 for the prediction results of the two sets of validation set samples using methylation marker combination 3, respectively.
表3-2:验证集1样本用甲基化标志物组合3预测的结果Table 3-2: Prediction results of samples in validation set 1 using methylation marker combination 3
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
137T137T 恶性vicious 0.6240.624 恶性vicious
138T138T 恶性vicious 0.6690.669 恶性vicious
141T141T 恶性vicious 0.8900.890 恶性vicious
146T146T 恶性vicious 0.8330.833 恶性vicious
148T148T 恶性vicious 0.8990.899 恶性vicious
150T150T 恶性vicious 0.6120.612 恶性vicious
152T152T 恶性vicious 0.8280.828 恶性vicious
153T153T 恶性vicious 0.9040.904 恶性vicious
154T154T 恶性vicious 0.6970.697 恶性vicious
155T155T 恶性vicious 0.8020.802 恶性vicious
158T158T 恶性vicious 0.5890.589 恶性vicious
161T161T 恶性vicious 0.8780.878 恶性vicious
162T162T 恶性vicious 0.6140.614 恶性vicious
163T163T 恶性vicious 0.9350.935 恶性vicious
164T164T 恶性vicious 0.8340.834 恶性vicious
167T167T 恶性vicious 0.8310.831 恶性vicious
168T168T 恶性vicious 0.7100.710 恶性vicious
169T169T 恶性vicious 0.7340.734 恶性vicious
170T170T 恶性vicious 0.7540.754 恶性vicious
172T172T 恶性vicious 0.7730.773 恶性vicious
175T175T 恶性vicious 0.9390.939 恶性vicious
178T178T 恶性vicious 0.9270.927 恶性vicious
179T179T 恶性vicious 0.6280.628 恶性vicious
181T181T 恶性vicious 0.6550.655 恶性vicious
182T182T 恶性vicious 0.5820.582 恶性vicious
601T601T 恶性vicious 0.8900.890 恶性vicious
602T602T 恶性vicious 0.5910.591 恶性vicious
603T603T 恶性vicious 0.7170.717 恶性vicious
605T605T 恶性vicious 0.8520.852 恶性vicious
606T606T 恶性vicious 0.8700.870 恶性vicious
607T607T 恶性vicious 0.7910.791 恶性vicious
608T608T 恶性vicious 0.7730.773 恶性vicious
609T609T 恶性vicious 0.8190.819 恶性vicious
610T610T 恶性vicious 0.6600.660 恶性vicious
611T611T 恶性vicious 0.9120.912 恶性vicious
612T612T 恶性vicious 0.7430.743 恶性vicious
613T613T 恶性vicious 0.7550.755 恶性vicious
615T615T 恶性vicious 0.7810.781 恶性vicious
616T616T 恶性vicious 0.4780.478 良性benign
617T617T 恶性vicious 0.8930.893 恶性vicious
619T619T 恶性vicious 0.7570.757 恶性vicious
514B514B 良性benign 0.3150.315 良性benign
516B516B 良性benign 0.3500.350 良性benign
519B519B 良性benign 0.3300.330 良性benign
522B522B 良性benign 0.4830.483 良性benign
525B525B 良性benign 0.2230.223 良性benign
531B531B 良性benign 0.5010.501 良性benign
534B534B 良性benign 0.4800.480 良性benign
542B542B 良性benign 0.3200.320 良性benign
545B545B 良性benign 0.3310.331 良性benign
546B546B 良性benign 0.1590.159 良性benign
547B547B 良性benign 0.3450.345 良性benign
548B548B 良性benign 0.3670.367 良性benign
554B554B 良性benign 0.3580.358 良性benign
555B555B 良性benign 0.3210.321 良性benign
556B556B 良性benign 0.4260.426 良性benign
557B557B 良性benign 0.2790.279 良性benign
558B558B 良性benign 0.3010.301 良性benign
559B559B 良性benign 0.0790.079 良性benign
563B563B 良性benign 0.3740.374 良性benign
564B564B 良性benign 0.5100.510 良性benign
565B565B 良性benign 0.5080.508 良性benign
567B567B 良性benign 0.3760.376 良性benign
568B568B 良性benign 0.3080.308 良性benign
570B570B 良性benign 0.4070.407 良性benign
571B571B 良性benign 0.1800.180 良性benign
572B572B 良性benign 0.3570.357 良性benign
574B574B 良性benign 0.2020.202 良性benign
575B575B 良性benign 0.3490.349 良性benign
576B576B 良性benign 0.2680.268 良性benign
578B578B 良性benign 0.2880.288 良性benign
579B579B 良性benign 0.4760.476 良性benign
580B580B 良性benign 0.3320.332 良性benign
581B581B 良性benign 0.2300.230 良性benign
582B582B 良性benign 0.4130.413 良性benign
583B583B 良性benign 0.1520.152 良性benign
614B614B 良性benign 0.4440.444 良性benign
620B620B 良性benign 0.1390.139 良性benign
表3-3验证集2样本用甲基化标志物组合3预测的结果Table 3-3 The prediction results of the validation set 2 samples using the methylation marker combination 3
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
GFP-FR171219001GFP-FR171219001 恶性vicious 0.6130.613 恶性vicious
GFP-FR171219003GFP-FR171219003 恶性vicious 0.7020.702 恶性vicious
GFP-FR171219005GFP-FR171219005 恶性vicious 0.8650.865 恶性vicious
GFP-FR171219019GFP-FR171219019 恶性vicious 0.6340.634 恶性vicious
GFP-FR171219021GFP-FR171219021 恶性vicious 0.8850.885 恶性vicious
GFP-FR171219023GFP-FR171219023 恶性vicious 0.8820.882 恶性vicious
GFP-FR171219027GFP-FR171219027 恶性vicious 0.8240.824 恶性vicious
GFP-FR171219031GFP-FR171219031 恶性vicious 0.7970.797 恶性vicious
GFP-FR171219033GFP-FR171219033 恶性vicious 0.6960.696 恶性vicious
GFP-FR171219035GFP-FR171219035 恶性vicious 0.8240.824 恶性vicious
GFP-FR171219039GFP-FR171219039 恶性vicious 0.7910.791 恶性vicious
GFP-FR171219041GFP-FR171219041 恶性vicious 0.7620.762 恶性vicious
GFP-FR171219043GFP-FR171219043 恶性vicious 0.9380.938 恶性vicious
GFP-FR171219045GFP-FR171219045 恶性vicious 0.8540.854 恶性vicious
GFP-FR171219047GFP-FR171219047 恶性vicious 0.8770.877 恶性vicious
GFP-FR171219049GFP-FR171219049 恶性vicious 0.8390.839 恶性vicious
GFP-FR180615002GFP-FR180615002 恶性vicious 0.6140.614 恶性vicious
GFP-FR180615004GFP-FR180615004 恶性vicious 0.7920.792 恶性vicious
GFP-FR180615006GFP-FR180615006 恶性vicious 0.5340.534 恶性vicious
GFP-FR180615008GFP-FR180615008 恶性vicious 0.4640.464 良性benign
GFP-FR180615010GFP-FR180615010 恶性vicious 0.6630.663 恶性vicious
GFP-FR180615012GFP-FR180615012 恶性vicious 0.3570.357 良性benign
GFP-FR180615014GFP-FR180615014 恶性vicious 0.8560.856 恶性vicious
GFP-FR180615016GFP-FR180615016 恶性vicious 0.5130.513 良性benign
GFP-FR180615018GFP-FR180615018 恶性vicious 0.8230.823 恶性vicious
GFP-FR180615020GFP-FR180615020 恶性vicious 0.6190.619 恶性vicious
GFP-FR180615022GFP-FR180615022 恶性vicious 0.8520.852 恶性vicious
GFP-FR180615024GFP-FR180615024 恶性vicious 0.6900.690 恶性vicious
GFP-FR180615026GFP-FR180615026 恶性vicious 0.7490.749 恶性vicious
GFP-FR180615028GFP-FR180615028 恶性vicious 0.7620.762 恶性vicious
GFP-FR180615030GFP-FR180615030 恶性vicious 0.7070.707 恶性vicious
GFP-FR180615032GFP-FR180615032 恶性vicious 0.5330.533 恶性vicious
GFP-FR180615034GFP-FR180615034 恶性vicious 0.8230.823 恶性vicious
GFP-FR180615036GFP-FR180615036 恶性vicious 0.7520.752 恶性vicious
GFP-FR180615038GFP-FR180615038 恶性vicious 0.5490.549 恶性vicious
GFP-FR180713031GFP-FR180713031 恶性vicious 0.9470.947 恶性vicious
GFP-FR180713033GFP-FR180713033 恶性vicious 0.6950.695 恶性vicious
GFP-FR171230001GFP-FR171230001 良性benign 0.3880.388 良性benign
GFP-FR171230003GFP-FR171230003 良性benign 0.5070.507 良性benign
GFP-FR171230005GFP-FR171230005 良性benign 0.4300.430 良性benign
GFP-FR171230007GFP-FR171230007 良性benign 0.2560.256 良性benign
GFP-FR171230009GFP-FR171230009 良性benign 0.1570.157 良性benign
GFP-FR171230013GFP-FR171230013 良性benign 0.3770.377 良性benign
GFP-FR171230015GFP-FR171230015 良性benign 0.3200.320 良性benign
GFP-FR171230017GFP-FR171230017 良性benign 0.1850.185 良性benign
GFP-FR171230019GFP-FR171230019 良性benign 0.2540.254 良性benign
GFP-FR180525002GFP-FR180525002 良性benign 0.3970.397 良性benign
GFP-FR180525004GFP-FR180525004 良性benign 0.3430.343 良性benign
GFP-FR180525006GFP-FR180525006 良性benign 0.3520.352 良性benign
GFP-FR180525008GFP-FR180525008 良性benign 0.5600.560 恶性vicious
GFP-FR180525010GFP-FR180525010 良性benign 0.2930.293 良性benign
GFP-FR180525012GFP-FR180525012 良性benign 0.4440.444 良性benign
GFP-FR180525014GFP-FR180525014 良性benign 0.5740.574 恶性vicious
GFP-FR180525016GFP-FR180525016 良性benign 0.1240.124 良性benign
GFP-FR180525020GFP-FR180525020 良性benign 0.1650.165 良性benign
GFP-FR180525022GFP-FR180525022 良性benign 0.1440.144 良性benign
GFP-FR180525024GFP-FR180525024 良性benign 0.2570.257 良性benign
GFP-FR180525026GFP-FR180525026 良性benign 0.1680.168 良性benign
GFP-FR180525028GFP-FR180525028 良性benign 0.2450.245 良性benign
GFP-FR180525030GFP-FR180525030 良性benign 0.5330.533 恶性vicious
GFP-FR180713002GFP-FR180713002 良性benign 0.7340.734 恶性vicious
GFP-FR180713004GFP-FR180713004 良性benign 0.3290.329 良性benign
GFP-FR180713006GFP-FR180713006 良性benign 0.1520.152 良性benign
GFP-FR180713008GFP-FR180713008 良性benign 0.2880.288 良性benign
GFP-FR180713010GFP-FR180713010 良性benign 0.4560.456 良性benign
GFP-FR180713012GFP-FR180713012 良性benign 0.5870.587 恶性vicious
GFP-FR180713014GFP-FR180713014 良性benign 0.0990.099 良性benign
GFP-FR180713016GFP-FR180713016 良性benign 0.4300.430 良性benign
GFP-FR180713025GFP-FR180713025 良性benign 0.1390.139 良性benign
GFP-FR180713035GFP-FR180713035 良性benign 0.3470.347 良性benign
GFP-FR180713037GFP-FR180713037 良性benign 0.2960.296 良性benign
GFP-FR180713041GFP-FR180713041 良性benign 0.3500.350 良性benign
GFP-FR180713043GFP-FR180713043 良性benign 0.3070.307 良性benign
GFP-FR180713045GFP-FR180713045 良性benign 0.1540.154 良性benign
实施例4Example 4
用甲基化标志物chr1:59041615:59042314,chr1:156676274:156676973,chr3:58153211:58153910,chr8:22547976:22548675,chr8:34104888:34105587,chr17:7286958:7287657,chr17:79060865:79061564,chr20:31162101:31162800,chr22:23624092:23624791的组合(甲基化标志物组合4)构建的模型在两组验证集样本中测试AUC。每个标志物的逻辑回归模型系数见表4-1。逻辑回归模型截距是1.358。With methylation markers chr1:59041615:59042314, chr1:156676274:156676973, chr3:58153211:58153910, chr8:22547976:22548675, chr8:34104888:34105587, chr17 :7286958:7287657, chr17:79060865:79061564, chr20: The model constructed by the combination of 31162101:31162800, chr22:23624092:23624791 (methylation marker combination 4) tested AUC in two sets of validation set samples. The logistic regression model coefficients of each marker are shown in Table 4-1. The logistic regression model intercept is 1.358.
表4-1:每个标志物的逻辑回归模型系数Table 4-1: Logistic regression model coefficients for each marker
标志物landmark 基因名称gene name 逻辑回归模型系数Logistic regression model coefficients
chr1:59041615:59042314chr1:59041615:59042314 TACSTD2TACSTD2 -1.126-1.126
chr1:156676274:156676973chr1:156676274:156676973 CRABP2CRABP2 -0.568-0.568
chr3:58153211:58153910chr3:58153211:58153910 DNASE1L3DNASE1L3 -0.785-0.785
chr8:22547976:22548675chr8:22547976:22548675 EGR3EGR3 2.1072.107
chr8:34104888:34105587chr8:34104888:34105587 DUSP26DUSP26 0.8060.806
chr17:7286958:7287657chr17:7286958:7287657 TNK1TNK1 -1.198-1.198
chr17:79060865:79061564chr17:79060865:79061564 BAIAP2BAIAP2 -0.779-0.779
chr20:31162101:31162800chr20:31162101:31162800 NOL4L-DTNOL4L-DT -1.278-1.278
chr22:23624092:23624791chr22:23624092:23624791 BCRBCR -0.724-0.724
结果如图4所示。结果显示,验证集1的ROC曲线下面积为1.00,95%CI为0.99~1.00;验证集2的ROC曲线下面积为0.97,95%CI为0.95~0.98(图4)。在训练集特异性为93%,敏感性为95%时,恶性预测阈值为0.52,即恶性预测概率大于0.52判断为恶性,反之判断为良性;用该阈值对验证集1甲状腺恶性结节诊断的敏感性达到95%,特异性达到97%,PPV达到98%,NPV达到95%;对验证集2甲状腺恶性结节诊断的敏感性达到92%,特异性达到87%,PPV达到87%,NPV达到91%。两组验证集样本用甲基化标志物组合4预测结果分别见表4-2,表4-3。The result is shown in Figure 4. The results showed that the area under the ROC curve of the verification set 1 was 1.00, and the 95% CI was 0.99-1.00; the area under the ROC curve of the verification set 2 was 0.97, and the 95% CI was 0.95-0.98 (Figure 4). When the specificity of the training set is 93%, and the sensitivity is 95%, the malignant prediction threshold is 0.52, that is, the malignant prediction probability is greater than 0.52, which is judged as malignant, otherwise it is judged as benign; The sensitivity reached 95%, the specificity reached 97%, the PPV reached 98%, and the NPV reached 95%; the sensitivity for the diagnosis of malignant thyroid nodules in the validation set 2 reached 92%, the specificity reached 87%, the PPV reached 87%, and the NPV Reached 91%. The prediction results of two groups of validation set samples using methylation marker combination 4 are shown in Table 4-2 and Table 4-3 respectively.
表4-2:验证集1样本用甲基化标志物组合4预测的结果Table 4-2: Prediction results of samples in validation set 1 using methylation marker combination 4
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
137T137T 恶性vicious 0.6490.649 恶性vicious
138T138T 恶性vicious 0.6770.677 恶性vicious
141T141T 恶性vicious 0.8500.850 恶性vicious
146T146T 恶性vicious 0.8480.848 恶性vicious
148T148T 恶性vicious 0.9010.901 恶性vicious
150T150T 恶性vicious 0.6000.600 恶性vicious
152T152T 恶性vicious 0.7930.793 恶性vicious
153T153T 恶性vicious 0.9100.910 恶性vicious
154T154T 恶性vicious 0.7180.718 恶性vicious
155T155T 恶性vicious 0.8090.809 恶性vicious
158T158T 恶性vicious 0.5980.598 恶性vicious
161T161T 恶性vicious 0.8820.882 恶性vicious
162T162T 恶性vicious 0.6160.616 恶性vicious
163T163T 恶性vicious 0.9400.940 恶性vicious
164T164T 恶性vicious 0.8250.825 恶性vicious
167T167T 恶性vicious 0.8440.844 恶性vicious
168T168T 恶性vicious 0.7340.734 恶性vicious
169T169T 恶性vicious 0.7500.750 恶性vicious
170T170T 恶性vicious 0.7600.760 恶性vicious
172T172T 恶性vicious 0.7880.788 恶性vicious
175T175T 恶性vicious 0.9400.940 恶性vicious
178T178T 恶性vicious 0.9360.936 恶性vicious
179T179T 恶性vicious 0.6220.622 恶性vicious
181T181T 恶性vicious 0.6650.665 恶性vicious
182T182T 恶性vicious 0.5090.509 良性benign
601T601T 恶性vicious 0.9140.914 恶性vicious
602T602T 恶性vicious 0.6130.613 恶性vicious
603T603T 恶性vicious 0.7290.729 恶性vicious
605T605T 恶性vicious 0.8670.867 恶性vicious
606T606T 恶性vicious 0.8800.880 恶性vicious
607T607T 恶性vicious 0.8140.814 恶性vicious
608T608T 恶性vicious 0.7650.765 恶性vicious
609T609T 恶性vicious 0.8320.832 恶性vicious
610T610T 恶性vicious 0.6770.677 恶性vicious
611T611T 恶性vicious 0.9110.911 恶性vicious
612T612T 恶性vicious 0.7550.755 恶性vicious
613T613T 恶性vicious 0.7690.769 恶性vicious
615T615T 恶性vicious 0.7980.798 恶性vicious
616T616T 恶性vicious 0.5000.500 良性benign
617T617T 恶性vicious 0.9110.911 恶性vicious
619T619T 恶性vicious 0.7530.753 恶性vicious
514B514B 良性benign 0.3360.336 良性benign
516B516B 良性benign 0.3560.356 良性benign
519B519B 良性benign 0.3400.340 良性benign
522B522B 良性benign 0.5370.537 恶性vicious
525B525B 良性benign 0.2260.226 良性benign
531B531B 良性benign 0.5090.509 良性benign
534B534B 良性benign 0.4790.479 良性benign
542B542B 良性benign 0.3190.319 良性benign
545B545B 良性benign 0.3360.336 良性benign
546B546B 良性benign 0.1670.167 良性benign
547B547B 良性benign 0.3590.359 良性benign
548B548B 良性benign 0.3880.388 良性benign
554B554B 良性benign 0.3620.362 良性benign
555B555B 良性benign 0.3410.341 良性benign
556B556B 良性benign 0.4450.445 良性benign
557B557B 良性benign 0.2910.291 良性benign
558B558B 良性benign 0.3350.335 良性benign
559B559B 良性benign 0.0780.078 良性benign
563B563B 良性benign 0.3570.357 良性benign
564B564B 良性benign 0.5110.511 良性benign
565B565B 良性benign 0.4830.483 良性benign
567B567B 良性benign 0.3580.358 良性benign
568B568B 良性benign 0.3030.303 良性benign
570B570B 良性benign 0.4480.448 良性benign
571B571B 良性benign 0.1900.190 良性benign
572B572B 良性benign 0.3630.363 良性benign
574B574B 良性benign 0.2120.212 良性benign
575B575B 良性benign 0.3650.365 良性benign
576B576B 良性benign 0.2750.275 良性benign
578B578B 良性benign 0.2800.280 良性benign
579B579B 良性benign 0.4730.473 良性benign
580B580B 良性benign 0.3230.323 良性benign
581B581B 良性benign 0.2240.224 良性benign
582B582B 良性benign 0.4240.424 良性benign
583B583B 良性benign 0.1510.151 良性benign
614B614B 良性benign 0.4210.421 良性benign
620B620B 良性benign 0.1480.148 良性benign
表4-3:验证集2样本用甲基化标志物组合4预测的结果Table 4-3: Prediction results of the validation set 2 samples using the methylation marker combination 4
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
GFP-FR171219001GFP-FR171219001 恶性vicious 0.6150.615 恶性vicious
GFP-FR171219003GFP-FR171219003 恶性vicious 0.7070.707 恶性vicious
GFP-FR171219005GFP-FR171219005 恶性vicious 0.8860.886 恶性vicious
GFP-FR171219019GFP-FR171219019 恶性vicious 0.6250.625 恶性vicious
GFP-FR171219021GFP-FR171219021 恶性vicious 0.8740.874 恶性vicious
GFP-FR171219023GFP-FR171219023 恶性vicious 0.8800.880 恶性vicious
GFP-FR171219027GFP-FR171219027 恶性vicious 0.8310.831 恶性vicious
GFP-FR171219031GFP-FR171219031 恶性vicious 0.7970.797 恶性vicious
GFP-FR171219033GFP-FR171219033 恶性vicious 0.6950.695 恶性vicious
GFP-FR171219035GFP-FR171219035 恶性vicious 0.8360.836 恶性vicious
GFP-FR171219039GFP-FR171219039 恶性vicious 0.8090.809 恶性vicious
GFP-FR171219041GFP-FR171219041 恶性vicious 0.7580.758 恶性vicious
GFP-FR171219043GFP-FR171219043 恶性vicious 0.9490.949 恶性vicious
GFP-FR171219045GFP-FR171219045 恶性vicious 0.8720.872 恶性vicious
GFP-FR171219047GFP-FR171219047 恶性vicious 0.8830.883 恶性vicious
GFP-FR171219049GFP-FR171219049 恶性vicious 0.8400.840 恶性vicious
GFP-FR180615002GFP-FR180615002 恶性vicious 0.6240.624 恶性vicious
GFP-FR180615004GFP-FR180615004 恶性vicious 0.7930.793 恶性vicious
GFP-FR180615006GFP-FR180615006 恶性vicious 0.5560.556 恶性vicious
GFP-FR180615008GFP-FR180615008 恶性vicious 0.4730.473 良性benign
GFP-FR180615010GFP-FR180615010 恶性vicious 0.7270.727 恶性vicious
GFP-FR180615012GFP-FR180615012 恶性vicious 0.3710.371 良性benign
GFP-FR180615014GFP-FR180615014 恶性vicious 0.8740.874 恶性vicious
GFP-FR180615016GFP-FR180615016 恶性vicious 0.5040.504 良性benign
GFP-FR180615018GFP-FR180615018 恶性vicious 0.8170.817 恶性vicious
GFP-FR180615020GFP-FR180615020 恶性vicious 0.6280.628 恶性vicious
GFP-FR180615022GFP-FR180615022 恶性vicious 0.8520.852 恶性vicious
GFP-FR180615024GFP-FR180615024 恶性vicious 0.6880.688 恶性vicious
GFP-FR180615026GFP-FR180615026 恶性vicious 0.7680.768 恶性vicious
GFP-FR180615028GFP-FR180615028 恶性vicious 0.7740.774 恶性vicious
GFP-FR180615030GFP-FR180615030 恶性vicious 0.6660.666 恶性vicious
GFP-FR180615032GFP-FR180615032 恶性vicious 0.5510.551 恶性vicious
GFP-FR180615034GFP-FR180615034 恶性vicious 0.8460.846 恶性vicious
GFP-FR180615036GFP-FR180615036 恶性vicious 0.7470.747 恶性vicious
GFP-FR180615038GFP-FR180615038 恶性vicious 0.5640.564 恶性vicious
GFP-FR180713031GFP-FR180713031 恶性vicious 0.9540.954 恶性vicious
GFP-FR180713033GFP-FR180713033 恶性vicious 0.7080.708 恶性vicious
GFP-FR171230001GFP-FR171230001 良性benign 0.3760.376 良性benign
GFP-FR171230003GFP-FR171230003 良性benign 0.5110.511 良性benign
GFP-FR171230005GFP-FR171230005 良性benign 0.4420.442 良性benign
GFP-FR171230007GFP-FR171230007 良性benign 0.2910.291 良性benign
GFP-FR171230009GFP-FR171230009 良性benign 0.1570.157 良性benign
GFP-FR171230013GFP-FR171230013 良性benign 0.4040.404 良性benign
GFP-FR171230015GFP-FR171230015 良性benign 0.3290.329 良性benign
GFP-FR171230017GFP-FR171230017 良性benign 0.1860.186 良性benign
GFP-FR171230019GFP-FR171230019 良性benign 0.2660.266 良性benign
GFP-FR180525002GFP-FR180525002 良性benign 0.4210.421 良性benign
GFP-FR180525004GFP-FR180525004 良性benign 0.3610.361 良性benign
GFP-FR180525006GFP-FR180525006 良性benign 0.3560.356 良性benign
GFP-FR180525008GFP-FR180525008 良性benign 0.5620.562 恶性vicious
GFP-FR180525010GFP-FR180525010 良性benign 0.2960.296 良性benign
GFP-FR180525012GFP-FR180525012 良性benign 0.4550.455 良性benign
GFP-FR180525014GFP-FR180525014 良性benign 0.5820.582 良性benign
GFP-FR180525016GFP-FR180525016 良性benign 0.1110.111 良性benign
GFP-FR180525020GFP-FR180525020 良性benign 0.1660.166 良性benign
GFP-FR180525022GFP-FR180525022 良性benign 0.1620.162 良性benign
GFP-FR180525024GFP-FR180525024 良性benign 0.2430.243 良性benign
GFP-FR180525026GFP-FR180525026 良性benign 0.1540.154 良性benign
GFP-FR180525028GFP-FR180525028 良性benign 0.2480.248 良性benign
GFP-FR180525030GFP-FR180525030 良性benign 0.5390.539 恶性vicious
GFP-FR180713002GFP-FR180713002 良性benign 0.7490.749 恶性vicious
GFP-FR180713004GFP-FR180713004 良性benign 0.3350.335 良性benign
GFP-FR180713006GFP-FR180713006 良性benign 0.1360.136 良性benign
GFP-FR180713008GFP-FR180713008 良性benign 0.2860.286 良性benign
GFP-FR180713010GFP-FR180713010 良性benign 0.4610.461 良性benign
GFP-FR180713012GFP-FR180713012 良性benign 0.5940.594 恶性vicious
GFP-FR180713014GFP-FR180713014 良性benign 0.0990.099 良性benign
GFP-FR180713016GFP-FR180713016 良性benign 0.4510.451 良性benign
GFP-FR180713025GFP-FR180713025 良性benign 0.1430.143 良性benign
GFP-FR180713035GFP-FR180713035 良性benign 0.3290.329 良性benign
GFP-FR180713037GFP-FR180713037 良性benign 0.2660.266 良性benign
GFP-FR180713041GFP-FR180713041 良性benign 0.3360.336 良性benign
GFP-FR180713043GFP-FR180713043 良性benign 0.3150.315 良性benign
GFP-FR180713045GFP-FR180713045 良性benign 0.1580.158 良性benign
实施例5Example 5
用甲基化标志物chr1:59041615:59042314,chr3:58153211:58153910,chr8:22547976:22548675,chr8:34104888:34105587,chr17:7286958:7287657,chr17:79060865:79061564,chr19:883793:884492,chr20:31162101:31162800,chr22:23624092:23624791的组合(甲基化标志物组合5)构建的模型在两组验证集样本中测试AUC。每个标志物的逻辑回归模型系数见表5-1。逻辑回归模型截距是1.447。With methylation markers chr1:59041615:59042314, chr3:58153211:58153910, chr8:22547976:22548675, chr8:34104888:34105587, chr17:7286958:7287657, chr17:79 060865:79061564, chr19:883793:884492, chr20: The model constructed by the combination of 31162101:31162800, chr22:23624092:23624791 (methylation marker combination 5) tested AUC in two sets of validation set samples. The logistic regression model coefficients of each marker are shown in Table 5-1. The logistic regression model intercept is 1.447.
表5-1:每个标志物的逻辑回归模型系数Table 5-1: Logistic regression model coefficients for each marker
标志物landmark 基因名称gene name 逻辑回归模型系数Logistic regression model coefficients
chr1:59041615:59042314chr1:59041615:59042314 TACSTD2TACSTD2 -1.122-1.122
chr3:58153211:58153910chr3:58153211:58153910 DNASE1L3DNASE1L3 -0.724-0.724
chr8:22547976:22548675chr8:22547976:22548675 EGR3EGR3 2.1432.143
chr8:34104888:34105587chr8:34104888:34105587 DUSP26DUSP26 0.8930.893
chr17:7286958:7287657chr17:7286958:7287657 TNK1TNK1 -1.212-1.212
chr17:79060865:79061564chr17:79060865:79061564 BAIAP2BAIAP2 -0.717-0.717
chr19:883793:884492chr19:883793:884492 MED16MED16 -0.411-0.411
chr20:31162101:31162800chr20:31162101:31162800 NOL4L-DTNOL4L-DT -1.258-1.258
chr22:23624092:23624791chr22:23624092:23624791 BCRBCR -0.682-0.682
结果如图5所示。结果显示,验证集1的ROC曲线下面积为1.00,95%CI为0.99~1.00;验证集2的ROC曲线下面积为0.97,95%CI为0.96~0.99(图4)。在训练集特异性为93%,敏感性为95%时,恶性预测阈值为0.52,即恶性预测概率大于0.52判断为恶性,反之判断为良性;用该阈值对验证集1甲状腺恶性结节诊断的敏感性达到95%,特异性达到97%,PPV达到98%,NPV达到95%;对验证集2甲状腺恶性结节诊断的敏感性达到92%,特异性达到87%,PPV达到87%,NPV达到91%。两组验证集样本用甲基化标志物组合5预测结果分别见表5-2,表5-3。The result is shown in Figure 5. The results showed that the area under the ROC curve of the verification set 1 was 1.00, and the 95% CI was 0.99-1.00; the area under the ROC curve of the verification set 2 was 0.97, and the 95% CI was 0.96-0.99 (Figure 4). When the specificity of the training set is 93%, and the sensitivity is 95%, the malignant prediction threshold is 0.52, that is, the malignant prediction probability is greater than 0.52, which is judged as malignant, otherwise it is judged as benign; The sensitivity reached 95%, the specificity reached 97%, the PPV reached 98%, and the NPV reached 95%; the sensitivity for the diagnosis of malignant thyroid nodules in the validation set 2 reached 92%, the specificity reached 87%, the PPV reached 87%, and the NPV Reached 91%. See Table 5-2 and Table 5-3 respectively for the prediction results of the two sets of validation set samples using methylation marker combination 5.
表5-2:验证集1样本用甲基化标志物组合5预测的结果Table 5-2: Prediction results of samples in validation set 1 using methylation marker combination 5
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
137T137T 恶性vicious 0.6420.642 恶性vicious
138T138T 恶性vicious 0.6610.661 恶性vicious
141T141T 恶性vicious 0.8480.848 恶性vicious
146T146T 恶性vicious 0.8480.848 恶性vicious
148T148T 恶性vicious 0.9000.900 恶性vicious
150T150T 恶性vicious 0.5910.591 恶性vicious
152T152T 恶性vicious 0.7860.786 恶性vicious
153T153T 恶性vicious 0.9130.913 恶性vicious
154T154T 恶性vicious 0.7160.716 恶性vicious
155T155T 恶性vicious 0.8000.800 恶性vicious
158T158T 恶性vicious 0.5930.593 恶性vicious
161T161T 恶性vicious 0.8800.880 恶性vicious
162T162T 恶性vicious 0.6030.603 恶性vicious
163T163T 恶性vicious 0.9390.939 恶性vicious
164T164T 恶性vicious 0.8200.820 恶性vicious
167T167T 恶性vicious 0.8400.840 恶性vicious
168T168T 恶性vicious 0.7250.725 恶性vicious
169T169T 恶性vicious 0.7360.736 恶性vicious
170T170T 恶性vicious 0.7460.746 恶性vicious
172T172T 恶性vicious 0.7810.781 恶性vicious
175T175T 恶性vicious 0.9390.939 恶性vicious
178T178T 恶性vicious 0.9290.929 恶性vicious
179T179T 恶性vicious 0.6240.624 恶性vicious
181T181T 恶性vicious 0.6530.653 恶性vicious
182T182T 恶性vicious 0.4950.495 良性benign
601T601T 恶性vicious 0.9090.909 恶性vicious
602T602T 恶性vicious 0.6340.634 恶性vicious
603T603T 恶性vicious 0.7380.738 恶性vicious
605T605T 恶性vicious 0.8690.869 恶性vicious
606T606T 恶性vicious 0.8670.867 恶性vicious
607T607T 恶性vicious 0.7940.794 恶性vicious
608T608T 恶性vicious 0.7580.758 恶性vicious
609T609T 恶性vicious 0.8220.822 恶性vicious
610T610T 恶性vicious 0.6680.668 恶性vicious
611T611T 恶性vicious 0.9000.900 恶性vicious
612T612T 恶性vicious 0.7510.751 恶性vicious
613T613T 恶性vicious 0.7690.769 恶性vicious
615T615T 恶性vicious 0.8040.804 恶性vicious
616T616T 恶性vicious 0.4770.477 良性benign
617T617T 恶性vicious 0.9030.903 恶性vicious
619T619T 恶性vicious 0.7480.748 恶性vicious
514B514B 良性benign 0.3280.328 良性benign
516B516B 良性benign 0.3520.352 良性benign
519B519B 良性benign 0.3310.331 良性benign
522B522B 良性benign 0.5280.528 恶性vicious
525B525B 良性benign 0.2260.226 良性benign
531B531B 良性benign 0.5080.508 良性benign
534B534B 良性benign 0.4750.475 良性benign
542B542B 良性benign 0.3340.334 良性benign
545B545B 良性benign 0.3460.346 良性benign
546B546B 良性benign 0.1780.178 良性benign
547B547B 良性benign 0.3600.360 良性benign
548B548B 良性benign 0.3770.377 良性benign
554B554B 良性benign 0.3590.359 良性benign
555B555B 良性benign 0.3270.327 良性benign
556B556B 良性benign 0.4410.441 良性benign
557B557B 良性benign 0.2860.286 良性benign
558B558B 良性benign 0.3450.345 良性benign
559B559B 良性benign 0.0780.078 良性benign
563B563B 良性benign 0.3490.349 良性benign
564B564B 良性benign 0.5010.501 良性benign
565B565B 良性benign 0.4710.471 良性benign
567B567B 良性benign 0.3460.346 良性benign
568B568B 良性benign 0.3120.312 良性benign
570B570B 良性benign 0.4910.491 良性benign
571B571B 良性benign 0.2090.209 良性benign
572B572B 良性benign 0.3630.363 良性benign
574B574B 良性benign 0.2200.220 良性benign
575B575B 良性benign 0.3580.358 良性benign
576B576B 良性benign 0.2690.269 良性benign
578B578B 良性benign 0.2710.271 良性benign
579B579B 良性benign 0.4650.465 良性benign
580B580B 良性benign 0.3320.332 良性benign
581B581B 良性benign 0.2180.218 良性benign
582B582B 良性benign 0.4220.422 良性benign
583B583B 良性benign 0.1530.153 良性benign
614B614B 良性benign 0.4220.422 良性benign
620B620B 良性benign 0.2020.202 良性benign
表5-3:验证集2样本用甲基化标志物组合5预测的结果Table 5-3: Prediction results of the validation set 2 samples using the methylation marker combination 5
样本IDSample ID 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
GFP-FR171219001GFP-FR171219001 恶性vicious 0.6110.611 恶性vicious
GFP-FR171219003GFP-FR171219003 恶性vicious 0.7130.713 恶性vicious
GFP-FR171219005GFP-FR171219005 恶性vicious 0.8920.892 恶性vicious
GFP-FR171219019GFP-FR171219019 恶性vicious 0.6130.613 恶性vicious
GFP-FR171219021GFP-FR171219021 恶性vicious 0.8690.869 恶性vicious
GFP-FR171219023GFP-FR171219023 恶性vicious 0.8840.884 恶性vicious
GFP-FR171219027GFP-FR171219027 恶性vicious 0.8290.829 恶性vicious
GFP-FR171219031GFP-FR171219031 恶性vicious 0.8020.802 恶性vicious
GFP-FR171219033GFP-FR171219033 恶性vicious 0.6990.699 恶性vicious
GFP-FR171219035GFP-FR171219035 恶性vicious 0.8540.854 恶性vicious
GFP-FR171219039GFP-FR171219039 恶性vicious 0.8110.811 恶性vicious
GFP-FR171219041GFP-FR171219041 恶性vicious 0.7460.746 恶性vicious
GFP-FR171219043GFP-FR171219043 恶性vicious 0.9530.953 恶性vicious
GFP-FR171219045GFP-FR171219045 恶性vicious 0.8700.870 恶性vicious
GFP-FR171219047GFP-FR171219047 恶性vicious 0.8820.882 恶性vicious
GFP-FR171219049GFP-FR171219049 恶性vicious 0.8390.839 恶性vicious
GFP-FR180615002GFP-FR180615002 恶性vicious 0.6130.613 恶性vicious
GFP-FR180615004GFP-FR180615004 恶性vicious 0.7980.798 恶性vicious
GFP-FR180615006GFP-FR180615006 恶性vicious 0.5760.576 恶性vicious
GFP-FR180615008GFP-FR180615008 恶性vicious 0.4600.460 良性benign
GFP-FR180615010GFP-FR180615010 恶性vicious 0.7500.750 恶性vicious
GFP-FR180615012GFP-FR180615012 恶性vicious 0.3650.365 良性benign
GFP-FR180615014GFP-FR180615014 恶性vicious 0.8810.881 恶性vicious
GFP-FR180615016GFP-FR180615016 恶性vicious 0.4930.493 良性benign
GFP-FR180615018GFP-FR180615018 恶性vicious 0.8230.823 恶性vicious
GFP-FR180615020GFP-FR180615020 恶性vicious 0.6090.609 恶性vicious
GFP-FR180615022GFP-FR180615022 恶性vicious 0.8600.860 恶性vicious
GFP-FR180615024GFP-FR180615024 恶性vicious 0.6730.673 恶性vicious
GFP-FR180615026GFP-FR180615026 恶性vicious 0.7620.762 恶性vicious
GFP-FR180615028GFP-FR180615028 恶性vicious 0.7630.763 恶性vicious
GFP-FR180615030GFP-FR180615030 恶性vicious 0.6710.671 恶性vicious
GFP-FR180615032GFP-FR180615032 恶性vicious 0.5330.533 恶性vicious
GFP-FR180615034GFP-FR180615034 恶性vicious 0.8620.862 恶性vicious
GFP-FR180615036GFP-FR180615036 恶性vicious 0.7670.767 恶性vicious
GFP-FR180615038GFP-FR180615038 恶性vicious 0.5650.565 恶性vicious
GFP-FR180713031GFP-FR180713031 恶性vicious 0.9550.955 恶性vicious
GFP-FR180713033GFP-FR180713033 恶性vicious 0.7400.740 恶性vicious
GFP-FR171230001GFP-FR171230001 良性benign 0.3610.361 良性benign
GFP-FR171230003GFP-FR171230003 良性benign 0.4990.499 良性benign
GFP-FR171230005GFP-FR171230005 良性benign 0.4270.427 良性benign
GFP-FR171230007GFP-FR171230007 良性benign 0.3070.307 良性benign
GFP-FR171230009GFP-FR171230009 良性benign 0.1640.164 良性benign
GFP-FR171230013GFP-FR171230013 良性benign 0.4080.408 良性benign
GFP-FR171230015GFP-FR171230015 良性benign 0.3600.360 良性benign
GFP-FR171230017GFP-FR171230017 良性benign 0.1720.172 良性benign
GFP-FR171230019GFP-FR171230019 良性benign 0.2770.277 良性benign
GFP-FR180525002GFP-FR180525002 良性benign 0.4440.444 良性benign
GFP-FR180525004GFP-FR180525004 良性benign 0.3750.375 良性benign
GFP-FR180525006GFP-FR180525006 良性benign 0.3420.342 良性benign
GFP-FR180525008GFP-FR180525008 良性benign 0.5430.543 恶性vicious
GFP-FR180525010GFP-FR180525010 良性benign 0.2830.283 良性benign
GFP-FR180525012GFP-FR180525012 良性benign 0.4370.437 良性benign
GFP-FR180525014GFP-FR180525014 良性benign 0.5730.573 恶性vicious
GFP-FR180525016GFP-FR180525016 良性benign 0.1080.108 良性benign
GFP-FR180525020GFP-FR180525020 良性benign 0.1510.151 良性benign
GFP-FR180525022GFP-FR180525022 良性benign 0.1570.157 良性benign
GFP-FR180525024GFP-FR180525024 良性benign 0.2280.228 良性benign
GFP-FR180525026GFP-FR180525026 良性benign 0.1560.156 良性benign
GFP-FR180525028GFP-FR180525028 良性benign 0.2430.243 良性benign
GFP-FR180525030GFP-FR180525030 良性benign 0.5260.526 恶性vicious
GFP-FR180713002GFP-FR180713002 良性benign 0.7520.752 恶性vicious
GFP-FR180713004GFP-FR180713004 良性benign 0.3240.324 良性benign
GFP-FR180713006GFP-FR180713006 良性benign 0.1520.152 良性benign
GFP-FR180713008GFP-FR180713008 良性benign 0.2820.282 良性benign
GFP-FR180713010GFP-FR180713010 良性benign 0.4550.455 良性benign
GFP-FR180713012GFP-FR180713012 良性benign 0.5850.585 恶性vicious
GFP-FR180713014GFP-FR180713014 良性benign 0.1040.104 良性benign
GFP-FR180713016GFP-FR180713016 良性benign 0.4360.436 良性benign
GFP-FR180713025GFP-FR180713025 良性benign 0.1430.143 良性benign
GFP-FR180713035GFP-FR180713035 良性benign 0.3280.328 良性benign
GFP-FR180713037GFP-FR180713037 良性benign 0.2610.261 良性benign
GFP-FR180713041GFP-FR180713041 良性benign 0.3390.339 良性benign
GFP-FR180713043GFP-FR180713043 良性benign 0.2990.299 良性benign
GFP-FR180713045GFP-FR180713045 良性benign 0.1550.155 良性benign
实施例6Example 6
分别用每个甲基化标志物构建的模型在两组验证集样本中测试AUC。以训练集的约登指数界定的阈值作为恶性预测阈值,高于该阈值判断为恶性,反之判断为良性。每个甲基化标志物对两组验证集样本预测性能见表6。The models built with each methylation marker were tested for AUC in two sets of validation set samples separately. The threshold defined by the Youden index of the training set was used as the malignant prediction threshold, above which it was judged as malignant, otherwise it was judged as benign. The predictive performance of each methylation marker on the two sets of validation samples is shown in Table 6.
表6:每个甲基化标志物对测试集样本预测性能Table 6: Predictive performance of each methylation marker on test set samples
Figure PCTCN2022137459-appb-000006
Figure PCTCN2022137459-appb-000006
Figure PCTCN2022137459-appb-000007
Figure PCTCN2022137459-appb-000007
Figure PCTCN2022137459-appb-000008
Figure PCTCN2022137459-appb-000008
Figure PCTCN2022137459-appb-000009
Figure PCTCN2022137459-appb-000009
实施例7Example 7
高达30%的甲状腺细针穿刺样本很难通过细胞学特征准确诊断。根据Bethesda甲状腺细胞病理学分类标准,验证集1中有25例样本属于细胞学分类不明确的,用实施例1的甲基化标志物组合1预测正确率为84%,敏感性为100%,特异性为73%。细胞学分类不明确的验证集1样本用实施例1的甲基化标志物组合1预测结果见表7-1。Up to 30% of thyroid fine-needle aspiration samples are poorly diagnosed by cytologic features. According to the Bethesda thyroid cytopathology classification standard, 25 samples in the verification set 1 belong to the unclear cytological classification, and the prediction accuracy rate of the methylation marker combination 1 in Example 1 is 84%, and the sensitivity is 100%. The specificity was 73%. The prediction results of the validation set 1 samples with unclear cytological classification using the methylation marker combination 1 of Example 1 are shown in Table 7-1.
表7-1:细胞学分类不明确的验证集1样本用甲基化标志物组合1预测的结果Table 7-1: The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 1
样本IDSample ID Bethesda分类BethesdaClassification 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
138T138T SMSM 恶性vicious 0.6200.620 恶性vicious
141T141T SMSM 恶性vicious 0.7670.767 恶性vicious
148T148T AUSAUS 恶性vicious 0.7380.738 恶性vicious
179T179T SMSM 恶性vicious 0.6610.661 恶性vicious
181T181T SMSM 恶性vicious 0.6580.658 恶性vicious
608T608T AUSAUS 恶性vicious 0.6470.647 恶性vicious
610T610T SMSM 恶性vicious 0.6620.662 恶性vicious
612T612T SMSM 恶性vicious 0.5950.595 恶性vicious
613T613T SMSM 恶性vicious 0.6880.688 恶性vicious
616T616T SFNSFN 恶性vicious 0.5490.549 恶性vicious
516B516B SFNSFN 良性benign 0.4910.491 恶性vicious
519B519B FLUS/AUSFLUS/AUS 良性benign 0.4250.425 良性benign
525B525B SMSM 良性benign 0.2990.299 良性benign
531B531B FNFN 良性benign 0.5410.541 恶性vicious
545B545B SFNSFN 良性benign 0.4400.440 良性benign
559B559B SFNSFN 良性benign 0.2520.252 良性benign
564B564B SFNSFN 良性benign 0.5680.568 恶性vicious
565B565B SFNSFN 良性benign 0.5800.580 恶性vicious
567B567B SFNSFN 良性benign 0.4270.427 良性benign
570B570B FLUSFLUS 良性benign 0.4230.423 良性benign
574B574B SFNSFN 良性benign 0.4810.481 良性benign
578B578B SFNSFN 良性benign 0.4210.421 良性benign
579B579B SFNSFN 良性benign 0.4810.481 良性benign
581B581B AUS/FLUSAUS/FLUS 良性benign 0.3460.346 良性benign
620B620B SFNSFN 良性benign 0.2460.246 良性benign
验证集1中25例细胞学分类不明确的样本用实施例2甲基化标志物组合2预测正确率为96%,敏感性为90%,特异性为100%。细胞学分类不明确的验证集1样本用实施例2甲基化标志物组合的预测结果见表7-2。For the 25 samples with unclear cytological classification in the verification set 1, the correct rate of prediction using the methylation marker combination 2 of Example 2 was 96%, the sensitivity was 90%, and the specificity was 100%. The prediction results of the validation set 1 samples with unclear cytological classification using the combination of methylation markers in Example 2 are shown in Table 7-2.
表7-2:细胞学分类不明确的验证集1样本用甲基化标志物组合2预测的结果Table 7-2: The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 2
样本IDSample ID Bethesda分类BethesdaClassification 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
138T138T SMSM 恶性vicious 0.6930.693 恶性vicious
141T141T SMSM 恶性vicious 0.8750.875 恶性vicious
148T148T AUSAUS 恶性vicious 0.8610.861 恶性vicious
179T179T SMSM 恶性vicious 0.6300.630 恶性vicious
181T181T SMSM 恶性vicious 0.5930.593 恶性vicious
608T608T AUSAUS 恶性vicious 0.7710.771 恶性vicious
610T610T SMSM 恶性vicious 0.6470.647 恶性vicious
612T612T SMSM 恶性vicious 0.7720.772 恶性vicious
613T613T SMSM 恶性vicious 0.7840.784 恶性vicious
616T616T SFNSFN 恶性vicious 0.5630.563 良性benign
516B516B SFNSFN 良性benign 0.3900.390 良性benign
519B519B FLUS/AUSFLUS/AUS 良性benign 0.3420.342 良性benign
525B525B SMSM 良性benign 0.2770.277 良性benign
531B531B FNFN 良性benign 0.5000.500 良性benign
545B545B SFNSFN 良性benign 0.3800.380 良性benign
559B559B SFNSFN 良性benign 0.1300.130 良性benign
564B564B SFNSFN 良性benign 0.5210.521 良性benign
565B565B SFNSFN 良性benign 0.5290.529 良性benign
567B567B SFNSFN 良性benign 0.4730.473 良性benign
570B570B FLUSFLUS 良性benign 0.4960.496 良性benign
574B574B SFNSFN 良性benign 0.3050.305 良性benign
578B578B SFNSFN 良性benign 0.3640.364 良性benign
579B579B SFNSFN 良性benign 0.4840.484 良性benign
581B581B AUS/FLUSAUS/FLUS 良性benign 0.2920.292 良性benign
620B620B SFNSFN 良性benign 0.2910.291 良性benign
验证集1中25例细胞学分类不明确的样本用实施例3甲基化标志物组合3预测正确率为96%,敏感性为90%,特异性为100%。细胞学分类不明确的验证集1样本用实施例3甲基化标志物组合3预测结果见表7-3。For the 25 samples with unclear cytological classification in the verification set 1, the correct rate of prediction using the methylation marker combination 3 of Example 3 was 96%, the sensitivity was 90%, and the specificity was 100%. The prediction results of the validation set 1 samples with unclear cytological classification using the methylation marker combination 3 of Example 3 are shown in Table 7-3.
表7-3:细胞学分类不明确的验证集1样本用甲基化标志物组合3预测的结果Table 7-3: The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 3
样本IDSample ID Bethesda分类BethesdaClassification 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
138T138T SMSM 恶性vicious 0.6690.669 恶性vicious
141T141T SMSM 恶性vicious 0.8900.890 恶性vicious
148T148T AUSAUS 恶性vicious 0.8990.899 恶性vicious
179T179T SMSM 恶性vicious 0.6280.628 恶性vicious
181T181T SMSM 恶性vicious 0.6550.655 恶性vicious
608T608T AUSAUS 恶性vicious 0.7730.773 恶性vicious
610T610T SMSM 恶性vicious 0.6600.660 恶性vicious
612T612T SMSM 恶性vicious 0.7430.743 恶性vicious
613T613T SMSM 恶性vicious 0.7550.755 恶性vicious
616T616T SFNSFN 恶性vicious 0.4780.478 良性benign
516B516B SFNSFN 良性benign 0.3500.350 良性benign
519B519B FLUS/AUSFLUS/AUS 良性benign 0.3300.330 良性benign
525B525B SMSM 良性benign 0.2230.223 良性benign
531B531B FNFN 良性benign 0.5010.501 良性benign
545B545B SFNSFN 良性benign 0.3310.331 良性benign
559B559B SFNSFN 良性benign 0.0790.079 良性benign
564B564B SFNSFN 良性benign 0.5100.510 良性benign
565B565B SFNSFN 良性benign 0.5080.508 良性benign
567B567B SFNSFN 良性benign 0.3760.376 良性benign
570B570B FLUSFLUS 良性benign 0.4070.407 良性benign
574B574B SFNSFN 良性benign 0.2020.202 良性benign
578B578B SFNSFN 良性benign 0.2880.288 良性benign
579B579B SFNSFN 良性benign 0.4760.476 良性benign
581B581B AUS/FLUSAUS/FLUS 良性benign 0.2300.230 良性benign
620B620B SFNSFN 良性benign 0.1390.139 良性benign
验证集1中25例细胞学分类不明确的样本用实施例4甲基化标志物组合4预测正确率为96%,敏感性为90%,特异性为100%。细胞学分类不明确的验证集1样本用实施例4甲基化标志物组合4预测结果见表7-4。For the 25 samples with unclear cytological classification in the verification set 1, the correct rate of prediction using the methylation marker combination 4 of Example 4 was 96%, the sensitivity was 90%, and the specificity was 100%. The prediction results of the validation set 1 samples with unclear cytological classification using the methylation marker combination 4 of Example 4 are shown in Table 7-4.
表7-4:细胞学分类不明确的验证集1样本用甲基化标志物组合4预测的结果Table 7-4: The prediction results of the validation set 1 samples with ambiguous cytological classification using the methylation marker combination 4
样本IDSample ID Bethesda分类BethesdaClassification 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
138T138T SMSM 恶性vicious 0.6770.677 恶性vicious
141T141T SMSM 恶性vicious 0.8500.850 恶性vicious
148T148T AUSAUS 恶性vicious 0.9010.901 恶性vicious
179T179T SMSM 恶性vicious 0.6220.622 恶性vicious
181T181T SMSM 恶性vicious 0.6650.665 恶性vicious
608T608T AUSAUS 恶性vicious 0.7650.765 恶性vicious
610T610T SMSM 恶性vicious 0.6770.677 恶性vicious
612T612T SMSM 恶性vicious 0.7550.755 恶性vicious
613T613T SMSM 恶性vicious 0.7690.769 恶性vicious
616T616T SFNSFN 恶性vicious 0.5000.500 良性benign
516B516B SFNSFN 良性benign 0.3560.356 良性benign
519B519B FLUS/AUSFLUS/AUS 良性benign 0.3400.340 良性benign
525B525B SMSM 良性benign 0.2260.226 良性benign
531B531B FNFN 良性benign 0.5090.509 良性benign
545B545B SFNSFN 良性benign 0.3360.336 良性benign
559B559B SFNSFN 良性benign 0.0780.078 良性benign
564B564B SFNSFN 良性benign 0.5110.511 良性benign
565B565B SFNSFN 良性benign 0.4830.483 良性benign
567B567B SFNSFN 良性benign 0.3580.358 良性benign
570B570B FLUSFLUS 良性benign 0.4480.448 良性benign
574B574B SFNSFN 良性benign 0.2120.212 良性benign
578B578B SFNSFN 良性benign 0.2800.280 良性benign
579B579B SFNSFN 良性benign 0.4730.473 良性benign
581B581B AUS/FLUSAUS/FLUS 良性benign 0.2240.224 良性benign
620B620B SFNSFN 良性benign 0.1480.148 良性benign
验证集1中25例细胞学分类不明确的样本用实施例5甲基化标志物组合5预测正确率为96%,敏感性为90%,特异性为100%。细胞学分类不明确的验证集1样本用实施例5甲基化标志物组合5预测结果见表7-5。For the 25 samples with unclear cytological classification in the verification set 1, the correct rate of prediction using the methylation marker combination 5 of Example 5 was 96%, the sensitivity was 90%, and the specificity was 100%. The prediction results of the validation set 1 samples with unclear cytological classification using the methylation marker combination 5 in Example 5 are shown in Table 7-5.
表7-5:细胞学分类不明确的验证集1样本用甲基化标志物组合5预测的结果Table 7-5: The prediction results of the validation set 1 sample with ambiguous cytological classification using the methylation marker combination 5
样本IDSample ID Bethesda分类BethesdaClassification 样本类型sample type 恶性预测概率malignancy prediction probability 预测结果forecast result
138T138T SMSM 恶性vicious 0.6610.661 恶性vicious
141T141T SMSM 恶性vicious 0.8480.848 恶性vicious
148T148T AUSAUS 恶性vicious 0.9000.900 恶性vicious
179T179T SMSM 恶性vicious 0.6240.624 恶性vicious
181T181T SMSM 恶性vicious 0.6530.653 恶性vicious
608T608T AUSAUS 恶性vicious 0.7580.758 恶性vicious
610T610T SMSM 恶性vicious 0.6680.668 恶性vicious
612T612T SMSM 恶性vicious 0.7510.751 恶性vicious
613T613T SMSM 恶性vicious 0.7690.769 恶性vicious
616T616T SFNSFN 恶性vicious 0.4770.477 良性benign
516B516B SFNSFN 良性benign 0.3520.352 良性benign
519B519B FLUS/AUSFLUS/AUS 良性benign 0.3310.331 良性benign
525B525B SMSM 良性benign 0.2260.226 良性benign
531B531B FNFN 良性benign 0.5080.508 良性benign
545B545B SFNSFN 良性benign 0.3460.346 良性benign
559B559B SFNSFN 良性benign 0.0780.078 良性benign
564B564B SFNSFN 良性benign 0.5010.501 良性benign
565B565B SFNSFN 良性benign 0.4710.471 良性benign
567B567B SFNSFN 良性benign 0.3460.346 良性benign
570B570B FLUSFLUS 良性benign 0.4910.491 良性benign
574B574B SFNSFN 良性benign 0.2200.220 良性benign
578B578B SFNSFN 良性benign 0.2710.271 良性benign
579B579B SFNSFN 良性benign 0.4650.465 良性benign
581B581B AUS/FLUSAUS/FLUS 良性benign 0.2180.218 良性benign
620B620B SFNSFN 良性benign 0.2020.202 良性benign
Bethesda分类说明:Bethesda Classification Description:
AUS:不能明确诊断(Atypia of Undetermined Significance);AUS: Atypia of Undetermined Significance;
FLUS:意义不明的滤泡病变(Follicular Lesion of Undetermined Significance);FLUS: Follicular Lesion of Undetermined Significance;
FN:滤泡型新生物(follicular neoplasms);FN: follicular neoplasms;
SFN:可疑滤泡性肿瘤(suspicious for follicular neoplasm);SFN: Suspicious for follicular neoplasm;
SM:可疑恶性肿瘤(suspicious for malignancy)。SM: Suspicious for malignancy.

Claims (22)

  1. 检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂在制备诊断个体甲状腺结节良恶性的检测试剂或诊断试剂盒中的应用,以及用于确定一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的装置在制备诊断个体甲状腺结节良恶性的诊断试剂盒中的应用,其中,所述一个或多个目标标志物选自:PRDM16基因或基因组的PRDM16序列、CAMK2N1基因或基因组的CAMK2N1序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、IER5基因或基因组的IER5序列、ITPKB基因或基因组的ITPKB序列、ITGB1BP1基因或基因组的ITGB1BP1序列、MTHFD2基因或基因组的MTHFD2序列、BIN1基因或基因组的BIN1序列、DNASE1L3基因或基因组的DNASE1L3序列、LSG1基因或基因组的LSG1序列、SH3BP2基因或基因组的SH3BP2序列、SLC12A7基因或基因组的SLC12A7序列、NR2F1基因或基因组的NR2F1序列、EGR1基因或基因组的EGR1序列、LARP1基因或基因组的LARP1序列、RARS基因或基因组的RARS序列、TTBK1基因或基因组的TTBK1序列、FAM20C基因或基因组的FAM20C序列、CREB5基因或基因组的CREB5序列、LIMK1基因或基因组的LIMK1序列、PRKAG2基因或基因组的PRKAG2序列、SLC39A14基因或基因组的SLC39A14序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、AGPAT2基因或基因组的AGPAT2序列、NRARP基因或基因组的NRARP序列、EGR2基因或基因组的EGR2序列、PPIF基因或基因组的PPIF序列、CHID1基因或基因组的CHID1序列、ADM基因或基因组的ADM序列、NAV2基因或基因组的NAV2序列、EHBP1L1基因或基因组的EHBP1L1序列、PHLDB1基因或基因组的PHLDB1序列、PARP11基因或基因组的PARP11序列、ANO6基因或基因组的ANO6序列、PLXNC1基因或基因组的PLXNC1序列、ZNF219基因或基因组的ZNF219序列、FOXA1基因或基因组的FOXA1序列、PAPLN基因或基因组的PAPLN序列、UACA基因或基因组的UACA序列、PGPEP1L基因或基因组的PGPEP1L序列、ITPRIPL2基因或基因组的ITPRIPL2序列、TNK1基因或基因组的TNK1序列、RPL19基因或基因组的RPL19序列、ICAM2基因或基因组的ICAM2序列、TMC6基因或基因组的TMC6序列、CEP295NL基因或基因组的CEP295NL序列、BAIAP2基因或基因组的BAIAP2序列、TBCD基因或基因组的TBCD序列、METRNL基因或基因组的METRNL序列、MED16基因或基因组的MED16序列、SBNO2基因或基因组的SBNO2序列、CIRBP基因或基因组的CIRBP序列、KLF16基因或基因组的KLF16序列、C19orf77基因或基因组的C19orf77序列、SNAPC2基因或基因组的SNAPC2序列、ICAM1基因或基因组的ICAM1序列、ICAM5基因或基因组的ICAM5序列、IER2基因或 基因组的IER2序列、ASF1B基因或基因组的ASF1B序列、CRTC1基因或基因组的CRTC1序列、ZNF536基因或基因组的ZNF536序列、LTBP4基因或基因组的LTBP4序列、NOL4L-DT基因或基因组的NOL4L-DT序列、KCNK15基因或基因组的KCNK15序列、UCKL1基因或基因组的UCKL1序列、RTN4R基因或基因组的RTN4R序列、BCR基因或基因组的BCR序列和TEF基因或基因组的TEF序列。Application of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers in the preparation of a detection reagent or a diagnostic kit for diagnosing benign and malignant thyroid nodules in individuals, and for determining a The application of the methylation state or level of at least one CpG dinucleotide of multiple target markers in the preparation of diagnostic kits for diagnosing benign and malignant thyroid nodules in individuals, wherein the one or multiple target markers The object is selected from: PRDM16 gene or genome PRDM16 sequence, CAMK2N1 gene or genome CAMK2N1 sequence, TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, IER5 gene or genome IER5 sequence, ITPKB gene or genome ITPKB sequence, ITGB1BP1 gene or genome ITGB1BP1 sequence, MTHFD2 gene or genome MTHFD2 sequence, BIN1 gene or genome BIN1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, LSG1 gene or genome LSG1 sequence, SH3BP2 gene or genome SH3BP2 sequence, SLC12A7 gene or genome SLC12A7 sequence, NR2F1 gene or genome NR2F1 sequence, EGR1 gene or genome EGR1 sequence, LARP1 gene or genome LARP1 sequence, RARS gene or genome RARS sequence, TTBK1 gene or genome TTBK1 sequence, FAM20C gene Or the FAM20C sequence of the genome, the CREB5 gene or the CREB5 sequence of the genome, the LIMK1 gene or the LIMK1 sequence of the genome, the PRKAG2 gene or the PRKAG2 sequence of the genome, the SLC39A14 gene or the SLC39A14 sequence of the genome, the EGR3 gene or the EGR3 sequence of the genome, the DUSP26 gene or the genome DUSP26 sequence, AGPAT2 gene or genomic AGPAT2 sequence, NRARP gene or genomic NRARP sequence, EGR2 gene or genomic EGR2 sequence, PPIF gene or genomic PPIF sequence, CHID1 gene or genomic CHID1 sequence, ADM gene or genomic ADM sequence, NAV2 gene or genome NAV2 sequence, EHBP1L1 gene or genome EHBP1L1 sequence, PHLDB1 gene or genome PHLDB1 sequence, PARP11 gene or genome PARP11 sequence, ANO6 gene or genome ANO6 sequence, PLXNC1 gene or genome PLXNC1 sequence, ZNF219 gene or genome ZNF219 sequence, FOXA1 gene or genome FOXA1 sequence, PAPLN gene or genome PAPLN sequence, UACA gene or genome UACA sequence, PGPEP1L gene or genome PGPEP1L sequence, ITPRIPL2 gene or genome ITPRIPL2 sequence, TNK1 gene or genomic TNK1 sequence, RPL19 gene or genomic RPL19 sequence, ICAM2 gene or genomic ICAM2 sequence, TMC6 gene or genomic TMC6 sequence, CEP295NL gene or genomic CEP295NL sequence, BAIAP2 gene or genomic BAIAP2 sequence, TBCD gene or genomic TBCD sequence, METRNL gene or genome METRNL sequence, MED16 gene or genome MED16 sequence, SBNO2 gene or genome SBNO2 sequence, CIRBP gene or genome CIRBP sequence, KLF16 gene or genome KLF16 sequence, C19orf77 gene or genome C19orf77 sequence, SNAPC2 gene or genome SNAPC2 sequence, ICAM1 gene or genome ICAM1 sequence, ICAM5 gene or genome ICAM5 sequence, IER2 gene or genome IER2 sequence, ASF1B gene or genome ASF1B sequence, CRTC1 gene or genome CRTC1 sequence, ZNF536 gene or genome ZNF536 sequence, LTBP4 gene or genome LTBP4 sequence, NOL4L-DT gene or genome NOL4L-DT sequence, KCNK15 gene or genome KCNK15 sequence, UCKL1 gene or genome UCKL1 sequence, RTN4R gene or genome RTN4R sequence, BCR gene or genomic BCR sequence and TEF gene or genomic TEF sequence.
  2. 如权利要求1所述的应用,其特征在于,所述一个或多个目标标志物选自:PRDM16基因或基因组的PRDM16序列、BIN1基因或基因组的BIN1序列、LIMK1基因或基因组的LIMK1序列、EGR3基因或基因组的EGR3序列、PPIF基因或基因组的PPIF序列、ZNF219基因或基因组的ZNF219序列、UACA基因或基因组的UACA序列、TNK1基因或基因组的TNK1序列、CEP295NL基因或基因组的CEP295NL序列、SBNO2基因或基因组的SBNO2序列、C19orf77基因或基因组的C19orf77序列、ICAM5基因或基因组的ICAM5序列、CRTC1基因或基因组的CRTC1序列、RTN4R基因或基因组的RTN4R序列、CAMK2N1基因或基因组的CAMK2N1序列、DNASE1L3基因或基因组的DNASE1L3序列、DUSP26基因或基因组的DUSP26序列、ICAM2基因或基因组的ICAM2序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、NOL4L-DT基因或基因组的NOL4L-DT序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、LSG1基因或基因组的LSG1序列和BCR基因或基因组的BCR序列。The application according to claim 1, wherein the one or more target markers are selected from: PRDM16 gene or genomic PRDM16 sequence, BIN1 gene or genomic BIN1 sequence, LIMK1 gene or genomic LIMK1 sequence, EGR3 EGR3 sequence of gene or genome, PPIF gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene or genome UACA sequence, TNK1 gene or genome TNK1 sequence, CEP295NL gene or genome CEP295NL sequence, SBNO2 gene or SBNO2 sequence of genome, C19orf77 gene or C19orf77 sequence of genome, ICAM5 gene or ICAM5 sequence of genome, CRTC1 gene or CRTC1 sequence of genome, RTN4R gene or RTN4R sequence of genome, CAMK2N1 gene or CAMK2N1 sequence of genome, DNASE1L3 gene or genome DNASE1L3 sequence, DUSP26 gene or genome DUSP26 sequence, ICAM2 gene or genome ICAM2 sequence, BAIAP2 gene or genome BAIAP2 sequence, MED16 gene or genome MED16 sequence, NOL4L-DT gene or genome NOL4L-DT sequence, TACSTD2 gene or Genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, LSG1 gene or genomic LSG1 sequence, and BCR gene or genomic BCR sequence.
  3. 如权利要求1所述的应用,其特征在于,所述一个或多个目标标志物至少包括下述目标标志物中的一个或多个:EGR3基因或基因组的EGR3序列、TNK1基因或基因组的TNK1序列、DNASE1L3基因或基因组的DNASE1L3序列、DUSP26基因或基因组的DUSP26序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、C19orf77基因或基因组的C19orf77序列、NOL4L-DT基因或基因组的NOL4L-DT序列、TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、BCR基因或基因组的BCR序列。The application according to claim 1, wherein the one or more target markers at least include one or more of the following target markers: EGR3 gene or genomic EGR3 sequence, TNK1 gene or genomic TNK1 Sequence, DNASE1L3 gene or genome's DNASE1L3 sequence, DUSP26 gene or genome's DUSP26 sequence, BAIAP2 gene or genome's BAIAP2 sequence, MED16 gene or genome's MED16 sequence, C19orf77 gene or genome's C19orf77 sequence, NOL4L-DT gene or genome's NOL4L - DT sequence, TACSTD2 gene or genomic TACSTD2 sequence, CRABP2 gene or genomic CRABP2 sequence, BCR gene or genomic BCR sequence.
  4. 如权利要求1所述的应用,其特征在于,所述一个或多个目标标志物包括:The application according to claim 1, wherein the one or more target markers comprise:
    PRDM16基因或基因组的PRDM16序列、BIN1基因或基因组的BIN1序列、LIMK1基因或基因组的LIMK1序列、EGR3基因或基因组的EGR3序列、PPIF基因或基因组的PPIF序列、ZNF219基因或基因组的ZNF219序列、UACA基因或基因组的UACA序列、TNK1基因或基因组的TNK1序列、CEP295NL基因或基因组的CEP295NL序列、SBNO2基因或基因组的SBNO2序列、C19orf77基因或基因组的C19orf77序列、ICAM5基因或基因组的ICAM5序列、CRTC1基因或基因组的CRTC1序列和RTN4R基因或基因组的RTN4R序列;或PRDM16 gene or genome PRDM16 sequence, BIN1 gene or genome BIN1 sequence, LIMK1 gene or genome LIMK1 sequence, EGR3 gene or genome EGR3 sequence, PPIF gene or genome PPIF sequence, ZNF219 gene or genome ZNF219 sequence, UACA gene Or the UACA sequence of the genome, the TNK1 gene or the TNK1 sequence of the genome, the CEP295NL gene or the CEP295NL sequence of the genome, the SBNO2 gene or the SBNO2 sequence of the genome, the C19orf77 gene or the C19orf77 sequence of the genome, the ICAM5 gene or the ICAM5 sequence of the genome, the CRTC1 gene or the genome CRTC1 sequence and RTN4R gene or genomic RTN4R sequence; or
    CAMK2N1基因或基因组的CAMK2N1序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、ICAM2基因或基因组的ICAM2序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、C19orf77基因或基因组的C19orf77序列和NOL4L-DT基因或基因组的NOL4L-DT序列;或CAMK2N1 gene or genome CAMK2N1 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, EGR3 gene or genome EGR3 sequence, DUSP26 gene or genome DUSP26 sequence, ICAM2 gene or genome ICAM2 sequence, BAIAP2 gene or genome BAIAP2 sequence, MED16 gene Or the MED16 sequence of the genome, the C19orf77 gene or the C19orf77 sequence of the genome and the NOL4L-DT gene or the NOL4L-DT sequence of the genome; or
    TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、DNASE1L3基因或基因组的DNASE1L3序列、LSG1基因或基因组的LSG1序列、EGR3基因或基因组的EGR3序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列;或TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, LSG1 gene or genome LSG1 sequence, EGR3 gene or genome EGR3 sequence, TNK1 gene or genome TNK1 sequence, BAIAP2 gene Or the BAIAP2 sequence of the genome, the NOL4L-DT gene or the NOL4L-DT sequence of the genome and the BCR gene or the BCR sequence of the genome; or
    TACSTD2基因或基因组的TACSTD2序列、CRABP2基因或基因组的CRABP2序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列;或TACSTD2 gene or genome TACSTD2 sequence, CRABP2 gene or genome CRABP2 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, EGR3 gene or genome EGR3 sequence, DUSP26 gene or genome DUSP26 sequence, TNK1 gene or genome TNK1 sequence, BAIAP2 gene Or the BAIAP2 sequence of the genome, the NOL4L-DT gene or the NOL4L-DT sequence of the genome and the BCR gene or the BCR sequence of the genome; or
    TACSTD2基因或基因组的TACSTD2序列、DNASE1L3基因或基因组的DNASE1L3序列、EGR3基因或基因组的EGR3序列、DUSP26基因或基因组的DUSP26序列、TNK1基因或基因组的TNK1序列、BAIAP2基因或基因组的BAIAP2序列、MED16基因或基因组的MED16序列、NOL4L-DT基因或基因组的NOL4L-DT序列和BCR基因或基因组的BCR序列。TACSTD2 gene or genome TACSTD2 sequence, DNASE1L3 gene or genome DNASE1L3 sequence, EGR3 gene or genome EGR3 sequence, DUSP26 gene or genome DUSP26 sequence, TNK1 gene or genome TNK1 sequence, BAIAP2 gene or genome BAIAP2 sequence, MED16 gene Or the MED16 sequence of the genome, the NOL4L-DT gene or the NOL4L-DT sequence of the genome and the BCR gene or the BCR sequence of the genome.
  5. 如权利要求1-4中任一项所述的应用,其特征在于,所述一个或多个目标标志物的Hg19坐标如下:The application according to any one of claims 1-4, wherein the Hg19 coordinates of the one or more target markers are as follows:
    PRDM16基因:chr1:3155061:3155760;CAMK2N1基因:chr1:20813203:20813902;TACSTD2基因:chr1:59041615:59042314;CRABP2基因:chr1:156676274:156676973;IER5基因:chr1:181074539:181075238;ITPKB基因:chr1:226924700:226925399;ITGB1BP1基因:chr2:9526804:9527503;MTHFD2基因:chr2:74453839:74454538;BIN1基因:chr2:127822196:127822895;DNASE1L3基因:chr3:58153211:58153910;LSG1基因:chr3:194408527:194409226;SH3BP2基因:chr4:2795032:2795731;SLC12A7基因:chr5:1117661:1118360;NR2F1基因:chr5:92914797:92915496;EGR1基因:chr5:137802399:137803098;LARP1基因:chr5:154133955:154134654;RARS基因:chr5:167837780:167838479;TTBK1基因:chr6:43215063:43215762;FAM20C基因:chr7:193512:194211;CREB5基因:chr7:28449041:28449740;LIMK1基因: chr7:73508743:73509442;PRKAG2基因:chr7:151424814:151425513;SLC39A14基因:chr8:22236914:22237613;EGR3基因:chr8:22547976:22549090;DUSP26基因:chr8:34104888:34105587;AGPAT2基因:chr9:139581855:139582554;NRARP基因:chr9:140205734:140206433;EGR2基因:chr10:64578269:64578968;PPIF基因:chr10:81001706:81002405;CHID1基因:chr11:911289:911988;ADM基因:chr11:10328946:10329645;NAV2基因:chr11:19734801:19736359;EHBP1L1基因:chr11:65343387:65344086;PHLDB1基因:chr11:118479144:118479843;PARP11基因:chr12:4139935:4140634;ANO6基因:chr12:45610331:45611030;PLXNC1基因:chr12:94544076:94544775;ZNF219基因:chr14:21559748:21560447;FOXA1基因:chr14:38064876:38065575;PAPLN基因:chr14:73704629:73705328;UACA基因:chr15:70766881:70767580;PGPEP1L基因:chr15:99466242:99466941;ITPRIPL2基因:chr16:19125694:19126393;TNK1基因:chr17:7286958:7287657;RPL19基因:chr17:37366033:37366732;ICAM2基因:chr17:62076008:62076707;TMC6基因:chr17:76113226:76124091;CEP295NL基因:chr17:76879761:76880460;BAIAP2基因:chr17:79060865:79061564;TBCD基因:chr17:80744791:80745490;METRNL基因:chr17:81083812:81084511;MED16基因:chr19:883793:884492;SBNO2基因:chr19:1177275:1177974;CIRBP基因:chr19:1265690:1266389;KLF16基因:chr19:1860343:1861042;C19orf77基因:chr19:3434666:3435687;SNAPC2基因:chr19:7985709:7986408;ICAM1基因:chr19:10381317:10382016;ICAM5基因:chr19:10404832:10405531;IER2基因:chr19:13266647:13267346;ASF1B基因:chr19:14248133:14248832;CRTC1基因:chr19:18770961:18771660;ZNF536基因:chr19:31039247:31039946;LTBP4基因:chr19:41105706:41106405;NOL4L-DT基因:chr20:31162101:31162800;KCNK15基因:chr20:43374048:43374747;UCKL1基因:chr20:62588113:62588812;RTN4R基因:chr22:20226373:20227274;BCR基因:chr22:23624092:23624791;TEF基因:chr22:41771229:41771928。PRDM16 gene: chr1:3155061:3155760; CAMK2N1 gene: chr1:20813203:20813902; TACSTD2 gene: chr1:59041615:59042314; CRABP2 gene: chr1:156676274:156676973; IER5 gene: chr1 1:181074539:181075238; ITPKB gene: chr1: 226924700:226925399; ITGB1BP1 gene: chr2:9526804:9527503; MTHFD2 gene: chr2:74453839:74454538; BIN1 gene: chr2:127822196:127822895; DNASE1L3 gene: chr3:58153 211:58153910; LSG1 gene: chr3:194408527:194409226; SH3BP2 Gene: chr4:2795032:2795731; SLC12A7 gene: chr5:1117661:1118360; NR2F1 gene: chr5:92914797:92915496; EGR1 gene: chr5:137802399:137803098; LARP1 gene: chr5:1541 33955:154134654; RARS gene: chr5:167837780 :167838479; TTBK1 gene: chr6:43215063:43215762; FAM20C gene: chr7:193512:194211; CREB5 gene: chr7:28449041:28449740; LIMK1 gene: chr7:73508743:73509442; PR KAG2 gene: chr7:151424814:151425513; SLC39A14 gene : chr8: 22236914: 22237613; EGR3 gene: chr8: 22547976: 22549090; DUSP26 gene: chr8: 34104888: 34105587; AGPAT2 gene: chr9: 139581855: 139582554; NRARP gene: chr9: 140 205734:140206433; EGR2 gene: chr10:64578269: 64578968; PPIF gene: chr10:81001706:81002405; CHID1 gene: chr11:911289:911988; ADM gene: chr11:10328946:10329645; NAV2 gene: chr11:19734801:19736359; EHBP1L1 Gene: chr11:65343387:65344086; PHLDB1 gene: chr11:118479144:118479843; PARP11 gene: chr12:4139935:4140634; ANO6 gene: chr12:45610331:45611030; PLXNC1 gene: chr12:94544076:94544775; ZNF219 gene: chr14: 21559748:21560447; FOXA1 gene: chr14:38064876:38065575 ; PAPLN gene: chr14:73704629:73705328; UACA gene: chr15:70766881:70767580; PGPEP1L gene: chr15:99466242:99466941; ITPRIPL2 gene: chr16:19125694:19126393; TNK1 Gene: chr17:7286958:7287657; RPL19 gene: chr17 :37366033:37366732; ICAM2 gene: chr17:62076008:62076707; TMC6 gene: chr17:76113226:76124091; CEP295NL gene: chr17:76879761:76880460; BAIAP2 gene: chr17:790 60865:79061564; TBCD gene: chr17:80744791:80745490; METRNL gene: chr17:81083812:81084511; MED16 gene: chr19:883793:884492; SBNO2 gene: chr19:1177275:1177974; CIRBP gene: chr19:1265690:1266389; KLF16 gene: chr19:18 60343:1861042; C19orf77 gene: chr19: 3434666:3435687; SNAPC2 gene: chr19:7985709:7986408; ICAM1 gene: chr19:10381317:10382016; ICAM5 gene: chr19:10404832:10405531; IER2 gene: chr19:13266647:132 67346; ASF1B gene: chr19:14248133:14248832; CRTC1 Gene: chr19:18770961:18771660; ZNF536 gene: chr19:31039247:31039946; LTBP4 gene: chr19:41105706:41106405; NOL4L-DT gene: chr20:31162101:31162800; KCNK15 gene : chr20:43374048:43374747; UCKL1 gene: chr20 :62588113:62588812; RTN4R gene: chr22:20226373:20227274; BCR gene: chr22:23624092:23624791; TEF gene: chr22:41771229:41771928.
  6. 如权利要求1-5中任一项所述的应用,其特征在于,所述一个或多个目标标志物的Hg19坐标如下:The application according to any one of claims 1-5, wherein the Hg19 coordinates of the one or more target markers are as follows:
    PRDM16基因:chr1:3155311:3155510;CAMK2N1基因:chr1:20813453:20813652;TACSTD2基因:chr1:59041865:59042064;CRABP2基因:chr1:156676524:156676723;IER5基因:chr1:181074789:181074988;ITPKB基因:chr1:226924950:226925149;ITGB1BP1基因:chr2:9527054:9527253;MTHFD2基因:chr2:74454089:74454288;BIN1基因:chr2:127822446:127822645;DNASE1L3基因:chr3:58153461:58153660;LSG1基因: chr3:194408777:194408976;SH3BP2基因:chr4:2795282:2795481;SLC12A7基因:chr5:1117911:1118110;NR2F1基因:chr5:92915047:92915246;EGR1基因:chr5:137802649:137802848;LARP1基因:chr5:154134205:154134404;RARS基因:chr5:167838030:167838229;TTBK1基因:chr6:43215313:43215512;FAM20C基因:chr7:193762:193961;CREB5基因:chr7:28449291:28449490;LIMK1基因:chr7:73508993:73509192;PRKAG2基因:chr7:151425064:151425263;SLC39A14基因:chr8:22237164:22237363;EGR3基因:chr8:22548226:22548425;EGR3基因:chr8:22548641:22548840;DUSP26基因:chr8:34105138:34105337;AGPAT2基因:chr9:139582105:139582304;NRARP基因:chr9:140205984:140206183;EGR2基因:chr10:64578519:64578718;PPIF基因:chr10:81001956:81002155;CHID1基因:chr11:911539:911738;ADM基因:chr11:10329196:10329395;NAV2基因:chr11:19735051:19735250;NAV2基因:chr11:19735910:19736109;EHBP1L1基因:chr11:65343637:65343836;PHLDB1基因:chr11:118479394:118479593;PARP11基因:chr12:4140185:4140384;ANO6基因:chr12:45610581:45610780;PLXNC1基因:chr12:94544326:94544525;ZNF219基因:chr14:21559998:21560197;FOXA1基因:chr14:38065126:38065325;PAPLN基因:chr14:73704879:73705078;UACA基因:chr15:70767131:70767330;PGPEP1L基因:chr15:99466492:99466691;ITPRIPL2基因:chr16:19125944:19126143;TNK1基因:chr17:7287208:7287407;RPL19基因:chr17:37366283:37366482;ICAM2基因:chr17:62076258:62076457;TMC6基因:chr17:76113476:76113675;TMC6基因:chr17:76123642:76123841;CEP295NL基因:chr17:76880011:76880210;BAIAP2基因:chr17:79061115:79061314;TBCD基因:chr17:80745041:80745240;METRNL基因:chr17:81084062:81084261;MED16基因:chr19:884043:884242;SBNO2基因:chr19:1177525:1177724;CIRBP基因:chr19:1265940:1266139;KLF16基因:chr19:1860593:1860792;C19orf77基因:chr19:3434916:3435115;C19orf77基因:chr19:3435238:3435437;SNAPC2基因:chr19:7985959:7986158;ICAM1基因:chr19:10381567:10381766;ICAM5基因:chr19:10405082:10405281;IER2基因:chr19:13266897:13267096;ASF1B基因:chr19:14248383:14248582;CRTC1基因:chr19:18771211:18771410;ZNF536基因:chr19:31039497:31039696;LTBP4基因:chr19:41105956:41106155;NOL4L-DT基因:chr20:31162351:31162550;KCNK15基因:chr20:43374298:43374497;UCKL1基因:chr20:62588363:62588562;RTN4R基因:chr22:20226623:20226822;RTN4R基因:chr22:20226825:20227024;BCR基因:chr22:23624342:23624541;TEF基因: chr22:41771479:41771678。PRDM16 gene: chr1:3155311:3155510; CAMK2N1 gene: chr1:20813453:20813652; TACSTD2 gene: chr1:59041865:59042064; CRABP2 gene: chr1:156676524:156676723; IER5 gene: chr 1:181074789:181074988; ITPKB gene: chr1: 226924950:226925149; ITGB1BP1 gene: chr2:9527054:9527253; MTHFD2 gene: chr2:74454089:74454288; BIN1 gene: chr2:127822446:127822645; DNASE1L3 gene: chr3:58153 461:58153660; LSG1 gene: chr3:194408777:194408976; SH3BP2 Gene: chr4:2795282:2795481; SLC12A7 gene: chr5:1117911:1118110; NR2F1 gene: chr5:92915047:92915246; EGR1 gene: chr5:137802649:137802848; LARP1 gene: chr5:1541 34205:154134404; RARS gene: chr5:167838030 :167838229; TTBK1 gene: chr6:43215313:43215512; FAM20C gene: chr7:193762:193961; CREB5 gene: chr7:28449291:28449490; LIMK1 gene: chr7:73508993:73509192; PRK AG2 gene: chr7:151425064:151425263; SLC39A14 gene : chr8: 22237164: 22237363; EGR3 gene: chr8: 22548226: 22548425; EGR3 gene: chr8: 22548641: 22548840; DUSP26 gene: chr8: 34105138: 34105337; AGPAT2 gene: chr9: 13958 2105:139582304; NRARP gene: chr9:140205984: 140206183; EGR2 gene: chr10:64578519:64578718; PPIF gene: chr10:81001956:81002155; CHID1 gene: chr11:911539:911738; ADM gene: chr11:10329196:10329395; NAV2 gene: chr11:19735051:19735250; NAV2 gene: chr11:19735910:19736109; EHBP1L1 gene: chr11:65343637:65343836; PHLDB1 gene: chr11:118479394:118479593; PARP11 gene: chr12:4140185:4140384; ANO6 gene: chr12 :45610581:45610780; PLXNC1 gene: chr12:94544326:94544525 ; ZNF219 gene: chr14:21559998:21560197; FOXA1 gene: chr14:38065126:38065325; PAPLN gene: chr14:73704879:73705078; UACA gene: chr15:70767131:70767330; PGPEP1L Gene: chr15:99466492:99466691; ITPRIPL2 gene: chr16 :19125944:19126143; TNK1 gene: chr17:7287208:7287407; RPL19 gene: chr17:37366283:37366482; ICAM2 gene: chr17:62076258:62076457; TMC6 gene: chr17:76113476 :76113675; TMC6 gene: chr17:76123642:76123841; CEP295NL gene: chr17:76880011:76880210; BAIAP2 gene: chr17:79061115:79061314; TBCD gene: chr17:80745041:80745240; METRNL gene: chr17:81084062:81084261; MED16 gene : chr19:884043:884242; SBNO2 gene: chr19: 1177525:1177724; CIRBP gene: chr19:1265940:1266139; KLF16 gene: chr19:1860593:1860792; C19orf77 gene: chr19:3434916:3435115; C19orf77 gene: chr19:3435238 :3435437; SNAPC2 gene: chr19:7985959:7986158; ICAM1 Gene: chr19:10381567:10381766; ICAM5 gene: chr19:10405082:10405281; IER2 gene: chr19:13266897:13267096; ASF1B gene: chr19:14248383:14248582; CRTC1 gene: chr19:1 8771211:18771410; ZNF536 gene: chr19:31039497 :31039696; LTBP4 gene: chr19:41105956:41106155; NOL4L-DT gene: chr20:31162351:31162550; KCNK15 gene: chr20:43374298:43374497; UCKL1 gene: chr20:62588363:6 2588562; RTN4R gene: chr22:20226623:20226822; RTN4R gene: chr22:20226825:20227024; BCR gene: chr22:23624342:23624541; TEF gene: chr22:41771479:41771678.
  7. 如权利要求1-6中任一项所述的应用,其特征在于,所述试剂包括引物和/或探针分子;The application according to any one of claims 1-6, wherein the reagents include primers and/or probe molecules;
    优选地,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸,所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交。Preferably, the primer molecule is identical to, complementary to, or hybridizes to the one or more target markers under stringent conditions and comprises at least 9 consecutive nucleotides, and the probe molecule is compatible with the one or more target markers. Amplified products of the target markers were hybridized under stringent conditions.
  8. 如权利要求1-6中任一项所述的应用,其特征在于,所述试剂为实施基因组简化甲基化测序技术所需的试剂。The application according to any one of claims 1-6, wherein the reagents are reagents required for the implementation of genome simplified methylation sequencing technology.
  9. 一种用于检测一个或多个目标标志物至少一个CpG二核苷酸的甲基化状态或甲基化水平以诊断甲状腺结节良恶性的诊断试剂或诊断试剂盒,其包含用于检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂;其中,所述一个或多个目标标志物如权利要求1-6中任一项所述。A diagnostic reagent or diagnostic kit for detecting the methylation status or methylation level of at least one CpG dinucleotide of one or more target markers for diagnosing benign and malignant thyroid nodules, comprising a or a reagent for the methylation status or level of at least one CpG dinucleotide of a plurality of target markers; wherein, the one or more target markers are as described in any one of claims 1-6.
  10. 如权利要求9所述的诊断试剂或诊断试剂盒,其特征在于,所述诊断试剂或诊断试剂盒包括引物和/或探针分子;优选地,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸,所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交;The diagnostic reagent or diagnostic kit according to claim 9, wherein the diagnostic reagent or diagnostic kit comprises primers and/or probe molecules; preferably, the primer molecules are identical to, complementary to or in the Hybridizing to the one or more target markers under conditions and comprising at least 9 consecutive nucleotides, the probe molecule hybridizes to the amplification product of the one or more target markers under stringent conditions;
    任选地,所述诊断试剂或诊断试剂盒还包括检测内参基因ACTB的引物分子和/或探针分子。Optionally, the diagnostic reagent or diagnostic kit further includes primer molecules and/or probe molecules for detecting the internal reference gene ACTB.
  11. 如权利要求9所述的诊断试剂或诊断试剂盒,其特征在于,所述诊断试剂或诊断试剂盒还包括选自以下的一种或多种物质:PCR缓冲液、聚合酶、dNTP、限制性内切酶、酶切缓冲液、荧光染料、荧光淬灭剂、荧光报告剂、外切核酸酶、碱性磷酸酶、内标、对照物、KCl、MgCl 2和(NH 4) 2SO 4The diagnostic reagent or diagnostic kit according to claim 9, wherein the diagnostic reagent or diagnostic kit also includes one or more substances selected from the following: PCR buffer, polymerase, dNTP, restriction Endonuclease, digestion buffer, fluorescent dye, fluorescent quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard, control, KCl, MgCl 2 and (NH 4 ) 2 SO 4 .
  12. 如权利要求9所述的诊断试剂或诊断试剂盒,其特征在于,所述试剂还包括下述一个或多个方法中所用的试剂:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析和质谱。The diagnostic reagent or diagnostic kit according to claim 9, wherein the reagent further comprises reagents used in one or more of the following methods: PCR based on bisulfite conversion, DNA sequencing, methylation Sensitive restriction enzyme assays, fluorometric assays, methylation-sensitive high-resolution melting curves, chip-based methylation profiling, and mass spectrometry.
  13. 如权利要求12所述的诊断试剂或诊断试剂盒,其特征在于,所述试剂选自以下一种或多种:重亚硫酸盐及其衍生物、荧光染料、荧光淬灭剂、荧光报告剂、内标和对照物。The diagnostic reagent or diagnostic kit according to claim 12, wherein the reagent is selected from one or more of the following: bisulfite and its derivatives, fluorescent dyes, fluorescent quenchers, fluorescent reporters , internal standard and control.
  14. 区分基因组DNA至少一个靶区域内甲基化和未甲基化CpG二核苷酸的至少一种试剂或成组试剂在制备用于检测和/或分类个体中甲状腺结节良恶性的方法的试剂盒中的用途,其中所述方法包括使从所述个体生物样品中分离的基因组DNA与所述至少一种试剂或成组试剂接触,其中所述靶区域等同于或互补于一个或多个目标标志物的至少16连续核苷 酸的序列,其中所述连续核苷酸包含至少一个CpG二核苷酸序列,由此至少部分地提供对甲状腺结节良恶性的检测和/或分类,其中,所述一个或多个目标标志物如权利要求1-6中任一项所述。At least one reagent or set of reagents for distinguishing between methylated and unmethylated CpG dinucleotides in at least one target region of genomic DNA Reagents for use in a method of detecting and/or classifying thyroid nodules from benign to malignant in an individual Use in a kit, wherein said method comprises contacting genomic DNA isolated from said individual biological sample with said at least one reagent or set of reagents, wherein said target region is identical to or complementary to one or more target A sequence of at least 16 contiguous nucleotides of the marker, wherein the contiguous nucleotides comprise at least one CpG dinucleotide sequence, thereby providing at least in part the detection and/or classification of benign and malignant thyroid nodules, wherein, The one or more target markers are as described in any one of claims 1-6.
  15. 将5位未甲基化的胞嘧啶碱基转化为尿嘧啶或在杂交性能方面可检测地不同于胞嘧啶的其它碱基的一种或多种试剂、扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中甲状腺结节良恶性的方法的试剂盒中的用途,其中所述方法包括:One or more reagents, amplification enzymes, and at least one agent comprising at least 9 Use of primers of consecutive nucleotides in the preparation of a test kit for the method of detecting and/or classifying benign and malignant thyroid nodules in an individual, wherein the method comprises:
    a)从所述个体生物样品分离基因组DNA;a) isolating genomic DNA from said individual biological sample;
    b)用所述一种或多种试剂处理a)的所述基因组DNA或其片段;b) treating said genomic DNA or fragments thereof of a) with said one or more reagents;
    c)使所述经处理的基因组DNA或其经处理的片段与所述扩增酶和所述至少一种引物接触,所述引物相同于、互补于或在严谨条件下杂交于一个或多个目标标志物,其中所述经处理的基因组DNA或其片段被扩增以产生至少一种扩增产物或不被扩增;以及c) contacting the treated genomic DNA or a treated fragment thereof with the amplification enzyme and the at least one primer that is identical to, complementary to, or hybridizes under stringent conditions to one or more A marker of interest, wherein the processed genomic DNA or fragment thereof is amplified to produce at least one amplification product or is not amplified; and
    d)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,或者反映所述一个或多个目标标志物的多个CpG二核苷酸平均甲基化状态或水平的均值或值,由此至少部分地检测和/或分类个体中甲状腺结节的良恶性;d) determining the methylation status or level of at least one CpG dinucleotide of the one or more markers of interest based on the presence or nature of the amplicon, or reflecting the one or more markers of interest A mean or value of a plurality of CpG dinucleotide average methylation states or levels of a substance, thereby at least in part detecting and/or classifying benign or malignant thyroid nodules in an individual;
    其中,所述一个或多个目标标志物如权利要求1-6中任一项所述。Wherein, the one or more target markers are as described in any one of claims 1-6.
  16. 如权利要求15所述的用途,其中步骤b)中,使用选自亚硫酸氢盐、酸式亚硫酸盐、焦亚硫酸盐及其组合的试剂处理所述基因组DNA或其片段。The use according to claim 15, wherein in step b), the genomic DNA or its fragments are treated with a reagent selected from the group consisting of bisulfite, acid sulfite, pyrosulfite and combinations thereof.
  17. 如权利要求16所述的用途,其中c)中,通过使用耐热DNA聚合酶作为所述扩增酶、使用缺乏5’-3’外切酶活性的聚合酶、使用聚合酶链式反应和/或产生带有可检测标记的扩增产物进行核酸分子的接触或扩增。purposes as claimed in claim 16, wherein in c), by using thermostable DNA polymerase as described amplification enzyme, using the polymerase that lacks 5'-3' exonuclease activity, using polymerase chain reaction and and/or generate an amplification product with a detectable label for contacting or amplifying nucleic acid molecules.
  18. 如权利要求15所述的用途,其中c)中的接触或扩增包括使用甲基化特异的引物。The use according to claim 15, wherein the contacting or amplifying in c) comprises using methylation specific primers.
  19. 一种或多种甲基化敏感限制酶和扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中甲状腺结节良恶性的方法的试剂盒中的用途,其中,所述引物相同于、互补于或在严谨条件下杂交于一个或多个目标标志物;所述方法包括:One or more methylation-sensitive restriction enzymes and amplification enzymes and at least one primer comprising at least 9 consecutive nucleotides in the preparation of a kit for the method of detecting and/or classifying benign and malignant thyroid nodules in individuals The purposes in, wherein, said primer is identical with, complementary to or under stringent condition hybridizes to one or more target markers; Said method comprises:
    a)从所述个体生物样品分离基因组DNA;a) isolating genomic DNA from said individual biological sample;
    b)以所述一种或多种甲基化敏感限制酶消化a)所述的基因组DNA或其片段,使所得消化产物与所述扩增酶和所述至少一种引物接触;和b) digesting the genomic DNA or a fragment thereof of a) with the one or more methylation-sensitive restriction enzymes, contacting the resulting digestion product with the amplification enzyme and the at least one primer; and
    c)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,由此至少部分地检测和/或分类个体中甲状腺结节的良恶性;c) determining the methylation status or level of at least one CpG dinucleotide of said one or more markers of interest based on the presence or absence or nature of said amplicon, thereby at least in part detecting and/or classifying Benign and malignant thyroid nodules in an individual;
    其中,所述一个或多个目标标志物如权利要求1-6中任一项所述。Wherein, the one or more target markers are as described in any one of claims 1-6.
  20. 如权利要求19所述的用途,其特征在于,通过杂交至少一种核酸或肽核酸来确定扩增产物的存在与否,所述至少一种核酸或肽核酸等同于或互补于选自所述一个或多个目标标志物的序列的至少16碱基长片段。The use according to claim 19, characterized in that the presence or absence of the amplified product is determined by hybridizing at least one nucleic acid or peptide nucleic acid which is identical to or complementary to the group selected from the group consisting of A fragment of at least 16 bases in length of the sequence of one or more markers of interest.
  21. 衍生自一个或多个目标标志物的经处理的核酸在制备用于诊断甲状腺结节良恶性的试剂盒中的用途,其中所述处理适合于将所述一个或多个目标标志物的至少一个未甲基化的胞嘧啶碱基转化至尿嘧啶或在杂交上可检测地不同于胞嘧啶的其它碱基,所述一个或多个目标标志物如权利要求1-6中任一项所述。Use of a processed nucleic acid derived from one or more target markers in the preparation of a kit for diagnosing benign and malignant thyroid nodules, wherein the treatment is suitable for converting at least one of the one or more target markers unmethylated cytosine bases converted to uracil or other bases that hybridize detectably different from cytosine, the one or more target markers as described in any one of claims 1-6 .
  22. 用于检测并诊断个体甲状腺结节良恶性的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤:(1)获取样品中一个或多个目标标志物至少一个CpG二核苷酸的甲基化水平或甲基化状态,和(2)根据(1)的甲基化水平或甲基化状态判读甲状腺结节良恶性;A device for detecting and diagnosing benign and malignant individual thyroid nodules, the device includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the following steps when executing the program : (1) obtaining the methylation level or methylation status of at least one CpG dinucleotide of one or more target markers in the sample, and (2) the methylation level or methylation status according to (1) Interpretation of benign and malignant thyroid nodules;
    其中,所述一个或多个目标标志物如权利要求1-6中任一项所述。Wherein, the one or more target markers are as described in any one of claims 1-6.
PCT/CN2022/137459 2021-12-09 2022-12-08 Methylation marker in diagnosis of benign and malignant nodules of thyroid cancer and applications thereof WO2023104136A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111496935.X 2021-12-09
CN202111496935.XA CN116287222A (en) 2021-12-09 2021-12-09 Methylation marker for diagnosis of benign and malignant thyroid cancer nodules and application thereof

Publications (1)

Publication Number Publication Date
WO2023104136A1 true WO2023104136A1 (en) 2023-06-15

Family

ID=86729670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137459 WO2023104136A1 (en) 2021-12-09 2022-12-08 Methylation marker in diagnosis of benign and malignant nodules of thyroid cancer and applications thereof

Country Status (2)

Country Link
CN (1) CN116287222A (en)
WO (1) WO2023104136A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090075265A1 (en) * 2007-02-02 2009-03-19 Orion Genomics Llc Gene methylation in thyroid cancer diagnosis
US20180051343A1 (en) * 2014-08-08 2018-02-22 Ait Austrian Institute Of Technology Gmbh Thyroid cancer diagnosis by dna methylation analysis
CN111197087A (en) * 2020-01-14 2020-05-26 中山大学附属第一医院 Thyroid cancer differential marker
WO2021143709A1 (en) * 2020-01-14 2021-07-22 上海鹍远生物技术有限公司 Reagent for detecting dna methylation and use thereof
CN113186278A (en) * 2021-07-01 2021-07-30 上海鹍远生物技术有限公司 Thyroid nodule benign and malignant related marker and application thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090075265A1 (en) * 2007-02-02 2009-03-19 Orion Genomics Llc Gene methylation in thyroid cancer diagnosis
US20180051343A1 (en) * 2014-08-08 2018-02-22 Ait Austrian Institute Of Technology Gmbh Thyroid cancer diagnosis by dna methylation analysis
CN111197087A (en) * 2020-01-14 2020-05-26 中山大学附属第一医院 Thyroid cancer differential marker
WO2021143709A1 (en) * 2020-01-14 2021-07-22 上海鹍远生物技术有限公司 Reagent for detecting dna methylation and use thereof
CN113186278A (en) * 2021-07-01 2021-07-30 上海鹍远生物技术有限公司 Thyroid nodule benign and malignant related marker and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BARROS-FILHO MATEUS CAMARGO, DOS REIS MARIANA BISARRO, BELTRAMI CAROLINE MORAES, DE MELLO JULIA BETTE HOMEM, MARCHI FÁBIO ALBUQUER: "DNA Methylation-Based Method to Differentiate Malignant from Benign Thyroid Lesions", THYROID., MARY ANN LIEBERT, NEW YORK, NY., US, vol. 29, no. 9, 1 September 2019 (2019-09-01), US , pages 1244 - 1254, XP093070690, ISSN: 1050-7256, DOI: 10.1089/thy.2018.0458 *
STEPHEN J. K., CHEN K. M., MERRITT J., CHITALE D., DIVINE G., WORSHAM M. J.: "Methylation markers differentiate thyroid cancer from benign nodules", JOURNAL OF ENDOCRINOLOGICAL INVESTIGATION, vol. 41, no. 2, 1 February 2018 (2018-02-01), pages 163 - 170, XP093070693, DOI: 10.1007/s40618-017-0702-2 *

Also Published As

Publication number Publication date
CN116287222A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
DK2479289T3 (en) Method for methylation analysis
EP2971158B1 (en) Detection of bisulfite converted nucleotide sequences
JP2009514555A (en) Materials and Methods for Gene-Related CpG Island Methylation Assays for Cancer Evaluation
CN113308544B (en) Reagent for DNA methylation detection and esophageal cancer detection kit
WO2018069450A1 (en) Methylation biomarkers for lung cancer
CN113186278B (en) Thyroid nodule benign and malignant related marker and application thereof
EP2304057A2 (en) Method for the detection of ovarian cancer
US20230193395A1 (en) Methods and kits for screening colorectal neoplasm
CN111197087B (en) Thyroid cancer differential marker
CN111100866B (en) Gene segment for identifying benign and malignant thyroid nodules and application thereof
CN113493835A (en) Method and kit for screening large intestine tumor by detecting methylation state of BCAN gene region
CN116219020B (en) Methylation reference gene and application thereof
WO2022170984A1 (en) Screening, risk assessment, and prognosis method and kit for advanced colorectal adenomas
WO2023104136A1 (en) Methylation marker in diagnosis of benign and malignant nodules of thyroid cancer and applications thereof
EP1918711A2 (en) Prostate cancer field effect analysis methods and kits
WO2021143709A1 (en) Reagent for detecting dna methylation and use thereof
WO2023274350A1 (en) Benign and malignant thyroid nodule related marker and use thereof
WO2024056008A1 (en) Methylation marker for identifying cancer and use thereof
CN117778568A (en) Marker for identifying gastric cancer and application thereof
TW202417642A (en) Methylation markers for identifying cancer and the applications
CN113493834A (en) Method and kit for screening large intestine tumor by detecting methylation state of PKNOX2 gene region
CN114277115A (en) Thyroid nodule benign and malignant related marker and application thereof
CN117721203A (en) Composition for detecting thyroid cancer and application thereof
KR20230105973A (en) COMPOSITION FOR DIAGNOSING PROSTATE ADENOCARCINOMA USING CpG METHYLATION STATUS OF SPECIFIC GENE AND USES THEREOF
CN116064820A (en) Biomarker for detecting early liver cancer, kit and use method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22903555

Country of ref document: EP

Kind code of ref document: A1