CN117587121A - Use of markers for diagnosing breast cancer or predicting breast cancer risk - Google Patents

Use of markers for diagnosing breast cancer or predicting breast cancer risk Download PDF

Info

Publication number
CN117587121A
CN117587121A CN202210931191.8A CN202210931191A CN117587121A CN 117587121 A CN117587121 A CN 117587121A CN 202210931191 A CN202210931191 A CN 202210931191A CN 117587121 A CN117587121 A CN 117587121A
Authority
CN
China
Prior art keywords
marker
methylation
breast cancer
target
target region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210931191.8A
Other languages
Chinese (zh)
Inventor
孙津
刘轶颖
马建华
马成城
何其晔
苏志熙
刘蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Huayuan Biotechnology Co ltd
Original Assignee
Jiangsu Huayuan Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Huayuan Biotechnology Co ltd filed Critical Jiangsu Huayuan Biotechnology Co ltd
Priority to CN202210931191.8A priority Critical patent/CN117587121A/en
Priority to PCT/CN2023/111009 priority patent/WO2024027796A1/en
Publication of CN117587121A publication Critical patent/CN117587121A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to the use of markers for diagnosing breast cancer or predicting breast cancer risk. The present invention discloses the use of reagents in the preparation of kits or microarrays for diagnosing breast cancer or predicting breast cancer risk in an individual. Also disclosed is a kit or microarray for diagnosing breast cancer or predicting breast cancer risk in an individual.

Description

Use of markers for diagnosing breast cancer or predicting breast cancer risk
Technical Field
The present application relates to the field of molecular biomedical technology. In particular, the present application relates to the use of markers in diagnosing breast cancer or predicting breast cancer risk.
Background
Breast Cancer is the second most advanced Cancer type with the highest morbidity and mortality among women, and in recent years, has increased in incidence year by year due to factors such as lifestyle, dietary structure, etc., severely threatening the health of women (Chen W, zheng R, baade PD, et al Cancer statistics in China, 2015. CA Cancer J clin. 2016; 66:115-32.). Of these, most breast cancer patients are induced by environmental factors, and genetic mutation-induced cancer patients account for only 5% -10%, which also suggests that gene mutation screening or family genetic history investigation is very limited to early screening for breast cancer, requiring more general and more versatile screening methods for breast cancer (Feng Y, spezia M, huang S, et al Breast cancer development and progression: risk factors, cancer stem cells, signaling pathways, genetics, and molecular genesis. Genes Dis. 2018; 5:77-106.). Early screening of breast Cancer has a significant effect on increasing patient survival, with early breast Cancer patients having a 5-year survival rate of 98% and late 5-year survival rates of 23% down, but unfortunately many young females (30-45 years old) have been in mid-to-late stages when they have confirmed diagnosis of breast Cancer, with poor prognosis (DeSantis CE, ma J, gaudet MM, et al Breast Cancer statistics, 2019. CA Cancer J Clin.2019; 69:438-51.). The widespread use of early Cancer screening techniques significantly reduced Cancer mortality, and one study in the United states in 2016 suggested that women with greater than average risk of developing breast Cancer were routinely screened for breast Cancer annually, and within 25 years, 39% of breast Cancer mortality was reduced (Byers T, wender RC, jemal A, et al The American Cancer Society challenge goal to reduce US Cancer mortality by% betwen 1990 and 2015: results and reflections. CA Cancer J Clin. 2016; 66:359-69.).
The currently clinically recommended breast cancer screening strategies are breast ultrasound or breast X-ray: the breast ultrasound sensitivity is poor, and the diagnosis is easy to be missed; mammary gland X-ray can detect mammary gland tumor, and can find malignant tumor indication such as breast microcalcification, etc., and is commonly used for high-risk female early screening (Oeffinger KC, fontham ET, etzioni R, ET al Breast Cancer Screening for Women at Average Risk:2015: 2015 Guideline Update From the American Cancer Society. JAMA.2015; 314:1599-614.). The accuracy of mammary X-ray detection is closely related to mammary gland structure, and its sensitivity in compact mammary gland screening is only 47.8% -64.4% (Kolb. TM., lichy J, newhouse JH. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology 2002; 225: 165-75.). The asian female mammary gland structure is generally compact, the mammary gland X-ray diagnosis difficulty is remarkably increased, and the diagnosis accuracy is low. The mammary gland X-ray screening also has radiation hazard, and is not suitable for periodic detection of young women and pregnant women. Other safe, convenient and high-accuracy detection means are urgently needed in clinic and are used for screening early breast cancer.
Molecular diagnostic techniques based on liquid biopsies are increasingly being applied in cancer detection. Liquid biopsy is a non-invasive detection technique based on blood or other body fluids, and has the advantages of safe and noninvasive sampling, high sampling repeatability, convenient and quick sampling, and the like. It was found that body fluids contain a large amount of free DNA (cfDNA) released by cells, and that the above characteristics such as DNA methylation, gene mutation, fragmentation, etc. can be used as detection indicators for early cancer diagnosis (Lo YMD, han DSC, jiang P, et al, epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies, science, 2021; 372.). Among them, DNA methylation has obvious advantages, has been successfully applied in various cancer diagnoses, 92% specificity can be achieved by constructing a machine learning model using 6 methylation marker methylation levels in colorectal cancer plasma, 86% sensitivity (Cai G, cai M, feng Z, et al A Multilocus Blood-Based Assay Targeting Circulating Tumor DNA Methylation Enables Early Detection and Early Relapse Prediction of Colorectal cancer, gastroenterology 2021; 161: 2053-56 e 2.); the machine learning model can reach 90.5% specificity and 83.3% sensitivity (Xu RH, wei W, krawczyk M, et al Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carboma Nat Mater 2017; 16:1155-61.) in liver cancer plasma by using 10 methylation markers methylation levels, which is far higher than other screening methods. Based on the unique advantages of liquid biopsy, research on breast cancer liquid biopsy methylation markers has important clinical significance for realizing noninvasive early diagnosis of breast cancer.
However, for early breast cancer screening based on liquid biopsies, there is no particularly efficient detection marker. Thus, there is a need to develop a method and/or kit that can efficiently read epigenetic information related to breast cancer from a very limited amount of extracellular free DNA in a biological sample, and that can be easily configured and reliably applied in a hospital clinical laboratory.
Disclosure of Invention
In order to solve the defects in the prior art, the inventor discovers that the marker can diagnose breast cancer or predict breast cancer risk with high sensitivity, specificity and low cost by screening a large number of markers. Based on the marker disclosed by the invention, breast cancer patients and healthy people can be effectively distinguished.
More specifically, the invention provides 22 cfDNA methylation markers, and establishes a diagnosis model of the relation between the methylation level of the methylation markers and breast cancer, and the model has the advantages of noninvasive detection, safe and convenient detection, high flux and high detection accuracy.
In one aspect, the invention relates to the use of a reagent for the preparation of a kit or microarray for diagnosing breast cancer or predicting breast cancer risk in an individual, characterized in that the reagent is used for detecting the methylation level of at least one target region of at least one marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3, WISP2, and any combination thereof, wherein a methylation level of at least one target region of one or more markers equal to or above a threshold value compared to the corresponding threshold value indicates that the individual has or is at risk of breast cancer, and wherein the target region comprises at least one CpG dinucleotide sequence. In some embodiments, the methylation is CpG methylation. In some embodiments, the agent is an agent selected from the group consisting of:
i) A substance, such as an oligonucleotide primer or probe, that hybridizes to or amplifies at least one target region of the marker; and
ii) a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one region of interest of the marker.
In some embodiments, the oligonucleotide primer or probe is complementary to or identical to a fragment of at least 9 bases long of at least one target region of the marker.
In some embodiments, the marker is SKI. In some embodiments, the marker is PRDM16. In some embodiments, the marker is PIAS3. In some embodiments, the marker is SLC10A4. In some embodiments, the marker is CXXC5. In some embodiments, the marker is NR2E1. In some embodiments, the marker is MPC1. In some embodiments, the marker is HOXA13. In some embodiments, the marker is LZTS1. In some embodiments, the marker is CHD7. In some embodiments, the marker is ANKRD20A1. In some embodiments, the marker is CACNA1B. In some embodiments, the marker is actrl 1. In some embodiments, the marker is CCNA1. In some embodiments, the marker is RNASEH2B. In some embodiments, the marker is SNX20. In some embodiments, the marker is TBCD. In some embodiments, the marker is PIP5K1C. In some embodiments, the marker is ZBTB7A. In some embodiments, the marker is DNASE2. In some embodiments, the marker is TSHZ3. In some embodiments, the marker is WISP2.
In some embodiments, the marker is a marker combination selected from the group consisting of: i) SKI, PRDM16, LZTS1, CCNA1, PIP5K1C, and WISP2; or ii) PIAS3, CHD7, CACNA1B, ACVRL1, SNX20, TBCD and ZBTB7A.
In some embodiments, the sample is selected from the group consisting of a cell line, a histological section, a tissue biopsy, paraffin-embedded tissue, body fluid, and combinations thereof; preferably, the sample is selected from the group consisting of plasma, serum, whole blood, isolated blood cells, and combinations thereof; more preferably, the sample is plasma cfDNA or ctDNA.
In some embodiments, the target region is selected from the group consisting of: regions chr1:2166118-2166318, chr1:2978722-2978922, chr1:145562922-145563122, chr4:48485417-48485821, chr5:139076623-139076941, chr6:108488634-108488917, chr6:166970625-166970825, chr7:27260117-27260462, chr8:20375580-20375780, chr8:61788861-61789200, chr9:68413067-68413267, chr9:140683687-140683969, chr12:52311647-52311991, chr13:37005935-37006328, chr13:51417486-51417774, chr16:50715367-50715567, chr17:80745056-80745446, chr19:3688030-3688230, chr19:4059528-4059746, chr19:12978686-12978886, chr19:31842771-31842971, chr20:43331809-43332099, or their complements or sequences (e.g., the corresponding sequence after bisulfite conversion or the corresponding sequence after MSRE treatment); or a processed sequence of the complementary sequence (e.g., a bisulfite converted corresponding sequence or an MSRE processed corresponding sequence); or any combination of the foregoing sequences and/or regions.
In another aspect, the invention relates to a kit or microarray for diagnosing breast cancer or predicting breast cancer risk in an individual, characterized in that the kit or microarray comprises reagents for detecting the methylation level of at least one target region of at least one marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3, WISP2, and any combination thereof, wherein a methylation level of a target region of one or more markers equal to or higher than a threshold value compared to the corresponding threshold value indicates that the individual has or is at risk of breast cancer, and wherein the target region comprises at least one CpG dinucleotide sequence.
In some embodiments, the methylation is CpG methylation.
In some embodiments, the sample is selected from the group consisting of a cell line, a histological section, a tissue biopsy, paraffin-embedded tissue, body fluid, and combinations thereof; preferably, the sample is selected from the group consisting of plasma, serum, whole blood, isolated blood cells, and combinations thereof; more preferably, the sample is plasma cfDNA or ctDNA.
In some embodiments, the agent is an agent selected from the group consisting of:
i) A substance, such as an oligonucleotide primer or probe, that hybridizes to or amplifies at least one target region of the marker, preferably complementary to or identical to a fragment of at least 9 bases long of at least one target region of the marker; and
ii) a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one region of interest of the marker.
In some embodiments, the marker is SKI. In some embodiments, the marker is PRDM16. In some embodiments, the marker is PIAS3. In some embodiments, the marker is SLC10A4. In some embodiments, the marker is CXXC5. In some embodiments, the marker is NR2E1. In some embodiments, the marker is MPC1. In some embodiments, the marker is HOXA13. In some embodiments, the marker is LZTS1. In some embodiments, the marker is CHD7. In some embodiments, the marker is ANKRD20A1. In some embodiments, the marker is CACNA1B. In some embodiments, the marker is actrl 1. In some embodiments, the marker is CCNA1. In some embodiments, the marker is RNASEH2B. In some embodiments, the marker is SNX20. In some embodiments, the marker is TBCD. In some embodiments, the marker is PIP5K1C. In some embodiments, the marker is ZBTB7A. In some embodiments, the marker is DNASE2. In some embodiments, the marker is TSHZ3. In some embodiments, the marker is WISP2.
In some embodiments, the marker is a marker combination selected from the group consisting of: i) SKI, PRDM16, LZTS1, CCNA1, PIP5K1C, and WISP2; or ii) PIAS3, CHD7, CACNA1B, ACVRL1, SNX20, TBCD and ZBTB7A.
In some embodiments, the target region is selected from the group consisting of: regions chr1:2166118-2166318, chr1:2978722-2978922, chr1:145562922-145563122, chr4:48485417-48485821, chr5:139076623-139076941, chr6:108488634-108488917, chr6:166970625-166970825, chr7:27260117-27260462, chr8:20375580-20375780, chr8:61788861-61789200, chr9:68413067-68413267, chr9:140683687-140683969, chr12:52311647-52311991, chr13:37005935-37006328, chr13:51417486-51417774, chr16:50715367-50715567, chr17:80745056-80745446, chr19:3688030-3688230, chr19:4059528-4059746, chr19:12978686-12978886, chr19:31842771-31842971, chr20:43331809-43332099, or their complements or sequences (e.g., the corresponding sequence after bisulfite conversion or the corresponding sequence after MSRE treatment); or a processed sequence of the complementary sequence (e.g., a bisulfite converted corresponding sequence or an MSRE processed corresponding sequence); or any combination of the foregoing sequences and/or regions.
In yet another aspect, the invention relates to a method for diagnosing breast cancer or predicting breast cancer risk in an individual, the method comprising the steps of:
(a) Obtaining a biological sample containing DNA from the individual; and
(b) Treating the DNA in the biological sample obtained in step (a) with a reagent capable of distinguishing between methylated and unmethylated sites, such as CpG sites, in the DNA, thereby obtaining treated DNA;
(c) Optionally, pre-amplifying at least one target region of at least one target marker in the treated DNA obtained from step (b) with a pre-amplification primer pool, wherein at least one target region of each target marker is pre-amplified to obtain at least one pre-amplified product, and the at least one target marker comprises one or more markers selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3, WISP2, and any combination thereof, and wherein the target region comprises at least one CpG dinucleotide sequence;
(d) Detecting a methylation template of at least one region of interest of at least one target marker in step (b) or in step (c), said at least one target marker comprising one or more markers selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3, WISP2, and any combination thereof, and
(e) Comparing the methylation level of at least one target region of each target marker in step (d) to a corresponding threshold value, respectively, wherein having a methylation level of at least one target region of one or more target markers that is equal to or higher than the threshold value relative to its corresponding threshold value indicates that the individual has breast cancer or is at risk of breast cancer.
In some embodiments, the reagent is a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one target region of the marker.
In some embodiments, in step (c), amplification is performed using a substance that amplifies at least one region of interest of the marker, such as an oligonucleotide primer. In some embodiments, the oligonucleotide primer is complementary to or identical to a fragment of at least 9 bases long of at least one target region of the marker. In some embodiments, the oligonucleotide primer is selected from SEQ ID NOS.23-66. In some embodiments, in step (d), detection is performed using a substance, such as a probe, that hybridizes to at least one target region of the marker. In some embodiments, the probe is complementary to or identical to a fragment of at least 9 bases long of at least one target region of the marker.
In some embodiments, the marker is SKI. In some embodiments, the marker is PRDM16. In some embodiments, the marker is PIAS3. In some embodiments, the marker is SLC10A4. In some embodiments, the marker is CXXC5. In some embodiments, the marker is NR2E1. In some embodiments, the marker is MPC1. In some embodiments, the marker is HOXA13. In some embodiments, the marker is LZTS1. In some embodiments, the marker is CHD7. In some embodiments, the marker is ANKRD20A1. In some embodiments, the marker is CACNA1B. In some embodiments, the marker is actrl 1. In some embodiments, the marker is CCNA1. In some embodiments, the marker is RNASEH2B. In some embodiments, the marker is SNX20. In some embodiments, the marker is TBCD. In some embodiments, the marker is PIP5K1C. In some embodiments, the marker is ZBTB7A. In some embodiments, the marker is DNASE2. In some embodiments, the marker is TSHZ3. In some embodiments, the marker is WISP2.
In some embodiments, the marker is a marker combination selected from the group consisting of: i) SKI, PRDM16, LZTS1, CCNA1, PIP5K1C, and WISP2; or ii) PIAS3, CHD7, CACNA1B, ACVRL1, SNX20, TBCD and ZBTB7A.
In some embodiments, the sample is selected from the group consisting of a cell line, a histological section, a tissue biopsy, paraffin-embedded tissue, body fluid, and combinations thereof; preferably, the sample is selected from the group consisting of plasma, serum, whole blood, isolated blood cells, and combinations thereof; more preferably, the sample is plasma cfDNA or ctDNA.
In some embodiments, the detection is by genetic sequencing, PCR (e.g., fluorescent PCR), FISH, immunohistochemistry, ELISA, western, or flow cytometry.
In some embodiments, the target region is selected from the group consisting of: regions chr1:2166118-2166318, chr1:2978722-2978922, chr1:145562922-145563122, chr4:48485417-48485821, chr5:139076623-139076941, chr6:108488634-108488917, chr6:166970625-166970825, chr7:27260117-27260462, chr8:20375580-20375780, chr8:61788861-61789200, chr9:68413067-68413267, chr9:140683687-140683969, chr12:52311647-52311991, chr13:37005935-37006328, chr13:51417486-51417774, chr16:50715367-50715567, chr17:80745056-80745446, chr19:3688030-3688230, chr19:4059528-4059746, chr19:12978686-12978886, chr19:31842771-31842971, chr20:43331809-43332099, or their complements or sequences (e.g., the corresponding sequence after bisulfite conversion or the corresponding sequence after MSRE treatment); or a processed sequence of the complementary sequence (e.g., a bisulfite converted corresponding sequence or an MSRE processed corresponding sequence); or any combination of the foregoing sequences and/or regions.
Drawings
Fig. 1 shows a breast cancer marker screening procedure.
Figure 2 shows methylation levels of selected 22 markers in the training set and the test set.
FIG. 3 shows the methylation levels of Seq ID No. 14 in the training set and in the test set.
FIG. 4 shows the AllModel model predictive score distribution.
Fig. 5 shows ROC curves for the allrodel model in the training set and the test set.
Fig. 6 shows Sub1 model predictive score distribution.
Fig. 7 shows ROC curves for Sub1 model in training and test sets.
Fig. 8 shows Sub2 model predictive score distribution.
Fig. 9 shows ROC curves for Sub2 model in training set and test set.
Detailed Description
Several aspects of the invention are described below with reference to example applications for illustration. It should be understood that numerous specific details, relationships, and methods are set forth to provide a full understanding of the invention. One of ordinary skill in the relevant art, however, will readily recognize that the invention can be practiced without one or more of the specific details or with other methods.
The present invention relates to the relationship between the methylation level of newly discovered markers and breast cancer. The markers described herein provide methods for diagnosing breast cancer or assessing breast cancer risk in an individual. Thus, one embodiment of the invention represents an improvement of markers suitable for diagnosing breast cancer or assessing breast cancer risk. In yet another embodiment, the newly discovered markers of the invention may be used in combination with one or more other breast cancer markers known in the art (e.g., CEA, CA 15-3, CA 125, ki-67, HER-2, ER, PR, etc.) and/or conventional examination means such as breast finger examination with physician's physical examination, breast ultrasound and breast X-ray, fine needle biopsy, etc., e.g., for diagnosing breast cancer or assessing breast cancer risk in an individual or for preparing kits and/or microarrays for such purpose.
The term "sample" means a material that is known or suspected to express or contain a marker as described herein. The sample may be derived from biological sources ("biological samples"), such as tissues (e.g., biopsy samples), extracts or cell cultures including cells (e.g., tumor cells), cell lysates, and biological or physiological fluids, such as whole blood, plasma, serum, saliva, cerebral spinal fluid, sweat, urine, milk, peritoneal fluid, and the like. Samples obtained from sources or after pretreatment to improve sample characteristics (e.g., to prepare plasma from blood, etc.) may be used directly. In certain aspects of the invention, the sample is a human physiological fluid, such as human plasma. In certain aspects of the invention, the sample is a biopsy sample such as tumor tissue or cells obtained by a biopsy.
Samples that can be analyzed according to the present invention include polynucleotides of clinical origin. As will be appreciated by those skilled in the art, the target polynucleotide may comprise DNA or RNA, particularly DNA, particularly free DNA such as extracellular free DNA. In certain specific aspects of the invention, the sample is plasma cfDNA or ctDNA.
The target polynucleotide or substances (e.g., oligonucleotide primers or probes) hybridized or amplified to the target polynucleotide may be detectably labeled on one or more nucleotides using methods known in the art. The detectable label can be, but is not limited to, a luminescent label, a fluorescent label, a bioluminescent label, a chemiluminescent label, a radioactive label, and a colorimetric label.
As used herein, the term "marker" refers to a nucleic acid, gene region, or methylation site of interest that: a score of its methylation level or a computational model based on methylation level (e.g., AUC of ROC curve in case of using a machine learning model such as logistic regression model) is indicative of breast cancer diagnosis or high risk of breast cancer. A gene should be considered to include all its transcriptional variants and all its promoter and regulatory elements. As will be appreciated by those skilled in the art, certain genes are known to exhibit allelic variation or single nucleotide polymorphism ("SNP") between individuals. SNPs include insertions and deletions of simple repeat sequences of different lengths (e.g., dinucleotide and trinucleotide repeats). Thus, the present application should be understood to extend to all forms of markers/genes arising from any other mutation, polymorphism or allelic variation. In addition, it is understood that the term "marker" shall include both the sense strand sequence of a marker or gene and the antisense strand sequence of a marker or gene.
The term "marker" as used herein is to be construed broadly to include both 1) the original marker found in a biological sample or genomic DNA (in a specific methylation state) and 2) its treated sequence (e.g., the corresponding region after bisulfite conversion or the corresponding region after MSRE treatment). The corresponding region after bisulfite conversion differs from the target marker in the genomic sequence in that one or more unmethylated cytosine residues are converted to uracil bases, thymine bases, or other bases that differ from cytosine in hybridization behavior. The MSRE treated corresponding region differs from the target marker in the genomic sequence in that the sequence is cleaved at one or more MSRE cleavage sites.
As used herein, "methylation state" refers to the presence, absence, and/or amount of one or more methylated nucleotide bases in a nucleic acid molecule. For example, a nucleic acid molecule containing a methylated cytosine is considered methylated when the methylation state of the nucleic acid molecule is methylated. Nucleic acid molecules that do not contain any methylation-modified cytosine are considered unmethylated, where the methylation state of the nucleic acid molecule is unmethylated. In some embodiments, a nucleic acid may be characterized as "unmethylated" if it is not methylated at a particular locus (e.g., a locus of a particular single CpG dinucleotide) or a particular combination of loci, even if it is methylated at other loci of the same gene or molecule.
Thus, methylation state describes the state of methylation of a nucleic acid (e.g., a genomic sequence). In addition, methylation state refers to a characteristic of a nucleic acid segment at a particular genomic locus that is associated with methylation. Such features include, but are not limited to, whether any cytosine (C) residues within this DNA sequence are methylated, the position of one or more methylated C residues, the frequency or percentage of methylated C throughout any particular region of the nucleic acid, and methylation allele differences due to, for example, differences in allele origins. "methylation state" refers to the relative concentration, absolute concentration, or pattern of methylated C or unmethylated C throughout any particular region of a nucleic acid in a biological sample. For example, one or more cytosine (C) residues within a nucleic acid sequence may be referred to as "hypermethylated" or have "increased methylation" if they are methylated, and one or more cytosine (C) residues within a DNA sequence may be referred to as "demethylated" or have "decreased methylation" if they are unmethylated. Likewise, if one or more cytosine (C) residues within a nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), the sequence is considered hypermethylated or has increased methylation compared to the other nucleic acid sequence. Alternatively, if one or more cytosine (C) residues within a DNA sequence are unmethylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), then the sequence is considered to be demethylated or to have reduced methylation compared to the other nucleic acid sequence.
In the present invention, methylation level represents the proportion of one or more sites in the methylation state. The methylation level of a region (or group of sites) is the average of the methyl levels of all sites in the region (or all sites in the group). Thus, an increase or decrease in the methylation level of a region does not indicate an increase or decrease in the methylation level of all methylation sites in the region. The process of converting the results obtained by methods for detecting DNA methylation (e.g. simplified methylation sequencing, fluorescent quantitative PCR) to methylation levels is known in the art.
As used herein, "methylation level" includes the relationship of the methylation status of any number of CpG's in the sequence of interest, and any position. The relationship may be the addition or subtraction of methylation status parameters (e.g., 0 or 1) or the calculation of mathematical algorithms (e.g., mean, percentage, number of copies, scale, degree, or calculation using a mathematical model), including but not limited to methylation levelness values, methylation haplotype ratios, methylation haplotype loads, or AUC of ROC curves in the case of using a machine learning model such as a logistic regression model.
Genes used as markers in the present invention are intended to include naturally occurring variants of the gene, their complementary sequences, all promoter and regulatory elements thereof (e.g., nucleic acid sequences within 5 kb upstream of the gene annotation start site (e.g., 4 kb, 3 kb, 2 kb or 1 kb) and 5 kb downstream of the gene annotation end site), and fragments, particularly molecularly biologically detectable fragments, of the gene or the variant. In the present invention, the terms "molecular biologically detectable fragment", "target region" and "target gene region" are used interchangeably. The molecular biology detectable fragment preferably comprises at least 16, 17, 18, 19, 20, 22, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300 or more consecutive nucleotides of the marker. In some embodiments, the contiguous nucleotide comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, or more CpG dinucleotide sequences. In some embodiments, it is preferred that the target gene region is rich in CpG dinucleotides.
In the present invention, the term "target region" or "target gene region" refers to any molecularly biologically detectable fragment within a nucleic acid region consisting of the marker gene itself, 5 kb upstream of its gene annotation start site (e.g., 4 kb, 3 kb, 2 kb or 1 kb) and 5 kb downstream of its gene annotation end site (e.g., 4 kb, 3 kb, 2 kb or 1 kb), or a complementary sequence or a processed sequence thereof (e.g., a bisulfite converted or MSRE processed) or a processed sequence of the complementary sequence (e.g., a bisulfite converted or MSRE processed). For example, the target gene region of the target marker in table 1 below includes its Hg19 coordinates and any molecular biologically detectable fragment within 5 kb (e.g., 4 kb, 3 kb, 2 kb, or 1 kb) upstream and downstream of the coordinates, its complement, or a processed sequence (e.g., a bisulfite converted corresponding sequence or an MSRE treated corresponding sequence), and a processed sequence of the complement (e.g., a bisulfite converted corresponding sequence or an MSRE treated corresponding sequence). More preferably, the target gene region of the target marker in table 1 below comprises its Hg19 coordinates and any molecular biologically detectable fragment within 5 kb (e.g., 4 kb, 3 kb, 2 kb, or 1 kb) upstream of that coordinates, its complement, or a treated sequence (e.g., a bisulfite converted corresponding sequence or an MSRE treated corresponding sequence), and a treated sequence of the complement (e.g., a bisulfite converted corresponding sequence or an MSRE treated corresponding sequence).
In some embodiments, it is preferred to use and detect a target marker selected from table 1 below and a target gene region thereof, or any combination thereof:
TABLE 1 target markers and target gene regions
/>
/>
/>
/>
/>
/>
/>
/>
The SKI is a protein coding gene, also called SKI Proto-Oncogene, and the coded protein functions as a TGF-beta signal pathway inhibitor.
PRDM16 described herein is a protein-encoding gene, also known as PR/SET Domain 16, encoding a protein that regulates gene transcription.
PIAS3 described herein is a protein-encoding gene, also known as Protein Inhibitor Of Activated STAT3, encoding a protein that inhibits STAT3 signaling pathway activation.
The SLC10A4 is a protein coding gene, also called Solute Carrier Family 10 membrane 4, and the coded protein is a transport protein and participates in transport of bile acid and the like.
CXXC5 as described herein is a protein-encoding gene, also known as CXXC Finger Protein 5, which encodes a protein that binds to a specific DNA motif and is involved in signal transduction of various signal pathways such as NF-kappa-B, MAPK, WNT, etc.
NR2E1 as described herein is a protein-encoding gene, also known as Nuclear Receptor Subfamily 2 Group E Member 1, encoding proteins involved in the formation of nuclear receptors.
The MPC1 described herein is a protein-encoding gene, also known as Mitochondrial Pyruvate Carrier 1, which encodes a protein involved in mitochondrial pyruvate transport.
The HOXA13 is a protein coding gene, also called Homeobox A13, the coding protein is a transcription factor, and the HOXA13 participates in transcriptional regulation.
LZTS1 is a protein coding gene, also called Leucine Zipper Tumor Suppressor 1, and the coding protein participates in regulating the cell cycle.
CHD7, as described herein, is a protein-encoding gene, also known as Chromodomain Helicase DNA Binding Protein 7, the gene ontology involved is annotated as chromatin binding and helicase activity.
ANKRD20A1 as described herein is a protein-encoding gene, also known as Ankyrin Repeat Domain 20 Family Member A1.
The CACNA1B is a protein coding gene, also called as Calcium Voltage-Gated Channel Subunit Alpha 1B, and the coding protein participates in forming a Calcium ion channel.
ACVRL1 is a protein coding gene, also called Activin A Receptor Like Type 1, and the coded protein is a TFG-beta family ligand receptor and participates in regulating vascular development.
The CCNA1 is a protein coding gene, also called Cyclin A1, and the coding protein participates in regulating the cell cycle.
RNASEH2B as described herein is a protein encoding gene, also known as ribocleare H2 subset B, encoding a protein that is an endonuclease Subunit.
The SNX20 is a protein coding gene, also called Sorting Nexin 20, and the coding protein participates in cell vesicle transport.
The TBCD described herein is a protein-encoding gene, also known as Tubulin Folding Cofactor D, which encodes a protein involved in tubulin folding.
PIP5K1C is a protein coding gene, which is also called phosphokinase, and is called phosphokinase-4-phosphokinase-5-Kinase Type 1 Gamma.
The ZBTB7A is a protein coding gene, also called Zinc Finger And BTB Domain Containing A, and the coded protein is a transcription factor for inhibiting transcriptional activation.
DNASE2 is a protein coding gene, also called deoxyribocleare 2, lysomap, and the coded protein belongs to a DNA endonuclease family.
The TSHZ3 is a protein coding gene, also called Teashirt Zinc Finger Homeobox 3, and the coding protein participates in transcriptional regulation.
WISP2 as described herein is a protein-encoding gene, also known as CCN5 or Cellular Communication Network Factor 5, which encodes a protein that belongs to a member of the WISP protein family.
In some embodiments, it is preferred to use and detect a combination of two or more (e.g., 3, 4, 5, or 6) of the target markers and their target gene regions in table 1. In some embodiments, the following combinations of target markers and their target gene regions in table 1 are preferably used and detected: i) SKI, PRDM16, LZTS1, CCNA1, PIP5K1C, and WISP2; or ii) PIAS3, CHD7, CACNA1B, ACVRL1, SNX20, TBCD and ZBTB7A.
In some embodiments, it is preferred to use and detect the target marker SKI and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3, and WISP2.
In some embodiments, it is preferred to use and detect the target marker PRDM16 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker PIAS3 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker SLC10A4 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker CXXC5 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker NR2E1 and its target gene region, optionally additionally the target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker MPC1 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker HOXA13 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker LZTS1 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, CHD7, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker CHD7 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, ANKRD20A1, CACNA1B, ACVRL, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker ANKRD20A1 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX20, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker CACNA1B and its target gene region, optionally additionally the target marker selected from the following and its target gene region or any combination thereof: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, ACVRL1, CCNA1, RNASEH2B, SNX20, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker ACVRL1 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker CCNA1 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker RNASEH2B and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, SNX20, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker SNX20 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, TBCD, PIP K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker TBCD, and target gene region thereof, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX20, PIP5K1C, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker PIP5K1C and its target gene region, optionally additionally the target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX20, TBCD, ZBTB7A, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker ZBTB7A and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, DNASE2, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker DNASE2 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX20, TBCD, PIP5K1C, ZBTB7A, TSHZ3 and WISP2.
In some embodiments, it is preferred to use and detect the target marker TSHZ3 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX20, TBCD, PIP5K1C, ZBTB7A, DNASE2, and WISP2.
In some embodiments, it is preferred to use and detect the target marker WISP2 and its target gene region, optionally in addition to use and detect a target marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX20, TBCD, PIP5K1C, ZBTB7A, DNASE2 and TSHZ3.
In some embodiments, the use of the subject targets and target gene regions thereof, or combinations thereof, is capable of achieving a sensitivity of at least 25%, e.g., at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 81%, at least 82%, or at least 83% with a specificity of greater than 80%, e.g., greater than 85%, or greater than 90%.
The terms "subject," "patient," and "individual" are used interchangeably herein to refer to a warm-blooded animal, such as a mammal. The term includes, but is not limited to, domestic animals, rodents (e.g., rats and mice), primates, and humans. Preferably the term refers to a human.
The term "methylation assay" refers to any assay that determines the methylation state of one or more dinucleotide (e.g., cpG) sequences within a DNA sequence.
In this context, the term "threshold" should be understood according to the general understanding of the person skilled in the art and represents any useful reference for reflecting the level of DNA methylation. In some embodiments, the threshold is represented by a positive reference interval, wherein within the positive reference interval the individual is indicated to have or be at risk of having breast cancer; for example, a methylation level of one or more markers within a positive reference interval as compared to a corresponding positive reference interval indicates that the individual has or is at risk of having breast cancer. The threshold or positive reference interval may be obtained from a known database or from a personal study. In the present invention, a threshold or positive reference interval refers to the level from a positive control (i.e., an individual with breast cancer). The threshold or positive reference interval may be obtained from a blood reference sample of the patient himself; expression of a marker gene in an individual suffering from breast cancer; or breast cancer cells of a predetermined individual having breast cancer.
In some embodiments, in the case of a machine learning model (including but not limited to a logistic regression model) to determine breast cancer, AUC values of ROC curves are used to set positive reference intervals, e.g., positive reference intervals are set with AUC values for each marker greater than 90% specific (less than 10% of samples not having breast cancer that are detected as positive). In some embodiments where the 22 markers are used to construct a logistic regression model, the AUC threshold is set, for example, equal to or greater than 0.440. In some embodiments where the logistic regression model is constructed using the SKI, PRDM16, LZTS1, CCNA1, PIP5K1C, and WISP2 markers, the AUC threshold is set to, for example, equal to or greater than 0.541. In some embodiments where PIAS3, CHD7, CACNA1B, ACVRL1, SNX20, TBCD, and ZBTB7A markers are used to construct the logistic regression model, the AUC threshold is set to, for example, equal to or greater than 0.513.
In some embodiments where a logistic regression model is constructed using individual SKI markers, the AUC threshold is set, for example, equal to or greater than 0.534.
In some embodiments where a logistic regression model is constructed using the PRDM16 marker alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
In some embodiments where the logistic regression model is constructed using the PIAS3 marker alone, the AUC threshold is set to, for example, equal to or greater than 0.533.
In some embodiments where the logistic regression model is constructed using the SLC10A4 marker alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
In some embodiments where a logistic regression model is constructed using the CXXC5 marker alone, the AUC threshold is set, for example, equal to or greater than 0.534.
In some embodiments where the logistic regression model is constructed using the NR2E1 markers alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
In some embodiments where a logistic regression model is constructed using individual MPC1 markers, the AUC threshold is set, for example, equal to or greater than 0.540.
In some embodiments where the logistic regression model is constructed using the HOXA13 marker alone, the AUC threshold is set to, for example, equal to or greater than 0.534.
In some embodiments where the logistic regression model is constructed using the individual LZTS1 markers, the AUC threshold is set to, for example, equal to or greater than 0.534.
In some embodiments where a logistic regression model is constructed using the CHD7 marker alone, the AUC threshold is set, for example, equal to or greater than 0.533.
In some embodiments where the logistic regression model is constructed using the ANKRD20A1 markers alone, the AUC threshold is set to, for example, equal to or greater than 0.530.
In some embodiments where the logistic regression model is constructed using the CACNA1B marker alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
In some embodiments where a logistic regression model is constructed using individual actrl 1 markers, the AUC threshold is set, for example, equal to or greater than 0.530.
In some embodiments where a logistic regression model is constructed using the CCNA1 marker alone, the AUC threshold is set, for example, equal to or greater than 0.533.
In some embodiments where the logistic regression model is constructed using RNASEH2B markers alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
In some embodiments where the logistic regression model is constructed using the SNX20 marker alone, the AUC threshold is set to, for example, equal to or greater than 0.548.
In some embodiments where a logistic regression model is constructed using individual TBCD markers, the AUC threshold is set, for example, equal to or greater than 0.532.
In some embodiments where the logistic regression model is constructed using the PIP5K1C marker alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
In some embodiments where the logistic regression model is constructed using the ZBTB7A marker alone, the AUC threshold is set to, for example, equal to or greater than 0.529.
In some embodiments where a logistic regression model is constructed using separate DNASE2 markers, the AUC threshold is set, for example, equal to or greater than 0.533.
In some embodiments where a logistic regression model is constructed using the TSHZ3 marker alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
In some embodiments where the logistic regression model is constructed using the WISP2 marker alone, the AUC threshold is set to, for example, equal to or greater than 0.532.
The term "oligonucleotide" refers to a multimeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. The term includes double and single stranded DNA and RNA, modified and unmodified forms such as methylation or capping of polynucleotides. The terms "polynucleotide" and "oligonucleotide" are used interchangeably herein. The oligonucleotide may, but need not, include other coding or non-coding sequences, or it may, but need not, be linked to other molecules and/or carriers or support materials. Oligonucleotides used in the methods or kits of the invention can be of any length suitable for the particular method. In certain applications, the term refers to an antisense nucleic acid molecule (e.g., an mRNA or DNA strand in the opposite direction to the sense polynucleotide encoding a marker of the invention).
Oligonucleotides for use in the present invention include complementary nucleic acid sequences and nucleic acids that are substantially identical to those sequences, and also include sequences that differ from the nucleic acid sequence by the degeneracy of the genetic code. Oligonucleotides useful in the present invention also include nucleic acids that hybridize under stringent conditions, preferably high stringency conditions, to oligonucleotide cancer marker nucleic acid sequences.
Nucleotide hybridization assays are well known in the art. Hybridization assay procedures and conditions will vary depending on the application and will be selected according to known general binding methods, see e.g. j. Experimental guidelines (third edition. Scientific press, 2002); and Young and Davis, P.N.A.S., 80:1194 (1983). Methods and apparatus for performing repeated and controlled hybridization reactions have been described in U.S. Pat. nos. 5,871,928, 5,874,219, 6,045,996, 6,386,749 and 6,391,623, each of which is incorporated herein by reference.
As used herein, "primer" generally refers to a linear oligonucleotide that is complementary to and anneals to a target sequence. The lower limit of primer length is determined by hybridization ability, since very short primers (e.g., less than 5 nucleotides) do not form thermodynamically stable duplex under most hybridization conditions. The primer length typically varies from 8 to 50 nucleotides. In certain embodiments, the primer is between about 15-25 nucleotides. Naturally occurring nucleotides (especially guanine, adenine, cytosine and thymine, hereinafter referred to as "G", "A", "C" and "T") and nucleotide analogs are useful in the primers of the invention.
As used herein, "amplification product" refers to an amplified nucleic acid produced by nucleic acid amplification from a nucleic acid template.
The term "nucleotide analog" as used herein refers to a compound that is structurally similar to a naturally occurring nucleotide. The nucleotide analogs can have altered phosphate backbones, sugar moieties, nucleobases, or combinations thereof. Nucleotide analogs that generally have altered nucleobases impart, inter alia, different base pairing and base stacking properties. Nucleotide analogs with altered phospho-sugar backbones (e.g., peptide Nucleic Acids (PNAs), locked Nucleic Acids (LNAs)) generally alter, inter alia, strand properties, e.g., secondary structure formation.
Examples of primers and probes used in the present invention are shown in Table 2, and target gene regions to which they are directed are shown in Table 1.
TABLE 2 primer sequences
/>
Note that: r represents random base A, T, C or G.
The nucleotide sequences of the primers and probes of the invention also include modified forms thereof, as long as the amplification or detection effect of the primers is not significantly affected. The modification may be, for example, the addition of one or more nucleotide residues in the nucleotide sequence or at both ends, the deletion of one or more nucleotide residues in the nucleotide sequence, or the replacement of one or more nucleotide residues in the sequence with another nucleotide residue, e.g., the replacement of a with T, the replacement of C with G, etc. It is clear to a person skilled in the art that the modified form of the primers is also encompassed within the scope of the invention, in particular the claims. In one embodiment, the modified form of the nucleotide sequence of the primer is a chemically amplified primer as disclosed in CN103270174 a.
The individual nucleotides in the primers of the invention can be chemically synthesized using, for example, a universal DNA synthesizer (e.g., model 394, manufactured by Applied Biosystems). Any other method well known in the art may also be used to synthesize oligonucleotides.
The target marker is amplified using DNA extracted from the sample as a template and PCR primers to obtain an amplified product. Amplification reactions include, but are not limited to, polymerase Chain Reaction (PCR), ligase chain reaction (LCP), self-sustained sequence replication (3 SR), nucleic acid sequence-based amplification (NASBA), strand Displacement Amplification (SDA), multiple Displacement Amplification (MDA), and Rolling Circle Amplification (RCA), which are disclosed in the following references (incorporated herein by reference): mullis et al, U.S. Pat. nos. 4,683,195; no. 4,965,188; no. 4,683,202; no. 4,800,159 (PCR); gelfand et al, U.S. Pat. No. 5,210,015 (real-time PCR with "Taqman" or "Taq" [ registered trademark ] probes); wittwer et al, U.S. Pat. nos. 6,174,670; kacian et al, U.S. Pat. No. 5,399,491 ("NASBA"); lizardi, U.S. Pat. nos. 5,854,033; aono et al, japanese patent publication No. JP 4-262799 (rolling circle amplification); etc.
The target markers are preferably amplified using PCR methods. PCR methods per se are well known in the art. The term "PCR" includes derivative forms of the reaction including, but not limited to, reverse transcription PCR, real-time PCR, nested PCR, multiplex PCR, fluorescent quantitative PCR, and the like. The target nucleotide is preferably quantitatively amplified using a fluorescent quantitative PCR method.
PCR is performed by repeating the cycle of denaturation, annealing and extension steps about 30 to 60 times (e.g., 50 times) using a primer hybridized to the sense strand (reverse primer) and a primer hybridized to the antisense strand (forward primer) in the presence of a primer, a template DNA and a thermostable DNA polymerase. In one embodiment, the PCR is fluorescent quantitative PCR. In one embodiment, PCR uses primers as shown in Table 2. It will be appreciated by those skilled in the art that other PCR methods and primers may be used as long as the target fragment can be amplified.
In the PCR of the present invention, amplification may be performed using various conventional thermostable DNA polymerases, including, but not limited to, fastStart Taq DNA polymerase (Roche), ex Taq (registered trademark, takara), Z-Taq, accuPrime Taq DNA polymerase, and HotStarTaq Plus DNA polymerase.
Methods for selecting appropriate PCR reaction conditions based on the Tm values of the primers are well known in the art, and one of ordinary skill in the art can select the optimum conditions depending on the primer length, GC content, target specificity and sensitivity, the nature of the polymerase used, and the like. For example, a fluorescent quantitative PCR reaction can be performed using the following conditions: 95 ℃ for 5 minutes; the cycle was repeated 50 times at 95℃for 15 seconds and at 56℃for 40 seconds. The reaction system was 25. Mu.L.
Reagents useful for detecting the methylation level of a target marker of the invention are well known in the art. Such reagents suitable for use in the present invention, for example bisulphite reagents or methylation sensitive restriction enzymes, are commercially available or are routinely prepared by methods well known to those skilled in the art.
The term "bisulfite reagent" refers to the bisulfite salt used to distinguish between methylated and unmethylated CpG dinucleotide sequences.
The term "methylation sensitive restriction enzyme" is understood to mean an enzyme that selectively digests nucleic acid according to the methylation state of its recognition site. For restriction enzymes that cleave specifically only when the recognition site is unmethylated or hemimethylated, cleavage does not occur or occurs with significantly reduced efficiency when the recognition site is methylated. For restriction enzymes that cleave specifically only when the recognition site is methylated, cleavage does not occur, or with significantly reduced efficiency, when the recognition site is unmethylated. Preferably, the recognition sequence of the methylation sensitive restriction enzyme contains a CG dinucleotide (e.g., cgcg or cccggg). In some embodiments, further preferred are restriction enzymes that do not cleave when the cytosine in the dinucleotide is methylated at the C5 carbon atom.
Kits of the invention may be prepared by methods conventional in the art. The kit may comprise materials or reagents (including reagents for detecting each target marker) used in practicing the methods of the invention. The kit may include reagents for storing the reaction (e.g., primers, dntps, enzymes, etc. in a suitable container) and/or support materials (e.g., buffers, instructions for performing the assay, etc.). For example, the kit may comprise one or more containers (e.g., cassettes) containing the respective reagents and/or support materials. Such contents may be delivered together or separately to the intended recipient. As an example, the kit may contain reagents for detecting each target marker, buffers, and instructions for use. The kit may further contain a polymerase, dTNP, etc. The kit may also contain internal standards for quality control, positive and negative controls, and the like. The kit may also comprise reagents for preparing nucleic acids, such as DNA, from the sample. The above examples are not to be construed as limiting the kits and their contents suitable for use in the present invention.
Microarrays refer to solid supports having a planar surface with an array of nucleic acids, each member of the array comprising identical copies of an oligonucleotide or polynucleotide immobilized on a spatially defined region or site that does not overlap with the regions or sites of other members of the array; that is, the regions or sites are spatially discrete. Furthermore, a spatially defined hybridization site may be "addressable" in that its position and the identity of its immobilized oligonucleotide are known or pre-determined Determined first (e.g., known or predetermined prior to its use). The oligonucleotide or polynucleotide is typically single stranded and is typically covalently attached to the solid support at either the 5 '-or 3' -end. The density of nucleic acids containing non-overlapping regions in a microarray is typically greater than 100/cm 2 More preferably greater than 1000/cm 2 . Microarray technology is disclosed, for example, in the following references: schena edited microarys A Practical Approach (IRL Press, oxford, 2000); southern, current Opin. Chem. Biol., 2:404-410, 1998, the entire contents of which are incorporated herein by reference.
The invention discloses the application of a marker in diagnosing breast cancer and predicting the risk of the breast cancer, and a person skilled in the art can properly improve the technological parameters by referring to the content of the invention. It is expressly noted that all such similar substitutions and modifications will be apparent to those skilled in the art, and are deemed to be included in the present invention. While the use of the present invention has been described in terms of preferred embodiments, it will be apparent to those skilled in the relevant art that the invention can be practiced and practiced with modification and alteration and combination of the use described herein without departing from the spirit and scope of the invention.
Examples
For a clearer understanding of the present invention, reference will be made to the following detailed description taken in conjunction with the accompanying drawings and examples.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.
EXAMPLE 1 methylation-targeted sequencing screening of Breast cancer methylation markers in plasma
The inventors collected 132 total female samples, 70 of which were breast cancer female patients, 62 healthy females, and the group-entering group had signed informed consent. The samples are divided into a training set and a testing set according to a certain proportion, wherein the training set is used for constructing a machine learning model, and the testing set is used for testing the performance of the model, and the sample information is shown in the following table 3.
TABLE 3 statistical Table of the number of plasma samples into groups
The methylation sequencing data of the cfDNA of the sample plasma is obtained by a method of Methyl-Titan (patent number: CN 201910515830), and methylation markers in the cfDNA are screened.
The specific technical scheme is as follows:
1. extraction of blood plasma cfDNA samples
A streck blood collection tube is used for collecting 2ml whole blood samples of volunteers, the information of the samples of the volunteers in the group is shown in Table 3, the blood plasma is centrifugally separated in time (within 3 days), and after the blood plasma is transported to a laboratory, cfDNA is extracted by using a QIAGEN QIAamp Circulating Nucleic Acid Kit kit according to the specification.
2. Sequencing and data preprocessing
a) The library was subjected to 150bp double-ended sequencing with an Illumina Nextseq 500 sequencer in an amount of not less than 5M.
b) The Pear (v0.6.0) software combines the double-ended sequencing data of the same fragment sequenced by 150bp on both ends of the sequencer off machine into one sequence, the shortest overlapping length is 20 bp, and the shortest 30bp after combining.
c) The pooled sequencing data was subjected to a decoking treatment using trim_galore v0.6.0, cutadaptv 1.8.1 software, with a linker sequence of "AGATCGGAAGAGCAC", and bases with a two-terminal sequencing quality value below 20 were removed.
3. Sequencing data alignment
The reference genome data used herein are from the UCSC database (UCSC: HG19, http:// hgdownload. Soe. UCSC. Edu/goldenPath/HG19/bigZips/HG19.Fa. Gz).
a) HG19 genomic sequences were transformed with cytosine to thymine (CT) and adenine to Guanine (GA), respectively, using Bismark software, and the transformed genomes were indexed using Bowtie2 software, respectively.
b) The pretreated data were also subjected to CT and GA conversion.
c) The transformed sequences were aligned to the transformed HG19 reference genome using Bowtie2 software, respectively, with a minimum seed sequence length of 20, the seed sequence not allowing for mismatches.
4. Calculation of AMF, MHF per sample
And obtaining the methylation state corresponding to each CpG site in each target methylation interval according to the comparison result.
a) The average methylation rate AMF value of the target methylation interval is calculated. The AMF is calculated as follows:
wherein M is the total number of CpG sites in the target methylation interval, i is the CpG sites in the interval, and N C,i Sequencing the CpG sites to the number of reads of C (i.e., the number of methylated reads), N T,i The CpG sites are sequenced to the number of reads of T (i.e., the number of unmethylated sequenced reads).
b) The methylation haplotype rate MHF value of the target methylation interval is calculated. There may be multiple methylation haplotypes for a target methylation interval, and this value is calculated for each methylation haplotype in the target region, as shown in the MHF calculation formula:
wherein l represents the target methylation interval, h represents the target methylation halotype, N l Representing the number of reads located in the target methylation interval, N l,h Representing the number of reads comprising a target methylated halotype
5. Feature matrix construction
a) And respectively merging AMF of each target methylation interval of each sample of the training set and the testing set, wherein the MHF value is the characteristic matrix of the training set and the testing set, and taking the target methylation interval with the number of reads lower than 100 as a missing value.
b) Target methylation intervals with a proportion of deletion values higher than 10% are removed.
c) Training a converter for the training set matrix by using a KNN algorithm, and performing missing data interpolation for the training set and the testing set feature matrix by using the converter.
6. Searching for breast cancer methylation markers from training set samples (see FIG. 1)
a) In the training set, a logistic regression model is constructed for each feature to distinguish breast cancer from healthy people, 3-fold cross validation average AUC is calculated, and the sequences from high to low are sorted
b) Adding the residual features into the feature set in turn, and reconstructing a logistic regression model
c) If the logistic regression model 5-fold cross-validation average AUC increases, the feature is retained, otherwise removed
d) And traversing all the features to obtain an optimal marker combination, modeling by using the optimal combination, and finally verifying the effect of the model by using the test set sample.
7. In the above process, 22 breast cancer methylation markers are screened out.
The above procedure screened 22 breast cancer methylation markers altogether, wherein a single one or a combination of multiple methylation markers can be used as methylation markers for breast cancer identification. Methylation marker associated genes refer to genes corresponding to the nearest neighbor TSS within 100Kb of the methylation marker, and specific associated genes and methylation levels are shown in Table 4.
The present invention screens 22 breast cancer methylation markers altogether, and methylation levels of these 22 methylation markers in the training set and test set breast cancer samples and healthy human samples are shown in fig. 2 and table 4. Methylation marker associated genes refer to genes corresponding to the nearest neighbor TSS within 100Kb of the methylation marker. Methylation marker genomic position refers to the position of the methylation marker in UCSC (https:// genome. UCSC. Edu/cgi-bin/hgTracks)db=hg 19) HG19 genomic position. For the training and test set healthy and breast cancer patient samples, methylation marker methylation levels were calculated for each sample, respectively, the class median was calculated as the class methylation level, and the training and test sets were calculated using' wilcoxStatistical significance of differential methylation in healthy and breast cancer patients, if Wilcox.P value<0.05, the methylation marker is considered to have a significant methylation difference between healthy and breast cancer patient samples. Of the 22 methylation markers, 19 methylation markers had a significant methylation difference in the training set samples and 14 had a significant methylation difference in the test set samples. These results demonstrate that we screened 22 methylation markers also better differentiated healthy and breast cancer patients at the sample methylation level.
We show in detail the methylation level of the methylation marker in the training set and the test set of breast cancer and healthy human with the Seq ID NO:14, as shown in FIG. 3 below, the methylation marker has very significant methylation differences in both the training set and the test set of breast cancer and healthy human, the training set Wilcox.P value is 1.2E-11, and the test set Wilcox.P value is 1.0E-08.
TABLE 4 methylation levels of 22 methylation markers selected
/>
Example 2: machine learning diagnostic model AllModel for all methylation markers
This example uses 22 methylation markers to construct a logistic regression machine learning model to identify plasma samples from healthy and breast cancer patients. Model training was performed using 22 methylation marker methylation levels from samples of the training set of example 1, and the effect of the model was tested using samples of the test set, as follows:
1. the logistic regression model in the sklearn (V1.0.1) package in python (V3.9.7) was used: allmodel=logistic regression ()
2. Training is performed using samples of the training set, allrodel. Fit (Traintata, traintheno), where Traintata is the data of the training set, traintheno is the property of the training set samples (breast cancer is 1, healthy person is 0), and the correlation threshold of the model is determined from the training set samples.
3. The test is performed using samples of the test set testpred=allrodel. Prediction_ proba (TestData) [: 1], where TestData is the test set data, testPred is the model predictive score, which is used to determine if the sample is breast cancer based on the above-described threshold.
Model predictive score distribution in training and test sets is shown in fig. 4, from which it can be seen that breast cancer and healthy human sample model scores have significant differences. The ROC curve is shown in fig. 5. In the training set, the breast cancer and healthy person distinguishing model AUC is 0.992, the test set AUC is 0.935, the threshold value is set to be 0.362 according to the training set data, and if the threshold value is greater than the threshold value, the breast cancer is the breast cancer, otherwise the breast cancer is the healthy person. At this threshold, the test set accuracy was 0.825, the specificity was 0.737, the sensitivity was 0.905, and the specific data are shown in table 5.
The model can better distinguish breast cancer plasma samples from healthy human plasma samples, and can be used for early screening of breast cancer.
EXAMPLE 3 random methylation marker combination 1 machine learning diagnostic model Sub1
To verify the effect of random methylation marker combinations, the present example selected 6 total methylation markers from among all 22 methylation markers, seq ID No. 1, seq ID No. 4, seq ID No. 9, seq ID No. 10, seq ID No. 15, and Seq ID No. 17 to construct a new machine learning model Sub1.
The machine learning model was constructed in the same manner as in example 2, but using only 6 methylation markers of random methylation marker combination 1, the model scores of the model in the training set and the test set are shown in FIG. 6, and the model ROC curve is shown in FIG. 7. The model can be seen that the breast cancer sample score has obvious difference with the healthy people score in a training set and a test set, the model training set AUC is 0.944, the test set AUC is 0.912, when the threshold is set to be 0.609, the test set accuracy is 0.750, the specificity is 0.895, the sensitivity is 0.619, and specific data are shown in Table 5, so that the good performance of the combined model is illustrated.
Example 4: random methylation marker combination 2 machine learning diagnostic model Sub2
This example uses another set of random methylation marker combinations: the machine learning model Sub2 was constructed from 7 methylation markers in total, seq ID No. 2, seq ID No. 3, seq ID No. 6, seq ID No. 12, seq ID No. 14, seq ID No. 16, seq ID No. 20.
The model construction method was also identical to example 2. The model scores of the model in the training set and the test set are shown in fig. 8, and the roc curve is shown in fig. 9. From the figure, it can be seen that the breast cancer sample score is significantly higher than the healthy person score in the training set and the test set. The model training set AUC is 0.935, the test set AUC is 0.852, when the threshold is set to 0.604, the accuracy of the test set is 0.700, the specificity is 0.789, the sensitivity is 0.619, the specific data are shown in Table 5, and the breast cancer and normal people can be well distinguished.
TABLE 5 machine learning diagnostic model effects
Example 5:
the methylation level of a single methylation marker in 22 methylation markers is found to have a good classification effect, and the effect of the single methylation marker in distinguishing breast cancer from healthy people is shown in Table 6. Taking the Seq ID NO. 14 as an example, if the methylation marker is singly used for constructing a machine learning model, the AUC of a model training set is 0.880, the AUC of a test set is 0.962, and when the threshold value is set to be 0.532, the accuracy of the test set is 0.875, the specificity is 0.789, the sensitivity is 0.952, and the classification effect is obvious.
TABLE 6 summary of the effects of single methylation marker diagnostic models
/>
The methylation markers of 22 breast cancers are screened out, and the machine learning diagnosis model constructed according to the methylation levels of the methylation markers can better distinguish breast cancers from healthy people, so that the method has important significance for early screening of breast cancers.

Claims (10)

1. Use of a reagent for the preparation of a kit or microarray for diagnosing breast cancer or predicting breast cancer risk in an individual, characterized in that the reagent is used for detecting the methylation level of at least one target region of at least one marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3, WISP2, and any combination thereof, wherein a methylation level of at least one target region of one or more markers equal to or above a threshold value compared to the corresponding threshold value indicates that the individual has or is at risk of breast cancer, and wherein the target region comprises at least one CpG dinucleotide sequence.
2. Use according to claim 1, characterized in that the methylation is CpG methylation.
3. Use according to claim 1, characterized in that the agent is an agent selected from the group consisting of:
i) A substance, such as an oligonucleotide primer or probe, that hybridizes to or amplifies at least one target region of the marker; and
ii) a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one region of interest of the marker.
4. Use according to claim 3, characterized in that the oligonucleotide primer or probe is complementary or identical to a fragment of at least 9 bases in length of at least one target region of the marker.
5. Use according to any one of claims 1-4, characterized in that the marker is LZTS1; or a marker combination selected from the group consisting of: i) SKI, PRDM16, LZTS1, CCNA1, PIP5K1C, and WISP2; or ii) PIAS3, CHD7, CACNA1B, ACVRL1, SNX20, TBCD and ZBTB7A.
6. The use according to any one of claims 1-4, characterized in that the sample is selected from the group consisting of cell lines, histological sections, tissue biopsies, paraffin-embedded tissues, body fluids and combinations thereof; preferably, the sample is selected from the group consisting of plasma, serum, whole blood, isolated blood cells, and combinations thereof; more preferably, the sample is plasma cfDNA or ctDNA; and/or
The target region is selected from: regions chr1:2166118-2166318, chr1:2978722-2978922, chr1:145562922-145563122, chr4:48485417-48485821, chr5:139076623-139076941, chr6:108488634-108488917, chr6:166970625-166970825, chr7:27260117-27260462, chr8:20375580-20375780, chr8:61788861-61789200, chr9:68413067-68413267, chr9:140683687-140683969, chr12:52311647-52311991, chr13:37005935-37006328, chr13:51417486-51417774, chr16:50715367-50715567, chr17:80745056-80745446, chr19:3688030-3688230, chr19:4059528-4059746, chr19:12978686-12978886, chr19:31842771-31842971, chr20:43331809-43332099, or their complements or processed sequences; or a processed sequence of said complementary sequence; or any combination of the foregoing sequences and/or regions.
7. A kit or microarray for diagnosing breast cancer or predicting breast cancer risk in an individual, characterized in that the kit or microarray comprises reagents for detecting the methylation level of at least one target region of at least one marker selected from the group consisting of: SKI, PRDM16, PIAS3, SLC10A4, CXXC5, NR2E1, MPC1, HOXA13, LZTS1, CHD7, ANKRD20A1, CACNA1B, ACVRL1, CCNA1, RNASEH2B, SNX, TBCD, PIP5K1C, ZBTB7A, DNASE2, TSHZ3, WISP2, and any combination thereof, wherein a methylation level of at least one target region of one or more markers equal to or above a threshold value compared to the corresponding threshold value indicates that the individual has or is at risk of breast cancer, and wherein the target region comprises at least one CpG dinucleotide sequence.
8. The kit or microarray according to claim 7, characterized in that the sample is selected from the group consisting of cell lines, histological sections, tissue biopsies, paraffin-embedded tissues, body fluids and combinations thereof; preferably, the sample is selected from the group consisting of plasma, serum, whole blood, isolated blood cells, and combinations thereof; more preferably, the sample is plasma cfDNA or ctDNA; and/or
The target region is selected from: regions chr1:2166118-2166318, chr1:2978722-2978922, chr1:145562922-145563122, chr4:48485417-48485821, chr5:139076623-139076941, chr6:108488634-108488917, chr6:166970625-166970825, chr7:27260117-27260462, chr8:20375580-20375780, chr8:61788861-61789200, chr9:68413067-68413267, chr9:140683687-140683969, chr12:52311647-52311991, chr13:37005935-37006328, chr13:51417486-51417774, chr16:50715367-50715567, chr17:80745056-80745446, chr19:3688030-3688230, chr19:4059528-4059746, chr19:12978686-12978886, chr19:31842771-31842971, chr20:43331809-43332099, or their complements or processed sequences; or a processed sequence of said complementary sequence; or any combination of the foregoing sequences and/or regions.
9. Kit or microarray according to claim 7 or 8, characterized in that the reagents are selected from the following:
i) A substance, such as an oligonucleotide primer or probe, that hybridizes to or amplifies at least one target region of the marker, preferably complementary to or identical to a fragment of at least 9 bases long of at least one target region of the marker; and
ii) a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one region of interest of the marker.
10. Kit or microarray according to claim 7 or 8, characterized in that the marker is LZTS1; or a marker combination selected from the group consisting of: i) SKI, PRDM16, LZTS1, CCNA1, PIP5K1C, and WISP2; or ii) PIAS3, CHD7, CACNA1B, ACVRL1, SNX20, TBCD and ZBTB7A.
CN202210931191.8A 2022-08-04 2022-08-04 Use of markers for diagnosing breast cancer or predicting breast cancer risk Pending CN117587121A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210931191.8A CN117587121A (en) 2022-08-04 2022-08-04 Use of markers for diagnosing breast cancer or predicting breast cancer risk
PCT/CN2023/111009 WO2024027796A1 (en) 2022-08-04 2023-08-03 Use of marker in diagnosing breast cancer or predicting breast cancer risks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210931191.8A CN117587121A (en) 2022-08-04 2022-08-04 Use of markers for diagnosing breast cancer or predicting breast cancer risk

Publications (1)

Publication Number Publication Date
CN117587121A true CN117587121A (en) 2024-02-23

Family

ID=89908648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210931191.8A Pending CN117587121A (en) 2022-08-04 2022-08-04 Use of markers for diagnosing breast cancer or predicting breast cancer risk

Country Status (1)

Country Link
CN (1) CN117587121A (en)

Similar Documents

Publication Publication Date Title
JP6975807B2 (en) Diagnostic genetic marker panel for colorectal cancer
BRPI0708534A2 (en) molecular assay to predict recurrence of colon cancer dukes b
JP2024020392A (en) Composition for diagnosing liver cancer using CPG methylation changes in specific genes and its use
US11535897B2 (en) Composite epigenetic biomarkers for accurate screening, diagnosis and prognosis of colorectal cancer
WO2004113574A2 (en) Methods for disease screening
US9920376B2 (en) Method for determining lymph node metastasis in cancer or risk thereof and rapid determination kit for the same
US11542559B2 (en) Methylation-based biomarkers in breast cancer screening, diagnosis, or prognosis
CN117587121A (en) Use of markers for diagnosing breast cancer or predicting breast cancer risk
CN102851368A (en) PIK3CA gene mutation fluorescence quantitative PCR genotype detection kit and detection method
EP2450455A1 (en) Method for determining presence or absence of epithelial cancer-origin cell in biological sample, and molecular marker and kit therefor
CN117660634A (en) Use of markers for diagnosing breast cancer or predicting breast cancer risk
WO2024027796A1 (en) Use of marker in diagnosing breast cancer or predicting breast cancer risks
CN117660633A (en) Use of markers for diagnosing breast cancer or predicting breast cancer risk
TW202413655A (en) Use of markers in diagnosing breast cancer or predicting breast cancer risk
CN113811622A (en) Detection of pancreatic ductal adenocarcinoma in plasma
EP3075851B1 (en) Method for acquiring information on gastric cancer and kit for detection of gastric cancer
WO2023145754A1 (en) Primers and probe for detecting presence of bladder cancer
CN116875695A (en) Methylation marker for lung cancer detection, primer probe composition and application thereof
CN115678989A (en) Use of markers for predicting the risk of recurrence and/or metastasis of colorectal cancer
CN115678991A (en) Use of markers for predicting the risk of recurrence and/or metastasis of colorectal cancer
CN117683888A (en) Composition for detecting lung cancer and application thereof
CN117327796A (en) Composition for detecting urothelial cancer and use thereof
CN115678990A (en) Use of markers for predicting the risk of recurrence and/or metastasis of colorectal cancer
CN117126941A (en) Composition for detecting bladder cancer and application thereof
CN117721203A (en) Composition for detecting thyroid cancer and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination