CN110195107A - The rDNA methylation markers of cancer detection and its application in peripheral blood - Google Patents

The rDNA methylation markers of cancer detection and its application in peripheral blood Download PDF

Info

Publication number
CN110195107A
CN110195107A CN201910445136.6A CN201910445136A CN110195107A CN 110195107 A CN110195107 A CN 110195107A CN 201910445136 A CN201910445136 A CN 201910445136A CN 110195107 A CN110195107 A CN 110195107A
Authority
CN
China
Prior art keywords
cancer
site cpg
marker
cpg
sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910445136.6A
Other languages
Chinese (zh)
Other versions
CN110195107B (en
Inventor
汪小我
张祥林
方欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910445136.6A priority Critical patent/CN110195107B/en
Publication of CN110195107A publication Critical patent/CN110195107A/en
Application granted granted Critical
Publication of CN110195107B publication Critical patent/CN110195107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Abstract

The present invention provides the nucleic acid body DNA methylation marker of cancer detection in peripheral blood and its applications.These markers include selected from least one of following site CpG: on the basis of human rebosomal DNA repeated fragment elements reference sequence U13369.1,38974th, 37148,37013,37028,37076,32936,21740,23407, the site CpG or the modified site CpG at 34657,28277 equal positions.Any combination of these markers is additionally provided simultaneously for the system of cancer diagnosis and kit etc..There are notable differences in tumor tissues and nonneoplastic tissue for the methylation state of these markers, the hypomethylation in tumor tissues, these markers combination in test set distinguish patient whether suffer from liver cancer, lung cancer, colorectal cancer ROC respectively reached 96%, 94%, 92%.

Description

The rDNA methylation markers of cancer detection and its application in peripheral blood
Technical field
The invention belongs to field of biological detection, it is related to a kind of marker for cancer detection or postoperative evaluation and its answers With, and in particular to DNA methylation marker object and its application using peripheral blood diagnosis cancer.
Background technique
Peripheral blood detection disease is a kind of minimally invasive or even noninvasive detection mode.There is dissociative DNA in peripheral blood, these trips It is discharged into the DNA in blood from Apoptosis from DNA, therefore, can be identified in body by the analysis to dissociative DNA The some problems of appearance.
DNA methylation is the pith of epigenetics, and DNA methylation has vital work to gene regulation With.It is existing research shows that the generation of cancer and genomic DNA methylation level are very close, this makes by identifying DNA methylation Variation becomes reality to detect cancer.DNA methylation refers in organism under the catalysis of dnmt rna, with S- gland Glycosides methionine is methyl donor, and methyl is transferred to the process in specific base.DNA methylation is main in mammals Occur on the C of CpG, generates 5-methylcytosine.
98% or more the site CpG is distributed in the repetitive sequence with swivel base potential in genome.In normal cell In, these CpG are in high methylation/Transcriptional Silencing state, and these CpG have occurred and widely go first in tumour cell Base leads to the transcription of repetitive sequence, the activation of transposons, increases the unstability of genome.It is remaining to account for total amount 2% or so CpG be densely distributed in the island CpG of gene promoter region.The special aberrant methylation site of screening cancerous tissue facilitates The detection of cancer.
Cancer seriously threatens human life and health, and due to existing marker poor specificity, many cancer patients are past when diagnosing Toward being middle and advanced stage, the chance of radical resection is lost.Therefore the marker pair of cancer peripheral blood methylation high sensitivity is found It is of great significance in the early discovery early treatment of cancer.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.For this purpose, of the invention One purpose is to propose the marker of a kind of early diagnosis that can be used in cancer, postoperative evaluation.
The present inventor has found in the course of the research: rDNA plays the role of extremely important in life process. The transcription product of rDNA occupies the 80% of cell RNA yield, most important for cell translation protein processes.It grinds Study carefully and shows that rDNA transcriptional control will appear exception in cancer.Therefore, the methylation for studying rDNA in cancer is different Often facilitate us and finds the marker of cancer detection.However the methylation of rDNA is not related in conventional analysis.This Outside, autosomal copy number only has 2 in the genome of people, and the copy number of rDNA has about 400, this makes i.e. When keeping sequencing depth shallower, we can also have the methylation status in enough data analyses single site CpG of rDNA. Therefore, the existing manufacturing basis of marker that detection cancer is found on rDNA, also there is optimized integration.
Specifically, the present invention provides the following technical scheme that
According to the first aspect of the invention, the present invention provides a kind of marker for cancer, the marker includes Selected from least one of following site CpG: on the basis of human rebosomal DNA repeated fragment elements reference sequence U13369.1, 38974th, 37148,37013,37028,37076,32936,21740,23407,34657,28277,38982,21695, 19811,32927,32906,32920,36988,32964,30940,19819,39004,31075,38940,19843, 18745,33830,31663,21709,23623,30639,32931,18727,37206,38980,21309,30630, 18737,38596,30647,37072,37162,30936,33838,36193,36218,34719,40812,37088, 15596,22970,18200,19952,14972,21194,23926 and the 24167,36996,37361,37178,39913, The site CpG or the modified site CpG at 30789,37020,36832,34428 or 34805 positions.These markers Methylation state in tumor tissues and nonneoplastic tissue there are notable difference, the hypomethylation in tumor tissues, Ke Yiyong Make the early diagnosis of cancer and the prediction of postoperative recurrence.By taking liver cancer, lung cancer and colorectal cancer as an example, the combination of these markers is being surveyed Examination concentrate distinguish patient whether suffer from liver cancer, lung cancer, colorectal cancer ROC respectively reached 96%, 94%, 92%, have it is high Accuracy.
According to an embodiment of the invention, the marker described above for cancer may further include following technology spy Sign:
In some embodiments of the invention, using human rebosomal DNA repeated fragment elements reference sequence U13369.1 as base Standard, the marker include at least one of following site CpG: the 38974th, 37148,37013,37028,37076, At least one of the site CpG or the modified site CpG at 32936,21740,23407,34657,28277 positions;With And including at least one of following site: the 38982nd, 21695,19811,32927,32906,32920,36988,32964, 30940,19819,39004,31075,38940,19843,18745,33830,31663,21709,23623,30639, 32931,18727,37206,38980,21309,30630,18737,38596,30647,37072,37162,30936, 33838,36193,36218,34719,40812,37088,15596,22970,18200,19952,14972,21194,23926 The site CpG or the modified site CpG at position.
In some embodiments of the invention, using human rebosomal DNA repeated fragment elements reference sequence U13369.1 as base Standard, the marker is including the site CpG at the 38974th, 37148,37013,37028,37076,32936 position or through repairing At least one of site CpG of decorations;And the 21740th, 23407,34657,28277 site CpG at position or through repairing At least one of site CpG of decorations.
In some embodiments of the invention, using human rebosomal DNA repeated fragment elements reference sequence U13369.1 as base Standard, the marker is including the site CpG at the 38974th, 37148,37013,37028,37076,32936 position or through repairing At least two in the site CpG of decorations.
In some embodiments of the invention, using human rebosomal DNA repeated fragment elements reference sequence U13369.1 as base Standard, the marker is including the site CpG at the 38974th, 37148,37013,37028,37076,32936 position or through repairing At least one of site CpG of decorations;And the 21740th, 23407,34657,28277 site CpG at position or through repairing At least two in the site CpG of decorations.
In some embodiments of the invention, the site CpG of the modification includes 5- methylation modification or 5- methylol Change modification.
According to the second aspect of the invention, the present invention provides a kind of primer sequence, the primer sequence is with the present invention the On the one hand nucleotides sequence where the marker is classified as target sequence, the specific amplification for target sequence.
According to the third aspect of the invention we, the present invention provides a kind of probe, the probe is free in solution or consolidates Due on chip, the probe being capable of the specific nucleotide sequence captured where marker described in first aspect present invention.
According to the fourth aspect of the invention, the present invention provides a kind of kits, and the kit is for diagnosing cancer, institute It states kit and contains reagent for detecting marker described in first aspect present invention.
In some embodiments of the invention, the kit further comprises second aspect of the present invention any embodiment institute Probe described in the primer sequence or third aspect present invention stated.
According to the fifth aspect of the invention, the present invention provides markers or primer sequence or probe in preparation cancer Purposes in diagnostic kit, the marker are marker described in first aspect present invention, and the primer sequence is this hair Primer sequence described in bright second aspect, the probe are probe described in third aspect present invention.
In some embodiments of the invention, the cancer includes liver cancer, lung cancer, colorectal cancer, breast cancer, nasopharyngeal carcinoma And/or head and neck cancer.
According to the sixth aspect of the invention, the present invention provides the sides that target site in a kind of determining sample to be tested methylates Method, the target site are the site CpG in marker described in first aspect present invention any embodiment, which comprises (1) conversion processing is carried out to the dissociative DNA in the sample to be tested peripheral blood, so that the Cytosines not methylated are Thymidine, the sample after obtaining conversion processing;(2) sample based on the conversion processing, constructs sequencing library, and sequencing obtains Sequencing data;(3) sequencing data is compared with reference sequences, mesh in the sequencing data is determined based on comparison result The methylation result of mark point.
According to an embodiment of the invention, the method that target site methylates in sample to be tested determined above can be wrapped further Include following technical characteristic:
In some embodiments of the invention, the reference sequences are people's rDNA repeated fragment elements reference sequence U13369.1。
In some embodiments of the invention, the sequencing be by second generation sequencing approach or third generation sequencing approach into Capable.It may be implemented using existing two generations sequencing approach or three generations's sequencing approach to the site CpG in sample to be tested Methylation result is measured.
In some embodiments of the invention, the sequencing is by selected from Hiseq2000, SOliD, 454 and unimolecule At least one progress of sequencing device.
According to the seventh aspect of the invention, the present invention provides one kind for diagnosing cancer or prediction cancer relapse risk System, comprising: conversion treatment device, the conversion treatment device are used for free in sample to be tested peripheral blood DNA carries out conversion processing, so that the Cytosines not methylated are thymidine, obtains the sample of conversion processing;It surveys Sequence device, the sequencing device are connected with the conversion treatment device, sample of the sequencing device based on the conversion processing, Sequencing library is constructed, sequencing obtains sequencing data;Comparison device, the comparison device are connected with the sequencing device, the ratio Device is compared for the sequencing data with reference sequences, marker in the sequencing data is determined based on comparison result The methylation result in the middle site CpG;Result judgement device, the result judgement device are connected with the comparison device, the knot Methylation of the fruit decision maker based on the site CpG in marker in the sequencing data by statistical model as a result, analyzed, judgement Whether the sample to be tested suffers from cancer or the whether easy cancer stricken of the prediction sample to be tested or whether post-surgical cancer recurs;Its In, the marker is marker described in first aspect present invention any embodiment.
According to an embodiment of the invention, diagnostic system described above may further include following technical characteristic:
In some embodiments of the invention, the reference sequences are people's rDNA repeated fragment elements reference sequence U13369.1。
In some embodiments of the invention, the statistical model is multivariate statistical model.It can using multivariate statistical model To analyze multiple sites CpG methylation status with the relationship of cancer, so that the methylation result using the site CpG determines cancer Disease condition realizes the Rapid&Early diagnosis of cancer.
In some embodiments of the invention, the statistical model is suffered from based on multiple cancer patients and the multiple cancer The methylation result in the site CpG is established in person, and the site CpG is to mark described in first aspect present invention any embodiment Object.
In some embodiments of the invention, the multivariate statistical model is logistic regression model, random forest mould At least one of type, SVM model, preferably logistic regression model.Regression model is quantitatively retouched to statistical relationship A kind of mathematical model stated is that calculating mould of the variable about the specific dependence of another variable is studied by model Type.By analysis of regression model, the relationship of the methylation result with cancer in each site CpG or multiple sites CpG can be studied, To which the DNA methylation assay according to the site CpG is as a result, the disease condition of sample to be tested can be determined.Logistic regression model As a kind of linear regression model (LRM) of broad sense, can accurate study of disease and variable relationship.
In some embodiments of the invention, the comparison, match party selected by software are carried out using software bs-seeker2 Formula is Local Alignment (local alignment).Select bs-seeker2 matched the reason is that the software support ' local The match pattern of alignment ' helps to improve the ratio for matching back reference sequences using this match pattern, increases analysis As a result robustness.
The group cooperation obtained by the present invention having the beneficial effect that using each site CpG provided by the invention or the site CpG For marker, the methylation of part ribosomal dna sequence in detection patient's peripheral blood can be passed through using peripheral blood in patients as sample Cancer diagnosis can be realized in state, to can realize diagnosis cancer in time in the case where noninvasive or minimally invasive.And this Analyte detection cancer is marked provided by invention, specificity and sensitivity are very high, and these markers copying in genome Shellfish number is more, and less marker can be thus achieved high-precision and detect.
Detailed description of the invention
Fig. 1 is the ROC curve that the rDNA site that embodiment according to the present invention provides is used for peripheral blood of patients with primary hepatocellular carcinoma detection Figure.
Fig. 2 is that the rDNA site that embodiment according to the present invention provides is used for peripheral blood lung cancer, breast cancer and nasopharynx The ROC curve figure of cancer detection.
Fig. 3 is the ROC that the rDNA site that embodiment according to the present invention provides is used for peripheral blood RRBS lung cancer detection Curve graph.
Fig. 4 is the ROC that the rDNA site that embodiment according to the present invention provides is used for peripheral blood RRBS lung cancer detection Curve graph.
Liver cancer postoperative effect figure is assessed for peripheral blood in the rDNA site that Fig. 5 embodiment according to the present invention provides.
Fig. 6 is the system for diagnosing cancer or predicting cancer risk that embodiment according to the present invention provides Structural schematic diagram.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
For convenience it will be appreciated by those skilled in the art that herein presented certain terms are explained and illustrated, need Illustrate, these explanation and illustrations are only used to that those skilled in the art is helped to understand the present invention, and cannot regard as It is limiting the scope of the invention.
Herein, the site CpG indicates dinucleotides pair, and after base guanine (G) follows cytimidine (C) closely, CpG is that born of the same parents are phonetic The abbreviation of pyridine (C)-phosphoric acid (p)-guanine (G).
Herein, " marker ", which refers to, can be used to indicate that the case where subject is with cancer.These markers can be Nucleic acid sequence, macromolecular, small molecule etc., such as can be the nucleic acid sequence of certain length, it is also possible to a specific site Nucleotide or two specific sites nucleotide, as long as the case where can be used to indicate that subject with liver cancer.According to this The embodiment of invention, marker provided by the invention refer to can be used in detecting or diagnose subject whether suffer from cancer or Person whether the site CpG whether susceptible cancer or post-surgical cancer recur.
The present invention provides utilize marker and the application that can be used to detect cancer.These markers are from human rebosomal It is screened in DNA reference sequences.Present invention discloses the sequence areas of human rebosomal DNA methylation exception, filter out It can be using the relatively high site CpG of the accuracy of 10 of liver cancer predictions of peripheral blood DNA detection or diagnosis cancer and pre- Survey or diagnose preparatory 45 slightly lower sites CpG of cancer.The methylation state in these regions is in tumor tissues and non-swollen There are notable differences in tumor tissue, can be used to disclose the disease condition of tested sample.
According to an aspect of the present invention, the present invention provides a kind of marker for cancer, the marker is with people On the basis of rDNA repeated fragment elements reference sequence U13369.1, selected from least one of following site CpG: the 38974,37148,37013,37028,37076,32936,21740,23407,34657,28277,38982,21695, 19811,32927,32906,32920,36988,32964,30940,19819,39004,31075,38940,19843, 18745,33830,31663,21709,23623,30639,32931,18727,37206,38980,21309,30630, 18737,38596,30647,37072,37162,30936,33838,36193,36218,34719,40812,37088, The site CpG or the modified site CpG at 15596,22970,18200,19952,14972,21194,23926 positions. The site CpG as marker can be any one in these sites, and any two, any three, any four are appointed Meaning five, any six, any seven, any eight, any nine, even all ten.When the position CpG for being used as marker When point is more, cancer diagnosis is carried out by these markers, diagnostic result obtained is more reliable.The methylation of these markers State can indicate kinds cancer, including but not limited to liver cancer, lung cancer, colorectal cancer, breast cancer, nasopharyngeal carcinoma etc..
Other than the above site CpG, these markers for being used for cancer can also include at least one in following sites: On the basis of human rebosomal DNA repeated fragment elements reference sequence U13369.1, the 24167th, 36996,37361,37178, The site CpG or the modified site CpG at 39913,30789,37020,36832,34428 or 34805 positions.These Before the site CpG or the modified site CpG at position after study, discovery it can be used to the early diagnosis of liver cancer and Screening, wherein correspondingly content is recorded in the Chinese patent application that number of patent application is 201910157582.7.Meanwhile these Site by further confirming, it is found that these sites can be also used for early diagnosis and the postoperative recurrence of other cancers in the application Prediction, and be not merely limited to liver cancer.It certainly, can be independent it has been investigated that can be used for the marker of liver cancer before these Or be applied in combination, for other cancers in addition to liver cancer;It can also be applied to cancer with CpG Sites Combination mentioned above Diagnosis, detection and postoperative evaluation.
In at least some embodiments, the CpG at the 38974th, 37148,37013,37028,37076,32936 position The prediction rate in site is relatively reliable, can be used alone, and either combines two of them or three applications.In at least some implementations In mode, the cancer diagnosis rate in the site CpG and other 45 sites CpG at the 21740th, 23407,34657,28277 position It is lower than other sites, relatively reliable diagnostic result can be obtained with combined application.
Herein, refer to " on the basis of human rebosomal DNA repeated fragment elements reference sequence U13369.1 " herein in table It is the table carried out with the position in human rebosomal DNA repeated fragment elements reference sequence U13369.1 when stating these sites CpG It states.These CpG contained in the human rebosomal DNA repeated fragment elements reference sequence U13369.1 being embodied in Genebank Point may be used as the marker of liver cancer, by can be with the whether easy cancer stricken of forecast sample or diagnosis to these CpG Locus Analysis in Shoots Whether cancer is suffered from.Perhaps, the position in these sites CpG can update or because of disparate databases with the data of database The difference of characteristic manner and change, but these variation do not influence these sites for diagnosing liver cancer function.These become Change is also contained within protection scope of the present invention.
In at least some embodiments of the invention, the modification in the site CpG includes 5- methylation modification, 5- hydroxyl first Baseization modification.Based on these markers, can be handled by human peripheral blood DNA, the diagnosis for cancer.It can also be based on These markers, the detection reagent or kit of preparation detection cancer.
According to another aspect of the present invention, the present invention provides a kind of methods for diagnosing cancer, comprising: (1) treats test sample Dissociative DNA in this peripheral blood carries out methylation processing, so that the Cytosines not methylated are thymidine, obtains The sample of conversion processing;(2) sample based on the conversion processing, constructs sequencing library, and sequencing obtains sequencing data;(3) will The sequencing data is compared with human rebosomal DNA reference sequences, is determined in the sequencing data and is marked based on comparison result The methylation result in the site CpG in object;(4) methylation based on the site CpG in the sequencing data is as a result, pass through statistical model Analysis, determines whether the sample to be tested suffers from cancer.It should be noted that this method can not only be used to judge sample to be tested Whether cancer is suffered from, can also predict that sample to be tested future suffers from the risk of cancer, to realize early a little treatment or prevention.
Herein, when the dissociative DNA in sample to be tested peripheral blood carries out conversion processing, it can use weight bisulfites Carry out the conversion processing.Commercially available heavy bisulfite agent box can directly be bought or oneself prepared and obtained.It needs It is bright, it is well known by those skilled in the art that either utilizing weight bisulfites, sulfurous acid when carrying out conversion processing Hydrogen salt, bisulphate, bisulfite etc. can achieve the purpose of above-mentioned conversion processing, i.e., will be in peripheral blood in dissociative DNA The non-thymidine of the Cytosines not methylated, obtains the sample of conversion processing, therefore, either utilizes these reagents In which kind of carry out conversion processing, be all included in the scope of protection of the present invention.As long as moreover, can obtain at above-mentioned conversion The purpose of reason, i.e., comprising within the protection scope of the present invention.
It carries out building library in the dissociative DNA to sample to be tested peripheral blood, be sequenced, to obtain the methylation result in each site CpG When, technological means generally in the art can be used.In at least some embodiments, at least some embodiments, sharp It is methylated with full-length genome and the methylation for obtaining each site CpG is sequenced as a result, obtaining methylation result using RRBS.Example Such as, blood samples of patients sample is obtained into blood plasma by 16000 × g centrifugal filtration in 1600 × g and 10 minute in 10 minutes;Pass through DSP Blood Mini Kit (Qiagen) extracts DNA, and each patient dna sample is extracted from the blood plasma of 4mL;Use Illumina's Paired-End Sequencing Sample Preparation Kit carries out methylation connector;Next, sequencing library makes It is purified with AMPure XP magnetic beads (Beckman Coulter), then utilizes EpiTect Plus DNA The heavy bisulfite conversion of Bisulfite Kit (Qiagen) progress two-wheeled;Product is carried out to the PCR amplification of 10 circulations, Finally single-ended sequencing is carried out at HiSeq 2000 (Illumina).
The present invention also provides a kind of systems for diagnosing cancer or predicting cancer risk, as shown in fig. 6, packet Include: conversion treatment device, sequencing device, comparison device and result judgement device, the conversion treatment device is for treating test sample Dissociative DNA in this person peripheral blood carries out methylation processing, so that the Cytosines not methylated are thymidine, obtains Obtain the sample of conversion processing;The sequencing device is connected with conversion treatment device, and the sequencing device is based on the conversion processing Sample, construct sequencing library, sequencing data is obtained in microarray dataset;The comparison device is connected with the sequencing device, The comparison device is compared for the sequencing data with reference sequences, is determined in the sequencing data based on comparison result The methylation result in the site marker location CpG;The result judgement device is connected with the comparison device, the result judgement Methylation of the device based on the site CpG in the sequencing data determines that the sample to be tested is as a result, by statistical model analysis It is no to suffer from cancer or the prediction whether easy cancer stricken of sample to be tested.In some embodiments, available statistical model is Multivariate statistical model.Wherein available multivariate statistical model include but is not limited to logistic regression model, Random Forest model, SVM model etc..
The solution of the present invention is explained below in conjunction with embodiment.It will be understood to those of skill in the art that following Embodiment is merely to illustrate the present invention, and should not be taken as limiting the scope of the invention.Particular technique or item are not specified in embodiment Part, it described technology or conditions or is carried out according to the literature in the art according to product description.Agents useful for same or instrument Production firm person is not specified in device, and being can be with conventional products that are commercially available.
The site CpG of difference on 1 full-length genome of embodiment methylation sequencing data screening rDNA
It is entitled that we using 2013 were published in PNAS " Noninvasive detection of cancer- associated genome-wide hypomethylation and copy number aberrations by plasma The peripheral blood weight bisulfite sequencing data delivered in DNA bisulfite sequencing " article, data deposit in Europe Continent genome-phenotype archives (European Genome-Phenome Archive), searching number EGAS00001000566.This In using to Healthy People (32), HBV infection non-cancer patient (8), early liver cancer patient (I phase, II phase, 26) peripheral blood DNA methylation data, and wherein 15 pairs of liver cancer tissues and leukocytic cream DNA methylation data.
The reference sequences U13369.1 of manned rDNA repeated fragment unit under in GENBANK, overall length are total 42999 bases, 3288 sites CpG.Sequencing data is matched back on reference sequences using bs-seeker2 software U13369.1, no longer removal sequencing repeat, and reason is that the sequencing coverage on rDNA is relatively high.It calculates CpG each C number of methylation of point and the C number that do not methylate.
Next, screening out the few site CpG of those matching times, 2871 effective sites CpG are obtained.
At this moment, patient is split into two parts at random, a part is used as training set, and a part is used as test set, wherein point Do not select respectively the non-cancer people of 2/3 Healthy People, 2/3 HBV infection, 2/3 liver cancer patient as training set, remaining patient As test set.The selection markers object on training set, is tested on test set.Random split process repeats 100 times, into The subsequent analytical procedure of row.
Non-cancer, cancer patient can effectively be distinguished by being filtered out on 2871 effective sites CpG using training set data The site CpG.Basic operation is to distinguish non-cancer, cancer patient using the methylation level in each site CpG, draw each CpG's ROC (receiver operating characteristic) curve calculates AUC (area under curve).To each position The AUC of point is sorted from large to small, and screens preceding 200 sites CpG, and the AUC in general preceding 200 sites CpG can be greater than 80%.
Using preceding 200 sites CpG filtered out, carry out feature selecting using training set data, that is, filter out effect compared with Then the logistic regression model of regularization is trained in the site CpG well, the site CpG that feature selecting obtains is target mark Remember object.Model is obtained in test integrated test using screening, examines the effect of selected bits point and model.
By the random fractionation of 100 training sets and test set, 100 regression models and CpG corresponding are obtained The combination of point, it is therefore an objective to avoid the randomness split, further, calculate the number that the site CpG is selected in 100 experiments.Fig. 1 It is given at the distribution of ROC curve in 100 test set tests.Wherein abscissa represents 1- specificity in Fig. 1, refers to false positive Rate;Ordinate represents sensibility, refers to true positive rate.Area under ROC curve is known as AUC, and the area of AUC is bigger, correctly Rate is higher.The mean value and variance of AUC in 100 tests are labelled on Fig. 1.
By above-mentioned calculating, i.e., the selected number in the site CpG in 100 times experiments obtains following site: the 38974th, 37148,37013,37028,37076,32936 and the 24167,36996,37361,37178,39913,30789,37020, The site CpG of 36832,34428,34805 positions, which has, preferably chooses number, followed by 21740, and 23407,34657, 28277,38982,21695,19811,32927,32906,32920,36988,32964,30940,19819,39004, 31075,38940,19843,18745,33830,31663,21709,23623,30639,32931,18727,37206, 38980,21309,30630,18737,38596,30647,37072,37162,30936,33838,36193,36218, 34719,40812,37088,15596,22970,18200,19952,14972,21194,23926 sites.
Site estimation lung cancer, breast cancer, the effect of nasopharyngeal carcinoma that embodiment 2 is obtained using model in embodiment 1 and screening
Data are using the reference data provided in embodiment 1, wherein breast cancer 5, and lung cancer 4, nasopharyngeal carcinoma 9.Here Cancer patient number it is limited, these types of cancer does not distinguish cancer types.By the liver cancer obtained model of training in embodiment 1 and Site is used to predict the distinction of these three cancers and reserved Healthy People.The purpose of the present embodiment is that verifying embodiment 1 is screened Site and model can be used for the detection of other cancer types.Fig. 2 provides the ROC curve figure of prediction.Figure it is seen that Average AUC reaches 0.82, achieves good prediction effect.The site for illustrating that embodiment 1 is screened can be used for breast cancer, lung The prediction of the cancers such as cancer, nasopharyngeal carcinoma.
Embodiment 3 predicts lung cancer, colorectal cancer using peripheral blood RRBS data
It is entitled that we using 2017 were published in PNAS " Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor The peripheral blood degeneracy representativeness bisulfite delivered in tissue-of-origin mapping from plasma DNA " article (RRBS) data are sequenced in salt, and data deposit in Gene Expression Omnibus (GEO), searching number GSE79279.Here Use the outer of the peripheral blood DNA methylation data and its colorectal cancer 30 for arriving Healthy People (75), lung cancer patient (29) All blood DNA methylation data.
The reference sequences U13369.1 of manned rDNA repeated fragment unit under in GENBANK, overall length are total 42999 bases, 3288 sites CpG.Sequencing data is matched back on reference sequences using bs-seeker2 software U13369.1, no longer removal sequencing repeat, and reason is that the sequencing coverage on rDNA is relatively high.It calculates CpG each C number of methylation of point and the C number that do not methylate.Used herein is RRBS data, and the number of sites that RRBS technology can detect is small In WGBS, therefore the technology cannot cover all sites CpG.
Next, screening out the few site CpG of those matching times, subsequent training process and embodiment 1 are consistent, Fig. 3 and Fig. 4 ROC curve figure in corresponding test set is given, good prediction effect is obtained.
By above-mentioned screening, the result of acquisition is as follows: the 21740th, 23407,34657,28277 and the 37020 site It is better anticipated effect, the 36193rd, 36218,34719,40812,37088,15596,22970,18200,19952, 14972,21194,23926 also show good prediction effect.38974th, 37148,37013,37028,37076, 32936,38982,21695,19811,32927,32906,32920,36988,32964,30940,19819,39004, 31075,38940,19843,18745,33830,31663,21709,23623,30639,32931,18727,37206, 38980,21309,30630,18737,38596,30647,37072,37162,30936,33838 and the 24167, The site CpG at 36996,37361,37178,39913,30789,36832,34428,34805 positions is because of RRBS Technical Board The reasons such as sex-limited, effect are declined or without signals.
Embodiment 4 carries out the postoperative assessment of liver cancer using peripheral blood data
Used here as the full-length genome peripheral blood methylation information of two patients in embodiment 1 perioperatively, implementation is used Logistic regression model in example 1 and the site filtered out predict the postoperative patient cancer probability of two patients, such as Fig. 5, Wherein TBR34 it is postoperative still have it is very high suffer from cancer probability, in fact the patient 8 months after surgery is just dead, and patient TBR36 is postoperative Suffer from cancer probability and just decline very low, still survives within 20 months after surgery.Show to be consistent with marker prediction result.
Herein, term " first ", " second " are used for description purposes only, and are not understood to indicate or imply relatively important Property or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Person implicitly includes at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, Three etc., unless otherwise specifically defined.
In the present invention unless specifically defined or limited otherwise, the terms such as term " connected ", " connection " should do broad sense reason Solution, for example, it may be being fixedly connected, may be a detachable connection, or integral;It can be mechanical connection, be also possible to electricity Connection can communicate each other;It can be directly connected, can also can be inside two elements indirectly connected through an intermediary Connection or two elements interaction relationship, unless otherwise restricted clearly.For those of ordinary skill in the art and Speech, the specific meanings of the above terms in the present invention can be understood according to specific conditions.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any It can be combined in any suitable manner in a or multiple embodiment or examples.In addition, without conflicting with each other, the technology of this field The feature of different embodiments or examples described in this specification and different embodiments or examples can be combined by personnel And combination.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims (10)

1. a kind of marker for cancer, which is characterized in that including selected from least one of following site CpG:
On the basis of human rebosomal DNA repeated fragment elements reference sequence U13369.1, the 38974th, 37148,37013, 37028,37076,32936,21740,23407,34657,28277,38982,21695,19811,32927,32906, 32920,36988,32964,30940,19819,39004,31075,38940,19843,18745,33830,31663, 21709,23623,30639,32931,18727,37206,38980,21309,30630,18737,38596,30647, 37072,37162,30936,33838,36193,36218,34719,40812,37088,15596,22970,18200, 19952,14972,21194,23926 and the 24167,36996,37361,37178,39913,30789,37020,36832, At least one of the site CpG or the modified site CpG at 34428 or 34805 positions.
2. marker according to claim 1, which is characterized in that with human rebosomal DNA repeated fragment elements reference sequence On the basis of U13369.1, the marker includes at least one of following site CpG: the 38974th, 37148,37013, The site CpG or the modified site CpG at 37028,37076,32936,21740,23407,34657,28277 positions At least one;And including at least one of following site: the 38982nd, 21695,19811,32927,32906,32920, 36988,32964,30940,19819,39004,31075,38940,19843,18745,33830,31663,21709, 23623,30639,32931,18727,37206,38980,21309,30630,18737,38596,30647,37072, 37162,30936,33838,36193,36218,34719,40812,37088,15596,22970,18200,19952, The site CpG or the modified site CpG at 14972,21194,23926 positions;
Optionally, on the basis of human rebosomal DNA repeated fragment elements reference sequence U13369.1, the marker includes the In the site CpG or the modified site CpG at 38974,37148,37013,37028,37076,32936,21740 positions At least one;And in the 21740th, 23407,34657,28277 site CpG or the modified site CpG at position At least one.
3. marker according to claim 1 or 2, which is characterized in that with human rebosomal DNA repeated fragment elements reference sequence On the basis of arranging U13369.1, the marker includes at the 38974th, 37148,37013,37028,37076,32936 position At least two in the site CpG or the modified site CpG;
Optionally, on the basis of human rebosomal DNA repeated fragment elements reference sequence U13369.1, the marker includes the In the site CpG or the modified site CpG at 38974,37148,37013,37028,37076,32936 positions at least One;And in the 21740th, 23407,34657,28277 site CpG or the modified site CpG at position at least Two;
Optionally, the modified site CpG includes 5- methylation modification or the modification of 5- methylolation.
4. a kind of primer sequence, which is characterized in that the primer sequence is where the marker any in claims 1 to 3 Nucleotides sequence is classified as target sequence, the specific amplification for target sequence.
5. a kind of probe, the probe is free in solution or is fixed on chip, which is characterized in that the probe can be special Nucleotide sequence where the opposite sex capture any marker of claims 1 to 33.
6. a kind of kit, which is characterized in that for diagnosing cancer, the kit contains for detecting right the kit It is required that the reagent of 1~3 any marker;
Optionally, the kit further comprises probe described in primer sequence as claimed in claim 4 or claim 5.
7. any marker of claims 1 to 3 or primer sequence as claimed in claim 4 or claim 5 institute The probe stated is preparing the purposes in cancer diagnosing kit;
Optionally, the cancer includes liver cancer, lung cancer, colorectal cancer, breast cancer, nasopharyngeal carcinoma and/or head and neck cancer.
8. a kind of method that target site methylates in determining sample to be tested, the target site is any in claims 1 to 3 The site CpG in the marker, which comprises
(1) conversion processing is carried out to the dissociative DNA in the sample to be tested peripheral blood, so that the cytimidine not methylated turns Thymidine is turned to, the sample of conversion processing is obtained;
(2) sample based on the conversion processing, constructs sequencing library, and sequencing obtains sequencing data;
(3) sequencing data is compared with reference sequences, target position in the sequencing data is determined based on comparison result The methylation result of point;
Optionally, the reference sequences are people's rDNA repeated fragment elements reference sequence U13369.1;
Optionally, the sequencing is carried out by second generation sequencing approach or third generation sequencing approach;
Optionally, the sequencing be by selected from Hiseq2000, SOliD, 454 and single-molecule sequencing device it is at least one into Capable.
9. a kind of system for diagnosing cancer or predicting cancer relapse risk characterized by comprising
Conversion treatment device, the conversion treatment device is for turning the dissociative DNA in sample to be tested peripheral blood Change processing obtains the sample of conversion processing so that the Cytosines not methylated are thymidine;
Sequencing device, the sequencing device are connected with conversion treatment device, sample of the sequencing device based on the conversion processing This, constructs sequencing library, and sequencing obtains sequencing data;
Comparison device, the comparison device are connected with the sequencing device, and the comparison device is for the sequencing data and ginseng It examines sequence to be compared, the methylation result in the site CpG in marker in the sequencing data is determined based on comparison result;
Result judgement device, the result judgement device are connected with the comparison device, and the result judgement device is based on described The methylation in the site CpG determines whether the sample to be tested suffers from as a result, by statistical model analysis in marker in sequencing data Have whether cancer or the whether easy cancer stricken of the prediction sample to be tested or post-surgical cancer recur;
Wherein, the marker is any marker of claims 1 to 3.
10. system according to claim 9, which is characterized in that the reference sequences are people's rDNA repeated fragment list First reference sequences U13369.1;
Optionally, the statistical model is multivariate statistical model;
Optionally, the statistical model is the methylation based on the site CpG in multiple cancer patients and the multiple cancer patient As a result it establishes, the site CpG is the site CpG in claims 1 to 3 in any marker;
Optionally, the multivariate statistical model is logistic regression model, Random Forest model, at least one in SVM model Kind, preferably logistic regression model;
Optionally, the comparison is carried out using software bs-seeker2, matching way selected by software is Local Alignment (local alignment)。
CN201910445136.6A 2019-05-27 2019-05-27 Ribosomal DNA methylation marker for detecting cancer in peripheral blood and application thereof Active CN110195107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910445136.6A CN110195107B (en) 2019-05-27 2019-05-27 Ribosomal DNA methylation marker for detecting cancer in peripheral blood and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910445136.6A CN110195107B (en) 2019-05-27 2019-05-27 Ribosomal DNA methylation marker for detecting cancer in peripheral blood and application thereof

Publications (2)

Publication Number Publication Date
CN110195107A true CN110195107A (en) 2019-09-03
CN110195107B CN110195107B (en) 2023-04-14

Family

ID=67753136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910445136.6A Active CN110195107B (en) 2019-05-27 2019-05-27 Ribosomal DNA methylation marker for detecting cancer in peripheral blood and application thereof

Country Status (1)

Country Link
CN (1) CN110195107B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110982907A (en) * 2020-02-27 2020-04-10 上海鹍远生物技术有限公司 Thyroid nodule-related rDNA methylation marker and application thereof
CN112375822A (en) * 2020-06-01 2021-02-19 广州市基准医疗有限责任公司 Methylation biomarker for detecting breast cancer and application thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104611410A (en) * 2013-11-04 2015-05-13 北京贝瑞和康生物技术有限公司 Noninvasive cancer detection method and its kit

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104611410A (en) * 2013-11-04 2015-05-13 北京贝瑞和康生物技术有限公司 Noninvasive cancer detection method and its kit

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
K. C. ALLEN CHAN等: "Noninvasive detection of cancer-associated genomewide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing", 《PNAS》 *
XIANGLIN ZHANG等: "Ribosomal DNA methylation as stable biomarkers for detection of cancer in plasma", 《BIORXIV》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110982907A (en) * 2020-02-27 2020-04-10 上海鹍远生物技术有限公司 Thyroid nodule-related rDNA methylation marker and application thereof
CN110982907B (en) * 2020-02-27 2020-07-03 上海鹍远生物技术有限公司 Thyroid nodule-related rDNA methylation marker and application thereof
CN113308540A (en) * 2020-02-27 2021-08-27 上海鹍远生物技术有限公司 Thyroid nodule-related rDNA methylation marker and application thereof
CN112375822A (en) * 2020-06-01 2021-02-19 广州市基准医疗有限责任公司 Methylation biomarker for detecting breast cancer and application thereof
WO2021244423A1 (en) * 2020-06-01 2021-12-09 广州市基准医疗有限责任公司 Methylated biomarker for detecting breast cancer, and use thereof

Also Published As

Publication number Publication date
CN110195107B (en) 2023-04-14

Similar Documents

Publication Publication Date Title
CN108753967A (en) A kind of gene set and its panel detection design methods for liver cancer detection
CN109825583B (en) Marker for early diagnosis of liver cancer by DNA methylation of human repeat element and application of marker
CN111254194B (en) Cancer-related biomarkers based on sequencing and data analysis of cfDNA and application thereof in classification of cfDNA samples
CN106460046A (en) Detecting colorectal neoplasm
CN109825584B (en) DNA methylation marker for diagnosing early liver cancer by using peripheral blood and application thereof
CN112501293B (en) Reagent combination for detecting liver cancer, kit and application thereof
CN109072310A (en) Cancer is detected in urine
Tanić et al. Epigenome-wide association studies for cancer biomarker discovery in circulating cell-free DNA: technical advances and challenges
Poage et al. Identification of an epigenetic profile classifier that is associated with survival in head and neck cancer
CN112322736A (en) Reagent combination for detecting liver cancer, kit and application thereof
CN112280865B (en) Reagent combination for detecting liver cancer, kit and application thereof
CN107142320B (en) Gene marker for detecting liver cancer and application thereof
CN110257525A (en) There is the marker and application thereof of conspicuousness to diagnosing tumor
CN111424093B (en) Kit, device and method for lung cancer diagnosis
CN110195107A (en) The rDNA methylation markers of cancer detection and its application in peripheral blood
WO2019149093A1 (en) Gene marker for detecting esophageal cancer, use thereof and detection method therefor
KR101992796B1 (en) Method for providing information of prediction and diagnosis of hypertension using methylation level of SGK1 gene and composition therefor
US20140206565A1 (en) Esophageal Cancer Markers
CN107119144A (en) Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_55 application
CN113817822B (en) Tumor diagnosis kit based on methylation detection and application thereof
CN114395623B (en) Gene methylation detection primer composition, kit and application thereof
CN107227366A (en) Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_113 application
CN107151708A (en) Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_13 application
TWI721414B (en) Methods for early prediction, treatment response, recurrence and prognosis monitoring of breast cancer
WO2022190752A1 (en) Cancer test reagent set, method for producing cancer test reagent set, and cancer test method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant