CN112037854A - Method and system for acquiring tumor methylation marker based on methylation chip data - Google Patents
Method and system for acquiring tumor methylation marker based on methylation chip data Download PDFInfo
- Publication number
- CN112037854A CN112037854A CN202011100217.1A CN202011100217A CN112037854A CN 112037854 A CN112037854 A CN 112037854A CN 202011100217 A CN202011100217 A CN 202011100217A CN 112037854 A CN112037854 A CN 112037854A
- Authority
- CN
- China
- Prior art keywords
- value
- beta
- site
- sample data
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011987 methylation Effects 0.000 title claims abstract description 157
- 238000007069 methylation reaction Methods 0.000 title claims abstract description 157
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 142
- 239000003550 marker Substances 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012216 screening Methods 0.000 claims abstract description 120
- 238000012360 testing method Methods 0.000 claims abstract description 61
- 238000007781 pre-processing Methods 0.000 claims abstract description 40
- 239000011159 matrix material Substances 0.000 claims description 150
- 230000035945 sensitivity Effects 0.000 claims description 49
- 238000007689 inspection Methods 0.000 claims description 9
- 230000007067 DNA methylation Effects 0.000 abstract description 9
- 238000004883 computer application Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 description 9
- 238000001914 filtration Methods 0.000 description 6
- 239000011324 bead Substances 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- IOVCWXUNBOPUCH-UHFFFAOYSA-M Nitrite anion Chemical compound [O-]N=O IOVCWXUNBOPUCH-UHFFFAOYSA-M 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 238000000692 Student's t-test Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013399 early diagnosis Methods 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- 238000007427 paired t-test Methods 0.000 description 2
- 238000002203 pretreatment Methods 0.000 description 2
- 238000010998 test method Methods 0.000 description 2
- 108020004414 DNA Proteins 0.000 description 1
- 230000004568 DNA-binding Effects 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Abstract
The invention provides a method and a system for acquiring tumor methylation markers based on methylation chip data, wherein the method comprises the following steps: acquiring sample data in a methylation chip; preprocessing sample data; carrying out T test on the pretreatment result; and screening the tumor methylation markers from the pretreatment result according to the T test result. The system comprises modules corresponding to the method steps. The method and the system for obtaining the tumor methylation marker based on the methylation chip screen the tumor methylation marker based on the DNA methylation chip data, improve the screening reliability and effectiveness of the tumor methylation marker, have simple and feasible screening method, and can be widely applied to the field of medical computer application.
Description
Technical Field
The invention relates to the technical field of methylation, in particular to a method and a system for acquiring a tumor methylation marker based on methylation chip data.
Background
Currently, DNA methylation affects the binding of DNA and protein to each other by modifying DNA bases, thereby playing an important role in normal development and disease development. Research shows that the abnormality of DNA methylation level has a close relationship with the occurrence of tumor, so that the search of markers related to early diagnosis of tumor from the DNA methylation map of tumor becomes one of the hot spots in recent years. The currently commonly used high throughput methylation quantification platform comprises a methylation chip platform and a methylation high throughput sequencing platform. Compared with a sequencing platform, the methylation chip platform has the characteristics of low cost and high sensitivity, so that the methylation chip platform can be applied to larger-scale clinical samples to mine more representative methylation markers. Generally, a simple statistical hypothesis testing method will be used to identify sites where there is a significant population difference in methylation signal in normal and tumor tissues, however this method may lead to false positive cases by failing to take into account the specific methylation signal distribution. Therefore, it is of great significance to improve the existing method to screen out methylation markers with higher accuracy for early diagnosis of tumors.
Disclosure of Invention
One of the objectives of the present invention is to provide a method and a system for obtaining tumor methylation markers based on methylation chip data, wherein tumor methylation markers are screened based on DNA methylation chip data, so that the reliability and effectiveness of tumor methylation marker screening are improved, and the screening method is simple, convenient and feasible, and can be widely applied to the field of medical computer application.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps:
obtaining sample data in the methylation chip;
preprocessing the sample data;
carrying out T test on the pretreatment result;
and screening out tumor methylation markers from the pretreatment result according to the T test result.
Preferably, the preprocessing the sample data includes:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
Preferably, the T-test on the preprocessing result includes:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
Preferably, said screening said pre-treatment results for tumor methylation markers according to T-test results comprises:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T inspection module is used for carrying out T inspection on the preprocessing result;
and the screening module is used for screening the tumor methylation markers from the pretreatment result according to the T test result.
Preferably, the preprocessing module performs operations including:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
Preferably, the T-test module performs operations comprising:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
Preferably, the screening module performs operations including:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of a method for obtaining tumor methylation markers based on methylation chip data according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of:
s101, acquiring sample data in the methylation chip;
step S102, preprocessing the sample data;
step S103, carrying out T test on the preprocessing result;
and step S104, screening out tumor methylation markers from the pretreatment result according to the T test result.
The working principle of the technical scheme is as follows:
performing signal detection on hybridization of the DNA sequence treated based on the nitrite by using a methylation chip; firstly, acquiring sample data in a methylation chip; the sample data includes: tumor tissue sample data and normal tissue sample data; secondly, preprocessing sample data acquired from the methylation chip; the purpose of pretreatment is to screen out valuable sites in sample data; then, carrying out T test on the pretreatment result; the purpose of carrying out T test on the preprocessing result is to obtain the P value of the preprocessing result data; finally, tumor methylation markers are screened from the pre-treatment results according to the T test result, i.e. the P value of the data of the treatment results.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the sample data obtained from the methylation chip is preprocessed, the preprocessing result is subjected to T test, and finally the tumor methylation marker is screened from the preprocessing result according to the T test result, so that the screening of the tumor methylation marker based on the DNA methylation chip data is completed, the screening reliability and effectiveness of the tumor methylation marker are improved, the screening method is simple, convenient and feasible, and the method can be widely applied to the field of medical computer application.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which is used for preprocessing the sample data and comprises the following steps:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
The working principle of the technical scheme is as follows:
the sample data in the methylation chip comprises tumor tissue sample data and normal tissue sample data; the preset site matching rule specifically comprises the following steps:
firstly, calculating Beta values of a site in tumor tissue sample data and a site in normal tissue sample data;
beta values range from 0 to 1;
the significance of the Beta value is:
1. any Beta value greater than or equal to 0.6 represents complete methylation;
2. any Beta value of 0.2 or less represents complete unmethylated;
3. beta values between 0.2 and 0.6 represent partial methylation;
matching all Beta values of a certain point in tumor tissue sample data with all Beta values of the certain point in normal tissue sample data according to the Beta meanings, and taking the certain point as the same point when the difference between the Beta value matching number and the total number is within 25%;
then, acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data, and acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
the preset intermediate threshold specifically comprises: selecting a threshold range of 0.2 to 0.6 according to the significance of the Beta value to distinguish whether the Beta value is methylated or not, and selecting a value from the threshold range of 0.2 to 0.6 as an intermediate threshold; for example: in the range of 0.2 to 0.6, sequentially increasing 0.05 as a selected point from 0.2 until the selected point is increased to 0.6, and selecting a value from a plurality of selected points as an intermediate threshold value;
the preset matrix coordinates are specifically: the first Beta numerical matrix and the second Beta numerical matrix have the same rows and columns, and any coordinate in the range of the rows and the columns is randomly selected;
then, if the corresponding first Beta value of the matrix coordinate in the first Beta numerical matrix is greater than or equal to the intermediate threshold value, and the corresponding second Beta value of the matrix coordinate in the second Beta numerical matrix is less than or equal to the intermediate threshold value, taking the matrix coordinate as a first index to be processed; if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
and calculating the sensitivity of the same site according to the number of the first to-be-processed coordinates and the second to-be-processed coordinates, and selecting the same site with the sensitivity larger than a preset sensitivity threshold value as a primary screening site.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the same sites are obtained by matching the sites of the tumor tissue sample data and the normal tissue sample data in the sample data, and the most valuable primary screening sites are screened out from the same sites according to the sensitivity and used as the preprocessing result, so that the accuracy of obtaining the tumor methylation marker is improved.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, wherein the T test on a pretreatment result comprises the following steps:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
The working principle of the technical scheme is as follows:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data; acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data; performing paired T test on the third Beta numerical matrix and the fourth Beta numerical matrix, and assuming that tomor is greater than normal; the T test specifically comprises the following steps: the student T test is mainly used for a test method of the difference degree of two average values of a small sample, and the probability of difference occurrence is deduced by using a T distribution theory so as to judge whether the difference of the two average values is obvious or not; the P value is specifically: a sum of probabilities of the sample or more extreme results than the sample occurring, provided that the original hypothesis is true; the traditional test needs to compare the statistic with the critical value, but the critical value is changed according to distribution and freedom degree, and the method adopting the P value does not need to guide the critical value, is simple and has more information than the critical value.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the P value of the primary screening site is obtained by performing T test on the preprocessing result, namely the primary screening site, and the P value of the primary screening site is used as the T test result, so that the T test can help to analyze whether the data difference is obvious or not, more information is provided, and the steps are relatively simple and convenient.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, wherein the method for screening the tumor methylation marker from the pretreatment result according to the T test result comprises the following steps:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
The working principle of the technical scheme is as follows:
filtering the primary screening site according to the size relation between a preset P-value threshold and the P value of the primary screening site; when the P value of the primary screening site is smaller than a preset P-value threshold value, selecting the primary screening site as a tumor methylation marker;
for example: setting the P-value threshold to be 0.001, and selecting a site with the P value less than 0.001 from the primary screening sites as a final tumor methylation marker.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the P value of the primary screening site is compared with the preset P-value threshold, and if the P value of the primary screening site is smaller than the preset P-value threshold, the primary screening site is selected as the tumor methylation marker, so that the primary screening site is further inspected and screened, and the accuracy of obtaining the tumor methylation marker is improved.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T inspection module is used for carrying out T inspection on the preprocessing result;
and the screening module is used for screening the tumor methylation markers from the pretreatment result according to the T test result.
The working principle of the technical scheme is as follows:
the system of the embodiment of the invention consists of an acquisition module, a preprocessing module, a T inspection module and a screening module; performing signal detection on hybridization of the DNA sequence treated based on the nitrite by using a methylation chip; firstly, an acquisition module acquires sample data in a methylation chip; the sample data includes: tumor tissue sample data and normal tissue sample data; secondly, the preprocessing module preprocesses the sample data acquired from the methylation chip; the purpose of pretreatment is to screen out valuable sites in sample data; then, the T inspection module performs T inspection on the preprocessing result; the purpose of carrying out T test on the preprocessing result is to obtain the P value of the preprocessing result data; finally, the screening module screens the tumor methylation markers from the pre-processing results according to the T test results, i.e., the P value of the processing result data.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the sample data obtained from the methylation chip is preprocessed, the preprocessing result is subjected to T test, and finally the tumor methylation marker is screened from the preprocessing result according to the T test result, so that the screening of the tumor methylation marker based on the DNA methylation chip data is completed, the screening reliability and effectiveness of the tumor methylation marker are improved, the screening method is simple, convenient and feasible, and the method can be widely applied to the field of medical computer application.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the preprocessing module executes the following operations:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed; based on the first number and the second number, calculating the sensitivity of the same site:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
The working principle of the technical scheme is as follows:
the sample data in the methylation chip comprises tumor tissue sample data and normal tissue sample data; the preset site matching rule specifically comprises the following steps:
firstly, calculating Beta values of a site in tumor tissue sample data and a site in normal tissue sample data;
beta values range from 0 to 1;
the significance of the Beta value is:
1. any Beta value greater than or equal to 0.6 represents complete methylation;
2. any Beta value of 0.2 or less represents complete unmethylated;
3. beta values between 0.2 and 0.6 represent partial methylation;
matching all Beta values of a certain point in tumor tissue sample data with all Beta values of the certain point in normal tissue sample data according to the Beta meanings, and taking the certain point as the same point when the difference between the Beta value matching number and the total number is within 25%;
then, acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data, and acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
the preset intermediate threshold specifically comprises: selecting a threshold range of 0.2 to 0.6 according to the significance of the Beta value to distinguish whether the Beta value is methylated or not, and selecting a value from the threshold range of 0.2 to 0.6 as an intermediate threshold; for example: in the range of 0.2 to 0.6, sequentially increasing 0.05 as a selected point from 0.2 until the selected point is increased to 0.6, and selecting a value from a plurality of selected points as an intermediate threshold value;
the preset matrix coordinates are specifically: the first Beta numerical matrix and the second Beta numerical matrix have the same rows and columns, and any coordinate in the range of the rows and the columns is randomly selected;
then, if the corresponding first Beta value of the matrix coordinate in the first Beta numerical matrix is greater than or equal to the intermediate threshold value, and the corresponding second Beta value of the matrix coordinate in the second Beta numerical matrix is less than or equal to the intermediate threshold value, taking the matrix coordinate as a first index to be processed; if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
and calculating the sensitivity of the same site according to the number of the first to-be-processed coordinates and the second to-be-processed coordinates, and selecting the same site with the sensitivity larger than a preset sensitivity threshold value as a primary screening site.
The beneficial effects of the above technical scheme are: the preprocessing module of the embodiment of the invention obtains the same sites by matching the sites of the tumor tissue sample data and the normal tissue sample data in the sample data, and then selects the most valuable primary screening sites from the same sites as the preprocessing result according to the sensitivity, thereby improving the accuracy of obtaining the tumor methylation markers.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the T-test module executes the following operations:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
The working principle of the technical scheme is as follows:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data; acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data; performing paired T test on the third Beta numerical matrix and the fourth Beta numerical matrix, and assuming that tomor is greater than normal; the T test specifically comprises the following steps: the student T test is mainly used for a test method of the difference degree of two average values of a small sample, and the probability of difference occurrence is deduced by using a T distribution theory so as to judge whether the difference of the two average values is obvious or not; the P value is specifically: a sum of probabilities of the sample or more extreme results than the sample occurring, provided that the original hypothesis is true; the traditional test needs to compare the statistic with the critical value, but the critical value is changed according to distribution and freedom degree, and the method adopting the P value does not need to guide the critical value, is simple and has more information than the critical value.
The beneficial effects of the above technical scheme are: the T test module of the embodiment of the invention performs T test on the preprocessing result, namely the primary screening site, so as to obtain the P value of the primary screening site and take the P value of the primary screening site as the T test result, and the T test can help to analyze whether the data difference is obvious or not, has more information quantity and has simpler steps.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the screening module executes the following operations:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
The working principle of the technical scheme is as follows:
the screening module filters the primary screening site according to the size relation between a preset P-value threshold and the P value of the primary screening site; when the P value of the primary screening site is smaller than a preset P-value threshold value, selecting the primary screening site as a tumor methylation marker;
for example: setting the P-value threshold to be 0.001, and selecting a site with the P value less than 0.001 from the primary screening sites as a final tumor methylation marker.
The beneficial effects of the above technical scheme are: the screening module of the embodiment of the invention compares the P value of the primary screening site with the preset P-value threshold, and selects the primary screening site as the tumor methylation marker if the P value of the primary screening site is smaller than the preset P-value threshold, so that the primary screening site is further inspected and screened, and the accuracy of obtaining the tumor methylation marker is improved.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps:
step S1: obtaining a methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue corresponding to a common locus of the sample tumor tissue and the sample normal tissue;
step S2: based on the selection of different thresholds, acquiring the sensitivity of the common sites in the tumor tissues and the normal tissues of the sample, and based on the selected sensitivity threshold, selecting the sites for preliminary screening;
step S3: based on the primary screened sites, a T-test is performed and a tumor methylation marker is screened by filtering with a preset P-value threshold.
The working principle of the technical scheme is as follows:
obtaining methylation site sensitivities under different thresholds based on a methylation chip beta numerical value matrix of the sample tumor tissue and the sample normal tissue corresponding to the common site of the sample tumor tissue and the sample normal tissue, and selecting a primary screening site based on the selected sensitivity threshold; and then carrying out T test based on the primary screened sites, and filtering by using a preset P-value threshold value to obtain filtered methylation sites, thereby realizing the screening of the tumor methylation markers.
The beneficial effects of the above technical scheme are:
the invention screens reliable and effective tumor methylation markers based on DNA methylation chip data, obtains the sensitivity of methylation sites by using different thresholds, and performs T test on the basis, thereby improving the reliability and effectiveness of the screening of the tumor methylation markers.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of S1: obtaining a methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue corresponding to a common locus of the sample tumor tissue and the sample normal tissue, wherein the methylation chip Beta numerical matrix comprises:
step S11: obtaining a common site of a sample tumor tissue and a sample normal tissue;
first, in DNA methylation chip data of paired samples, the value of methylation chip Beta is the intensity value from methylated bead type/(intensity value from methylated bead type + intensity value from unmethylated bead type +100), the range of Beta values is from 0 (completely unmethylated) to 1 (completely methylated), and the meaning of the specific methylation chip Beta value is:
1. any Beta value equal to or greater than 0.6 is considered fully methylated;
2. any Beta value equal to or less than 0.2 is considered to be completely unmethylated;
3. beta values between 0.2 and 0.6 are considered partially methylated;
specifically, according to the positions of the paired tumor sample and normal sample, the positions, in which the number of the paired samples existing in the tumor sample and the normal sample is within 25% of the total number of the paired samples, are obtained.
Step S12: based on the above-mentioned sites, obtain the correspondent methylated chip Beta numerical matrix of site in sample tumor tissue and sample normal tissue.
Specifically, a Beta value matrix of the methylation chip of the consensus site in the tumor sample and a Beta value matrix of the consensus site in the normal sample are obtained, the row is the id of each sample, and the column is the Beta value of the methylation chip of the site in the sample.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of S2: based on the selection of different thresholds, the sensitivity of the common sites in the tumor tissues and the normal tissues of the samples is obtained, and based on the selected sensitivity threshold, the sites of the primary screening are selected, which comprises the following steps:
step S21: selecting different threshold values in a certain numerical range based on the methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue, so that the numerical value is larger than the methylation chip Beta value in the sample normal tissue and smaller than the methylation chip Beta value in the sample tumor tissue;
firstly, according to the significance of a Beta value of a methylation chip, the selected threshold range is 0.2 to 0.6 and is used for distinguishing whether the chip is methylated or not;
next, a value is selected based on the threshold range
Greater than the methylated chip Beta value in normal tissue of the sample and less than the methylated chip Beta value in tumor tissue of the sample.
Step S22: based on the above condition, a site is obtained where the ratio (sensitivity) of the number of samples satisfying the condition to the total number of paired samples is greater than a selected sensitivity threshold.
Specifically, according to the selected threshold range, selecting the primary screening sites with the ratio of the number of samples which is greater than the Beta value of the methylation chip in the normal tissue of the sample and less than the Beta value of the methylation chip in the tumor tissue of the sample to the total number of matched samples being greater than 80%.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of S3: performing a T-test based on the primary screened sites, and filtering with a preset P-value threshold to screen for tumor methylation markers, comprising:
step S31: acquiring P value details of the preliminarily screened site T test;
specifically, based on the primary screened sites, obtaining P values of the primary screened sites according to the methylation chip Beta value matrix of the sites in the tumor sample and the methylation chip Beta value matrix in the matched normal sample by using a matched T test and assuming that tomor is greater than normal;
step S32: and filtering by using a preset p-value threshold value to obtain the tumor methylation marker.
Specifically, the P-value threshold is set to be 0.001, and the site satisfying that the P value is less than 0.001 is obtained, namely the final tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps:
a Beta numerical matrix obtaining unit for obtaining a Beta numerical matrix of the common sites in the matched sample tumor tissue and the sample normal tissue;
the sensitivity site primary screening acquisition unit selects different thresholds of a certain numerical range based on corresponding Beta numerical matrixes in the matched sample tumor tissue and sample normal tissue, and acquires a primary screening methylation site with the sensitivity greater than the selected sensitivity threshold;
and the tumor methylation marker determining unit calculates the P value condition of the T test of the sites based on the initially screened methylation sites, and filters the P value condition by using a preset P-value threshold value to obtain the tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein a Beta numerical matrix acquisition unit comprises:
a calculating subunit, configured to calculate a common site in the sample tumor tissue and the sample normal tissue;
and obtaining a subunit for obtaining a corresponding Beta value matrix of the consensus site in the tumor tissue and the sample normal tissue.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein a sensitivity site primary screening acquisition unit comprises:
a calculating subunit for calculating the sensitivity of the site in the paired sample tissues;
subunits are obtained, and primary screened methylation sites are obtained based on sensitivity and a selected sensitivity threshold.
The embodiment of the invention provides a system for acquiring a tumor methylation marker based on methylation chip data, wherein a tumor methylation marker determining unit comprises:
the P value calculation operator unit is used for calculating the P value details of the primary screening site T test;
and a tumor methylation marker acquisition subunit, which is used for filtering by using a preset P-value threshold value based on the P value to acquire the tumor methylation marker.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
1. A method for obtaining tumor methylation markers based on methylation chip data, comprising:
obtaining sample data in the methylation chip;
preprocessing the sample data;
carrying out T test on the pretreatment result;
and screening out tumor methylation markers from the pretreatment result according to the T test result.
2. The method of claim 1, wherein the pre-processing the sample data comprises:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
3. The method of claim 2, wherein the T-test of the pre-processing result comprises:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
4. The method of claim 3, wherein the screening of the tumor methylation markers from the pre-processed results according to the T test result comprises:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
5. A system for obtaining tumor methylation markers based on methylation chip data, comprising:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T inspection module is used for carrying out T inspection on the preprocessing result;
and the screening module is used for screening the tumor methylation markers from the pretreatment result according to the T test result.
6. The system of claim 5, wherein the preprocessing module performs operations comprising:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
7. The system of claim 5, wherein the T-test module performs operations comprising:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
8. The system of claim 5, wherein the screening module performs the following operations:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011100217.1A CN112037854B (en) | 2020-10-15 | 2020-10-15 | Method and system for obtaining tumor methylation marker based on methylation chip data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011100217.1A CN112037854B (en) | 2020-10-15 | 2020-10-15 | Method and system for obtaining tumor methylation marker based on methylation chip data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112037854A true CN112037854A (en) | 2020-12-04 |
CN112037854B CN112037854B (en) | 2024-04-09 |
Family
ID=73573657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011100217.1A Active CN112037854B (en) | 2020-10-15 | 2020-10-15 | Method and system for obtaining tumor methylation marker based on methylation chip data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112037854B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103409514A (en) * | 2013-07-23 | 2013-11-27 | 徐州医学院 | Chip-based 5-hydroxymethylated cytosine detection method with high flux and high sensitivity |
CN107119144A (en) * | 2017-07-05 | 2017-09-01 | 昆明医科大学第附属医院 | Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_55 application |
CN107677831A (en) * | 2017-06-28 | 2018-02-09 | 深圳市龙岗中心医院 | The method for determining the diagnosis marker for assessing schizophrenia patients |
CN109616198A (en) * | 2018-12-28 | 2019-04-12 | 陈洪亮 | It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer |
CN109825584A (en) * | 2019-03-01 | 2019-05-31 | 清华大学 | DNA methylation marker object and its application using peripheral blood diagnosis early liver cancer |
US20190256921A1 (en) * | 2016-05-04 | 2019-08-22 | Queen's University At Kingston | Cell-free detection of methylated tumour dna |
US20200131582A1 (en) * | 2016-06-07 | 2020-04-30 | The Regents Of The University Of California | Cell-free dna methylation patterns for disease and condition analysis |
CN111440869A (en) * | 2020-03-16 | 2020-07-24 | 武汉百药联科科技有限公司 | DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof |
-
2020
- 2020-10-15 CN CN202011100217.1A patent/CN112037854B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103409514A (en) * | 2013-07-23 | 2013-11-27 | 徐州医学院 | Chip-based 5-hydroxymethylated cytosine detection method with high flux and high sensitivity |
US20190256921A1 (en) * | 2016-05-04 | 2019-08-22 | Queen's University At Kingston | Cell-free detection of methylated tumour dna |
US20200131582A1 (en) * | 2016-06-07 | 2020-04-30 | The Regents Of The University Of California | Cell-free dna methylation patterns for disease and condition analysis |
CN107677831A (en) * | 2017-06-28 | 2018-02-09 | 深圳市龙岗中心医院 | The method for determining the diagnosis marker for assessing schizophrenia patients |
CN107119144A (en) * | 2017-07-05 | 2017-09-01 | 昆明医科大学第附属医院 | Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_55 application |
CN109616198A (en) * | 2018-12-28 | 2019-04-12 | 陈洪亮 | It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer |
CN109825584A (en) * | 2019-03-01 | 2019-05-31 | 清华大学 | DNA methylation marker object and its application using peripheral blood diagnosis early liver cancer |
CN111440869A (en) * | 2020-03-16 | 2020-07-24 | 武汉百药联科科技有限公司 | DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof |
Also Published As
Publication number | Publication date |
---|---|
CN112037854B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110866893B (en) | Pathological image-based TMB classification method and system and TMB analysis device | |
EP2700042B1 (en) | Analyzing the expression of biomarkers in cells with moments | |
CN108256292B (en) | Copy number variation detection device | |
CN110326051B (en) | Method and analysis system for identifying expression discrimination elements in biological samples | |
CN108664769B (en) | Drug relocation method based on cancer genome and non-specific gene tag | |
CN113035273B (en) | Rapid and ultrahigh-sensitivity DNA fusion gene detection method | |
CN109411015A (en) | Tumor mutations load detection device and storage medium based on Circulating tumor DNA | |
CN113257360B (en) | Cancer screening model, and construction method and construction device of cancer screening model | |
CN110991536A (en) | Training method of early warning model of primary liver cancer | |
CN111180013B (en) | Device for detecting blood disease fusion gene | |
CN111402955A (en) | Biological information measuring method, system, storage medium and terminal | |
CN111696622B (en) | Method for correcting and evaluating detection result of mutation detection software | |
CN115954049B (en) | Microsatellite unstable locus state detection method, system and storage medium | |
AU2022218581B2 (en) | Sequencing data-based itd mutation ratio detecting apparatus and method | |
CN105861696B (en) | tumor metastasis gene detection system based on transcriptome | |
CN112037854A (en) | Method and system for acquiring tumor methylation marker based on methylation chip data | |
CN116864011A (en) | Colorectal cancer molecular marker identification method and system based on multiple sets of chemical data | |
CN115424666B (en) | Method and system for screening early-stage screening sub-markers of pan-cancer based on whole genome bisulfite sequencing data | |
CN114613436B (en) | Blood sample Motif feature extraction method and cancer early screening model construction method | |
CN116434830B (en) | Tumor focus position identification method based on ctDNA multi-site methylation | |
CN116542978B (en) | Quality detection method and device for FISH probe | |
CN108707663A (en) | Reagent, preparation method and application for the miRNA sequencing quantitative result evaluations of cancer sample | |
CN116855596A (en) | Rice variety homogeneity evaluation method | |
Jangamashetti et al. | Automatically Locating Spots in DNAMicroarray Image Using Genetic Algorithm without Gridding | |
CN117912555A (en) | Tumor somatic mutation tag tendency score calculation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |