CN112037854B - Method and system for obtaining tumor methylation marker based on methylation chip data - Google Patents

Method and system for obtaining tumor methylation marker based on methylation chip data Download PDF

Info

Publication number
CN112037854B
CN112037854B CN202011100217.1A CN202011100217A CN112037854B CN 112037854 B CN112037854 B CN 112037854B CN 202011100217 A CN202011100217 A CN 202011100217A CN 112037854 B CN112037854 B CN 112037854B
Authority
CN
China
Prior art keywords
value
beta
sample data
matrix
methylation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011100217.1A
Other languages
Chinese (zh)
Other versions
CN112037854A (en
Inventor
李霞
万季
陈文霞
吴大英
祁淑英
杨楚钦
胡桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN
Original Assignee
LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN filed Critical LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN
Priority to CN202011100217.1A priority Critical patent/CN112037854B/en
Publication of CN112037854A publication Critical patent/CN112037854A/en
Application granted granted Critical
Publication of CN112037854B publication Critical patent/CN112037854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Abstract

The invention provides a method and a system for acquiring tumor methylation markers based on methylation chip data, wherein the method comprises the following steps: acquiring sample data in a methylation chip; preprocessing sample data; t test is carried out on the pretreatment result; and screening tumor methylation markers from the pretreatment results according to the T test results. The system comprises modules corresponding to the method steps. According to the method and the system for acquiring the tumor methylation marker based on the methylation chip, the tumor methylation marker is screened based on the DNA methylation chip data, so that the reliability and the effectiveness of screening the tumor methylation marker are improved, and the screening method is simple, convenient and feasible and can be widely applied to the field of medical computer application.

Description

Method and system for obtaining tumor methylation marker based on methylation chip data
Technical Field
The invention relates to the technical field of methylation, in particular to a method and a system for acquiring tumor methylation markers based on methylation chip data.
Background
At present, DNA methylation affects the mutual combination of DNA and protein by modifying DNA base, thereby playing an important role in normal development and disease development. Research shows that the abnormality of DNA methylation level has a close relationship with the occurrence of tumors, so that the search for markers related to early diagnosis of tumors from DNA methylation patterns of tumors is one of the hot spots in recent years. High throughput methylation quantification platforms in common use today include methylation chip platforms and methylation high throughput sequencing platforms. Compared with a sequencing platform, the methylation chip platform has the characteristics of low cost and high sensitivity, and can be applied to a larger-scale clinical sample to mine more representative methylation markers. Typically, a simple statistical hypothesis test method will be used to determine the sites of significant population differences in methylation signals in normal and tumor tissues, however this method may result in false positives due to failure to take into account specific methylation signal profiles. Therefore, the method has important significance in screening out the tumor early diagnosis methylation markers with higher precision by improving the existing method.
Disclosure of Invention
The invention aims at providing a method and a system for acquiring tumor methylation markers based on methylation chip data, and screening tumor methylation markers based on DNA methylation chip data, so that the reliability and the effectiveness of screening tumor methylation markers are improved, and the screening method is simple, convenient and feasible and can be widely applied to the field of medical computer application.
The method for acquiring the tumor methylation marker based on the methylation chip data provided by the embodiment of the invention comprises the following steps:
acquiring sample data in the methylation chip;
preprocessing the sample data;
t test is carried out on the pretreatment result;
and screening tumor methylation markers from the pretreatment results according to the T test results.
Preferably, the preprocessing the sample data includes:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is greater than or equal to the middle threshold value and the second Beta value is less than or equal to the middle threshold value, the matrix coordinate is used as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
calculating the sensitivity of the same sites based on the first number and the second number:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
the primary screening sites were used as pretreatment results.
Preferably, the T-test of the pretreatment result includes:
obtaining a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
pairing T test is carried out on the third Beta value matrix and the fourth Beta value matrix of the primary screening site, so that a P value of the primary screening site is obtained;
the P value was taken as the T test result.
Preferably, said screening tumor methylation markers from said pretreatment results based on T-test results comprises:
acquiring a preset P-value threshold;
if the P value of the primary screening site is smaller than the P-value threshold, the primary screening site is a tumor methylation marker.
The system for acquiring tumor methylation markers based on methylation chip data provided by the embodiment of the invention comprises:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T test module is used for T testing the pretreatment result;
and the screening module is used for screening tumor methylation markers from the pretreatment result according to the T test result.
Preferably, the preprocessing module performs operations including:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is greater than or equal to the middle threshold value and the second Beta value is less than or equal to the middle threshold value, the matrix coordinate is used as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
calculating the sensitivity of the same sites based on the first number and the second number:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
the primary screening sites were used as pretreatment results.
Preferably, the T-test module performs operations comprising:
obtaining a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
pairing T test is carried out on the third Beta value matrix and the fourth Beta value matrix of the primary screening site, so that a P value of the primary screening site is obtained;
the P value was taken as the T test result.
Preferably, the screening module performs operations including:
acquiring a preset P-value threshold;
if the P value of the primary screening site is smaller than the P-value threshold, the primary screening site is a tumor methylation marker.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a method for obtaining tumor methylation markers based on methylation chip data in an embodiment of the invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which is shown in fig. 1 and comprises the following steps:
step S101, acquiring sample data in the methylation chip;
step S102, preprocessing the sample data;
step S103, T-test is carried out on the pretreatment result;
step S104, screening tumor methylation markers from the pretreatment results according to T test results.
The working principle of the technical scheme is as follows:
signal detection is carried out on DNA sequence hybridization after nitrite treatment by using a methylation chip; firstly, acquiring sample data in a methylation chip; the sample data includes: tumor tissue sample data and normal tissue sample data; then, preprocessing sample data obtained from the methylation chip; the aim of pretreatment is to screen out valuable sites in the sample data; then, T-test is carried out on the pretreatment result; the purpose of T-test on the pretreatment result is to obtain the P value of the pretreatment result data; and finally, screening tumor methylation markers from the pretreatment result according to the T test result, namely the P value of the treatment result data.
The beneficial effects of the technical scheme are as follows: according to the embodiment of the invention, the sample data obtained from the methylation chip is preprocessed, the preprocessing result is subjected to T test, and finally the tumor methylation marker is screened from the preprocessing result according to the T test result, so that the screening of the tumor methylation marker based on the DNA methylation chip data is completed, the reliability and the effectiveness of the screening of the tumor methylation marker are improved, and the screening method is convenient and feasible and can be widely applied to the field of medical computer application.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps of:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is greater than or equal to the middle threshold value and the second Beta value is less than or equal to the middle threshold value, the matrix coordinate is used as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
calculating the sensitivity of the same sites based on the first number and the second number:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
the primary screening sites were used as pretreatment results.
The working principle of the technical scheme is as follows:
sample data in the methylation chip has tumor tissue sample data and normal tissue sample data; the preset site matching rule specifically comprises the following steps:
firstly, calculating Beta values of sites in tumor tissue sample data and sites in normal tissue sample data;
beta values range from 0 to 1;
the significance of Beta values is:
1. any Beta value greater than or equal to 0.6 represents complete methylation;
2. any Beta value of 0.2 or less represents complete unmethylation;
3. beta values between 0.2 and 0.6 represent partial methylation;
pairing all Beta values of a certain site in tumor tissue sample data and all Beta values of the site in normal tissue sample data according to the Beta meaning, and taking the site as the same site when the Beta value pairing number is within 25% of the total number;
then, a first Beta value matrix corresponding to the same position in tumor tissue sample data is obtained, and a second Beta value matrix corresponding to the same position in normal tissue sample data is obtained;
the preset intermediate threshold value is specifically: selecting a threshold range of 0.2 to 0.6 according to the Beta value meaning to distinguish whether the methylation is performed, and selecting a value from the threshold range of 0.2 to 0.6 as an intermediate threshold; for example: in the range of 0.2 to 0.6, starting from 0.2, sequentially increasing by 0.05 as a selected point until increasing to 0.6, and selecting a value from a plurality of selected points as an intermediate threshold value;
the preset matrix coordinates are specifically as follows: the first Beta value matrix and the second Beta value matrix have the same row and column, and random coordinates in the range of the row and the column are randomly selected;
then, if the first Beta value corresponding to the matrix coordinate in the first Beta value matrix is larger than or equal to a middle threshold value, and the second Beta value corresponding to the matrix coordinate in the second Beta value matrix is smaller than or equal to the middle threshold value, the matrix coordinate is used as a first index to be processed; if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
and calculating the sensitivity of the same sites according to the number of the first coordinates to be processed and the second coordinates to be processed, and selecting the same sites with the sensitivity larger than a preset sensitivity threshold as primary screening sites.
The beneficial effects of the technical scheme are as follows: according to the embodiment of the invention, the same sites are obtained by matching the sites of the tumor tissue sample data and the normal tissue sample data in the sample data, and the most valuable primary screening sites are screened out of the same sites according to the sensitivity to serve as a pretreatment result, so that the accuracy of obtaining tumor methylation markers is improved.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps of:
obtaining a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
pairing T test is carried out on the third Beta value matrix and the fourth Beta value matrix of the primary screening site, so that a P value of the primary screening site is obtained;
the P value was taken as the T test result.
The working principle of the technical scheme is as follows:
obtaining a third Beta numerical matrix corresponding to the primary screening site in tumor tissue sample data; acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data; performing pairing T test on the third Beta value matrix and the fourth Beta value matrix, and assuming that tumor > normal; t test specifically comprises: the student T test is mainly used for a test method of the difference degree of two average values of a small sample, and uses a T distribution theory to infer the occurrence probability of the difference so as to judge whether the difference of the two average values is obvious or not; the P value is specifically: a probability that the sum of probabilities of the sample or a more extreme result than the sample occurs, assuming the original assumption is true; while the conventional test requires comparing statistics with a threshold, the threshold is changed according to distribution and degree of freedom, the method using P value does not need to guide the threshold, and is simpler and more informative than the threshold.
The beneficial effects of the technical scheme are as follows: according to the embodiment of the invention, the pre-treatment result, namely the primary screening site is subjected to T test, so that the P value of the primary screening site is obtained, the P value of the primary screening site is used as the T test result, and the T test can help to analyze whether the data difference is obvious or not, has more information and is relatively simple in steps.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which screens tumor methylation markers from the pretreatment results according to T test results, and comprises the following steps:
acquiring a preset P-value threshold;
if the P value of the primary screening site is smaller than the P-value threshold, the primary screening site is a tumor methylation marker.
The working principle of the technical scheme is as follows:
filtering the primary screening sites according to the magnitude relation between the preset P-value threshold and the P value of the primary screening sites; when the P value of the primary screening site is smaller than a preset P-value threshold, selecting the primary screening site as a tumor methylation marker;
for example: the P-value threshold was set at 0.001 and a site with a P value less than 0.001 was selected from the primary screening sites as the final tumor methylation marker.
The beneficial effects of the technical scheme are as follows: according to the embodiment of the invention, the P value of the primary screening site is compared with the preset P-value threshold, and if the P value of the primary screening site is smaller than the preset P-value threshold, the primary screening site is selected as the tumor methylation marker, so that the primary screening site is further inspected and screened, and the accuracy of acquiring the tumor methylation marker is improved.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T test module is used for T testing the pretreatment result;
and the screening module is used for screening tumor methylation markers from the pretreatment result according to the T test result.
The working principle of the technical scheme is as follows:
the system of the embodiment of the invention consists of an acquisition module, a preprocessing module, a T test module and a screening module; signal detection is carried out on DNA sequence hybridization after nitrite treatment by using a methylation chip; firstly, an acquisition module acquires sample data in a methylation chip; the sample data includes: tumor tissue sample data and normal tissue sample data; then, a preprocessing module preprocesses sample data obtained from the methylation chip; the aim of pretreatment is to screen out valuable sites in the sample data; then, T-test is carried out on the pretreatment result by the T-test module; the purpose of T-test on the pretreatment result is to obtain the P value of the pretreatment result data; and finally, screening the tumor methylation markers from the pretreatment result by a screening module according to the T test result, namely the P value of the treatment result data.
The beneficial effects of the technical scheme are as follows: according to the embodiment of the invention, the sample data obtained from the methylation chip is preprocessed, the preprocessing result is subjected to T test, and finally the tumor methylation marker is screened from the preprocessing result according to the T test result, so that the screening of the tumor methylation marker based on the DNA methylation chip data is completed, the reliability and the effectiveness of the screening of the tumor methylation marker are improved, and the screening method is convenient and feasible and can be widely applied to the field of medical computer application.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, and the preprocessing module performs the following operations:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is greater than or equal to the middle threshold value and the second Beta value is less than or equal to the middle threshold value, the matrix coordinate is used as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed; calculating the sensitivity of the same sites based on the first number and the second number:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
the primary screening sites were used as pretreatment results.
The working principle of the technical scheme is as follows:
sample data in the methylation chip has tumor tissue sample data and normal tissue sample data; the preset site matching rule specifically comprises the following steps:
firstly, calculating Beta values of sites in tumor tissue sample data and sites in normal tissue sample data;
beta values range from 0 to 1;
the significance of Beta values is:
1. any Beta value greater than or equal to 0.6 represents complete methylation;
2. any Beta value of 0.2 or less represents complete unmethylation;
3. beta values between 0.2 and 0.6 represent partial methylation;
pairing all Beta values of a certain site in tumor tissue sample data and all Beta values of the site in normal tissue sample data according to the Beta meaning, and taking the site as the same site when the Beta value pairing number is within 25% of the total number;
then, a first Beta value matrix corresponding to the same position in tumor tissue sample data is obtained, and a second Beta value matrix corresponding to the same position in normal tissue sample data is obtained;
the preset intermediate threshold value is specifically: selecting a threshold range of 0.2 to 0.6 according to the Beta value meaning to distinguish whether the methylation is performed, and selecting a value from the threshold range of 0.2 to 0.6 as an intermediate threshold; for example: in the range of 0.2 to 0.6, starting from 0.2, sequentially increasing by 0.05 as a selected point until increasing to 0.6, and selecting a value from a plurality of selected points as an intermediate threshold value;
the preset matrix coordinates are specifically as follows: the first Beta value matrix and the second Beta value matrix have the same row and column, and random coordinates in the range of the row and the column are randomly selected;
then, if the first Beta value corresponding to the matrix coordinate in the first Beta value matrix is larger than or equal to a middle threshold value, and the second Beta value corresponding to the matrix coordinate in the second Beta value matrix is smaller than or equal to the middle threshold value, the matrix coordinate is used as a first index to be processed; if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
and calculating the sensitivity of the same sites according to the number of the first coordinates to be processed and the second coordinates to be processed, and selecting the same sites with the sensitivity larger than a preset sensitivity threshold as primary screening sites.
The beneficial effects of the technical scheme are as follows: the preprocessing module provided by the embodiment of the invention obtains the same sites by matching the sites of the tumor tissue sample data and the normal tissue sample data in the sample data, and then screens out the most valuable preliminary screening sites from the same sites according to the sensitivity as a preprocessing result, thereby improving the accuracy of obtaining tumor methylation markers.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, and the T test module performs the following operations:
obtaining a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
pairing T test is carried out on the third Beta value matrix and the fourth Beta value matrix of the primary screening site, so that a P value of the primary screening site is obtained;
the P value was taken as the T test result.
The working principle of the technical scheme is as follows:
obtaining a third Beta numerical matrix corresponding to the primary screening site in tumor tissue sample data; acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data; performing pairing T test on the third Beta value matrix and the fourth Beta value matrix, and assuming that tumor > normal; t test specifically comprises: the student T test is mainly used for a test method of the difference degree of two average values of a small sample, and uses a T distribution theory to infer the occurrence probability of the difference so as to judge whether the difference of the two average values is obvious or not; the P value is specifically: a probability that the sum of probabilities of the sample or a more extreme result than the sample occurs, assuming the original assumption is true; while the conventional test requires comparing statistics with a threshold, the threshold is changed according to distribution and degree of freedom, the method using P value does not need to guide the threshold, and is simpler and more informative than the threshold.
The beneficial effects of the technical scheme are as follows: the T test module of the embodiment of the invention obtains the P value of the primary screening site by carrying out T test on the pretreatment result, namely the primary screening site, takes the P value of the primary screening site as the T test result, and can help to analyze whether the data difference is obvious or not, has more information and has relatively simple steps.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, and the screening module performs the following operations:
acquiring a preset P-value threshold;
if the P value of the primary screening site is smaller than the P-value threshold, the primary screening site is a tumor methylation marker.
The working principle of the technical scheme is as follows:
the screening module filters the primary screening sites according to the magnitude relation between the preset P-value threshold and the P value of the primary screening sites; when the P value of the primary screening site is smaller than a preset P-value threshold, selecting the primary screening site as a tumor methylation marker;
for example: the P-value threshold was set at 0.001 and a site with a P value less than 0.001 was selected from the primary screening sites as the final tumor methylation marker.
The beneficial effects of the technical scheme are as follows: the screening module of the embodiment of the invention compares the P value of the primary screening site with the preset P-value threshold, and selects the primary screening site as the tumor methylation marker if the P value of the primary screening site is smaller than the preset P-value threshold, thereby realizing further detection and screening of the primary screening site and improving the accuracy of acquiring the tumor methylation marker.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps:
step S1: obtaining a methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue corresponding to the common site of the sample tumor tissue and the sample normal tissue;
step S2: based on the selection of different thresholds, the sensitivity of the common sites in the tumor tissue of the sample and the normal tissue of the sample is obtained, and based on the selected sensitivity thresholds, the sites of the primary screening are selected;
step S3: t test is performed based on the primary screened sites, and the tumor methylation markers are screened out by filtering with a preset P-value threshold.
The working principle of the technical scheme is as follows:
based on a methylation chip beta numerical matrix of the sample tumor tissue and the sample normal tissue corresponding to the common locus of the sample tumor tissue and the sample normal tissue, methylation locus sensitivity under different thresholds is obtained, and based on the selected sensitivity threshold, a primary screening locus is selected; and then, based on the primary screening sites, T-test is carried out, and the pre-set P-value threshold is utilized for filtering to obtain the filtered methylation sites, so that the screening of tumor methylation markers is realized.
The beneficial effects of the technical scheme are as follows:
the invention screens reliable and effective tumor methylation markers based on DNA methylation chip data, acquires the sensitivity of methylation sites by using different thresholds, and performs T test on the basis, thereby improving the reliability and effectiveness of screening tumor methylation markers.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps of: the method for obtaining the methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue corresponding to the common site of the sample tumor tissue and the sample normal tissue comprises the following steps:
step S11: obtaining a common site of a sample tumor tissue and a sample normal tissue;
first, in the DNA methylation chip data of paired samples, the methylation chip Beta value = intensity value from the methylation bead type/(intensity value from the methylation bead type + intensity value from the unmethylated bead type + 100), the Beta value ranges from 0 (completely unmethylated) to 1 (completely methylated), and the significance of the specific methylation chip Beta value is:
1. any Beta value equal to or greater than 0.6 is considered fully methylated;
2. any Beta value equal to or less than 0.2 is considered to be completely unmethylated;
3. beta values between 0.2 and 0.6 are considered partially methylated;
specifically, according to the positions of paired tumor samples and normal samples, the positions, in which the number of paired samples existing in both tumor samples and normal samples is within 25% of the total number of paired samples, are obtained.
Step S12: based on the sites, a methylation chip Beta numerical matrix of the sites corresponding to the sample tumor tissue and the sample normal tissue is obtained.
Specifically, a methylation chip Beta value matrix with a common site in a tumor sample and a Beta value matrix with the common site in a normal sample are obtained, and each sample id is acted and listed as the methylation chip Beta value of the site in the sample.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps of: based on the selection of different thresholds, the sensitivity of the common site in the tumor tissue of the sample and the normal tissue of the sample is obtained, and based on the selected sensitivity threshold, the site of the primary screening is selected, comprising:
step S21: selecting different thresholds of a certain value range based on a methylation chip Beta value matrix of the sample tumor tissue and the sample normal tissue, so that the value is larger than the methylation chip Beta value in the sample normal tissue and smaller than the methylation chip Beta value in the sample tumor tissue;
firstly, according to the meaning of the Beta value of a methylation chip, a threshold value range is selected to be 0.2-0.6, and whether the methylation chip is methylated or not is distinguished;
next, based on this threshold range, a value is selected
Is greater than the methylation chip Beta value in normal tissue of the sample and less than the methylation chip Beta value in tumor tissue of the sample.
Step S22: based on the above condition, a site is obtained in which the ratio (sensitivity) of the number of samples satisfying the condition to the total number of paired samples is greater than the selected sensitivity threshold.
Specifically, according to the selected threshold range, the primary screening sites with the numerical value larger than the Beta value of the methylation chips in the normal tissues of the samples and smaller than the Beta value of the methylation chips in the tumor tissues of the samples and the ratio of the total paired samples being larger than 80% are selected.
The embodiment of the invention provides a method for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps of: based on the primary screened sites, T-test is performed, and filtration is performed by using a preset P-value threshold value, so that tumor methylation markers are screened, and the method comprises the following steps:
step S31: p value details of the T test of the primary screening site are obtained;
specifically, based on the primary screening sites, according to the methylation chip Beta value matrix of the sites in the tumor sample and the methylation chip Beta value matrix of the paired normal samples, paired T test is used, and a P value of the primary screening sites is obtained under the assumption that tumor > normal;
step S32: and filtering by using a preset p-value threshold value to obtain the tumor methylation marker.
Specifically, the P-value threshold is set to be 0.001, and a site meeting the P value of less than 0.001 is obtained, namely the final tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps:
the Beta value matrix acquisition unit is used for acquiring a Beta value matrix of the common sites in the paired sample tumor tissue and the paired sample normal tissue;
the sensitivity locus preliminary screening acquisition unit is used for selecting different thresholds of a certain numerical range based on the Beta numerical matrix corresponding to the paired sample tumor tissue and the sample normal tissue to obtain preliminary screening methylation loci with sensitivity larger than the selected sensitivity thresholds;
and a tumor methylation marker determination unit for calculating the P value condition of T test of the sites based on the primary screening methylation sites and filtering by using a preset P-value threshold value to obtain the tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the Beta numerical matrix acquisition unit comprises:
a calculating subunit for calculating a site common to the sample tumor tissue and the sample normal tissue;
and the obtaining subunit is used for obtaining the Beta numerical matrix of the common locus corresponding to the tumor tissue and the normal tissue of the sample.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the sensitivity site preliminary screening acquisition unit comprises:
a calculation subunit for calculating sensitivity of the sites in the paired sample tissue;
subunits are obtained, based on the sensitivity and the selected sensitivity threshold, to obtain the methylation sites of the primary screen.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the tumor methylation marker determination unit comprises:
a P value calculating subunit for calculating the P value details of the T test of the site of the primary screening;
and a tumor methylation marker acquisition subunit, which is used for filtering by using a preset P-value threshold value based on the P value to acquire the tumor methylation marker.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1. A method for obtaining a tumor methylation signature based on methylation chip data, comprising:
acquiring sample data in the methylation chip;
preprocessing the sample data;
t test is carried out on the pretreatment result;
screening tumor methylation markers from the pretreatment result according to a T test result;
the preprocessing of the sample data comprises:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is greater than or equal to the middle threshold value and the second Beta value is less than or equal to the middle threshold value, the matrix coordinate is used as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
calculating the sensitivity of the same sites based on the first number and the second number:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
the primary screening sites were used as pretreatment results.
2. The method of claim 1, wherein T-testing the pretreatment results comprises:
obtaining a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
pairing T test is carried out on the third Beta value matrix and the fourth Beta value matrix of the primary screening site, so that a P value of the primary screening site is obtained;
the P value was taken as the T test result.
3. The method of claim 2, wherein the screening tumor methylation markers from the pretreatment results based on T-test results comprises:
acquiring a preset P-value threshold;
if the P value of the primary screening site is smaller than the P-value threshold, the primary screening site is a tumor methylation marker.
4. A system for obtaining tumor methylation markers based on methylation chip data, comprising:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T test module is used for T testing the pretreatment result;
the screening module is used for screening tumor methylation markers from the pretreatment result according to the T test result;
the preprocessing module performs operations including:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is greater than or equal to the middle threshold value and the second Beta value is less than or equal to the middle threshold value, the matrix coordinate is used as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the middle threshold value or the second Beta value is larger than the middle threshold value, the matrix coordinate is used as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
calculating the sensitivity of the same sites based on the first number and the second number:
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
the primary screening sites were used as pretreatment results.
5. The system for obtaining tumor methylation markers based on methylation chip data of claim 4, the T-test module performing operations comprising:
obtaining a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
pairing T test is carried out on the third Beta value matrix and the fourth Beta value matrix of the primary screening site, so that a P value of the primary screening site is obtained;
the P value was taken as the T test result.
6. The system for obtaining tumor methylation markers based on methylation chip data of claim 4, the screening module performing operations comprising:
acquiring a preset P-value threshold;
if the P value of the primary screening site is smaller than the P-value threshold, the primary screening site is a tumor methylation marker.
CN202011100217.1A 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data Active CN112037854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011100217.1A CN112037854B (en) 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011100217.1A CN112037854B (en) 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data

Publications (2)

Publication Number Publication Date
CN112037854A CN112037854A (en) 2020-12-04
CN112037854B true CN112037854B (en) 2024-04-09

Family

ID=73573657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011100217.1A Active CN112037854B (en) 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data

Country Status (1)

Country Link
CN (1) CN112037854B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103409514A (en) * 2013-07-23 2013-11-27 徐州医学院 Chip-based 5-hydroxymethylated cytosine detection method with high flux and high sensitivity
CN107119144A (en) * 2017-07-05 2017-09-01 昆明医科大学第附属医院 Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_55 application
CN107677831A (en) * 2017-06-28 2018-02-09 深圳市龙岗中心医院 The method for determining the diagnosis marker for assessing schizophrenia patients
CN109616198A (en) * 2018-12-28 2019-04-12 陈洪亮 It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer
CN109825584A (en) * 2019-03-01 2019-05-31 清华大学 DNA methylation marker object and its application using peripheral blood diagnosis early liver cancer
CN111440869A (en) * 2020-03-16 2020-07-24 武汉百药联科科技有限公司 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017201606A1 (en) * 2016-05-04 2017-11-30 Queen's University At Kingston Cell-free detection of methylated tumour dna
US11499196B2 (en) * 2016-06-07 2022-11-15 The Regents Of The University Of California Cell-free DNA methylation patterns for disease and condition analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103409514A (en) * 2013-07-23 2013-11-27 徐州医学院 Chip-based 5-hydroxymethylated cytosine detection method with high flux and high sensitivity
CN107677831A (en) * 2017-06-28 2018-02-09 深圳市龙岗中心医院 The method for determining the diagnosis marker for assessing schizophrenia patients
CN107119144A (en) * 2017-07-05 2017-09-01 昆明医科大学第附属医院 Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_55 application
CN109616198A (en) * 2018-12-28 2019-04-12 陈洪亮 It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer
CN109825584A (en) * 2019-03-01 2019-05-31 清华大学 DNA methylation marker object and its application using peripheral blood diagnosis early liver cancer
CN111440869A (en) * 2020-03-16 2020-07-24 武汉百药联科科技有限公司 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof

Also Published As

Publication number Publication date
CN112037854A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
US10127351B2 (en) Accurate and fast mapping of reads to genome
CN111304303B (en) Method for predicting microsatellite instability and application thereof
CN110299185B (en) Insertion variation detection method and system based on new generation sequencing data
JP2008507993A (en) Automated analysis of multiple probe target interaction patterns: pattern matching and allele identification
CN111091868B (en) Method and system for analyzing chromosome aneuploidy
CN113035273B (en) Rapid and ultrahigh-sensitivity DNA fusion gene detection method
CN112951418A (en) Method and device for evaluating methylation of linked regions based on liquid biopsy, terminal equipment and storage medium
CN109411015A (en) Tumor mutations load detection device and storage medium based on Circulating tumor DNA
CN111833965A (en) Urinary sediment genomic DNA classification method, device and application
CN110211633A (en) The detection method of mgmt gene promoter methylation, the processing method of sequencing data and processing unit
CN113257360B (en) Cancer screening model, and construction method and construction device of cancer screening model
CN111180013B (en) Device for detecting blood disease fusion gene
CN112037854B (en) Method and system for obtaining tumor methylation marker based on methylation chip data
CN105861696B (en) tumor metastasis gene detection system based on transcriptome
CN115954049A (en) Method, system and storage medium for detecting states of microsatellite unstable points
CN107885972A (en) It is a kind of based on the fusion detection method of single-ended sequencing and its application
KR102397822B1 (en) Apparatus and method for analyzing cells using chromosome structure and state information
CN111798926B (en) Pathogenic gene locus database and establishment method thereof
CN115497561A (en) Method and device for layering screening of methylation markers
CN112760384A (en) Pancreatic cancer prognosis determination method and device
CN102982253B (en) Methylation differential detection method and device between a kind of multisample
CN114093417B (en) Method and device for identifying chromosomal arm heterozygosity loss
CN108707663A (en) Reagent, preparation method and application for the miRNA sequencing quantitative result evaluations of cancer sample
CN116168761B (en) Method and device for determining characteristic region of nucleic acid sequence, electronic equipment and storage medium
CN116434830B (en) Tumor focus position identification method based on ctDNA multi-site methylation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant