CN112037854A - Method and system for acquiring tumor methylation marker based on methylation chip data - Google Patents

Method and system for acquiring tumor methylation marker based on methylation chip data Download PDF

Info

Publication number
CN112037854A
CN112037854A CN202011100217.1A CN202011100217A CN112037854A CN 112037854 A CN112037854 A CN 112037854A CN 202011100217 A CN202011100217 A CN 202011100217A CN 112037854 A CN112037854 A CN 112037854A
Authority
CN
China
Prior art keywords
value
beta
site
sample data
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011100217.1A
Other languages
Chinese (zh)
Other versions
CN112037854B (en
Inventor
李霞
万季
陈文霞
吴大英
祁淑英
杨楚钦
胡桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN
Original Assignee
LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN filed Critical LONGGANG DISTRICT CENTRAL HOSPITAL OF SHENZHEN
Priority to CN202011100217.1A priority Critical patent/CN112037854B/en
Publication of CN112037854A publication Critical patent/CN112037854A/en
Application granted granted Critical
Publication of CN112037854B publication Critical patent/CN112037854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Abstract

The invention provides a method and a system for acquiring tumor methylation markers based on methylation chip data, wherein the method comprises the following steps: acquiring sample data in a methylation chip; preprocessing sample data; carrying out T test on the pretreatment result; and screening the tumor methylation markers from the pretreatment result according to the T test result. The system comprises modules corresponding to the method steps. The method and the system for obtaining the tumor methylation marker based on the methylation chip screen the tumor methylation marker based on the DNA methylation chip data, improve the screening reliability and effectiveness of the tumor methylation marker, have simple and feasible screening method, and can be widely applied to the field of medical computer application.

Description

Method and system for acquiring tumor methylation marker based on methylation chip data
Technical Field
The invention relates to the technical field of methylation, in particular to a method and a system for acquiring a tumor methylation marker based on methylation chip data.
Background
Currently, DNA methylation affects the binding of DNA and protein to each other by modifying DNA bases, thereby playing an important role in normal development and disease development. Research shows that the abnormality of DNA methylation level has a close relationship with the occurrence of tumor, so that the search of markers related to early diagnosis of tumor from the DNA methylation map of tumor becomes one of the hot spots in recent years. The currently commonly used high throughput methylation quantification platform comprises a methylation chip platform and a methylation high throughput sequencing platform. Compared with a sequencing platform, the methylation chip platform has the characteristics of low cost and high sensitivity, so that the methylation chip platform can be applied to larger-scale clinical samples to mine more representative methylation markers. Generally, a simple statistical hypothesis testing method will be used to identify sites where there is a significant population difference in methylation signal in normal and tumor tissues, however this method may lead to false positive cases by failing to take into account the specific methylation signal distribution. Therefore, it is of great significance to improve the existing method to screen out methylation markers with higher accuracy for early diagnosis of tumors.
Disclosure of Invention
One of the objectives of the present invention is to provide a method and a system for obtaining tumor methylation markers based on methylation chip data, wherein tumor methylation markers are screened based on DNA methylation chip data, so that the reliability and effectiveness of tumor methylation marker screening are improved, and the screening method is simple, convenient and feasible, and can be widely applied to the field of medical computer application.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps:
obtaining sample data in the methylation chip;
preprocessing the sample data;
carrying out T test on the pretreatment result;
and screening out tumor methylation markers from the pretreatment result according to the T test result.
Preferably, the preprocessing the sample data includes:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
Figure BDA0002725061340000021
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
Preferably, the T-test on the preprocessing result includes:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
Preferably, said screening said pre-treatment results for tumor methylation markers according to T-test results comprises:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T inspection module is used for carrying out T inspection on the preprocessing result;
and the screening module is used for screening the tumor methylation markers from the pretreatment result according to the T test result.
Preferably, the preprocessing module performs operations including:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
Figure BDA0002725061340000041
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
Preferably, the T-test module performs operations comprising:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
Preferably, the screening module performs operations including:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of a method for obtaining tumor methylation markers based on methylation chip data according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of:
s101, acquiring sample data in the methylation chip;
step S102, preprocessing the sample data;
step S103, carrying out T test on the preprocessing result;
and step S104, screening out tumor methylation markers from the pretreatment result according to the T test result.
The working principle of the technical scheme is as follows:
performing signal detection on hybridization of the DNA sequence treated based on the nitrite by using a methylation chip; firstly, acquiring sample data in a methylation chip; the sample data includes: tumor tissue sample data and normal tissue sample data; secondly, preprocessing sample data acquired from the methylation chip; the purpose of pretreatment is to screen out valuable sites in sample data; then, carrying out T test on the pretreatment result; the purpose of carrying out T test on the preprocessing result is to obtain the P value of the preprocessing result data; finally, tumor methylation markers are screened from the pre-treatment results according to the T test result, i.e. the P value of the data of the treatment results.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the sample data obtained from the methylation chip is preprocessed, the preprocessing result is subjected to T test, and finally the tumor methylation marker is screened from the preprocessing result according to the T test result, so that the screening of the tumor methylation marker based on the DNA methylation chip data is completed, the screening reliability and effectiveness of the tumor methylation marker are improved, the screening method is simple, convenient and feasible, and the method can be widely applied to the field of medical computer application.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which is used for preprocessing the sample data and comprises the following steps:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
Figure BDA0002725061340000061
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
The working principle of the technical scheme is as follows:
the sample data in the methylation chip comprises tumor tissue sample data and normal tissue sample data; the preset site matching rule specifically comprises the following steps:
firstly, calculating Beta values of a site in tumor tissue sample data and a site in normal tissue sample data;
Figure BDA0002725061340000062
beta values range from 0 to 1;
the significance of the Beta value is:
1. any Beta value greater than or equal to 0.6 represents complete methylation;
2. any Beta value of 0.2 or less represents complete unmethylated;
3. beta values between 0.2 and 0.6 represent partial methylation;
matching all Beta values of a certain point in tumor tissue sample data with all Beta values of the certain point in normal tissue sample data according to the Beta meanings, and taking the certain point as the same point when the difference between the Beta value matching number and the total number is within 25%;
then, acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data, and acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
the preset intermediate threshold specifically comprises: selecting a threshold range of 0.2 to 0.6 according to the significance of the Beta value to distinguish whether the Beta value is methylated or not, and selecting a value from the threshold range of 0.2 to 0.6 as an intermediate threshold; for example: in the range of 0.2 to 0.6, sequentially increasing 0.05 as a selected point from 0.2 until the selected point is increased to 0.6, and selecting a value from a plurality of selected points as an intermediate threshold value;
the preset matrix coordinates are specifically: the first Beta numerical matrix and the second Beta numerical matrix have the same rows and columns, and any coordinate in the range of the rows and the columns is randomly selected;
then, if the corresponding first Beta value of the matrix coordinate in the first Beta numerical matrix is greater than or equal to the intermediate threshold value, and the corresponding second Beta value of the matrix coordinate in the second Beta numerical matrix is less than or equal to the intermediate threshold value, taking the matrix coordinate as a first index to be processed; if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
and calculating the sensitivity of the same site according to the number of the first to-be-processed coordinates and the second to-be-processed coordinates, and selecting the same site with the sensitivity larger than a preset sensitivity threshold value as a primary screening site.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the same sites are obtained by matching the sites of the tumor tissue sample data and the normal tissue sample data in the sample data, and the most valuable primary screening sites are screened out from the same sites according to the sensitivity and used as the preprocessing result, so that the accuracy of obtaining the tumor methylation marker is improved.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, wherein the T test on a pretreatment result comprises the following steps:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
The working principle of the technical scheme is as follows:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data; acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data; performing paired T test on the third Beta numerical matrix and the fourth Beta numerical matrix, and assuming that tomor is greater than normal; the T test specifically comprises the following steps: the student T test is mainly used for a test method of the difference degree of two average values of a small sample, and the probability of difference occurrence is deduced by using a T distribution theory so as to judge whether the difference of the two average values is obvious or not; the P value is specifically: a sum of probabilities of the sample or more extreme results than the sample occurring, provided that the original hypothesis is true; the traditional test needs to compare the statistic with the critical value, but the critical value is changed according to distribution and freedom degree, and the method adopting the P value does not need to guide the critical value, is simple and has more information than the critical value.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the P value of the primary screening site is obtained by performing T test on the preprocessing result, namely the primary screening site, and the P value of the primary screening site is used as the T test result, so that the T test can help to analyze whether the data difference is obvious or not, more information is provided, and the steps are relatively simple and convenient.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, wherein the method for screening the tumor methylation marker from the pretreatment result according to the T test result comprises the following steps:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
The working principle of the technical scheme is as follows:
filtering the primary screening site according to the size relation between a preset P-value threshold and the P value of the primary screening site; when the P value of the primary screening site is smaller than a preset P-value threshold value, selecting the primary screening site as a tumor methylation marker;
for example: setting the P-value threshold to be 0.001, and selecting a site with the P value less than 0.001 from the primary screening sites as a final tumor methylation marker.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the P value of the primary screening site is compared with the preset P-value threshold, and if the P value of the primary screening site is smaller than the preset P-value threshold, the primary screening site is selected as the tumor methylation marker, so that the primary screening site is further inspected and screened, and the accuracy of obtaining the tumor methylation marker is improved.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T inspection module is used for carrying out T inspection on the preprocessing result;
and the screening module is used for screening the tumor methylation markers from the pretreatment result according to the T test result.
The working principle of the technical scheme is as follows:
the system of the embodiment of the invention consists of an acquisition module, a preprocessing module, a T inspection module and a screening module; performing signal detection on hybridization of the DNA sequence treated based on the nitrite by using a methylation chip; firstly, an acquisition module acquires sample data in a methylation chip; the sample data includes: tumor tissue sample data and normal tissue sample data; secondly, the preprocessing module preprocesses the sample data acquired from the methylation chip; the purpose of pretreatment is to screen out valuable sites in sample data; then, the T inspection module performs T inspection on the preprocessing result; the purpose of carrying out T test on the preprocessing result is to obtain the P value of the preprocessing result data; finally, the screening module screens the tumor methylation markers from the pre-processing results according to the T test results, i.e., the P value of the processing result data.
The beneficial effects of the above technical scheme are: according to the embodiment of the invention, the sample data obtained from the methylation chip is preprocessed, the preprocessing result is subjected to T test, and finally the tumor methylation marker is screened from the preprocessing result according to the T test result, so that the screening of the tumor methylation marker based on the DNA methylation chip data is completed, the screening reliability and effectiveness of the tumor methylation marker are improved, the screening method is simple, convenient and feasible, and the method can be widely applied to the field of medical computer application.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the preprocessing module executes the following operations:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed; based on the first number and the second number, calculating the sensitivity of the same site:
Figure BDA0002725061340000101
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
The working principle of the technical scheme is as follows:
the sample data in the methylation chip comprises tumor tissue sample data and normal tissue sample data; the preset site matching rule specifically comprises the following steps:
firstly, calculating Beta values of a site in tumor tissue sample data and a site in normal tissue sample data;
Figure BDA0002725061340000111
beta values range from 0 to 1;
the significance of the Beta value is:
1. any Beta value greater than or equal to 0.6 represents complete methylation;
2. any Beta value of 0.2 or less represents complete unmethylated;
3. beta values between 0.2 and 0.6 represent partial methylation;
matching all Beta values of a certain point in tumor tissue sample data with all Beta values of the certain point in normal tissue sample data according to the Beta meanings, and taking the certain point as the same point when the difference between the Beta value matching number and the total number is within 25%;
then, acquiring a first Beta numerical matrix corresponding to the same position in the tumor tissue sample data, and acquiring a second Beta numerical matrix corresponding to the same position in the normal tissue sample data;
the preset intermediate threshold specifically comprises: selecting a threshold range of 0.2 to 0.6 according to the significance of the Beta value to distinguish whether the Beta value is methylated or not, and selecting a value from the threshold range of 0.2 to 0.6 as an intermediate threshold; for example: in the range of 0.2 to 0.6, sequentially increasing 0.05 as a selected point from 0.2 until the selected point is increased to 0.6, and selecting a value from a plurality of selected points as an intermediate threshold value;
the preset matrix coordinates are specifically: the first Beta numerical matrix and the second Beta numerical matrix have the same rows and columns, and any coordinate in the range of the rows and the columns is randomly selected;
then, if the corresponding first Beta value of the matrix coordinate in the first Beta numerical matrix is greater than or equal to the intermediate threshold value, and the corresponding second Beta value of the matrix coordinate in the second Beta numerical matrix is less than or equal to the intermediate threshold value, taking the matrix coordinate as a first index to be processed; if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
and calculating the sensitivity of the same site according to the number of the first to-be-processed coordinates and the second to-be-processed coordinates, and selecting the same site with the sensitivity larger than a preset sensitivity threshold value as a primary screening site.
The beneficial effects of the above technical scheme are: the preprocessing module of the embodiment of the invention obtains the same sites by matching the sites of the tumor tissue sample data and the normal tissue sample data in the sample data, and then selects the most valuable primary screening sites from the same sites as the preprocessing result according to the sensitivity, thereby improving the accuracy of obtaining the tumor methylation markers.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the T-test module executes the following operations:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
The working principle of the technical scheme is as follows:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data; acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data; performing paired T test on the third Beta numerical matrix and the fourth Beta numerical matrix, and assuming that tomor is greater than normal; the T test specifically comprises the following steps: the student T test is mainly used for a test method of the difference degree of two average values of a small sample, and the probability of difference occurrence is deduced by using a T distribution theory so as to judge whether the difference of the two average values is obvious or not; the P value is specifically: a sum of probabilities of the sample or more extreme results than the sample occurring, provided that the original hypothesis is true; the traditional test needs to compare the statistic with the critical value, but the critical value is changed according to distribution and freedom degree, and the method adopting the P value does not need to guide the critical value, is simple and has more information than the critical value.
The beneficial effects of the above technical scheme are: the T test module of the embodiment of the invention performs T test on the preprocessing result, namely the primary screening site, so as to obtain the P value of the primary screening site and take the P value of the primary screening site as the T test result, and the T test can help to analyze whether the data difference is obvious or not, has more information quantity and has simpler steps.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein the screening module executes the following operations:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
The working principle of the technical scheme is as follows:
the screening module filters the primary screening site according to the size relation between a preset P-value threshold and the P value of the primary screening site; when the P value of the primary screening site is smaller than a preset P-value threshold value, selecting the primary screening site as a tumor methylation marker;
for example: setting the P-value threshold to be 0.001, and selecting a site with the P value less than 0.001 from the primary screening sites as a final tumor methylation marker.
The beneficial effects of the above technical scheme are: the screening module of the embodiment of the invention compares the P value of the primary screening site with the preset P-value threshold, and selects the primary screening site as the tumor methylation marker if the P value of the primary screening site is smaller than the preset P-value threshold, so that the primary screening site is further inspected and screened, and the accuracy of obtaining the tumor methylation marker is improved.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps:
step S1: obtaining a methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue corresponding to a common locus of the sample tumor tissue and the sample normal tissue;
step S2: based on the selection of different thresholds, acquiring the sensitivity of the common sites in the tumor tissues and the normal tissues of the sample, and based on the selected sensitivity threshold, selecting the sites for preliminary screening;
step S3: based on the primary screened sites, a T-test is performed and a tumor methylation marker is screened by filtering with a preset P-value threshold.
The working principle of the technical scheme is as follows:
obtaining methylation site sensitivities under different thresholds based on a methylation chip beta numerical value matrix of the sample tumor tissue and the sample normal tissue corresponding to the common site of the sample tumor tissue and the sample normal tissue, and selecting a primary screening site based on the selected sensitivity threshold; and then carrying out T test based on the primary screened sites, and filtering by using a preset P-value threshold value to obtain filtered methylation sites, thereby realizing the screening of the tumor methylation markers.
The beneficial effects of the above technical scheme are:
the invention screens reliable and effective tumor methylation markers based on DNA methylation chip data, obtains the sensitivity of methylation sites by using different thresholds, and performs T test on the basis, thereby improving the reliability and effectiveness of the screening of the tumor methylation markers.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of S1: obtaining a methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue corresponding to a common locus of the sample tumor tissue and the sample normal tissue, wherein the methylation chip Beta numerical matrix comprises:
step S11: obtaining a common site of a sample tumor tissue and a sample normal tissue;
first, in DNA methylation chip data of paired samples, the value of methylation chip Beta is the intensity value from methylated bead type/(intensity value from methylated bead type + intensity value from unmethylated bead type +100), the range of Beta values is from 0 (completely unmethylated) to 1 (completely methylated), and the meaning of the specific methylation chip Beta value is:
1. any Beta value equal to or greater than 0.6 is considered fully methylated;
2. any Beta value equal to or less than 0.2 is considered to be completely unmethylated;
3. beta values between 0.2 and 0.6 are considered partially methylated;
specifically, according to the positions of the paired tumor sample and normal sample, the positions, in which the number of the paired samples existing in the tumor sample and the normal sample is within 25% of the total number of the paired samples, are obtained.
Step S12: based on the above-mentioned sites, obtain the correspondent methylated chip Beta numerical matrix of site in sample tumor tissue and sample normal tissue.
Specifically, a Beta value matrix of the methylation chip of the consensus site in the tumor sample and a Beta value matrix of the consensus site in the normal sample are obtained, the row is the id of each sample, and the column is the Beta value of the methylation chip of the site in the sample.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of S2: based on the selection of different thresholds, the sensitivity of the common sites in the tumor tissues and the normal tissues of the samples is obtained, and based on the selected sensitivity threshold, the sites of the primary screening are selected, which comprises the following steps:
step S21: selecting different threshold values in a certain numerical range based on the methylation chip Beta numerical matrix of the sample tumor tissue and the sample normal tissue, so that the numerical value is larger than the methylation chip Beta value in the sample normal tissue and smaller than the methylation chip Beta value in the sample tumor tissue;
firstly, according to the significance of a Beta value of a methylation chip, the selected threshold range is 0.2 to 0.6 and is used for distinguishing whether the chip is methylated or not;
next, a value is selected based on the threshold range
Greater than the methylated chip Beta value in normal tissue of the sample and less than the methylated chip Beta value in tumor tissue of the sample.
Step S22: based on the above condition, a site is obtained where the ratio (sensitivity) of the number of samples satisfying the condition to the total number of paired samples is greater than a selected sensitivity threshold.
Specifically, according to the selected threshold range, selecting the primary screening sites with the ratio of the number of samples which is greater than the Beta value of the methylation chip in the normal tissue of the sample and less than the Beta value of the methylation chip in the tumor tissue of the sample to the total number of matched samples being greater than 80%.
The embodiment of the invention provides a method for acquiring a tumor methylation marker based on methylation chip data, which comprises the following steps of S3: performing a T-test based on the primary screened sites, and filtering with a preset P-value threshold to screen for tumor methylation markers, comprising:
step S31: acquiring P value details of the preliminarily screened site T test;
specifically, based on the primary screened sites, obtaining P values of the primary screened sites according to the methylation chip Beta value matrix of the sites in the tumor sample and the methylation chip Beta value matrix in the matched normal sample by using a matched T test and assuming that tomor is greater than normal;
step S32: and filtering by using a preset p-value threshold value to obtain the tumor methylation marker.
Specifically, the P-value threshold is set to be 0.001, and the site satisfying that the P value is less than 0.001 is obtained, namely the final tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, which comprises the following steps:
a Beta numerical matrix obtaining unit for obtaining a Beta numerical matrix of the common sites in the matched sample tumor tissue and the sample normal tissue;
the sensitivity site primary screening acquisition unit selects different thresholds of a certain numerical range based on corresponding Beta numerical matrixes in the matched sample tumor tissue and sample normal tissue, and acquires a primary screening methylation site with the sensitivity greater than the selected sensitivity threshold;
and the tumor methylation marker determining unit calculates the P value condition of the T test of the sites based on the initially screened methylation sites, and filters the P value condition by using a preset P-value threshold value to obtain the tumor methylation marker.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein a Beta numerical matrix acquisition unit comprises:
a calculating subunit, configured to calculate a common site in the sample tumor tissue and the sample normal tissue;
and obtaining a subunit for obtaining a corresponding Beta value matrix of the consensus site in the tumor tissue and the sample normal tissue.
The embodiment of the invention provides a system for acquiring tumor methylation markers based on methylation chip data, wherein a sensitivity site primary screening acquisition unit comprises:
a calculating subunit for calculating the sensitivity of the site in the paired sample tissues;
subunits are obtained, and primary screened methylation sites are obtained based on sensitivity and a selected sensitivity threshold.
The embodiment of the invention provides a system for acquiring a tumor methylation marker based on methylation chip data, wherein a tumor methylation marker determining unit comprises:
the P value calculation operator unit is used for calculating the P value details of the primary screening site T test;
and a tumor methylation marker acquisition subunit, which is used for filtering by using a preset P-value threshold value based on the P value to acquire the tumor methylation marker.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A method for obtaining tumor methylation markers based on methylation chip data, comprising:
obtaining sample data in the methylation chip;
preprocessing the sample data;
carrying out T test on the pretreatment result;
and screening out tumor methylation markers from the pretreatment result according to the T test result.
2. The method of claim 1, wherein the pre-processing the sample data comprises:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
Figure FDA0002725061330000011
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
3. The method of claim 2, wherein the T-test of the pre-processing result comprises:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
4. The method of claim 3, wherein the screening of the tumor methylation markers from the pre-processed results according to the T test result comprises:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
5. A system for obtaining tumor methylation markers based on methylation chip data, comprising:
the acquisition module is used for acquiring sample data in the methylation chip;
the preprocessing module is used for preprocessing the sample data;
the T inspection module is used for carrying out T inspection on the preprocessing result;
and the screening module is used for screening the tumor methylation markers from the pretreatment result according to the T test result.
6. The system of claim 5, wherein the preprocessing module performs operations comprising:
extracting tumor tissue sample data and normal tissue sample data in the sample data;
matching the first site of the tumor tissue sample data with the second site of the normal tissue sample data according to a preset site matching rule to obtain a plurality of identical sites;
acquiring a first Beta numerical matrix corresponding to the same site in the tumor tissue sample data;
acquiring a second Beta numerical matrix corresponding to the same position point in the normal tissue sample data;
acquiring a preset intermediate threshold and a matrix coordinate;
acquiring a first Beta value corresponding to the matrix coordinate in the first Beta value matrix;
acquiring a second Beta value corresponding to the matrix coordinate in the second Beta value matrix;
if the first Beta value of the matrix coordinate is larger than or equal to the intermediate threshold value and the second Beta value of the matrix coordinate is smaller than or equal to the intermediate threshold value, taking the matrix coordinate as a first coordinate to be processed;
if the first Beta value of the matrix coordinate is smaller than the intermediate threshold value or the second Beta value of the matrix coordinate is larger than the intermediate threshold value, taking the matrix coordinate as a second coordinate to be processed;
acquiring a first number of the first coordinates to be processed and a second number of the second coordinates to be processed;
based on the first number and the second number, calculating the sensitivity of the same site:
Figure FDA0002725061330000031
wherein S is sensitivity, A is a first number, and B is a second number;
if the sensitivity is greater than a preset sensitivity threshold, the same site is a primary screening site;
and taking the primary screening site as a pretreatment result.
7. The system of claim 5, wherein the T-test module performs operations comprising:
acquiring a third Beta numerical matrix corresponding to the primary screening site in the tumor tissue sample data;
acquiring a fourth Beta numerical matrix corresponding to the primary screening site in the normal tissue sample data;
carrying out pairing T test on the third Beta numerical matrix and the fourth Beta numerical matrix of the primary screening site to obtain a P value of the primary screening site;
and taking the P value as a T test result.
8. The system of claim 5, wherein the screening module performs the following operations:
acquiring a preset P-value threshold;
if the P value of the primary screening site is less than the P-value threshold, the primary screening site is a tumor methylation marker.
CN202011100217.1A 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data Active CN112037854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011100217.1A CN112037854B (en) 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011100217.1A CN112037854B (en) 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data

Publications (2)

Publication Number Publication Date
CN112037854A true CN112037854A (en) 2020-12-04
CN112037854B CN112037854B (en) 2024-04-09

Family

ID=73573657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011100217.1A Active CN112037854B (en) 2020-10-15 2020-10-15 Method and system for obtaining tumor methylation marker based on methylation chip data

Country Status (1)

Country Link
CN (1) CN112037854B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103409514A (en) * 2013-07-23 2013-11-27 徐州医学院 Chip-based 5-hydroxymethylated cytosine detection method with high flux and high sensitivity
CN107119144A (en) * 2017-07-05 2017-09-01 昆明医科大学第附属医院 Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_55 application
CN107677831A (en) * 2017-06-28 2018-02-09 深圳市龙岗中心医院 The method for determining the diagnosis marker for assessing schizophrenia patients
CN109616198A (en) * 2018-12-28 2019-04-12 陈洪亮 It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer
CN109825584A (en) * 2019-03-01 2019-05-31 清华大学 DNA methylation marker object and its application using peripheral blood diagnosis early liver cancer
US20190256921A1 (en) * 2016-05-04 2019-08-22 Queen's University At Kingston Cell-free detection of methylated tumour dna
US20200131582A1 (en) * 2016-06-07 2020-04-30 The Regents Of The University Of California Cell-free dna methylation patterns for disease and condition analysis
CN111440869A (en) * 2020-03-16 2020-07-24 武汉百药联科科技有限公司 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103409514A (en) * 2013-07-23 2013-11-27 徐州医学院 Chip-based 5-hydroxymethylated cytosine detection method with high flux and high sensitivity
US20190256921A1 (en) * 2016-05-04 2019-08-22 Queen's University At Kingston Cell-free detection of methylated tumour dna
US20200131582A1 (en) * 2016-06-07 2020-04-30 The Regents Of The University Of California Cell-free dna methylation patterns for disease and condition analysis
CN107677831A (en) * 2017-06-28 2018-02-09 深圳市龙岗中心医院 The method for determining the diagnosis marker for assessing schizophrenia patients
CN107119144A (en) * 2017-07-05 2017-09-01 昆明医科大学第附属医院 Multi-functional transcription regulatory factor CTCF DNA binding sites CTCF_55 application
CN109616198A (en) * 2018-12-28 2019-04-12 陈洪亮 It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer
CN109825584A (en) * 2019-03-01 2019-05-31 清华大学 DNA methylation marker object and its application using peripheral blood diagnosis early liver cancer
CN111440869A (en) * 2020-03-16 2020-07-24 武汉百药联科科技有限公司 DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof

Also Published As

Publication number Publication date
CN112037854B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN110866893B (en) Pathological image-based TMB classification method and system and TMB analysis device
EP2700042B1 (en) Analyzing the expression of biomarkers in cells with moments
CN108256292B (en) Copy number variation detection device
CN110326051B (en) Method and analysis system for identifying expression discrimination elements in biological samples
CN108664769B (en) Drug relocation method based on cancer genome and non-specific gene tag
CN113035273B (en) Rapid and ultrahigh-sensitivity DNA fusion gene detection method
CN109411015A (en) Tumor mutations load detection device and storage medium based on Circulating tumor DNA
CN113257360B (en) Cancer screening model, and construction method and construction device of cancer screening model
CN110991536A (en) Training method of early warning model of primary liver cancer
CN111180013B (en) Device for detecting blood disease fusion gene
CN111402955A (en) Biological information measuring method, system, storage medium and terminal
CN111696622B (en) Method for correcting and evaluating detection result of mutation detection software
CN115954049B (en) Microsatellite unstable locus state detection method, system and storage medium
AU2022218581B2 (en) Sequencing data-based itd mutation ratio detecting apparatus and method
CN105861696B (en) tumor metastasis gene detection system based on transcriptome
CN112037854A (en) Method and system for acquiring tumor methylation marker based on methylation chip data
CN116864011A (en) Colorectal cancer molecular marker identification method and system based on multiple sets of chemical data
CN115424666B (en) Method and system for screening early-stage screening sub-markers of pan-cancer based on whole genome bisulfite sequencing data
CN114613436B (en) Blood sample Motif feature extraction method and cancer early screening model construction method
CN116434830B (en) Tumor focus position identification method based on ctDNA multi-site methylation
CN116542978B (en) Quality detection method and device for FISH probe
CN108707663A (en) Reagent, preparation method and application for the miRNA sequencing quantitative result evaluations of cancer sample
CN116855596A (en) Rice variety homogeneity evaluation method
Jangamashetti et al. Automatically Locating Spots in DNAMicroarray Image Using Genetic Algorithm without Gridding
CN117912555A (en) Tumor somatic mutation tag tendency score calculation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant