CN116758993A - DNA methylation prediction method integrating multiple groups of chemical characteristics - Google Patents
DNA methylation prediction method integrating multiple groups of chemical characteristics Download PDFInfo
- Publication number
- CN116758993A CN116758993A CN202310718721.5A CN202310718721A CN116758993A CN 116758993 A CN116758993 A CN 116758993A CN 202310718721 A CN202310718721 A CN 202310718721A CN 116758993 A CN116758993 A CN 116758993A
- Authority
- CN
- China
- Prior art keywords
- methylation
- characteristic
- dna methylation
- prediction model
- chemical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007067 DNA methylation Effects 0.000 title claims abstract description 87
- 239000000126 substance Substances 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims abstract description 16
- 230000011987 methylation Effects 0.000 claims abstract description 73
- 238000007069 methylation reaction Methods 0.000 claims abstract description 73
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 64
- 201000011510 cancer Diseases 0.000 claims abstract description 64
- 108091029430 CpG site Proteins 0.000 claims abstract description 51
- 108020004999 messenger RNA Proteins 0.000 claims abstract description 45
- 108091070501 miRNA Proteins 0.000 claims abstract description 44
- 239000002679 microRNA Substances 0.000 claims abstract description 44
- 238000011156 evaluation Methods 0.000 claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 10
- 210000002569 neuron Anatomy 0.000 claims description 29
- 238000002790 cross-validation Methods 0.000 claims description 4
- 210000002364 input neuron Anatomy 0.000 claims description 3
- 238000010187 selection method Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007321 biological mechanism Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001909 effect on DNA Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Biotechnology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physiology (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Genetics & Genomics (AREA)
- Analytical Chemistry (AREA)
- Bioethics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a DNA methylation prediction method integrating multiple groups of chemical characteristics, which comprises the steps of determining CpG sites to be predicted, obtaining multiple groups of chemical characteristic sets of CpG sites in a cancer tissue, obtaining methylation characteristic sets of CpG sites in the cancer tissue, respectively calculating correlation coefficients between the methylation characteristics of the CpG sites of the cancer tissue and miRNA, mRNA and methylation characteristics of the cancer tissue based on Pearson correlation coefficients, respectively selecting K miRNAs, Q mRNAs and L methylation characteristics according to the values of the correlation coefficients, constructing multiple groups of chemical correlation characteristic sets, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the multiple groups of chemical correlation characteristic sets, calculating an evaluation index of the trained DNA methylation prediction model, obtaining the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting the methylation of the cancer tissue according to the evaluated DNA methylation prediction model. The prediction accuracy of DNA methylation is improved.
Description
Technical Field
The invention relates to the field of DNA methylation prediction, in particular to a DNA methylation prediction method integrating multiple groups of chemical characteristics.
Background
DNA methylation of cancer tissues and paracancerous tissues is closely related to the occurrence and development of cancer, and analysis of DNA methylation variation helps reveal the molecular biological mechanisms of cancer. It has been difficult to predict DNA methylation using only a single set of chemicals, and with the rapid development of sequencing technology researchers have obtained a vast array of multiple biological sets of data, and there is a significant correlation between DNA methylation of cancerous and paracancestral tissues and multiple sets of data, so it is highly necessary to predict DNA methylation using integrated multiple sets of chemical features. In summary, DNA methylation data is important for cancer research, and existing research only uses a single set of biological information provided by a single set of biological characteristics to predict DNA methylation, and the prediction effect on DNA methylation data is poor.
Disclosure of Invention
The invention provides a DNA methylation prediction method integrating multiple groups of chemical characteristics so as to overcome the technical problems.
A DNA methylation prediction method integrating multiple sets of chemical characteristics comprises,
step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
calculating correlation coefficients between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient, selecting K miRNA characteristics from the miRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic of the tissue beside the cancer, selecting Q mRNA characteristics from the mRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the mRNA characteristic of the tissue beside the cancer, selecting L methylation characteristics from the methylation characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the methylation characteristic of the tissue beside the cancer,
thirdly, constructing a plurality of groups of related features according to the K miRNA features, the Q mRNA features and the L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the plurality of groups of related features, calculating an evaluation index of the trained DNA methylation prediction model, acquiring the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting methylation of cancer tissues according to the evaluated DNA methylation prediction model.
Preferably, calculating the correlation coefficient between the methylation characteristic of the CpG sites of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the paracancerous tissue based on the Pearson correlation coefficient comprises calculating the correlation coefficient according to the formula (1),
wherein x is i CpG site methylation characteristic value, mRNA characteristic value or miRNA characteristic value of the i-th sample paracancerous tissue in the plurality of groups of chemical characteristic sets,representing the characteristic average value of all samples of the CpG sites; y is i Methylation characteristic value representing the corresponding CpG site in the ith sample cancer tissue in the multiple sets of chemical characteristics,/I>The methylation characteristic average value of all samples of the CpG sites is shown, and n is the number of samples.
Preferably, the constructing a DNA methylation prediction model based on the deep neural network is that the DNA methylation prediction model includes v input neurons, k hidden layer neurons, h output layer neurons, and the input received by the q-th neuron of the hidden layer is:
the output of the hidden layer qth neuron is:
wherein the weight between the p-th neuron of the input layer and the q-th neuron of the hidden layer is w pq ,x i As input vector, the input received by the r-th neuron of the output layer is the characteristic b of a plurality of groups of the characteristics sets of the chemical relevance j N is the number of samples, and the weight between the h neuron of the hidden layer and the r neuron of the output layer is e hr 。
Preferably, the calculating the evaluation index of the trained DNA methylation prediction model comprises calculating the absolute value of the Pearson correlation coefficient according to the formula (4), calculating the average absolute error of the Pearson correlation coefficient according to the formula (5),
in the middle ofy i. Representing the predicted DNA methylation value and the actual DNA methylation characteristic value of the ith sample respectively,represents the predicted mean and the actual mean, respectively, +.>Respectively representing standard deviations; />y ij The predicted DNA methylation value and the actual DNA methylation value, respectively, represent the jth feature of the ith sample.
Preferably, the DNA methylation predictive model can also be optimized for parameters of the DNA methylation predictive model by ten fold cross validation.
The invention provides a DNA methylation prediction method integrating multiple groups of chemical features, which is characterized in that the multiple groups of chemical features related to target CpG sites are extracted based on a feature selection method, then a model integrating the multiple groups of chemical features to predict the DNA methylation level of cancer tissues is established, the influence of key parameters such as a neural network structure, feature quantity and the like on the performance of the DNA methylation prediction model is analyzed by comparing performance indexes such as average absolute errors and the like, model parameters are optimized, and the accuracy of DNA methylation prediction is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a diagram of an implementation of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flowchart of the method of the present invention, as shown in FIG. 1, the method of the present embodiment may include:
step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
calculating correlation coefficients between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient, selecting K miRNA characteristics from the miRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic of the tissue beside the cancer, selecting Q mRNA characteristics from the mRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the mRNA characteristic of the tissue beside the cancer, selecting L methylation characteristics from the methylation characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the methylation characteristic of the tissue beside the cancer,
thirdly, constructing a plurality of groups of related features according to the K miRNA features, the Q mRNA features and the L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the plurality of groups of related features, calculating an evaluation index of the trained DNA methylation prediction model, acquiring the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting methylation of cancer tissues according to the evaluated DNA methylation prediction model.
Based on the scheme, a plurality of groups of chemical features related to the target CpG sites are extracted through a feature selection method, a model for predicting the DNA methylation level of cancer tissues by integrating the plurality of groups of chemical features is established, the influence of key parameters such as a neural network structure, feature quantity and the like on the performance of the DNA methylation prediction model is analyzed by comparing performance indexes such as average absolute errors, model parameters are optimized, and the accuracy of DNA methylation prediction is improved.
Step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
specifically, a feature selection method is adopted to extract multiple groups of chemical features at first:
definition of miRNA data matrix as I i =(miRNA i1 ,miRNA i2 ,...,miRNA ia ),
mRNA data matrix N i =(mRNA i1 ,mRNA i2 ,...,mRNA ib ),
Methylation data matrix M i =(CpG i1 ,CPG i2 ,...,CpG ic ),
Where i represents samples (n samples total), j represents features (a total of miRNA data, b total of mRNA data, and c total of methylation data).
Calculating correlation coefficients between methylation characteristics of CpG sites of the cancer tissues and miRNA characteristics, mRNA characteristics and methylation characteristics of the tissues beside the cancer based on the Pearson correlation coefficients,
calculating the correlation coefficient between the methylation characteristic of the CpG sites of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient comprises calculating the correlation coefficient according to a formula (1),
wherein x is i CpG site methylation characteristic value, mRNA characteristic value or miRNA characteristic value of the i-th sample paracancerous tissue in the plurality of groups of chemical characteristic sets,representing the characteristic average value of all samples of the CpG sites; y is i Methylation characteristic value representing the corresponding CpG site in the ith sample cancer tissue in the multiple sets of chemical characteristics,/I>The methylation characteristic average value of all samples of the CpG sites is shown, and n is the number of samples.
K miRNA features are selected from the miRNA feature set according to the value of the correlation coefficient of the methylation feature of the CpG site of the cancer tissue and the miRNA feature of the tissue beside the cancer, Q mRNA features are selected from the mRNA feature set according to the value of the correlation coefficient of the methylation feature of the CpG site of the cancer tissue and the mRNA feature of the tissue beside the cancer, L methylation features are selected from the methylation feature set according to the value of the correlation coefficient of the methylation feature of the CpG site of the cancer tissue and the methylation feature of the tissue beside the cancer, specifically,
after calculating the correlation coefficients between the CpG sites, i.e. the target sites, of the cancer tissue and the three histology features, respectively, for each target site, the first K miRNA (top miRNA) with high correlation coefficient values are selected according to formula (2), the first Q mRNA (top mRNA) are selected according to formula (3), and the first L methylation (top methyl) features are selected according to formula (4):
wherein CpG is used target j Represents the j-th predicted target site, m represents the number of target sites and miRNA 1 …miRNA K Representative and CpG target j The top K most relevant miRNA features, mRNA 1 …mRNA Q Representative and CpG target j The top Q mRNA features most relevant, cpG 1 …CpG L Representative and CpG target j The top L methylation signatures most relevant, the total number of signatures required for each target site is: v=k+q+l.
Thirdly, constructing a plurality of sets of related features according to K miRNA features, Q mRNA features and L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the sets of related features, and constructing the DNA methylation prediction model: with multiple sets of biologically relevant features (miRNAs) 1 …miRNA K ,mRNA 1 …mRNA Q ,CpG 1 …CpG L ) As input data, the dimension of the input vector is v, and the dimension of the output vector is h.
The DNA methylation prediction model is constructed based on a deep neural network and comprises v input neurons, k hidden layer neurons and h output layer neurons, wherein the input received by the q-th neuron of the hidden layer is as follows:
the output of the hidden layer qth neuron is:
wherein the weight between the p-th neuron of the input layer and the q-th neuron of the hidden layer is w pq ,x i As input vector, the input received by the r-th neuron of the output layer is the characteristic b of a plurality of groups of the characteristics sets of the chemical relevance j N is the number of samples, the h neuron of the hidden layer and the h neuron of the output layerThe weight between r neurons is e hr 。
Calculating the evaluation index of the trained DNA methylation prediction model,
the evaluation index of the DNA methylation prediction model after calculation training comprises the absolute value of the Pearson correlation coefficient calculated according to a formula (7) and the average absolute error of the Pearson correlation coefficient calculated according to a formula (8),
in the middle ofy i. Representing the predicted DNA methylation value and the actual DNA methylation characteristic value of the ith sample respectively,represents the predicted mean and the actual mean, respectively, +.>Respectively representing standard deviations; />y ij The predicted DNA methylation value and the actual DNA methylation value, respectively, represent the jth feature of the ith sample.
And when the evaluation index meets the threshold value, acquiring an evaluated DNA methylation prediction model, and predicting methylation of the cancer tissue according to the evaluated DNA methylation prediction model. The DNA methylation prediction model can optimize parameters of the DNA methylation prediction model through ten-fold cross validation, specifically, a data set is divided into 10 mutually exclusive subsets with equal size, data consistency is maintained during division, a union set of 9 subsets is used as a training set each time, the rest is used as a test set, 10 times of training are carried out, and finally, an average value of 10 times of results is taken.
In the mathematical model studied in this embodiment, using multiple sets of chemical data in the surrogate tissue to predict DNA methylation data in the target tissue can be decomposed into the following technical steps, and the implementation flow chart is shown in fig. 2:
matching and data preprocessing the miRNA, mRNA, DNA methylation three histology data of the cancer tissue and the paracancer tissue;
performing feature extraction and fusion on multiple groups of chemical data by using a feature selection strategy based on correlation;
establishing a mathematical model of DNA methylation between the paracancerous tissue and the cancerous tissue;
predicting DNA methylation data of the target tissue using the multiple sets of integrated data of the surrogate tissue;
the influence of key parameters such as the number of layers, the number of neurons and characteristic dimensions of the deep learning model on the performance of the model is analyzed, the model parameters are optimized through ten-fold cross validation, namely, a data set is divided into 10 mutually exclusive subsets with equal size, the consistency of the data is maintained during division, the union set of 9 subsets is used as a training set each time, the rest is used as a test set, 10 times of training are carried out, and finally, the average value of 10 times of results is taken.
Absolute values (R), mean absolute error (MAE, mean absolute error) of pearson correlation coefficients were used to evaluate the predictive performance of the model.
The whole beneficial effects are that:
the invention provides a DNA methylation prediction method integrating multiple groups of chemical features, which is characterized in that the multiple groups of chemical features related to target CpG sites are extracted based on a feature selection method, then a model integrating the multiple groups of chemical features to predict the DNA methylation level of cancer tissues is established, the influence of key parameters such as a neural network structure, feature quantity and the like on the performance of the DNA methylation prediction model is analyzed by comparing performance indexes such as average absolute errors and the like, model parameters are optimized, and the accuracy of DNA methylation prediction is improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
Claims (5)
1. A DNA methylation prediction method integrating multiple groups of chemical characteristics is characterized by comprising the following steps of,
step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
calculating correlation coefficients between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient, selecting K miRNA characteristics from the miRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic of the tissue beside the cancer, selecting Q mRNA characteristics from the mRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the mRNA characteristic of the tissue beside the cancer, selecting L methylation characteristics from the methylation characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the methylation characteristic of the tissue beside the cancer,
thirdly, constructing a plurality of groups of related features according to the K miRNA features, the Q mRNA features and the L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the plurality of groups of related features, calculating an evaluation index of the trained DNA methylation prediction model, acquiring the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting methylation of cancer tissues according to the evaluated DNA methylation prediction model.
2. The method for DNA methylation prediction integrated with multiple sets of chemical features according to claim 1, wherein calculating correlation coefficients between methylation features of CpG sites of cancer tissue and miRNA features, mRNA features and methylation features of paracancerous tissue, respectively, based on Pearson correlation coefficients comprises calculating correlation coefficients according to formula (1),
wherein x is i CpG site methylation characteristic value, mRNA characteristic value or miRNA characteristic value of the i-th sample paracancerous tissue in the plurality of groups of chemical characteristic sets,representing the characteristic average value of all samples of the CpG sites; y is i Methylation characteristic value representing the corresponding CpG site in the ith sample cancer tissue in the multiple sets of chemical characteristics,/I>The methylation characteristic average value of all samples of the CpG sites is shown, and n is the number of samples.
3. The DNA methylation prediction method integrating multiple sets of chemical characteristics according to claim 1, wherein the constructing a DNA methylation prediction model based on a deep neural network is that the DNA methylation prediction model includes v input neurons, k hidden layer neurons, h output layer neurons, and the input received by the q-th neuron of the hidden layer is:
the output of the hidden layer qth neuron is:
wherein the weight between the p-th neuron of the input layer and the q-th neuron of the hidden layer is w pq ,x i As input vector, the input received by the r-th neuron of the output layer is the characteristic b of a plurality of groups of the characteristics sets of the chemical relevance j N is the number of samples, and the weight between the h neuron of the hidden layer and the r neuron of the output layer is e hr 。
4. The method for DNA methylation prediction integrated with multiple sets of chemical features according to claim 1, wherein the calculating the evaluation index of the trained DNA methylation prediction model comprises calculating an absolute value of the Pearson correlation coefficient according to formula (4), calculating an average absolute error of the Pearson correlation coefficient according to formula (5),
in the middle ofy i. Representing the predicted DNA methylation value and the actual DNA methylation characteristic value of the ith sample, respectively,/->Represents the predicted mean and the actual mean, respectively, +.>Respectively representing a prediction standard deviation and an actual standard deviation; />y ij Respectively are provided withPredicted DNA methylation values and actual DNA methylation values representing the jth feature of the ith sample.
5. The method for predicting DNA methylation integrating multiple sets of chemical features according to claim 1, wherein the DNA methylation prediction model optimizes parameters of the DNA methylation prediction model by ten fold cross validation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310718721.5A CN116758993A (en) | 2023-06-16 | 2023-06-16 | DNA methylation prediction method integrating multiple groups of chemical characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310718721.5A CN116758993A (en) | 2023-06-16 | 2023-06-16 | DNA methylation prediction method integrating multiple groups of chemical characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116758993A true CN116758993A (en) | 2023-09-15 |
Family
ID=87954777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310718721.5A Pending CN116758993A (en) | 2023-06-16 | 2023-06-16 | DNA methylation prediction method integrating multiple groups of chemical characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116758993A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117894452A (en) * | 2024-01-16 | 2024-04-16 | 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) | Unknown primary tumor primary range prediction method and system based on DenseFile model |
-
2023
- 2023-06-16 CN CN202310718721.5A patent/CN116758993A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117894452A (en) * | 2024-01-16 | 2024-04-16 | 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) | Unknown primary tumor primary range prediction method and system based on DenseFile model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114927162B (en) | Multi-mathematic association phenotype prediction method based on hypergraph characterization and dirichlet allocation | |
CN114255886B (en) | Multi-group similarity guide-based drug sensitivity prediction method and device | |
CN111370073B (en) | Medicine interaction rule prediction method based on deep learning | |
CN112183837A (en) | miRNA and disease association relation prediction method based on self-coding model | |
US20070294067A1 (en) | Prediction of estrogen receptor status of breast tumors using binary prediction tree modeling | |
CN109559781A (en) | A kind of two-way LSTM and CNN model that prediction DNA- protein combines | |
CN106055922A (en) | Hybrid network gene screening method based on gene expression data | |
CN116758993A (en) | DNA methylation prediction method integrating multiple groups of chemical characteristics | |
CN115881232A (en) | ScRNA-seq cell type annotation method based on graph neural network and feature fusion | |
CN115982141A (en) | Characteristic optimization method for time series data prediction | |
CN116204831A (en) | Road-to-ground analysis method based on neural network | |
CN117912570A (en) | Classification feature determining method and system based on gene co-expression network | |
CN116959585B (en) | Deep learning-based whole genome prediction method | |
CN114121158A (en) | Deep network self-adaption based scRNA-seq cell type identification method | |
CN113362900A (en) | Mixed model for predicting N4-acetylcytidine | |
CN106650304A (en) | Extension method of DNA methylation chip data | |
CN116842996A (en) | Space transcriptome method and device based on depth compressed sensing | |
CN116978464A (en) | Data processing method, device, equipment and medium | |
CN113223622B (en) | miRNA-disease association prediction method based on meta-path | |
CN115083511A (en) | Peripheral gene regulation and control feature extraction method based on graph representation learning and attention | |
CN112651168B (en) | Construction land area prediction method based on improved neural network algorithm | |
Yaman et al. | MachineTFBS: Motif-based method to predict transcription factor binding sites with first-best models from machine learning library | |
CN111755074B (en) | Method for predicting DNA replication origin in saccharomyces cerevisiae | |
CN114678083A (en) | Training method and prediction method of chemical genetic toxicity prediction model | |
CN114626594A (en) | Medium-and-long-term electric quantity prediction method based on cluster analysis and deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |