CN116758993A - DNA methylation prediction method integrating multiple groups of chemical characteristics - Google Patents

DNA methylation prediction method integrating multiple groups of chemical characteristics Download PDF

Info

Publication number
CN116758993A
CN116758993A CN202310718721.5A CN202310718721A CN116758993A CN 116758993 A CN116758993 A CN 116758993A CN 202310718721 A CN202310718721 A CN 202310718721A CN 116758993 A CN116758993 A CN 116758993A
Authority
CN
China
Prior art keywords
methylation
characteristic
dna methylation
prediction model
chemical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310718721.5A
Other languages
Chinese (zh)
Inventor
马宝山
申忆文
刘玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN202310718721.5A priority Critical patent/CN116758993A/en
Publication of CN116758993A publication Critical patent/CN116758993A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physiology (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a DNA methylation prediction method integrating multiple groups of chemical characteristics, which comprises the steps of determining CpG sites to be predicted, obtaining multiple groups of chemical characteristic sets of CpG sites in a cancer tissue, obtaining methylation characteristic sets of CpG sites in the cancer tissue, respectively calculating correlation coefficients between the methylation characteristics of the CpG sites of the cancer tissue and miRNA, mRNA and methylation characteristics of the cancer tissue based on Pearson correlation coefficients, respectively selecting K miRNAs, Q mRNAs and L methylation characteristics according to the values of the correlation coefficients, constructing multiple groups of chemical correlation characteristic sets, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the multiple groups of chemical correlation characteristic sets, calculating an evaluation index of the trained DNA methylation prediction model, obtaining the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting the methylation of the cancer tissue according to the evaluated DNA methylation prediction model. The prediction accuracy of DNA methylation is improved.

Description

DNA methylation prediction method integrating multiple groups of chemical characteristics
Technical Field
The invention relates to the field of DNA methylation prediction, in particular to a DNA methylation prediction method integrating multiple groups of chemical characteristics.
Background
DNA methylation of cancer tissues and paracancerous tissues is closely related to the occurrence and development of cancer, and analysis of DNA methylation variation helps reveal the molecular biological mechanisms of cancer. It has been difficult to predict DNA methylation using only a single set of chemicals, and with the rapid development of sequencing technology researchers have obtained a vast array of multiple biological sets of data, and there is a significant correlation between DNA methylation of cancerous and paracancestral tissues and multiple sets of data, so it is highly necessary to predict DNA methylation using integrated multiple sets of chemical features. In summary, DNA methylation data is important for cancer research, and existing research only uses a single set of biological information provided by a single set of biological characteristics to predict DNA methylation, and the prediction effect on DNA methylation data is poor.
Disclosure of Invention
The invention provides a DNA methylation prediction method integrating multiple groups of chemical characteristics so as to overcome the technical problems.
A DNA methylation prediction method integrating multiple sets of chemical characteristics comprises,
step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
calculating correlation coefficients between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient, selecting K miRNA characteristics from the miRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic of the tissue beside the cancer, selecting Q mRNA characteristics from the mRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the mRNA characteristic of the tissue beside the cancer, selecting L methylation characteristics from the methylation characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the methylation characteristic of the tissue beside the cancer,
thirdly, constructing a plurality of groups of related features according to the K miRNA features, the Q mRNA features and the L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the plurality of groups of related features, calculating an evaluation index of the trained DNA methylation prediction model, acquiring the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting methylation of cancer tissues according to the evaluated DNA methylation prediction model.
Preferably, calculating the correlation coefficient between the methylation characteristic of the CpG sites of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the paracancerous tissue based on the Pearson correlation coefficient comprises calculating the correlation coefficient according to the formula (1),
wherein x is i CpG site methylation characteristic value, mRNA characteristic value or miRNA characteristic value of the i-th sample paracancerous tissue in the plurality of groups of chemical characteristic sets,representing the characteristic average value of all samples of the CpG sites; y is i Methylation characteristic value representing the corresponding CpG site in the ith sample cancer tissue in the multiple sets of chemical characteristics,/I>The methylation characteristic average value of all samples of the CpG sites is shown, and n is the number of samples.
Preferably, the constructing a DNA methylation prediction model based on the deep neural network is that the DNA methylation prediction model includes v input neurons, k hidden layer neurons, h output layer neurons, and the input received by the q-th neuron of the hidden layer is:
the output of the hidden layer qth neuron is:
wherein the weight between the p-th neuron of the input layer and the q-th neuron of the hidden layer is w pq ,x i As input vector, the input received by the r-th neuron of the output layer is the characteristic b of a plurality of groups of the characteristics sets of the chemical relevance j N is the number of samples, and the weight between the h neuron of the hidden layer and the r neuron of the output layer is e hr
Preferably, the calculating the evaluation index of the trained DNA methylation prediction model comprises calculating the absolute value of the Pearson correlation coefficient according to the formula (4), calculating the average absolute error of the Pearson correlation coefficient according to the formula (5),
in the middle ofy i. Representing the predicted DNA methylation value and the actual DNA methylation characteristic value of the ith sample respectively,represents the predicted mean and the actual mean, respectively, +.>Respectively representing standard deviations; />y ij The predicted DNA methylation value and the actual DNA methylation value, respectively, represent the jth feature of the ith sample.
Preferably, the DNA methylation predictive model can also be optimized for parameters of the DNA methylation predictive model by ten fold cross validation.
The invention provides a DNA methylation prediction method integrating multiple groups of chemical features, which is characterized in that the multiple groups of chemical features related to target CpG sites are extracted based on a feature selection method, then a model integrating the multiple groups of chemical features to predict the DNA methylation level of cancer tissues is established, the influence of key parameters such as a neural network structure, feature quantity and the like on the performance of the DNA methylation prediction model is analyzed by comparing performance indexes such as average absolute errors and the like, model parameters are optimized, and the accuracy of DNA methylation prediction is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a diagram of an implementation of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flowchart of the method of the present invention, as shown in FIG. 1, the method of the present embodiment may include:
step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
calculating correlation coefficients between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient, selecting K miRNA characteristics from the miRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic of the tissue beside the cancer, selecting Q mRNA characteristics from the mRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the mRNA characteristic of the tissue beside the cancer, selecting L methylation characteristics from the methylation characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the methylation characteristic of the tissue beside the cancer,
thirdly, constructing a plurality of groups of related features according to the K miRNA features, the Q mRNA features and the L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the plurality of groups of related features, calculating an evaluation index of the trained DNA methylation prediction model, acquiring the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting methylation of cancer tissues according to the evaluated DNA methylation prediction model.
Based on the scheme, a plurality of groups of chemical features related to the target CpG sites are extracted through a feature selection method, a model for predicting the DNA methylation level of cancer tissues by integrating the plurality of groups of chemical features is established, the influence of key parameters such as a neural network structure, feature quantity and the like on the performance of the DNA methylation prediction model is analyzed by comparing performance indexes such as average absolute errors, model parameters are optimized, and the accuracy of DNA methylation prediction is improved.
Step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
specifically, a feature selection method is adopted to extract multiple groups of chemical features at first:
definition of miRNA data matrix as I i =(miRNA i1 ,miRNA i2 ,...,miRNA ia ),
mRNA data matrix N i =(mRNA i1 ,mRNA i2 ,...,mRNA ib ),
Methylation data matrix M i =(CpG i1 ,CPG i2 ,...,CpG ic ),
Where i represents samples (n samples total), j represents features (a total of miRNA data, b total of mRNA data, and c total of methylation data).
Calculating correlation coefficients between methylation characteristics of CpG sites of the cancer tissues and miRNA characteristics, mRNA characteristics and methylation characteristics of the tissues beside the cancer based on the Pearson correlation coefficients,
calculating the correlation coefficient between the methylation characteristic of the CpG sites of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient comprises calculating the correlation coefficient according to a formula (1),
wherein x is i CpG site methylation characteristic value, mRNA characteristic value or miRNA characteristic value of the i-th sample paracancerous tissue in the plurality of groups of chemical characteristic sets,representing the characteristic average value of all samples of the CpG sites; y is i Methylation characteristic value representing the corresponding CpG site in the ith sample cancer tissue in the multiple sets of chemical characteristics,/I>The methylation characteristic average value of all samples of the CpG sites is shown, and n is the number of samples.
K miRNA features are selected from the miRNA feature set according to the value of the correlation coefficient of the methylation feature of the CpG site of the cancer tissue and the miRNA feature of the tissue beside the cancer, Q mRNA features are selected from the mRNA feature set according to the value of the correlation coefficient of the methylation feature of the CpG site of the cancer tissue and the mRNA feature of the tissue beside the cancer, L methylation features are selected from the methylation feature set according to the value of the correlation coefficient of the methylation feature of the CpG site of the cancer tissue and the methylation feature of the tissue beside the cancer, specifically,
after calculating the correlation coefficients between the CpG sites, i.e. the target sites, of the cancer tissue and the three histology features, respectively, for each target site, the first K miRNA (top miRNA) with high correlation coefficient values are selected according to formula (2), the first Q mRNA (top mRNA) are selected according to formula (3), and the first L methylation (top methyl) features are selected according to formula (4):
wherein CpG is used target j Represents the j-th predicted target site, m represents the number of target sites and miRNA 1 …miRNA K Representative and CpG target j The top K most relevant miRNA features, mRNA 1 …mRNA Q Representative and CpG target j The top Q mRNA features most relevant, cpG 1 …CpG L Representative and CpG target j The top L methylation signatures most relevant, the total number of signatures required for each target site is: v=k+q+l.
Thirdly, constructing a plurality of sets of related features according to K miRNA features, Q mRNA features and L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the sets of related features, and constructing the DNA methylation prediction model: with multiple sets of biologically relevant features (miRNAs) 1 …miRNA K ,mRNA 1 …mRNA Q ,CpG 1 …CpG L ) As input data, the dimension of the input vector is v, and the dimension of the output vector is h.
The DNA methylation prediction model is constructed based on a deep neural network and comprises v input neurons, k hidden layer neurons and h output layer neurons, wherein the input received by the q-th neuron of the hidden layer is as follows:
the output of the hidden layer qth neuron is:
wherein the weight between the p-th neuron of the input layer and the q-th neuron of the hidden layer is w pq ,x i As input vector, the input received by the r-th neuron of the output layer is the characteristic b of a plurality of groups of the characteristics sets of the chemical relevance j N is the number of samples, the h neuron of the hidden layer and the h neuron of the output layerThe weight between r neurons is e hr
Calculating the evaluation index of the trained DNA methylation prediction model,
the evaluation index of the DNA methylation prediction model after calculation training comprises the absolute value of the Pearson correlation coefficient calculated according to a formula (7) and the average absolute error of the Pearson correlation coefficient calculated according to a formula (8),
in the middle ofy i. Representing the predicted DNA methylation value and the actual DNA methylation characteristic value of the ith sample respectively,represents the predicted mean and the actual mean, respectively, +.>Respectively representing standard deviations; />y ij The predicted DNA methylation value and the actual DNA methylation value, respectively, represent the jth feature of the ith sample.
And when the evaluation index meets the threshold value, acquiring an evaluated DNA methylation prediction model, and predicting methylation of the cancer tissue according to the evaluated DNA methylation prediction model. The DNA methylation prediction model can optimize parameters of the DNA methylation prediction model through ten-fold cross validation, specifically, a data set is divided into 10 mutually exclusive subsets with equal size, data consistency is maintained during division, a union set of 9 subsets is used as a training set each time, the rest is used as a test set, 10 times of training are carried out, and finally, an average value of 10 times of results is taken.
In the mathematical model studied in this embodiment, using multiple sets of chemical data in the surrogate tissue to predict DNA methylation data in the target tissue can be decomposed into the following technical steps, and the implementation flow chart is shown in fig. 2:
matching and data preprocessing the miRNA, mRNA, DNA methylation three histology data of the cancer tissue and the paracancer tissue;
performing feature extraction and fusion on multiple groups of chemical data by using a feature selection strategy based on correlation;
establishing a mathematical model of DNA methylation between the paracancerous tissue and the cancerous tissue;
predicting DNA methylation data of the target tissue using the multiple sets of integrated data of the surrogate tissue;
the influence of key parameters such as the number of layers, the number of neurons and characteristic dimensions of the deep learning model on the performance of the model is analyzed, the model parameters are optimized through ten-fold cross validation, namely, a data set is divided into 10 mutually exclusive subsets with equal size, the consistency of the data is maintained during division, the union set of 9 subsets is used as a training set each time, the rest is used as a test set, 10 times of training are carried out, and finally, the average value of 10 times of results is taken.
Absolute values (R), mean absolute error (MAE, mean absolute error) of pearson correlation coefficients were used to evaluate the predictive performance of the model.
The whole beneficial effects are that:
the invention provides a DNA methylation prediction method integrating multiple groups of chemical features, which is characterized in that the multiple groups of chemical features related to target CpG sites are extracted based on a feature selection method, then a model integrating the multiple groups of chemical features to predict the DNA methylation level of cancer tissues is established, the influence of key parameters such as a neural network structure, feature quantity and the like on the performance of the DNA methylation prediction model is analyzed by comparing performance indexes such as average absolute errors and the like, model parameters are optimized, and the accuracy of DNA methylation prediction is improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (5)

1. A DNA methylation prediction method integrating multiple groups of chemical characteristics is characterized by comprising the following steps of,
step one, determining CpG sites to be predicted, obtaining a plurality of groups of chemical feature sets of the CpG sites in the tissue beside the cancer, wherein the plurality of groups of chemical feature sets comprise miRNA feature sets, mRNA feature sets and methylation feature sets, obtaining methylation feature sets of the CpG sites in the cancer tissue,
calculating correlation coefficients between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic, the mRNA characteristic and the methylation characteristic of the tissue beside the cancer based on the Pearson correlation coefficient, selecting K miRNA characteristics from the miRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the miRNA characteristic of the tissue beside the cancer, selecting Q mRNA characteristics from the mRNA characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the mRNA characteristic of the tissue beside the cancer, selecting L methylation characteristics from the methylation characteristic set according to the value of the correlation coefficient between the methylation characteristic of the CpG site of the cancer tissue and the methylation characteristic of the tissue beside the cancer,
thirdly, constructing a plurality of groups of related features according to the K miRNA features, the Q mRNA features and the L methylation features, constructing a DNA methylation prediction model based on a deep neural network, training the DNA methylation prediction model according to the plurality of groups of related features, calculating an evaluation index of the trained DNA methylation prediction model, acquiring the evaluated DNA methylation prediction model when the evaluation index meets a threshold value, and predicting methylation of cancer tissues according to the evaluated DNA methylation prediction model.
2. The method for DNA methylation prediction integrated with multiple sets of chemical features according to claim 1, wherein calculating correlation coefficients between methylation features of CpG sites of cancer tissue and miRNA features, mRNA features and methylation features of paracancerous tissue, respectively, based on Pearson correlation coefficients comprises calculating correlation coefficients according to formula (1),
wherein x is i CpG site methylation characteristic value, mRNA characteristic value or miRNA characteristic value of the i-th sample paracancerous tissue in the plurality of groups of chemical characteristic sets,representing the characteristic average value of all samples of the CpG sites; y is i Methylation characteristic value representing the corresponding CpG site in the ith sample cancer tissue in the multiple sets of chemical characteristics,/I>The methylation characteristic average value of all samples of the CpG sites is shown, and n is the number of samples.
3. The DNA methylation prediction method integrating multiple sets of chemical characteristics according to claim 1, wherein the constructing a DNA methylation prediction model based on a deep neural network is that the DNA methylation prediction model includes v input neurons, k hidden layer neurons, h output layer neurons, and the input received by the q-th neuron of the hidden layer is:
the output of the hidden layer qth neuron is:
wherein the weight between the p-th neuron of the input layer and the q-th neuron of the hidden layer is w pq ,x i As input vector, the input received by the r-th neuron of the output layer is the characteristic b of a plurality of groups of the characteristics sets of the chemical relevance j N is the number of samples, and the weight between the h neuron of the hidden layer and the r neuron of the output layer is e hr
4. The method for DNA methylation prediction integrated with multiple sets of chemical features according to claim 1, wherein the calculating the evaluation index of the trained DNA methylation prediction model comprises calculating an absolute value of the Pearson correlation coefficient according to formula (4), calculating an average absolute error of the Pearson correlation coefficient according to formula (5),
in the middle ofy i. Representing the predicted DNA methylation value and the actual DNA methylation characteristic value of the ith sample, respectively,/->Represents the predicted mean and the actual mean, respectively, +.>Respectively representing a prediction standard deviation and an actual standard deviation; />y ij Respectively are provided withPredicted DNA methylation values and actual DNA methylation values representing the jth feature of the ith sample.
5. The method for predicting DNA methylation integrating multiple sets of chemical features according to claim 1, wherein the DNA methylation prediction model optimizes parameters of the DNA methylation prediction model by ten fold cross validation.
CN202310718721.5A 2023-06-16 2023-06-16 DNA methylation prediction method integrating multiple groups of chemical characteristics Pending CN116758993A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310718721.5A CN116758993A (en) 2023-06-16 2023-06-16 DNA methylation prediction method integrating multiple groups of chemical characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310718721.5A CN116758993A (en) 2023-06-16 2023-06-16 DNA methylation prediction method integrating multiple groups of chemical characteristics

Publications (1)

Publication Number Publication Date
CN116758993A true CN116758993A (en) 2023-09-15

Family

ID=87954777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310718721.5A Pending CN116758993A (en) 2023-06-16 2023-06-16 DNA methylation prediction method integrating multiple groups of chemical characteristics

Country Status (1)

Country Link
CN (1) CN116758993A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894452A (en) * 2024-01-16 2024-04-16 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) Unknown primary tumor primary range prediction method and system based on DenseFile model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894452A (en) * 2024-01-16 2024-04-16 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) Unknown primary tumor primary range prediction method and system based on DenseFile model

Similar Documents

Publication Publication Date Title
CN114927162B (en) Multi-mathematic association phenotype prediction method based on hypergraph characterization and dirichlet allocation
CN114255886B (en) Multi-group similarity guide-based drug sensitivity prediction method and device
CN111370073B (en) Medicine interaction rule prediction method based on deep learning
CN112183837A (en) miRNA and disease association relation prediction method based on self-coding model
US20070294067A1 (en) Prediction of estrogen receptor status of breast tumors using binary prediction tree modeling
CN109559781A (en) A kind of two-way LSTM and CNN model that prediction DNA- protein combines
CN106055922A (en) Hybrid network gene screening method based on gene expression data
CN116758993A (en) DNA methylation prediction method integrating multiple groups of chemical characteristics
CN115881232A (en) ScRNA-seq cell type annotation method based on graph neural network and feature fusion
CN115982141A (en) Characteristic optimization method for time series data prediction
CN116204831A (en) Road-to-ground analysis method based on neural network
CN117912570A (en) Classification feature determining method and system based on gene co-expression network
CN116959585B (en) Deep learning-based whole genome prediction method
CN114121158A (en) Deep network self-adaption based scRNA-seq cell type identification method
CN113362900A (en) Mixed model for predicting N4-acetylcytidine
CN106650304A (en) Extension method of DNA methylation chip data
CN116842996A (en) Space transcriptome method and device based on depth compressed sensing
CN116978464A (en) Data processing method, device, equipment and medium
CN113223622B (en) miRNA-disease association prediction method based on meta-path
CN115083511A (en) Peripheral gene regulation and control feature extraction method based on graph representation learning and attention
CN112651168B (en) Construction land area prediction method based on improved neural network algorithm
Yaman et al. MachineTFBS: Motif-based method to predict transcription factor binding sites with first-best models from machine learning library
CN111755074B (en) Method for predicting DNA replication origin in saccharomyces cerevisiae
CN114678083A (en) Training method and prediction method of chemical genetic toxicity prediction model
CN114626594A (en) Medium-and-long-term electric quantity prediction method based on cluster analysis and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination