CN114093422A - MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network - Google Patents

MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network Download PDF

Info

Publication number
CN114093422A
CN114093422A CN202111393459.9A CN202111393459A CN114093422A CN 114093422 A CN114093422 A CN 114093422A CN 202111393459 A CN202111393459 A CN 202111393459A CN 114093422 A CN114093422 A CN 114093422A
Authority
CN
China
Prior art keywords
mirna
network
gene
information
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111393459.9A
Other languages
Chinese (zh)
Inventor
骆嘉伟
欧阳文珏
申聪
蔡洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202111393459.9A priority Critical patent/CN114093422A/en
Publication of CN114093422A publication Critical patent/CN114093422A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for predicting interaction between miRNA and genes based on a multi-relation graph convolution network. The method constructs a heterogeneous information network of miRNA and genes, and learns the network topology characteristics of nodes by using a multi-relation graph convolution network based on the heterogeneous network; meanwhile, the effective characteristics of the gene sequence are captured by using a recurrent neural network. Finally, network topology features and sequence features are combined, and the obtained miRNA and gene embedding are used for calculating the correlation prediction score of the miRNA-gene pair. The implementation process of the invention does not need artificial feature construction, combines expression learning, fully utilizes the advantages of the multi-relation graph convolution network and excavates effective gene sequence information, and better captures the feature expression of miRNA and gene nodes. And the experimental result shows that the MRMTI is superior to other comparison methods in the correlation prediction of miRNA and gene, and has good prediction performance.

Description

MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network
Technical Field
The invention relates to application of deep learning in the field of bioinformatics, in particular to prediction of genes interacting with miRNA, and provides a prediction method and a prediction system of miRNA and gene interaction based on a multi-relation graph convolution network.
Background
MicroRNA (miRNA) is a small non-coding RNA molecule with the length of about 22nt, and plays an important role in various biological processes such as cell growth, differentiation and the like. MiRNA regulates post-transcriptional expression of genes by binding to the 3' UTRs of mRNA, and aberrant expression of MiRNA can lead to dysfunction of target genes, thereby causing a variety of complex diseases. Therefore, the recognition of the interaction between miRNA and gene is of great significance for revealing the regulation mechanism of miRNA and the role of miRNA in the occurrence and development of complex diseases.
Compared with the traditional biological experiment method which is time-consuming and expensive, the calculation method provides a new assistance for verifying the interaction of miRNA target genes. Early computational methods were based primarily on artificially extracted biological features. For example, miRanda screens target genes based on sequence complementarity, free energy calculations, and evolutionary conservation. With the accumulation of biological data in the big data era, the construction of related databases provides a reliable data source for machine learning methods, and a large number of machine learning methods are proposed, but the methods generally rely on manually extracted features.
It is widely used in bioinformatics tasks because of its superiority in representing learning. For example, the nimgcn method first learns the potential signature representation through a graph convolution network, and then inputs it into a matrix completion model to obtain the correlation score of miRNA and disease. The model IDDkin integrates a graph convolution network, a graph attention network and an adaptive weighting method, effectively learns potential representations on the graph, and thereby enhances the prediction capability of kinase inhibitors. In the technical field of miRNA and gene relation prediction, IMTRBM constructs a weighted miRNA-target interaction network, and then a restricted Boltzmann machine is used for automatically extracting characteristics and predicting; SG-LSTM generates sequence and geometric insertions of mirnas and genes, which are then used to predict candidate targets using the LSTM model. The methods achieve excellent performance, and show that network representation learning can better represent biological characteristics, and has great potential in relation prediction, and more scholars are encouraged to apply representation learning in the interaction prediction task of miRNA and genes. Meanwhile, it has been shown that it is beneficial to model structural and relational data simultaneously in heterogeneous networks.
Disclosure of Invention
The invention aims to provide a prediction method and a prediction system for miRNA and gene interaction based on a multi-relation graph convolution network, aiming at the technical problem of more accurately predicting genes interacting with miRNA. The method integrates a miRNA similarity network, a gene similarity network and a miRNA-gene association network to construct a heterogeneous information network of miRNA and genes, and then network topology characteristics of nodes are learned by convolution of a multi-relation graph based on the heterogeneous network; and then combining network topological characteristics and gene characteristic information, and calculating the correlation prediction score of the miRNA-gene pair by using the obtained miRNA and gene embedding, thereby providing a brand new technical means for accurately predicting the gene interacting with the miRNA.
In one aspect, the invention provides a method for predicting interaction between miRNA and genes based on a multi-relation graph convolution network, which comprises the following steps:
step S1: constructing a miRNA-gene heterogeneous information network, wherein the miRNA-gene heterogeneous information network comprises relation data between miRNA-miRNA, association data of known miRNA-gene pairs and gene-gene relation data;
step S2: constructing a prediction model of miRNA and gene interaction, wherein network topological characteristic representation of heterogeneous information network nodes is extracted based on a multi-relation graph convolutional network, and miRNA-gene pairs are fused with network topological characteristic representation corresponding to miRNA and genes and gene characteristic information as input values of a prediction function of the prediction model, so that output values of the prediction function are obtained and serve as correlation scores of the miRNA-gene pairs;
wherein the heterogeneous information network nodes extracted by the multi-relation graph convolutional network correspond to miRNA or genes;
step S3: training the prediction model constructed in the step S2 by using the miRNA-gene heterogeneous information network data in the step S1;
step S4: and calculating the association score of the miRNA-gene pair of unknown association data by using the trained prediction model.
Further optionally, the process of extracting the network topology characteristic representation of the heterogeneous information network node based on the multi-relation graph convolutional network is as follows:
determining initial characteristics of network nodes in the miRNA-gene heterogeneous information network;
respectively executing the following operations to each layer in the multi-relation graph convolutional network to update the feature representation of the network nodes, wherein the feature representation obtained at the last layer is the network topology feature representation of the network nodes;
for the l-th layer in the multi-relation graph convolution network, the obtaining process of the updated value of the feature representation corresponding to the network node is as follows:
calculating neighborhood information from adjacent nodes based on the feature representation corresponding to the l-th layer of the related nodes in the multi-relation graph convolution network;
calculating self-circulation information of the network nodes based on the characteristic representation corresponding to the first layer of the related nodes in the multi-relation graph convolution network;
superposing neighborhood information corresponding to the network node and self-circulation information to obtain transmission information corresponding to the network node under a given relationship type r;
and integrating the transfer information under all the relation types as the updated value of the characteristic representation corresponding to the network node at the l-th layer, and using the updated value as the characteristic representation corresponding to the network node at the l + 1-th layer.
Further optionally, neighborhood information from neighboring nodes and/or self-loop information of network nodes is as follows:
Figure BDA0003369141790000021
wherein the content of the first and second substances,
Figure BDA0003369141790000031
representing neighborhood information passed to network node i in the l-th layer of the graph convolution,
Figure BDA0003369141790000032
representing the set of neighbor nodes of network node i under relationship type r,
Figure BDA0003369141790000033
a feature representation representing layer i of network node j in a multi-relation graph convolution network,
Figure BDA0003369141790000034
in order to be a normalization constant, the method comprises the following steps of,
Figure BDA0003369141790000035
a weight matrix representing a given relationship type r;
Figure BDA0003369141790000036
wherein the content of the first and second substances,
Figure BDA0003369141790000037
indicating the self-loop information of network node i in the l-th layer of the graph convolution,
Figure BDA0003369141790000038
representing the characteristic representation of the network node i at the l-th layer in the multi-relation graph convolution network, normalizing the constant
Figure BDA0003369141790000039
Figure BDA00033691417900000310
As a set of neighbor nodes
Figure BDA00033691417900000311
And is expressed by the number of neighbor nodes.
The relation type r contained in the heterogeneous network is generally a similar relation between similar nodes, and an interaction relation between heterogeneous nodes, namely a similar relation between miRNA and miRNA, a similar relation between genes and an interaction relation between miRNA and genes;
further optionally, the formula for integrating the transfer information under all relationship types is as follows:
Figure BDA00033691417900000312
wherein the content of the first and second substances,
Figure BDA00033691417900000313
the method includes the steps that transfer information under all relation types is integrated aiming at a network node i, namely, the transfer information is used as a characteristic representation corresponding to the network node i on the l +1 th layer; σ (-) is the ReLU activation function,
Figure BDA00033691417900000314
for a set of relationship types in a heterogeneous network,
Figure BDA00033691417900000315
representing the delivery information corresponding to the network node i under the given relationship type r.
Further optionally, in step S2, a process of fusing the miRNA, the network topology feature representation corresponding to the gene, and the gene feature information as an input value of the prediction function of the prediction model to obtain an output value of the prediction function as the association score of the miRNA-gene includes:
splicing the network topological feature representation of the gene and the gene feature information, and compressing the spliced features by utilizing a learnable conversion matrix so that the spliced features and miRNA are embedded into a feature space with the same dimension;
and calculating the inner product of the compressed characteristics corresponding to the genes and the network topology characteristic expression of the miRNA, and obtaining the correlation score of the miRNA-genes as the input value of the prediction function.
Further optionally, the prediction function is represented as:
Figure BDA00033691417900000316
wherein, the sigma is a sigmoid function,
Figure BDA00033691417900000317
is miRNAmiIs to be used to characterize the network topology of,
Figure BDA00033691417900000318
is gene gjAnd the embedded representation obtained after feature splicing and compression satisfies the following conditions:
Figure BDA00033691417900000319
wherein the content of the first and second substances,
Figure BDA0003369141790000041
represents a gene g with the same insertion dimension as miRNAiEmbedded representation of (A), WpTo transform the matrix, concat (-) represents a splicing operation,
Figure BDA0003369141790000042
and
Figure BDA0003369141790000043
respectively represent genes giNetwork topology characterization and genetic characterization information.
Further optionally, the gene characteristic information in step S2 is a gene sequence characteristic.
In a second aspect, the present invention provides a system based on the above-mentioned prediction method of miRNA and gene interaction, which comprises:
the miRNA-gene heterogeneous information network construction module is used for constructing an miRNA-gene heterogeneous information network, and the miRNA-gene heterogeneous information network comprises miRNA-miRNA relation data, known miRNA-gene pair association data and gene-gene relation data;
the prediction model construction module is used for constructing a prediction model of the interaction of miRNA and genes, wherein network topological characteristic representation of heterogeneous information network nodes is extracted based on a multi-relation graph convolutional network, and miRNA-gene pairs are fused with network topological characteristic representation corresponding to miRNA and genes and gene characteristic information as input values of a prediction function of the prediction model, so that output values of the prediction function are obtained and serve as correlation scores of the miRNA-gene pairs;
a training module, configured to train the prediction model constructed in step S2 by using the miRNA-gene heterogeneous information network data in step S1;
and the prediction module is used for calculating the association score of the miRNA-gene pair of the unknown association data by using the trained prediction model.
In a third aspect, the present invention provides an electronic terminal, comprising:
one or more processors;
a memory storing one or more computer programs;
the processor invokes the computer program to implement:
a method for predicting the interaction between miRNA and genes based on a multi-relation graph convolution network.
In a fourth aspect, the present invention provides a readable storage medium storing a computer program for invocation by a processor to implement:
a method for predicting the interaction between miRNA and genes based on a multi-relation graph convolution network.
Advantageous effects
The invention provides a prediction method of miRNA and gene interaction based on a multi-relation graph convolution network, which constructs a heterogeneous information network of miRNA and gene, and then utilizes network topology characteristics of a multi-relation graph convolution learning node based on the heterogeneous network; and then combining network topological characteristics and gene characteristic information, and calculating the correlation prediction score of the miRNA-gene pair by using the obtained miRNA and gene embedding, thereby providing a brand new technical means for accurately predicting the gene interacting with the miRNA and enriching the prediction means of the interaction of the miRNA and the gene, wherein the network topological characteristics of the nodes are convoluted and learned by using a multi-relation graph, so that the structure and the relation characteristics of the nodes in the heterogeneous graph can be more fully obtained, and the prediction precision is further improved.
Drawings
Fig. 1 is a schematic diagram of an MRMTI model framework according to an embodiment of the present invention.
FIG. 2 is a diagram showing ROC curves of the method according to the example of the present invention and other methods.
Fig. 3 is a schematic flow chart of a method for predicting miRNA-gene interaction based on a multiple relation graph convolution network according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the following figures and examples.
Example 1:
as shown in fig. 1, the method for predicting miRNA-gene interaction based on a multiple relation graph convolution network provided in this embodiment includes the following steps:
step S1: and constructing a miRNA-gene heterogeneous information network. And integrating the miRNA similarity network, the gene similarity network and a known miRNA-gene binary network to obtain the miRNA-gene heterogeneous information network.
In this embodiment, miRNA sequences are obtained from a miRBase database, sequence similarity scores between all miRNA-miRNA pairs are calculated through an overall sequence comparison algorithm Needleman-Wunsch, 10 neighbor nodes most related to each miRNA are reserved for each miRNA, that is, 10 pieces of miRNA-miRNA relational data with the highest similarity score are reserved, and a miRNA similarity network is further constructed. It should be understood that the present invention is not limited to reserving 10 neighbor nodes, and in other possible embodiments, the adaptation may be performed according to the model effect and the prediction accuracy requirement.
In this embodiment, the gene function similarity data is downloaded from the human net database, and the gene similarity network is constructed by removing all the function similarity relationship data smaller than the average value and the nodes in the gene-gene network having a medium value (number of edges) smaller than 10, and then retaining the 10 relationship data having the highest similarity score for each gene. It should be understood that the present invention is not limited to retaining 10 pieces of relational data, and in other possible embodiments, the adaptation may be performed according to the model effect and the prediction accuracy requirement.
In this embodiment, the known experimentally verified human miRNA-target gene association data is downloaded from the miRTarBase database, and a known miRNA-gene bipartite network is constructed.
Wherein, the miRNA similarity network, the gene similarity network and the known miRNA-gene binary network are integrated to obtain the miRNA-gene heterogeneous information network shown as part A in figure 1.
In this embodiment, the finally constructed heterogeneous information network includes 18033 miRNA-miRNA associated data, 127772 gene-gene associated data, and 211111 miRNA-gene interaction data between 2546 and 7880 genes through data preprocessing.
Step S2: and constructing a prediction model of the interaction between the miRNA and the gene. Extracting network topological feature representation of heterogeneous information network nodes based on a multi-relation graph convolution network; and aiming at the miRNA-gene pair, fusing miRNA, network topology characteristic representation corresponding to the genes and gene characteristic information as input values of a prediction function of the prediction model, and obtaining output values of the prediction function as correlation scores of the miRNA-genes.
In this embodiment, the gene feature information preferentially selects the gene sequence features, and in other feasible embodiments, it is also feasible to select other gene features on the basis of meeting the prediction requirements and the accuracy requirements, which is not specifically limited by the present invention.
Regarding the extraction of network topology characteristic representation of heterogeneous information network nodes based on the multi-relation graph convolution network, the invention utilizes the multi-relation graph convolution coding module to carry out message transmission operation on the miRNA-gene heterogeneous information network to obtain network topology characteristic representation of miRNA and genes, as shown in a module B in figure 1. The specific process is as follows:
a: determining initial characteristics of network nodes in the miRNA-gene heterogeneous information network. In this embodiment, one-hot codes are used as initial characteristics of network nodes.
B: and respectively executing the following operations to each layer in the multi-relation graph convolutional network to update the feature representation of the network nodes, wherein the feature representation obtained at the last layer is the network topology feature representation of the network nodes.
For the l-th layer in the multi-relation graph convolution network, the obtaining process of the updated value of the feature representation corresponding to the network node is as follows:
b-1: based on the corresponding feature representation of the correlation node in the ith layer of the multi-relation graph convolution network, calculating neighborhood information from adjacent nodes, wherein the information propagation rule is defined as follows:
Figure BDA0003369141790000061
wherein the content of the first and second substances,
Figure BDA0003369141790000062
representing neighborhood information passed to network node i in the l-th layer of the graph convolution,
Figure BDA0003369141790000063
representing the set of neighbor nodes of network node i under relationship type r,
Figure BDA0003369141790000064
a feature representation representing layer i of network node j in a multi-relation graph convolution network,
Figure BDA0003369141790000065
in order to be a normalization constant, the method comprises the following steps of,
Figure BDA0003369141790000066
a weight matrix representing a given relationship type r.
B-2: calculating self-circulation information of the network nodes based on the corresponding feature representation of the related nodes in the ith layer of the multi-relation graph convolution network, and defining the self-circulation information as follows:
Figure BDA0003369141790000067
wherein the content of the first and second substances,
Figure BDA0003369141790000068
indicating the self-loop information of network node i in the l-th layer of the graph convolution,
Figure BDA0003369141790000069
representing the characteristic representation of the network node i at the l-th layer in the multi-relation graph convolution network, normalizing the constant
Figure BDA00033691417900000610
B-3: superposing neighborhood information corresponding to the network node and self-circulation information to obtain transmission information corresponding to the network node under a given relationship type r;
Figure BDA00033691417900000611
in the formula (I), the compound is shown in the specification,
Figure BDA0003369141790000071
representing the delivery information corresponding to the network node i under the given relationship type r.
B-4: and integrating the transfer information under all the relation types as the updated value of the characteristic representation corresponding to the network node at the l-th layer, and using the updated value as the characteristic representation corresponding to the network node at the l + 1-th layer.
Figure BDA0003369141790000072
Wherein the content of the first and second substances,
Figure BDA0003369141790000073
the method comprises the steps that transfer information under all relation types is integrated aiming at a network node i, namely the transfer information is used as a feature representation corresponding to the network node i at a layer l + 1; σ (-) is the ReLU activation function,
Figure BDA0003369141790000074
is a collection of relationship types in a heterogeneous network.
It should be understood that the above-described network node calculation process is applicable to both miRNA nodes and gene nodes.
Regarding the extraction of gene sequence features, a word2vec model is used to convert a gene sequence (ATGC base sequence) into real value embedding, and then a bidirectional long-short term memory network is used to mine effective gene sequence information to obtain a sequence feature representation of a gene, as shown in a module C in FIG. 1. The process of converting the gene sequence into the real-valued embedding is to cut the obtained gene sequence into k-mer fragments, regard the fragments as words and map the words into the real-valued embedding through a pre-trained word2vec model. The word2vec model and the bidirectional long and short term memory network are prior art, and therefore, they are not specifically stated.
Regarding the process of obtaining the output value of the prediction function as the correlation score of miRNA-gene by taking the fused miRNA, the network topology characteristic representation corresponding to the gene and the gene characteristic information as the input value of the prediction function of the prediction model, the specific implementation process is as follows:
a: and splicing the network topological characteristic representation of the gene and the gene characteristic information.
b: and compressing the spliced features by utilizing a learnable transformation matrix so that the features are embedded into a feature space with the same dimension as the miRNA.
The characteristic splicing and compressing process formula is as follows:
Figure BDA0003369141790000075
wherein the content of the first and second substances,
Figure BDA0003369141790000076
represents a gene g with the same insertion dimension as miRNAiEmbedded representation of WpTo transform the matrix, concat (-) represents a splicing operation,
Figure BDA0003369141790000077
and
Figure BDA0003369141790000078
respectively represent genes giNetwork topology characterization and genetic characterization information.
c: and calculating the inner product of the compressed characteristics corresponding to the genes and the network topology characteristic expression of the miRNA, and obtaining the correlation score of the miRNA-genes as the input value of the prediction function.
In this embodiment, a sigmoid function is selected as a prediction function, which is specifically as follows:
Figure BDA0003369141790000079
wherein, the sigma is a sigmoid function,
Figure BDA00033691417900000710
is miRNAmiIs shown.
From the above, the correlation score prediction value of the miRNA-gene can be obtained through feature fusion and a prediction function. Further model training and adjustment can be performed based on the correlation data of the miRNA-gene pairs known in step S1. Wherein, the correlation data of the known miRNA-genes is that the correlation score predictive value is the same type of data or the two can realize conversion. In this embodiment, the known association data indicates whether there is an association between the two, and if there is an association, the known association data is 1, and if there is no association, the known association data is 0; the association score is a value between 0 and 1, and is considered relevant above a defined threshold.
Step S3: and (5) training the prediction model constructed in the step S2 by using the miRNA-gene heterogeneous information network data in the step S1.
In the embodiment, a Hinge Loss function is used as a Loss function of the prediction model, the Loss minimization Loss is taken as a target, and the Adam optimizer is used for updating model parameters by combining a BP algorithm, so that the output of the model is closer to the correct input data as the training is carried out. The loss function is expressed as follows:
Figure BDA0003369141790000081
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003369141790000082
representing a set of relationship types, S, in a heterogeneous network+Represents a set of positive samples, S-A set of negative samples representing random sampling, the negative samples being consistent in number with the positive samples,
Figure BDA0003369141790000083
and
Figure BDA0003369141790000084
representing the predicted values of the positive and negative examples, respectively, amrgin is a selected hyper-parameter, set to 0.3 empirically.
It should be understood that in other possible embodiments, other loss functions or other technical means are possible to implement the model adjustment, and the present invention is not particularly limited and restricted to this.
It should be understood that the training process of the model is performed based on the miRNA-gene heterogeneous information network data of step S1 and the corresponding gene feature data (gene sequence data in this embodiment). For example, the association score is determined according to the association data of the known miRNA-gene pair, the predicted value of the association score of the corresponding miRNA-gene pair under the prediction model is determined according to the feature fusion method and the prediction function, and then model adjustment is carried out according to the actual value and the predicted value.
Step S4: and calculating the association score of the miRNA-gene pair of unknown association data by using the trained prediction model.
It will be appreciated that for miRNA-gene pairs of unknown association data, association scores can be calculated according to the a-c process described previously. The invention also preferably calculates the relevance scores of all unknown miRNA-gene pairs, and obtains a potential miRNA-gene association list according to the sequence of the relevance scores from high to low.
In summary, the invention provides a miRNA and gene interaction prediction method based on multi-relation graph embedded fusion gene sequence information, in order to avoid the limitation of the traditional feature extraction method, the method automatically extracts high-quality network topological features on a multi-source heterogeneous information network by means of multi-relation graph convolution, simultaneously fully excavates deeper sequence features of genes by using a bidirectional long-short term memory network, trains a model in an end-to-end manner, is beneficial to improving the correlation prediction precision of miRNA and genes, and provides valuable reference for further understanding the regulation and control function of miRNA. In addition, the method extracts the network topological characteristic representation of the heterogeneous information network node based on the multi-relation graph convolutional network, fully considers the difference of different relation types, and finally integrates the transmission information under all relation types.
Example 2: the present embodiment provides a system based on the prediction method of miRNA-gene interaction, including:
the miRNA-gene heterogeneous information network construction module is used for constructing an miRNA-gene heterogeneous information network, and the miRNA-gene heterogeneous information network comprises miRNA-miRNA relation data, known miRNA-gene pair association data and gene-gene relation data;
the prediction model construction module is used for constructing a prediction model of the interaction of miRNA and genes, wherein network topological characteristic representation of heterogeneous information network nodes is extracted based on a multi-relation graph convolutional network, and miRNA-gene pairs are fused with network topological characteristic representation corresponding to miRNA and genes and gene characteristic information as input values of a prediction function of the prediction model, so that output values of the prediction function are obtained and serve as correlation scores of the miRNA-gene pairs;
a training module, configured to train the prediction model constructed in step S2 by using the miRNA-gene heterogeneous information network data in step S1;
and the prediction module is used for calculating the association score of the miRNA-gene pair of the unknown association data by using the trained prediction model.
In some possible approaches, the prediction model includes a multiple-relation graph convolutional encoding module, a gene sequence feature extraction module, and an information fusion and association score prediction module. The multi-relation graph convolution coding module is used for extracting network topological characteristic representation of heterogeneous information network nodes based on a multi-relation graph convolution network. The gene sequence characteristic extraction module is used for extracting gene sequence characteristics. The information fusion is used for fusing miRNA, network topology characteristic representation corresponding to the gene and gene characteristic information. And the association score prediction module is used for inputting the fused features into a prediction function, and the obtained output value is used as the association score of the miRNA-gene pair.
It should also be understood that, the specific implementation process of the above unit module refers to the method content, and the present invention is not described herein in detail, and the division of the above functional module unit is only a division of a logic function, and there may be another division manner in the actual implementation, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. Meanwhile, the integrated unit can be realized in a hardware form, and can also be realized in a software functional unit form.
Example 3:
the present embodiment provides an electronic terminal, which includes:
one or more processors;
a memory storing one or more computer programs;
the processor invokes the computer program to implement:
a method for predicting the interaction between miRNA and genes based on a multi-relation graph convolution network. The method specifically comprises the following steps:
step S1: and constructing a miRNA-gene heterogeneous information network.
Step S2: and constructing a prediction model of the interaction between the miRNA and the gene.
Step S3: and (5) training the prediction model constructed in the step S2 by using the miRNA-gene heterogeneous information network data in the step S1.
Step S4: and calculating the association score of the miRNA-gene pair of unknown association data by using the trained prediction model.
The specific implementation process of each step refers to the explanation of the foregoing method.
It should be understood that in the embodiments of the present invention, the Processor may be a Central Processing Unit (CPU), and the Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information
Example 4:
the present invention provides a readable storage medium storing a computer program for invocation by a processor to implement:
a method for predicting the interaction between miRNA and genes based on a multi-relation graph convolution network. The method specifically comprises the following steps:
step S1: and constructing a miRNA-gene heterogeneous information network.
Step S2: and constructing a prediction model of the interaction between the miRNA and the gene.
Step S3: and (5) training the prediction model constructed in the step S2 by using the miRNA-gene heterogeneous information network data in the step S1.
Step S4: and calculating the association score of the miRNA-gene pair of unknown association data by using the trained prediction model.
The specific implementation process of each step refers to the explanation of the foregoing method.
The readable storage medium is a computer readable storage medium, which may be an internal storage unit of the controller according to any of the foregoing embodiments, for example, a hard disk or a memory of the controller. The readable storage medium may also be an external storage device of the controller, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the controller. Further, the readable storage medium may also include both an internal storage unit and an external storage device of the controller. The readable storage medium is used for storing the computer program and other programs and data required by the controller. The readable storage medium may also be used to temporarily store data that has been output or is to be output.
Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
And (4) evaluating the result:
comparing the result predicted by the method with a real label, calculating a True Positive Rate (TPR) and a False Positive Rate (FPR), drawing an ROC curve and calculating the area under the ROC curve to obtain an AUC value, wherein the larger the AUC value is, the better the prediction performance of the model is. The MRMTI model described herein is compared to the KATZ, SG-LSTM, and LINE methods. As shown in figure 2, the AUC value of the MRMTI model is 0.9183, which is significantly higher than that of KATZ (0.8886), SG-LSTM (0.8581) and LINE (0.8290) methods, and the method proves that the prediction performance of the MRMTI model is superior to that of other comparison methods, the relation prediction precision of miRNA and genes can be effectively improved, a potential miRNA-gene association list is provided, and the method has certain practicability.
It should be emphasized that the examples described herein are illustrative and not restrictive, and thus the invention is not to be limited to the examples described herein, but rather to other embodiments that may be devised by those skilled in the art based on the teachings herein, and that various modifications, alterations, and substitutions are possible without departing from the spirit and scope of the present invention.

Claims (10)

1. A prediction method of miRNA and gene interaction based on a multi-relation graph convolution network is characterized by comprising the following steps: the method comprises the following steps:
step S1: constructing a miRNA-gene heterogeneous information network, wherein the miRNA-gene heterogeneous information network comprises relation data between miRNA-miRNA, association data of known miRNA-gene pairs and gene-gene relation data;
step S2: constructing a prediction model of miRNA and gene interaction, wherein network topological characteristic representation of heterogeneous information network nodes is extracted based on a multi-relation graph convolutional network, and miRNA-gene pairs are fused with network topological characteristic representation corresponding to miRNA and genes and gene characteristic information as input values of a prediction function of the prediction model, so that output values of the prediction function are obtained and serve as correlation scores of the miRNA-gene pairs;
wherein the heterogeneous information network nodes extracted by the multi-relation graph convolutional network correspond to miRNA or genes;
step S3: training the prediction model constructed in the step S2 by using the miRNA-gene heterogeneous information network data in the step S1;
step S4: and calculating the association score of the miRNA-gene pair of unknown association data by using the trained prediction model.
2. The method of claim 1, wherein: the process of extracting the network topological characteristic representation of the heterogeneous information network node based on the multi-relation graph convolution network is as follows:
determining initial characteristics of network nodes in the miRNA-gene heterogeneous information network;
respectively executing the following operations to each layer in the multi-relation graph convolutional network to update the feature representation of the network nodes, wherein the feature representation obtained at the last layer is the network topology feature representation of the network nodes;
for the ith layer in the multi-relation graph convolution network, the acquisition process of the updated value of the feature representation corresponding to the network node is as follows:
calculating neighborhood information from adjacent nodes based on the corresponding feature representation of the relevant nodes on the l-th layer in the multi-relation graph convolution network;
calculating self-circulation information of the network nodes based on the characteristic representation corresponding to the first layer of the related nodes in the multi-relation graph convolution network;
superposing neighborhood information corresponding to the network node and self-circulation information to obtain transmission information corresponding to the network node under a given relationship type r;
and integrating the transfer information under all the relation types as the updated value of the characteristic representation corresponding to the network node at the l-th layer, and using the updated value as the characteristic representation corresponding to the network node at the l + 1-th layer.
3. The method of claim 2, wherein: neighborhood information from neighboring nodes and/or self-loop information of network nodes is as follows:
Figure FDA0003369141780000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003369141780000012
representing neighborhood information passed to network node i in the l-th layer of the graph convolution,
Figure FDA0003369141780000013
representing the set of neighbor nodes of network node i under relationship type r,
Figure FDA0003369141780000021
representing a networkThe characteristic representation of the node j at the l-th layer in the multi-relation graph convolution network,
Figure FDA0003369141780000022
in order to be a normalization constant, the method comprises the following steps of,
Figure FDA0003369141780000023
a weight matrix representing a given relationship type r;
Figure FDA0003369141780000024
wherein the content of the first and second substances,
Figure FDA0003369141780000025
self-loop information of network node i in the l-th layer of the graph convolution is represented,
Figure FDA0003369141780000026
representing the characteristic representation of the network node i at the l-th layer in the multi-relation graph convolution network, normalizing the constant
Figure FDA0003369141780000027
4. The method of claim 2, wherein: the formula for integrating the delivery information under all relationship types is as follows:
Figure FDA0003369141780000028
wherein the content of the first and second substances,
Figure FDA0003369141780000029
the method includes the steps that transfer information under all relation types is integrated aiming at a network node i, namely, the transfer information is used as a characteristic representation corresponding to the network node i on the l +1 th layer; σ (-) is the ReLU activation function,
Figure FDA00033691417800000210
for a set of relationship types in a heterogeneous network,
Figure FDA00033691417800000211
representing the delivery information corresponding to the network node i under the given relationship type r.
5. The method of claim 1, wherein: a process of fusing the miRNA, the network topology feature representation corresponding to the gene, and the gene feature information as an input value of the prediction function of the prediction model in step S2 to obtain an output value of the prediction function as an association score of the miRNA-gene, specifically:
splicing the network topological feature representation of the gene and the gene feature information, and compressing the spliced features by utilizing a learnable conversion matrix so that the spliced features and miRNA are embedded into a feature space with the same dimension;
and calculating the inner product of the compressed characteristics corresponding to the genes and the network topology characteristic expression of the miRNA, and obtaining the correlation score of the miRNA-genes as the input value of the prediction function.
6. The method of claim 5, wherein: the prediction function is represented as:
Figure FDA00033691417800000212
wherein, the sigma is a sigmoid function,
Figure FDA00033691417800000213
is miRNAmiIs to be used to characterize the network topology of,
Figure FDA00033691417800000214
is gene gjAnd the embedded representation obtained after feature splicing and compression satisfies the following conditions:
Figure FDA00033691417800000215
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00033691417800000216
represents a gene g with the same insertion dimension as miRNAiEmbedded representation of WpTo transform the matrix, concat (-) represents a splicing operation,
Figure FDA00033691417800000217
and
Figure FDA00033691417800000218
respectively represent genes giNetwork topology characterization and genetic characterization information.
7. The method of claim 1, wherein: the gene characteristic information in step S2 is a gene sequence characteristic.
8. A system based on the method of any one of claims 1-7, characterized by: the method comprises the following steps:
the miRNA-gene heterogeneous information network construction module is used for constructing an miRNA-gene heterogeneous information network, and the miRNA-gene heterogeneous information network comprises miRNA-miRNA relation data, known miRNA-gene pair association data and gene-gene relation data;
the prediction model construction module is used for constructing a prediction model of the interaction of miRNA and genes, wherein network topological characteristic representation of heterogeneous information network nodes is extracted based on a multi-relation graph convolutional network, and miRNA-gene pairs are fused with network topological characteristic representation corresponding to miRNA and genes and gene characteristic information as input values of a prediction function of the prediction model, so that output values of the prediction function are obtained and serve as correlation scores of the miRNA-gene pairs;
a training module, configured to train the prediction model constructed in step S2 by using the miRNA-gene heterogeneous information network data in step S1;
and the prediction module is used for calculating the association score of the miRNA-gene pair of the unknown association data by using the trained prediction model.
9. An electronic terminal, characterized by: the method comprises the following steps:
one or more processors;
a memory storing one or more computer programs;
the processor invokes the computer program to implement:
the process steps of any one of claims 1 to 7.
10. A readable storage medium, characterized by: a computer program is stored, which is invoked by a processor to implement:
the process steps of any one of claims 1 to 7.
CN202111393459.9A 2021-11-23 2021-11-23 MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network Pending CN114093422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111393459.9A CN114093422A (en) 2021-11-23 2021-11-23 MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111393459.9A CN114093422A (en) 2021-11-23 2021-11-23 MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network

Publications (1)

Publication Number Publication Date
CN114093422A true CN114093422A (en) 2022-02-25

Family

ID=80303163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111393459.9A Pending CN114093422A (en) 2021-11-23 2021-11-23 MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network

Country Status (1)

Country Link
CN (1) CN114093422A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223657A (en) * 2022-09-20 2022-10-21 吉林农业大学 Medicinal plant transcription regulation and control map prediction method
CN116959561A (en) * 2023-09-21 2023-10-27 北京科技大学 Gene interaction prediction method and device based on neural network model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223657A (en) * 2022-09-20 2022-10-21 吉林农业大学 Medicinal plant transcription regulation and control map prediction method
CN116959561A (en) * 2023-09-21 2023-10-27 北京科技大学 Gene interaction prediction method and device based on neural network model
CN116959561B (en) * 2023-09-21 2023-12-19 北京科技大学 Gene interaction prediction method and device based on neural network model

Similar Documents

Publication Publication Date Title
CN109243538B (en) Method and system for predicting association relation between disease and LncRNA
CN111312329B (en) Transcription factor binding site prediction method based on deep convolution automatic encoder
Park et al. Deep recurrent neural network-based identification of precursor micrornas
CN114093422A (en) MiRNA (micro ribonucleic acid) and gene interaction prediction method and system based on multi-relation graph convolution network
CN112270958B (en) Prediction method based on layered deep learning miRNA-lncRNA interaction relationship
CN112489723B (en) DNA binding protein prediction method based on local evolution information
CN107679367A (en) A kind of common regulated and control network functional module recognition methods and system based on the network node degree of association
CN112131399A (en) Old medicine new use analysis method and system based on knowledge graph
Chakraborty et al. Predicting MicroRNA sequence using CNN and LSTM stacked in Seq2Seq architecture
CN115995293A (en) Circular RNA and disease association prediction method
CN110808095B (en) Diagnostic result recognition method, model training method, computer equipment and storage medium
CN109147868A (en) Protein function prediction technique, device, equipment and storage medium
CN113539372A (en) Efficient prediction method for LncRNA and disease association relation
CN109147936B (en) Prediction method for association between non-coding RNA and diseases based on deep learning
CN113241123B (en) Method and system for fusing multiple characteristic recognition enhancers and intensity thereof
CN116978464A (en) Data processing method, device, equipment and medium
CN115938490A (en) Metabolite identification method, system and equipment based on graph representation learning algorithm
CN115810398A (en) TF-DNA binding identification method based on multi-feature fusion
CN113539479B (en) Similarity constraint-based miRNA-disease association prediction method and system
CN115691817A (en) LncRNA-disease association prediction method based on fusion neural network
CN116246698A (en) Information extraction method, device, equipment and storage medium based on neural network
CN115295156A (en) Method for predicting miRNA-disease based on relation graph convolution network fusion multi-source information
Yan et al. DNA-binding protein prediction based on deep transfer learning
CN112885405A (en) Prediction method and system of disease-associated miRNA
JP6993250B2 (en) Content feature extractor, method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination