CN113838527B

CN113838527B - Method and device for generating target gene prediction model and storage medium

Info

Publication number: CN113838527B
Application number: CN202111129697.9A
Authority: CN
Inventors: 刘小双
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2023-09-01
Anticipated expiration: 2041-09-26
Also published as: CN113838527A

Abstract

The application relates to the field of digital medical treatment, and provides a method for generating a target gene prediction model, which comprises the following steps: acquiring a gene regulation network, an miRNA regulation network and regulation relation data between genes and miRNAs in a preset database, and connecting the genes in the gene regulation network and the miRNAs in the miRNA regulation network according to the regulation relation data to obtain a target knowledge graph; learning the target knowledge graph through a preset learning model to obtain gene node characteristics and miRNA node characteristics corresponding to the target knowledge graph; training the preset learning model according to the gene node characteristics, the miRNA node characteristics and the regulation and control relation data to obtain a trained preset learning model, and connecting the trained preset learning model with a preset activation function to obtain a target gene prediction model. Compared with the traditional prediction method, the method is more effective and has higher accuracy.

Description

Method and device for generating target gene prediction model and storage medium

Technical Field

The application relates to the technical field of digital medical treatment, in particular to a method and a device for generating a target gene prediction model, a storage medium and computer equipment.

Background

Many studies have shown that mirnas are involved in the development of various complex diseases. miRNAs control disease progression mainly by controlling gene expression. Therefore, research on the regulation and control effect of miRNA on genes is of great importance for deep understanding of the occurrence and development processes of diseases.

In the prior art, when whether the regulation and control relationship exists between miRNA and gene is researched, the conclusion is often obtained by comparing the sequence characteristics of the miRNA sequence with the sequence characteristics of the gene, but the method has certain limitation and lower accuracy. Therefore, how to improve the accuracy of the regulation relation prediction between miRNA and gene and determine the target gene of miRNA better becomes a technical problem to be solved urgently in the field.

Disclosure of Invention

In view of the above, the application provides a method and a device for generating a target gene prediction model, a storage medium and computer equipment, wherein the method and the device are used for constructing a target knowledge graph, and node information in the target knowledge graph is learned by using a preset learning model, so that the similarity between miRNA and gene and the regulation and control relationship between genes can be fully utilized when the regulation and control relationship between miRNA and genes is predicted, and compared with the traditional prediction method, the method is more effective and has higher accuracy.

According to one aspect of the present application, there is provided a method for generating a target gene prediction model, comprising:

acquiring a gene regulation network, an miRNA regulation network and regulation relation data between genes and miRNAs in a preset database, and connecting the genes in the gene regulation network and the miRNAs in the miRNA regulation network according to the regulation relation data to obtain a target knowledge graph;

learning the target knowledge graph through a preset learning model to obtain gene node characteristics and miRNA node characteristics corresponding to the target knowledge graph;

training the preset learning model according to the gene node characteristics, the miRNA node characteristics and the regulation and control relation data to obtain a trained preset learning model, and connecting the trained preset learning model with a preset activation function to obtain a target gene prediction model.

Optionally, the regulation and control relationship data comprises training sample regulation and control relationship data and test sample regulation and control relationship data; training the preset learning model according to the gene node characteristics, the miRNA node characteristics and the regulation and control relation data to obtain a trained preset learning model, and connecting the trained preset learning model with a preset activation function to obtain a target gene prediction model, wherein the method specifically comprises the following steps of:

Determining the gene node characteristics and the miRNA node characteristics corresponding to the training sample regulation and control relation data according to the genes and miRNAs corresponding to any training sample regulation and control relation data, calculating the inner product of the gene node characteristics and the miRNA node characteristics, and inputting the inner product into a preset activation function to obtain first relation prediction data corresponding to each training sample regulation and control relation data;

calculating a model loss value through a preset model loss calculation function based on the first relation prediction data and the corresponding training sample regulation and control relation data;

adjusting model parameters of the preset learning model according to the model loss value, obtaining second relation prediction data corresponding to the regulation and control relation data of each training sample through the adjusted preset learning model and the preset activation function, and calculating the model loss value again;

and when the model loss value is smaller than a preset loss threshold value, connecting the trained preset learning model with the preset activation function to obtain a target gene prediction model.

Optionally, the connecting the gene in the gene regulation network and the miRNA in the miRNA regulation network according to the regulation relationship data to obtain a target knowledge graph specifically includes:

According to regulation and control relation data between any gene and miRNA, determining a target gene corresponding to the regulation and control relation data from the gene regulation and control network, determining a target miRNA corresponding to the regulation and control relation data from the miRNA regulation and control network, and constructing a first characteristic edge between the target gene and the target miRNA so as to construct a target knowledge graph.

Optionally, before the obtaining the regulation relationship data between the genes and the mirnas in the gene regulation network, the miRNA regulation network and the preset database, the method further includes:

acquiring association relation data among proteins corresponding to genes in a preset gene database, constructing second characteristic edges among the genes when the association relation data is larger than a preset association threshold, and generating the gene regulation network by taking the association relation data as weights corresponding to the second characteristic edges;

and acquiring miRNA sequences corresponding to all miRNAs in a preset miRNA database, calculating a matching score between any two miRNA sequences according to a preset matching condition, constructing a third characteristic edge between the miRNAs when the matching score is larger than a preset matching threshold, and generating the miRNA regulation network by taking the matching score as the weight corresponding to the third characteristic edge.

Optionally, before determining the gene node feature and the miRNA node feature corresponding to the training sample regulation relationship data according to the gene and the miRNA corresponding to any training sample regulation relationship data, the method further includes:

dividing the regulation and control relation data into the training sample regulation and control relation data and the test sample regulation and control relation data according to a preset distribution proportion.

Optionally, after the target gene prediction model is obtained, the method further comprises:

obtaining third relation prediction data through the target gene prediction model based on the genes and miRNAs corresponding to any one of the test sample regulation relation data;

calculating a predicted deviation value of the target gene prediction model through a preset model loss calculation function based on the third relation prediction data and the test sample regulation relation data;

and if the predicted deviation value meets the preset deviation value condition, determining that the target gene prediction model test passes.

Optionally, when the target knowledge-graph does not include the gene to be predicted and/or the miRNA to be predicted, after the determining that the target gene prediction model test passes, the method further includes:

Calculating association relation data between the genes to be predicted and any one gene in the target knowledge graph and/or calculating a matching score between the miRNA to be predicted and any one miRNA in the target knowledge graph, and constructing a second characteristic edge between the genes to be predicted and the genes greater than a preset association threshold when the association relation data is greater than a preset association threshold and/or constructing a third characteristic edge between the miRNA to be predicted and the miRNA greater than the preset matching threshold when the matching score is greater than the preset matching threshold so as to update the target knowledge graph;

learning the updated target knowledge graph through the preset learning model to obtain gene node characteristics and miRNA node characteristics corresponding to the updated target knowledge graph;

training the preset learning model according to the gene node characteristics, the miRNA node characteristics and the regulation and control relation data to obtain a trained preset learning model, and connecting the trained preset learning model with a preset activation function to obtain an updated target gene prediction model.

According to another aspect of the present application, there is provided an apparatus for generating a target gene prediction model, comprising:

The network generation module is used for acquiring regulation and control relation data between genes and miRNAs in a gene regulation and control network, an miRNA regulation and control network and a preset database, and connecting the genes in the gene regulation and control network and the miRNAs in the miRNA regulation and control network according to the regulation and control relation data to obtain a target knowledge graph;

the feature acquisition module is used for learning the target knowledge graph through a preset learning model to obtain gene node features and miRNA node features corresponding to the target knowledge graph;

the model determining module is used for training the preset learning model according to the gene node characteristics, the miRNA node characteristics and the regulation and control relation data to obtain a trained preset learning model, and connecting the trained preset learning model with a preset activation function to obtain a target gene prediction model.

Optionally, the regulation and control relationship data comprises training sample regulation and control relationship data and test sample regulation and control relationship data; the model determining module specifically comprises:

the prediction data calculation unit is used for determining the gene node characteristics and the miRNA node characteristics corresponding to the training sample regulation and control relation data according to the genes and miRNAs corresponding to any training sample regulation and control relation data, calculating the inner products of the gene node characteristics and the miRNA node characteristics, and inputting the inner products into a preset activation function to obtain first relation prediction data corresponding to each training sample regulation and control relation data;

The loss value calculation unit is used for calculating a model loss value through a preset model loss calculation function based on the first relation prediction data and the corresponding training sample regulation relation data;

the parameter adjusting unit is used for adjusting model parameters of the preset learning model according to the model loss value, obtaining second relation prediction data corresponding to the regulation and control relation data of each training sample through the adjusted preset learning model and the preset activation function, and calculating the model loss value again;

and the model determining unit is used for connecting the trained preset learning model with the preset activation function when the model loss value is smaller than a preset loss threshold value to obtain a target gene prediction model.

Optionally, the network generation module is specifically configured to:

Optionally, the network generating module is further configured to, before the obtaining of the regulation and control relationship data between the genes and the mirnas in the gene regulation and control network and the miRNA in the preset database, obtain association relationship data between proteins corresponding to each gene in the preset gene database, and when the association relationship data is greater than a preset association threshold, construct a second feature edge between the genes, and use the association relationship data as a weight corresponding to the second feature edge to generate the gene regulation and control network; and acquiring miRNA sequences corresponding to all miRNAs in a preset miRNA database, calculating a matching score between any two miRNA sequences according to a preset matching condition, constructing a third characteristic edge between the miRNAs when the matching score is larger than a preset matching threshold, and generating the miRNA regulation network by taking the matching score as the weight corresponding to the third characteristic edge.

Optionally, the apparatus further comprises:

the data distribution module is used for dividing the regulation and control relation data into the training sample regulation and control relation data and the test sample regulation and control relation data according to a preset distribution proportion before determining the gene node characteristics and the miRNA node characteristics corresponding to the training sample regulation and control relation data according to the genes and miRNAs corresponding to any training sample regulation and control relation data.

Optionally, the apparatus further comprises:

the model test module is used for obtaining third relation prediction data through the target gene prediction model based on the gene and miRNA corresponding to any one of the test sample regulation relation data after the target gene prediction model is obtained; calculating a predicted deviation value of the target gene prediction model through a preset model loss calculation function based on the third relation prediction data and the test sample regulation relation data; and if the predicted deviation value meets the preset deviation value condition, determining that the target gene prediction model test passes.

Optionally, the apparatus further comprises:

the map updating module is used for calculating association relation data between the gene to be predicted and any gene in the target knowledge map and/or calculating a matching score between the miRNA to be predicted and any miRNA in the target knowledge map after the target knowledge map does not comprise the gene to be predicted and/or miRNA to be predicted passes the target gene prediction model test, and constructing a second characteristic edge between the gene to be predicted and the gene which is larger than a preset association threshold when the association relation data is larger than a preset association threshold and/or constructing a third characteristic edge between the miRNA to be predicted and the miRNA which is larger than the preset matching threshold when the matching score is larger than the preset matching threshold so as to update the target knowledge map;

The feature acquisition module is further used for learning the updated target knowledge graph through the preset learning model to obtain gene node features and miRNA node features corresponding to the updated target knowledge graph;

the model determining module is further configured to train the preset learning model according to the gene node characteristics, the miRNA node characteristics, and the regulation and control relationship data, obtain a trained preset learning model, and connect the trained preset learning model to a preset activation function to obtain an updated target gene prediction model.

According to still another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described target gene prediction model generation method.

According to still another aspect of the present application, there is provided a computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, the processor implementing the method of generating a target gene prediction model as described above when executing the program.

By means of the technical scheme, the target gene prediction model generation method, the target gene prediction model generation device, the storage medium and the computer equipment are used for acquiring the pre-constructed gene regulation network, the pre-constructed miRNA regulation network and regulation relation data between genes and miRNAs stored in the preset database, and then the gene regulation network and the miRNA regulation network are connected together by utilizing the regulation relation data, so that a large knowledge graph, namely a target knowledge graph, is formed. After the target knowledge graph is generated, the target knowledge graph is input into a preset learning model, so that gene node characteristics and miRNA node characteristics corresponding to the target knowledge graph are obtained, further, model loss values corresponding to the preset learning model are calculated by utilizing the gene node characteristics, the miRNA node characteristics and regulation and control relation data, and parameters of the preset learning model are adjusted according to the model loss values, so that training of the preset learning model is achieved. When the calculated model loss value obtains a preset requirement, the preset learning model is indicated to be trained, and then the trained preset learning model is connected with a preset activation function, so that a target gene prediction model is obtained. According to the method, the target knowledge graph is constructed, the node information in the target knowledge graph is learned by using the preset learning model, so that the similarity between miRNA and gene and the regulation and control relationship between genes can be fully utilized when the regulation and control relationship between miRNA and gene is predicted, and compared with the traditional prediction method, the method is more effective and has higher accuracy.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 shows a schematic flow chart of a method for generating a target gene prediction model according to an embodiment of the present application;

fig. 2 shows a schematic structural diagram of a target gene prediction model generating device according to an embodiment of the present application.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

In this embodiment, a method for generating a target gene prediction model is provided, as shown in fig. 1, and the method includes:

Step 101, acquiring a gene regulation network, an miRNA regulation network and regulation relation data between genes and miRNAs in a preset database, and connecting the genes in the gene regulation network and the miRNAs in the miRNA regulation network according to the regulation relation data to obtain a target knowledge graph;

the method for generating the target gene prediction model is mainly suitable for a scene of predicting the target gene of miRNA (microRNA, micro ribonucleic acid), and can be particularly applied to one side of a server. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. In the embodiment of the application, a pre-constructed gene regulation network, an miRNA regulation network and regulation relation data between genes and miRNAs stored in a preset database are obtained. The regulation and control relation data stored in the preset database indicate that the miRNA and the gene corresponding to the regulation and control relation data have regulation and control relation. And then, connecting the genes in the gene regulation network and miRNAs in the miRNA regulation network together by using regulation and control relation data, so as to form a large knowledge graph, namely a target knowledge graph. Specifically, regulatory relationship data can be utilized to check which miRNAs and which genes have regulatory relationship, the genes are found from a gene regulatory network, the miRNAs are found from the miRNA regulatory network, and then the genes with the regulatory relationship and the miRNAs are connected together in pairs, so that a target knowledge graph is obtained.

Step 102, learning the target knowledge graph through a preset learning model to obtain gene node characteristics and miRNA node characteristics corresponding to the target knowledge graph;

in this embodiment, after the target knowledge graph is generated, the target knowledge graph is input into a preset learning model, where the preset learning model may be a graph-annotating force neural network model, and initial model parameters in the preset learning model may be set randomly. Because the target knowledge graph comprises the gene node and the miRNA node, the node information in the target knowledge graph can be learned through a preset learning model, and further the gene node characteristic and the miRNA node characteristic corresponding to the target knowledge graph are obtained. In the process of learning node information in a target knowledge graph, the preset learning model can learn the related information of the neighbor node of any node, so that the obtained gene node characteristics and miRNA node characteristics can also comprise the information of the neighbor node.

And step 103, training the preset learning model according to the gene node characteristics, the miRNA node characteristics and the regulation and control relation data to obtain a trained preset learning model, and connecting the trained preset learning model with a preset activation function to obtain a target gene prediction model.

In this embodiment, after obtaining the gene node feature and the miRNA node feature corresponding to the target knowledge graph through the preset learning model, further, calculating a model loss value corresponding to the preset learning model by using the gene node feature, the miRNA node feature and the regulation and control relationship data, and adjusting initial model parameters of the preset learning model according to the model loss value, so as to realize training of the preset learning model. When the calculated model loss value reaches the preset requirement, the preset learning model is completely trained, and then the trained preset learning model is connected with a preset activation function, so that the target gene prediction model is obtained.

By applying the technical scheme of the embodiment, the pre-constructed gene regulation network, the miRNA regulation network and regulation relation data between the genes and the miRNAs stored in the preset database are obtained, and then the gene regulation network and the miRNAs regulation network are connected together by utilizing the regulation relation data, so that a large knowledge graph, namely a target knowledge graph, is formed. After the target knowledge graph is generated, the target knowledge graph is input into a preset learning model, so that gene node characteristics and miRNA node characteristics corresponding to the target knowledge graph are obtained, further, model loss values corresponding to the preset learning model are calculated by utilizing the gene node characteristics, the miRNA node characteristics and regulation and control relation data, and parameters of the preset learning model are adjusted according to the model loss values, so that training of the preset learning model is achieved. When the calculated model loss value obtains a preset requirement, the preset learning model is indicated to be trained, and then the trained preset learning model is connected with a preset activation function, so that a target gene prediction model is obtained. According to the method, the target knowledge graph is constructed, the node information in the target knowledge graph is learned by using the preset learning model, so that the similarity between miRNA and gene and the regulation and control relationship between genes can be fully utilized when the regulation and control relationship between miRNA and gene is predicted, and compared with the traditional prediction method, the method is more effective and has higher accuracy.

Further, as a refinement and extension of the specific implementation of the foregoing embodiment, in order to fully describe the specific implementation process of the embodiment, another method for generating a target gene prediction model is provided, where the method includes:

step 201, obtaining association relation data between proteins corresponding to genes in a preset gene database, when the association relation data is larger than a preset association threshold, constructing a second characteristic edge between the genes, taking the association relation data as a weight corresponding to the second characteristic edge, and generating the gene regulation network;

in this embodiment, the gene regulation network may be established before the target knowledge graph is constructed. Firstly, obtaining association relation data between proteins from a preset gene database, wherein the association relation data can reflect the similarity degree between different proteins, and when the obtained association relation data is larger than a preset association threshold value, the similarity between the two proteins is higher, then determining that a regulation and control relation exists between genes corresponding to the two proteins, and constructing an edge between the genes corresponding to the two proteins, wherein the edge can be called as a second characteristic edge. For example, if the association relationship data between the protein a 'corresponding to the gene a and the protein B' corresponding to the gene B in the preset gene database is greater than the preset association threshold, then a regulatory relationship exists between the default gene a and the gene B, and one edge in the gene regulatory network may be formed between the gene a and the gene B. In addition, the corresponding association relationship data can be used as the weight of the second characteristic edge to be marked in the gene regulation network.

Step 202, obtaining miRNA sequences corresponding to all miRNAs in a preset miRNA database, calculating a matching score between any two miRNA sequences according to a preset matching condition, constructing a third characteristic edge between the miRNAs when the matching score is larger than a preset matching threshold, and generating the miRNA regulation network by taking the matching score as a weight corresponding to the third characteristic edge;

in this embodiment, miRNA sequences corresponding to mirnas are stored in a pre-set miRNA database. And acquiring the miRNA sequences from a preset miRNA database, and calculating the matching score between any two miRNA sequences according to preset matching conditions. For example, the preset matching condition may be to set the seed region to 6, that is, at least 6 consecutive constituent unit sequences in two miRNA sequences are completely identical, and calculate the corresponding matching score according to the sequence identity. When the calculated matching score is larger than a preset matching threshold, the similarity between the two miRNA sequences is higher, so that a third characteristic edge can be constructed between the two miRNAs. And constructing a third characteristic edge between miRNAs with matching scores larger than a preset matching threshold value in all miRNA sequences stored in a preset miRNA database, so as to construct the whole miRNA regulation network. In addition, the corresponding matching score may be a weight for the third feature edge.

Step 203, acquiring regulation and control relationship data between genes and miRNAs in a gene regulation and control network, an miRNA regulation and control network and a preset database;

step 204, determining a target gene corresponding to regulation and control relation data from the gene regulation and control network according to the regulation and control relation data between any gene and miRNA, determining a target miRNA corresponding to the regulation and control relation data from the miRNA regulation and control network, and constructing a first characteristic edge between the target gene and the target miRNA so as to construct a target knowledge graph;

in this embodiment, the already constructed gene regulation network, miRNA regulation network, and regulation relationship data between genes and mirnas stored in a preset database are acquired. Then, selecting any one regulation and control relation data, determining a gene and miRNA corresponding to the regulation and control relation data, finding the gene from a gene regulation and control network as a target gene, finding the miRNA from the miRNA regulation and control network as a target miRNA, constructing a first characteristic edge between the target gene and the target miRNA, and completing construction of a target knowledge graph after all the first characteristic edges are completely constructed. For example, regulation and control relation data between the gene 1 and the miRNA9 are stored in a preset database, so that the miRNA9 has a certain regulation and control effect on the gene 1, a node of the gene 1 is found from a gene regulation and control network, a node of the miRNA9 is found from the miRNA regulation and control network, and then an edge, namely a first characteristic edge, is constructed between the two nodes.

Step 205, learning the target knowledge graph through a preset learning model to obtain gene node characteristics and miRNA node characteristics corresponding to the target knowledge graph;

in this embodiment, after the target knowledge graph is generated, the target knowledge graph is input into a preset learning model, and node information in the target knowledge graph is learned through the preset learning model, so that the gene node characteristics and the miRNA node characteristics corresponding to the target knowledge graph are obtained.

Step 206, dividing the regulation and control relation data into the training sample regulation and control relation data and the test sample regulation and control relation data according to a preset distribution proportion;

in this embodiment, the existing regulation relationship data may be randomly divided into two parts according to a preset distribution ratio, where one part is training sample regulation relationship data and the other part is test sample regulation relationship data. Specifically, the training sample regulatory relationship data may account for 60% of the regulatory relationship data, and the test sample regulatory relationship data may account for 40% of the regulatory relationship data.

Step 207, determining the gene node characteristics and the miRNA node characteristics corresponding to the training sample regulation and control relation data according to the genes and miRNAs corresponding to any training sample regulation and control relation data, calculating the inner product of the gene node characteristics and the miRNA node characteristics, and inputting the inner product into a preset activation function to obtain first relation prediction data corresponding to each training sample regulation and control relation data;

In this embodiment, the regulatory relationship data includes training sample regulatory relationship data and test sample regulatory relationship data. According to any training sample regulation and control relation data in the regulation and control relation data, determining a gene and miRNA corresponding to the training sample regulation and control relation data, further determining a gene node characteristic corresponding to the gene and a miRNA node characteristic corresponding to the miRNA from all obtained gene node characteristics and miRNA node characteristics, and then calculating the inner product of the gene node characteristic and the miRNA node characteristic. Here, both the gene node characteristics and the miRNA node characteristics may be represented by means of feature vectors. And then, inputting the calculated inner product into a preset activation function to obtain first relation prediction data corresponding to the regulation and control relation data of each training sample. Here, the preset activation function may be a sigmoid function, through which the probability that there is a regulatory relationship between the gene and miRNA may be obtained.

Step 208, calculating a model loss value through a preset model loss calculation function based on the first relation prediction data and the corresponding training sample regulation relation data;

In this embodiment, the model prediction bias may be calculated by a preset model loss calculation function, and specifically, the obtained first relationship prediction data and the corresponding training sample regulation relationship data may be substituted into the preset model loss calculation function, and the model loss value of the preset learning model may be calculated by the preset model loss calculation function. The model loss value is calculated by the first relation prediction data and the second relation prediction data, wherein the model loss value is calculated by the first relation prediction data and the second relation prediction data.

Step 209, adjusting model parameters of the preset learning model according to the model loss values, obtaining second relation prediction data corresponding to each training sample regulation relation data through the adjusted preset learning model and the preset activation function, and calculating the model loss values again; when the model loss value is smaller than a preset loss threshold value, connecting the trained preset learning model with the preset activation function to obtain a target gene prediction model;

in this embodiment, the model loss value may be used to adjust the initial model parameters in the preset learning model, so as to obtain an adjusted preset learning model, and calculate the second relationship prediction data corresponding to each training sample regulation relationship data through the adjusted preset learning model and the same preset activation function. And then, calculating the model loss value again based on the calculated second relation prediction data and the training sample regulation relation data. And repeating the process of adjusting the model parameters in the preset learning model until the calculated model loss value is smaller than the preset loss threshold value, and indicating that the model loss has reached an acceptable degree, wherein the model parameters corresponding to the preset learning model can be used as final model parameters, and the training of the preset learning model is completed. And connecting the trained preset learning model with a preset activation function to obtain a final target gene prediction model. According to the embodiment of the application, the model loss value is calculated, and the model parameters are continuously adjusted by using the model loss value, so that the output result of the final target gene prediction model is more close to reality, and the accuracy of subsequent prediction is improved.

Step 210, obtaining third relation prediction data through the target gene prediction model based on the gene and miRNA corresponding to any one of the test sample regulation relation data; calculating a predicted deviation value of the target gene prediction model through a preset model loss calculation function based on the third relation prediction data and the test sample regulation relation data; and if the predicted deviation value meets the preset deviation value condition, determining that the target gene prediction model test passes.

In this embodiment, the regulatory relationship data includes, in addition to training sample regulatory relationship data, test sample regulatory relationship data. The performance of the target gene prediction model can be detected by using the test sample regulation and control relation data. Specifically, according to any one of the test sample regulation and control relation data, determining a gene and miRNA corresponding to the test sample regulation and control relation data, obtaining third relation prediction data corresponding to the gene and the miRNA through a target gene prediction model, and finding out the probability of regulation and control relation between the miRNA and the gene from the third relation prediction data. Further, based on the obtained third relation prediction data and the test sample regulation relation data, calculating a prediction bias value of the target gene prediction model through a preset model loss calculation function, further judging whether the prediction bias value meets a preset bias value condition, if so, indicating that the target gene prediction model has a good prediction effect, and if the target gene prediction model passes the test, the target gene prediction model can be used for predicting whether a regulation relation exists between miNRA and the gene or not, namely, whether the predicted gene is a target gene of miRNA or not.

In an embodiment of the present application, optionally, when the target knowledge-graph does not include the gene to be predicted and/or the miRNA to be predicted, after the determining that the target gene prediction model test passes, the method further includes: calculating association relation data between the genes to be predicted and any one gene in the target knowledge graph and/or calculating a matching score between the miRNA to be predicted and any one miRNA in the target knowledge graph, and constructing a second characteristic edge between the genes to be predicted and the genes greater than a preset association threshold when the association relation data is greater than a preset association threshold and/or constructing a third characteristic edge between the miRNA to be predicted and the miRNA greater than the preset matching threshold when the matching score is greater than the preset matching threshold so as to update the target knowledge graph; learning the updated target knowledge graph through the preset learning model to obtain gene node characteristics and miRNA node characteristics corresponding to the updated target knowledge graph; training the preset learning model according to the gene node characteristics, the miRNA node characteristics and the regulation and control relation data to obtain a trained preset learning model, and connecting the trained preset learning model with a preset activation function to obtain an updated target gene prediction model.

In this embodiment, when both the miRNA and the gene to be predicted are already present in the target knowledge graph, the relationship prediction data between the corresponding miRNA and the gene, specifically, the probability of the relationship between the miRNA and the gene, can be obtained by the target gene prediction model. In addition, it is also possible that the miRNA or gene to be predicted is not present in the already constructed target knowledge pattern, and the target knowledge pattern may be updated at this time. For miRNAs to be predicted which do not exist in the target knowledge graph, matching scores between the miRNAs to be predicted and the existing miRNAs in the target knowledge graph can be calculated respectively, miRNAs with matching scores larger than a preset matching threshold value are found out from the target knowledge graph, and a third characteristic edge is constructed between the miRNAs to be predicted and the miRNAs; for genes to be predicted which do not exist in the target knowledge graph, the association relation threshold value between the genes to be predicted and the existing genes in the target knowledge graph can be calculated, genes with association relation threshold values larger than the preset association threshold value are found out from the target knowledge graph, and a second characteristic edge is constructed between the genes to be predicted and the genes, so that the target knowledge graph is updated. And then, learning the updated target knowledge graph through a preset learning model, and further obtaining gene node characteristics and miRNA node characteristics corresponding to the updated target knowledge graph, wherein the gene node characteristics can also comprise gene node characteristics corresponding to the genes to be predicted, and the miRNA node characteristics can also comprise miRNA node characteristics corresponding to the miRNA to be predicted. And calculating a model loss value corresponding to the preset learning model by using the gene node characteristics, the miRNA node characteristics and the regulation and control relation data, and adjusting initial model parameters of the preset learning model according to the model loss value so as to train the preset learning model. When the calculated model loss value reaches the preset requirement, the preset learning model is completely trained, and the trained preset learning model is connected with a preset activation function, so that an updated target gene prediction model is obtained.

Further, as a specific implementation of the method of fig. 1, an embodiment of the present application provides a device for generating a target gene prediction model, as shown in fig. 2, where the device includes:

Optionally, the network generation module is specifically configured to:

Optionally, the apparatus further comprises:

It should be noted that, other corresponding descriptions of each functional unit related to the generating device of the target gene prediction model provided by the embodiment of the present application may refer to corresponding descriptions in the method of fig. 1, and are not described herein again.

Based on the method shown in fig. 1, correspondingly, the embodiment of the application also provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the method for generating the target gene prediction model shown in fig. 1.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present application.

Based on the method shown in fig. 1 and the virtual device embodiment shown in fig. 2, in order to achieve the above objective, the embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, etc., where the computer device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the above-described method for generating a target gene prediction model as shown in fig. 1.

Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the architecture of a computer device provided in the present embodiment is not limited to the computer device, and may include more or fewer components, or may combine certain components, or may be arranged in different components.

The storage medium may also include an operating system, a network communication module. An operating system is a program that manages and saves computer device hardware and software resources, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the entity equipment.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. And acquiring a pre-constructed gene regulation network, an miRNA regulation network and regulation relation data between genes and miRNAs stored in a preset database, and connecting the gene regulation network and the miRNA regulation network together by utilizing the regulation relation data so as to form a large knowledge graph, namely a target knowledge graph. After the target knowledge graph is generated, the target knowledge graph is input into a preset learning model, so that gene node characteristics and miRNA node characteristics corresponding to the target knowledge graph are obtained, further, model loss values corresponding to the preset learning model are calculated by utilizing the gene node characteristics, the miRNA node characteristics and regulation and control relation data, and parameters of the preset learning model are adjusted according to the model loss values, so that training of the preset learning model is achieved. When the calculated model loss value obtains a preset requirement, the preset learning model is indicated to be trained, and then the trained preset learning model is connected with a preset activation function, so that a target gene prediction model is obtained. According to the method, the target knowledge graph is constructed, the node information in the target knowledge graph is learned by using the preset learning model, so that the similarity between miRNA and gene and the regulation and control relationship between genes can be fully utilized when the regulation and control relationship between miRNA and gene is predicted, and compared with the traditional prediction method, the method is more effective and has higher accuracy.

Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.

Claims

1. A method for generating a target gene prediction model, comprising:

acquiring association relation data among proteins corresponding to genes in a preset gene database, constructing second characteristic edges among the genes when the association relation data is larger than a preset association threshold, and generating a gene regulation network by taking the association relation data as weights corresponding to the second characteristic edges;

Acquiring miRNA sequences corresponding to all miRNAs in a preset miRNA database, calculating a matching score between any two miRNA sequences according to a preset matching condition, constructing a third characteristic edge between the miRNAs when the matching score is larger than a preset matching threshold, and generating a miRNA regulation network by taking the matching score as a weight corresponding to the third characteristic edge;

acquiring a gene regulation network, a miRNA regulation network and regulation relation data between genes and miRNAs in a preset database, wherein the regulation relation data comprises training sample regulation relation data and test sample regulation relation data;

determining a target gene corresponding to regulation and control relation data from the gene regulation and control network according to the regulation and control relation data between any gene and miRNA, determining a target miRNA corresponding to the regulation and control relation data from the miRNA regulation and control network, and constructing a first characteristic edge between the target gene and the target miRNA so as to construct a target knowledge graph;

2. The method of claim 1, wherein prior to determining the gene node signature and the miRNA node signature corresponding to the training sample regulatory relationship data from the genes and mirnas corresponding to any training sample regulatory relationship data, the method further comprises:

3. The method of claim 1, wherein after the obtaining the target gene prediction model, the method further comprises:

4. The method of claim 3, wherein when the target knowledge-graph does not include genes to be predicted and/or mirnas to be predicted, the method further comprises, after the determining that the target gene prediction model test passes:

5. A target gene prediction model generation device, comprising:

a network generation module for: acquiring association relation data among proteins corresponding to genes in a preset gene database, constructing second characteristic edges among the genes when the association relation data is larger than a preset association threshold, and generating a gene regulation network by taking the association relation data as weights corresponding to the second characteristic edges;

the model determining module is used for determining the gene node characteristics and the miRNA node characteristics corresponding to the training sample regulation and control relation data according to the genes and miRNAs corresponding to any training sample regulation and control relation data, calculating the inner product of the gene node characteristics and the miRNA node characteristics, and inputting the inner product into a preset activation function to obtain first relation prediction data corresponding to each training sample regulation and control relation data;

6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1 to 4.

7. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 4 when executing the computer program.