CN113299339A

CN113299339A - Method, device, equipment and storage medium for predicting curative effect of medicine based on deep learning

Info

Publication number: CN113299339A
Application number: CN202110592915.6A
Authority: CN
Inventors: 王俊
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-24
Anticipated expiration: 2041-05-28
Also published as: CN113299339B

Abstract

The invention provides a medicine curative effect prediction method, a device, equipment and a storage medium based on deep learning, wherein the method comprises the following steps: obtaining a first protein sequence corresponding to a drug; segmenting the first protein sequence to obtain a plurality of first subsequences; analyzing each first subsequence to obtain each first character expression; calculating the matching degree of each first character expression and each second character expression; determining an action target point of the drug matched with the target protein based on the matching degree; and predicting the drug efficacy of the drug on the targeted protein based on each of the action targets. The invention has the beneficial effects that: the automatic detection of the action target of the medicine is realized, the curative effect of the medicine can be predicted, and the experimental resources are saved.

Description

Method, device, equipment and storage medium for predicting curative effect of medicine based on deep learning

Technical Field

The invention relates to the field of digital medical treatment, in particular to a medicine curative effect prediction method, a medicine curative effect prediction device, medicine curative effect prediction equipment and a storage medium based on deep learning.

Background

Drug discovery is the process of identifying new candidate compounds with potential therapeutic effects, and prediction of drug-target interactions (DTIs) of drug molecules and targeted proteins is an essential step in the drug discovery process. The therapeutic effect of drug molecules depends on their affinity for the target protein or receptor. Drug molecules that do not have any interaction or affinity for the target protein will not provide a therapeutic response. At present, the experimental determination of the drug target point interaction DTI only depends on manual determination, which is time-consuming and resource-consuming.

Disclosure of Invention

The invention mainly aims to provide a medicine curative effect prediction method, a device, equipment and a storage medium based on deep learning, and aims to solve the problems that experimental determination of DTI (dynamic time delay) of medicine target interaction only depends on manual determination, and time and resources are consumed.

The invention provides a medicine curative effect prediction method based on deep learning, which is applied to targeted protein and comprises the following steps:

obtaining a first protein sequence corresponding to a drug;

segmenting the first protein sequence to obtain a plurality of first subsequences corresponding to the first protein sequence, wherein the number of amino acid molecules of each first subsequence is the same;

analyzing each first subsequence to obtain a first character expression corresponding to each first subsequence;

inputting each first character expression and each second character expression corresponding to the target protein into a word2vec model trained in advance to obtain the matching degree of each first character expression and each second character expression; the second character expression is obtained by segmenting a second protein sequence of the target protein to obtain a plurality of second subsequences corresponding to the second protein sequence, and analyzing the second protein to obtain second character expression corresponding to each second subsequence; the number of amino acid molecules of each second subsequence is the same, and the second subsequence corresponding to each second character expression is a target point;

determining an action target point of the drug matched with the target protein based on the matching degree;

and predicting the drug efficacy of the drug on the targeted protein based on each of the action targets.

The invention also provides a medicine curative effect prediction device based on deep learning, which is applied to targeted protein and comprises the following components:

the acquisition module is used for acquiring a first protein sequence corresponding to the medicine;

the segmentation module is used for segmenting the first protein sequence to obtain a plurality of first subsequences corresponding to the first protein sequence, wherein the number of amino acid molecules of each first subsequence is the same;

the analysis module is used for analyzing each first subsequence to obtain a first character expression corresponding to each first subsequence;

the input module is used for inputting each first character expression and each second character expression corresponding to the target protein into a word2vec model trained in advance to obtain the matching degree of each first character expression and each second character expression; the second character expression is obtained by segmenting a second protein sequence of the target protein to obtain a plurality of second subsequences corresponding to the second protein sequence, and analyzing the second protein to obtain second character expression corresponding to each second subsequence; the number of amino acid molecules of each second subsequence is the same, and the second subsequence corresponding to each second character expression is a target point;

a determination module for determining an action target point of the drug matched with the target protein based on the matching degree;

a prediction module for predicting the drug efficacy of the drug on the targeted protein based on each of the target sites of action.

The invention also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the above methods when the processor executes the computer program.

The invention also provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any of the above.

The invention has the beneficial effects that: the amino acid composition structure of the drug is obtained and split into a plurality of first subsequences, the first subsequences are converted into corresponding first character expressions for matching calculation, and the drug curative effect of the drug on the target protein is predicted according to the matching condition. Therefore, the automatic detection of the action target of the medicine is realized, the curative effect of the medicine can be predicted, and the experimental resources are saved.

Drawings

FIG. 1 is a flowchart illustrating a method for predicting drug efficacy based on deep learning according to an embodiment of the present invention;

FIG. 2 is a block diagram schematically illustrating a deep learning-based drug efficacy prediction apparatus according to an embodiment of the present invention;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that all directional indicators (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly, and the connection may be a direct connection or an indirect connection.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.

In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a method for predicting a therapeutic effect of a drug based on deep learning, which is applied to a target protein, and includes:

s1: obtaining a first protein sequence corresponding to a drug;

s2: segmenting the first protein sequence to obtain a plurality of first subsequences corresponding to the first protein sequence, wherein the number of amino acid molecules of each first subsequence is the same;

s3: analyzing each first subsequence to obtain a first character expression corresponding to each first subsequence;

s4: inputting each first character expression and each second character expression corresponding to the target protein into a word2vec model trained in advance to obtain the matching degree of each first character expression and each second character expression; the second character expression is obtained by segmenting a second protein sequence of the target protein to obtain a plurality of second subsequences corresponding to the second protein sequence, and analyzing the second protein to obtain second character expression corresponding to each second subsequence; the number of amino acid molecules of each second subsequence is the same, and the second subsequence corresponding to each second character expression is a target point;

s5: determining an action target point of the drug matched with the target protein based on the matching degree;

s6: and predicting the drug efficacy of the drug on the targeted protein based on each of the action targets.

As described in step S1 above, the first protein sequence corresponding to the drug is obtained. In other embodiments, if the drug is an existing drug material, the drug may be obtained from the internet or from a drug database. Wherein, the first protein sequence at least comprises an amino acid composition structure, namely the ordering condition of amino acids. The first protein sequence may also include the spatial structure of the molecule, i.e., the spatial structure of the amino acids, which may facilitate subsequent detection of whether the spatial structure may bind to the target protein.

The first protein sequence is segmented as described in step S2 above to obtain a plurality of first subsequences corresponding to the first protein sequence. The segmentation method may be to segment according to the number of amino acids, and generally segment 3 amino acids into a group to obtain a plurality of first subsequences. If the number of the last divided group of amino acids is less than 3, 0 may be added as one amino acid to fill in, so that each first subsequence contains 3 amino acids. Or 3 amino acid molecules can be used as a group, the corresponding serial number of each amino acid molecule is sequentially added with 1, for example, ABCDE, the sequentially arranged 5 amino acid molecules are obtained, the first subsequence obtained by segmentation is three first subsequences of ABC, BCD and CDE, and the method can not need to consider the problem of insufficient number of the amino acids of the group which is finally segmented later.

As described in the above step S3, each of the first subsequences is analyzed to obtain a first word expression corresponding to each of the first subsequences. The analysis can be carried out through a model trained in advance, the training mode is that the first subsequence and the corresponding word are trained, and therefore the character representation of each first subsequence is obtained. The N-gram Model is a Language Model (LM), which is a probability-based discrimination Model, i.e. after any first subsequence is input, a first character expression corresponding to the probability can be obtained. Specifically, for example, assuming that there are three expressions in the middle of a () C, such as probabilities obtained by training and learning ABC, AQC, and AXC, N-gram models are 80%, 10%, and 10%, respectively, the first literal expression of ABC includes ABC: AQC: AXC is 8:1:1 information, and the first literal representation corresponding to AQC includes AQC: ABC: AXC is 1:8:1 information, thus forming a unique literal representation for each first subsequence.

As described in the above steps S4-S5, each first character expression and each second character expression corresponding to the target protein are input into a word2vec model trained in advance, so as to obtain a matching degree between each first character expression and each second character expression. The word2vec model, as an unsupervised model, includes two pre-training methods, called Skip-Gram and Continue Bag-of-words (CBOW). Skip-Gram is used to predict a word from the context, while CBOW is used to predict the context from a given word. Combining Skip-Gram and CBOW, word2vec can finally map words to low-dimensional real-valued vectors. By adopting the mechanism, action targets matched with the first character expression and the second character expression can be obtained, wherein the matching of the first character expression and the second character expression is specifically the matching condition between words (namely whether a first subsequence corresponding to the first character expression can be combined with a second subsequence corresponding to the second character expression), so that whether the first subsequences can be combined with the target protein or not and the action targets can be obtained, and the drug curative effect of the drug can be judged based on the action targets. The method for acquiring the second word expression is the same as the method for acquiring the first word expression; the method for segmenting the first protein sequence is the same as the method for segmenting the second protein sequence corresponding to the target protein, and therefore the description is omitted. UniProt is a database which collects protein resources and can be mutually linked with other resources, and is also a database which collects the most extensive protein sequence list and has the most comprehensive functional annotation so far. The data in UniProt can be used as training data for training the word2vec model.

As described in step S6, the drug efficacy of the drug on the target protein is predicted based on each of the action targets, where the prediction mode may be matching, predicting the number of binding sites between the drug and the target protein, or determining whether a target (the second subsequence of the target protein) where a pathogenic factor of the target protein is located is bound, and giving a higher weight to the target to calculate the drug efficacy, and the calculation mode is described in detail later, and is not repeated here.

In one embodiment, the step S6 of predicting the therapeutic effect of the drug on the target protein based on each of the target sites of action includes:

s601: calculating action scores of the action targets according to preset weights of the targets of the targeted protein, and summing the obtained action scores to obtain corresponding curative effect scores of the medicines;

s602: and acquiring the medicament curative effect of the curative effect score according to the preset corresponding relation between the medicament curative effect and the medicament curative effect score.

As described in step S601, the action target point of each first character expression is obtained based on the matching condition analysis. I.e. the binding site of the drug and the targeting protein, if the action target is the main pathogenic site of the targeting protein, the drug can be considered to have certain drug efficacy and should be given higher weight. Calculating the action scores of the action targets according to the weight of each target of the preset targeted protein, and summing the obtained action scores to obtain the corresponding curative effect score of the medicament. Because the targeted protein and the drug have a plurality of combined action targets, when the targeted protein is analyzed, the weight of each first subsequence of the targeted protein needs to be recorded in advance, and a higher weight is given to a pathogenic site, so that the weight of each target can be directly obtained according to the combined target. From this, the efficacy score is calculated. The setting mode of the weight can be that corresponding scientific researchers obtain the pathogenic strength of each target point on the target protein after research, so that the corresponding scientific researchers can set the corresponding weight value for each target point based on the pathogenic strength of each target point.

As described in step S602, the drug efficacy of the efficacy score is obtained according to the preset correspondence between the drug efficacy and the drug efficacy score. The method can obtain the therapeutic effect scores of the drugs corresponding to the target points according to the weighted values of the action target points, sum the therapeutic effect scores to obtain the therapeutic effect scores of the drugs, and can set the relationship between the therapeutic effect scores of the drugs and the therapeutic effect of the drugs in advance so as to obtain the therapeutic effect of the drugs.

In one embodiment, the step S3 of analyzing each of the first subsequences to obtain a first word expression corresponding to each of the first subsequences comprises:

s301: inputting each first subsequence into a Skip-Gram model for processing to obtain real-value vectors corresponding to each first subsequence; wherein the dimensions of each real-valued vector are the same;

s302: acquiring real-valued vectors corresponding to the context words with the preset number of the real-valued vectors as target vectors;

s303: and updating each real value vector by a random gradient ascending method to obtain the first character expression corresponding to each real value vector.

As described in step S301, each of the first subsequences is input to a Skip-Gram model and processed to obtain real value vectors corresponding to each of the first subsequences, where the real value vectors have the same dimension and are assumed to be V-dimensional vectors.

As described in step S302, the real-valued vectors corresponding to the context words with the preset number of real-valued vectors are obtained as the target vectors. It should be noted that, the number of the top-bottom extraction is preferably the same, and if the number of the real-valued vectors in the above text or the below text is not enough, the top-bottom extraction may be performed from the corresponding below text or the above text, and it is sufficient to ensure that the number of the real-valued vectors extracted from each real-valued vector to the context is the same, and it is assumed that 2c words are extracted.

As described in step S303, each real-valued vector is updated by a random gradient ascent method, so as to obtain the first literal expression corresponding to each real-valued vector. The random gradient ascent method includes the steps of weighting the extracted 2c vectors by a matrix W (V × N matrix), accumulating the results, and averaging the results to obtain a hidden layer vector (1 × N). And N is a preset dimension. The hidden layer vector is multiplied by an output weight matrix W' (N × V matrix) to obtain a vector (1 × V). And (4) processing by using an activation function (softmax) to obtain V-dim probability distribution, wherein the word pointed by the index with the highest probability is the predicted intermediate word w. To maximize the log-likelihood function

And continuously iterating the model to finally obtain the first character expression corresponding to each real-value vector, wherein the root sign of each real-value vector can be expressed to obtain a more accurate vector word. In the maximum log-likelihood function, C ═ 2C, w denotes the selected actual real-valued vector, context (w) denotes a context word of the selected actual real-valued vector, and p ═ w | context (w)) denotes the probability that the context word matches the selected actual real-valued vector.

In an embodiment, before the step S4 of inputting each first text expression and each second text expression corresponding to the target protein into a word2vec model trained in advance to obtain the matching degree between each first text expression and each second text expression, the method further includes:

s311: based on the target category to which the target protein belongs, acquiring initial parameters corresponding to the target category from a parameter database; and the number of the first and second groups,

acquiring training data of a corresponding category based on the target category;

s32: and inputting the initial parameters into a word2vec initial model, and inputting the training data for training to obtain the pre-trained word2vec model.

The training of the word2vec initial model is realized, the corresponding category initial parameters are obtained firstly, then further training is carried out based on training data, the training time of the word2vec model can be reduced, and the training speed is accelerated.

As described in step S311, the corresponding relationship between the initial parameter and the target category may be stored in advance, and it should be noted that the training data is continuously increased, and the latest training data should be used for training, so that the initial parameter only needs to be preliminarily trained corresponding to the target category, that is, a preset number of training data in the target category may be arbitrarily selected in advance for training, and the obtained parameter may be used as the initial parameter of the target category.

And acquiring training data of the corresponding category based on the target category, namely acquiring training data of the corresponding category from a corresponding database, wherein the data can be acquired from UniProt.

As described in step S312, the initial parameters are input into the word2vec initial model, and the training data is input for training, so as to obtain the pre-trained word2vec model. I.e. retraining the model and further optimizing the initial parameters in the model.

In an embodiment, the step S312 of inputting the initial parameters into the word2vec initial model and inputting the training data for training to obtain the pre-trained word2vec model includes:

s3121: splitting the training data into a plurality of training sets;

s3122: inputting each training set and the initial parameters into different word2vec initial models for training, and obtaining the respective training intermediate parameters of each word2vec initial model after training is completed;

s3123: calculating a loss value of each word2vec initial model by using a gradient descent method, and optimizing the corresponding word2vec initial model based on the loss value to obtain an optimization parameter corresponding to each word2vec initial model;

s3124: inputting each optimized parameter into a meta-optimization formula for calculation to obtain a target parameter;

s3125: and inputting the target parameters into the word2vec initial model to obtain the pre-trained word2vec model.

The word2vec model is obtained according to the training data.

As described in step S3121 above, the training data is split into a plurality of training sets. The splitting mode may be uniform splitting or non-uniform splitting, and it should be noted that enough training data in the split training set is ensured to avoid large errors.

As described in step S3122 above, the training sets and the initial parameters are input into different word2vec initial models for training, and after training is completed, intermediate parameters for respective training of each word2vec initial model are obtained. And respectively inputting each training set into different word2vec initial models to obtain respectively trained intermediate parameters so as to facilitate further calculation.

As described in step S3123, a loss value of each word2vec initial model is calculated by using a gradient descent method, and the respective corresponding word2vec initial model is optimized based on the loss value, so as to obtain an optimization parameter corresponding to each word2vec initial model. Wherein the formula of the training is

Wherein, theta' [ i ]]Is the optimal parameter for task Ti, theta is the initial parameter, alpha is the hyper-parameter,

[θ]L[T[i]]f (θ) is the gradient of the task Ti, which represents the task of the ith training set in the ith word2vec initial model.

As described in step S3124 above, the meta-optimization is formulated as

Where theta is the initial parameter, beta is the hyperparameter,

is that each new task Ti is relative to the parameter theta' [ i]Gradient of (a), f (θ)_i) Is the optimized parameter obtained by the ith model.

As described in step S3125 above, the obtained target parameters are input into the word2vec initial model, so as to obtain the pre-trained word2vec model.

In one embodiment, before the step of dividing the amino acid composition structure into a plurality of first subsequences S2, the method further comprises:

s201: obtaining a drug three-dimensional structure of the drug and a target three-dimensional structure of the target protein based on SWISS-MODEL;

s202: inputting the three-dimensional structure of the medicine and the three-dimensional structure of the target into a preset protein structure matching model to obtain the matching degree of the three-dimensional structure of the medicine and the three-dimensional structure of the target; wherein the protein matching model is a convolutional neural network model;

s203: judging whether the drug can act on the target protein according to the matching degree;

s204: if so, the step of dividing the amino acid building block into a plurality of first subsequences is performed.

The three-dimensional shape detection of the medicine is realized.

As described in step 201 above, SWISS-MODEL obtains the three-dimensional structure of the drug and the three-dimensional structure of the target protein; SWISS-MODEL is a MODEL for predicting the protein structure at present, and can obtain the amino acid structure sequence in the protein so as to obtain the corresponding protein structure.

As described in step S202, the three-dimensional structure of the drug and the three-dimensional structure of the target are input into a preset protein structure matching model, so as to obtain a matching degree between the three-dimensional structure of the drug and the three-dimensional structure of the target. The drug three-dimensional structure and the target three-dimensional structure can be matched through a convolutional neural network, the convolutional neural network takes the predicted combination conditions of different protein segments as input, and takes the actual combination conditions of the corresponding protein segments as output for training, so that the corresponding protein matching model is obtained.

As described in the above steps S203-S204, it is determined whether the drug can act on the target protein according to the matching degree, and if the corresponding drug cannot be bound to the spatial structure of the target protein, even if the site of the target protein can be bound by the drug, the drug cannot be considered to have a therapeutic effect on the target protein, so that the therapeutic effect of the drug can be continuously detected only when the spatial structure of the drug can be bound.

In one embodiment, after the step S6 of predicting the drug efficacy of the drug on the target protein based on each of the target action points, the method further comprises:

s701: acquiring the actual drug curative effect of the drug, and calculating the similarity between the actual drug curative effect and the drug curative effect based on a similarity calculation formula;

s702: judging whether the similarity is smaller than a preset similarity or not;

s703: if the similarity is smaller than the preset similarity, calculating a curative effect loss value of the curative effect of the medicine;

s704: and inputting the curative effect loss value into the word2vec model for retraining.

Retraining the word2vec model is achieved, self-learning is achieved, and subsequent model identification is more accurate.

As described in the above step S701, the actual drug efficacy of the drug is obtained, and the similarity with the drug efficacy is calculated based on the similarity calculation formula. The similarity calculation formula is any one of calculation formulas in the prior art, and is not described herein again.

As described in the above steps S702 to S703, the preset similarity is a preset similarity, and if the similarity is greater than the preset similarity, it indicates that the prediction result of the word2vec model has higher precision, and does not need to continue the training, and if the similarity is less than the preset similarity, the therapeutic effect loss value of the therapeutic effect of the drug is calculated, so as to facilitate the subsequent retraining.

As described in step S704, the therapeutic effect loss value is input into the word2vec model for retraining. The therapeutic effect loss value and the actual drug therapeutic value are input into a therapeutic word2vec model, the therapeutic effect loss value is used as an amplitude reference for adjusting parameters in the word2vec model, the actual drug therapeutic value is used as final output, and the word2vec model is retrained.

Referring to fig. 2, an embodiment of the present application further provides a device for predicting drug efficacy based on deep learning, applied to a target protein, including:

the acquisition module 10 is used for acquiring a first protein sequence corresponding to a drug;

a dividing module 20, configured to divide the first protein sequence to obtain a plurality of first subsequences corresponding to the first protein sequence, where the number of amino acid molecules in each of the first subsequences is the same;

an analysis module 30, configured to analyze each of the first subsequences to obtain a first text expression corresponding to each of the first subsequences;

an input module 40, configured to input each first text expression and each second text expression corresponding to the target protein into a word2vec model trained in advance, so as to obtain a matching degree between each first text expression and each second text expression; the second character expression is obtained by segmenting a second protein sequence of the target protein to obtain a plurality of second subsequences corresponding to the second protein sequence, and analyzing the second protein to obtain second character expression corresponding to each second subsequence; the number of amino acid molecules of each second subsequence is the same, and the second subsequence corresponding to each second character expression is a target point;

a determination module 50 for determining an action target point of the drug matched with the target protein based on the matching degree;

a prediction module 60 for predicting the drug efficacy of the drug on the targeted protein based on each of the target points of action.

In one embodiment, prediction module 60 includes:

the action score calculation submodule is used for calculating the action scores of the action targets according to the preset weight of each target of the targeted protein, and summing the obtained action scores to obtain the corresponding curative effect score of the medicine;

and the medicine curative effect acquisition submodule is used for acquiring the medicine curative effect of the curative effect score according to the preset corresponding relation between the medicine curative effect and the medicine curative effect score.

In one embodiment, the analysis module 30 includes:

the subsequence input sub-module is used for inputting each first subsequence to a Skip-Gram model for processing to obtain real value vectors corresponding to each first subsequence; wherein the dimensions of each real-valued vector are the same;

the target vector obtaining submodule is used for obtaining real-valued vectors corresponding to the context words with the preset number of the real-valued vectors as target vectors;

and the updating submodule is used for updating each real value vector by a random gradient rise method to obtain the first character expression corresponding to each real value vector.

In one embodiment, the device for predicting the therapeutic effect of a drug based on deep learning further comprises:

the data acquisition module is used for acquiring initial parameters corresponding to the target categories from a parameter database based on the target categories to which the target proteins belong; and the number of the first and second groups,

the training data acquisition module is used for acquiring training data of corresponding categories based on the target categories;

and the model training module is used for inputting the initial parameters into a word2vec initial model, and then inputting the training data for training to obtain the pre-trained word2vec model.

In one embodiment, the input module 40 includes:

a splitting submodule for splitting the training data into a plurality of training sets;

the parameter input submodule is used for inputting each training set and the initial parameters into different word2vec initial models for training, and after training is finished, the intermediate parameters of each training of each word2vec initial model are obtained;

a loss value operator module, configured to calculate a loss value of each word2vec initial model by using a gradient descent method, and optimize the corresponding word2vec initial model based on the loss value to obtain an optimization parameter corresponding to each word2vec initial model;

the target parameter calculation submodule is used for inputting each optimized parameter into the meta-optimization formula for calculation to obtain a target parameter;

and the target parameter input submodule is used for inputting the target parameters into the word2vec initial model to obtain the pre-trained word2vec model.

the drug three-dimensional structure acquisition module is used for acquiring a drug three-dimensional structure of the drug and a target three-dimensional structure of the target protein based on SWISS-MODEL;

the structure input module is used for inputting the three-dimensional structure of the medicine and the targeted three-dimensional structure into a preset protein structure matching model to obtain the matching degree of the three-dimensional structure of the medicine and the targeted three-dimensional structure; wherein the protein matching model is a convolutional neural network model;

the drug judgment module is used for judging whether the drug can act on the targeted protein according to the matching degree;

and if so, executing the step of segmenting the first protein sequence to obtain a plurality of first subsequences corresponding to the first protein sequence.

the actual drug curative effect acquisition module is used for acquiring the actual drug curative effect of the drug and calculating the similarity between the actual drug curative effect and the drug curative effect based on a similarity calculation formula;

the similarity judging module is used for judging whether the similarity is smaller than a preset similarity or not;

the curative effect loss value calculation module is used for calculating the curative effect loss value of the curative effect of the medicine if the curative effect loss value is smaller than the preset similarity;

and the retraining module is used for inputting the curative effect loss value into the word2vec model for retraining.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used to store second protein sequences of various target proteins, and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program can realize the method for predicting the curative effect of the medicine based on deep learning of any embodiment when being executed by a processor.

Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.

The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for predicting the curative effect of a drug based on deep learning according to any of the above embodiments may be implemented.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A method for predicting the curative effect of a medicine based on deep learning is applied to a target protein and is characterized by comprising the following steps:

obtaining a first protein sequence corresponding to a drug;

2. The method of claim 1, wherein the step of predicting the therapeutic effect of the drug on the target protein based on each target of action comprises:

calculating action scores of the action targets according to preset weights of the targets of the targeted protein, and summing the obtained action scores to obtain corresponding curative effect scores of the medicines;

and acquiring the medicament curative effect of the curative effect score according to the preset corresponding relation between the medicament curative effect and the medicament curative effect score.

3. The method for predicting the curative effect of a drug based on deep learning of claim 1, wherein the step of analyzing each of the first subsequences to obtain the first word expression corresponding to each of the first subsequences comprises:

inputting each first subsequence into a Skip-Gram model for processing to obtain real-value vectors corresponding to each first subsequence; wherein the dimensions of each real-valued vector are the same;

acquiring real-valued vectors corresponding to the context words with the preset number of the real-valued vectors as target vectors;

and updating each real value vector by a random gradient ascending method to obtain the first character expression corresponding to each real value vector.

4. The method for predicting the curative effect of a deep learning-based drug according to claim 1, wherein before the step of inputting each first textual expression and each second textual expression corresponding to the target protein into a word2vec model trained in advance to obtain the matching degree between each first textual expression and each second textual expression, the method further comprises:

based on the target category to which the target protein belongs, acquiring initial parameters corresponding to the target category from a parameter database; and the number of the first and second groups,

and inputting the initial parameters into a word2vec initial model, and then inputting the training data for training to obtain the pre-trained word2vec model.

5. The method for predicting the curative effect of a medicine based on deep learning as claimed in claim 4, wherein the step of inputting the initial parameters into a word2vec initial model and inputting the training data for training to obtain the pre-trained word2vec model comprises:

splitting the training data into a plurality of training sets;

inputting each training set and the initial parameters into different word2vec initial models for training, and obtaining the respective training intermediate parameters of each word2vec initial model after training is completed;

calculating a loss value of each word2vec initial model by using a gradient descent method, and optimizing the corresponding word2vec initial model based on the loss value to obtain an optimization parameter corresponding to each word2vec initial model;

inputting each optimized parameter into a meta-optimization formula for calculation to obtain a target parameter;

and inputting the target parameters into the word2vec initial model to obtain the pre-trained word2vec model.

6. The method of claim 1, wherein the step of segmenting the first protein sequence into a plurality of first subsequences corresponding to the first protein sequence is preceded by the step of predicting the therapeutic effect of the drug based on deep learning, further comprising:

obtaining a drug three-dimensional structure of the drug and a target three-dimensional structure of the target protein based on SWISS-MODEL;

inputting the three-dimensional structure of the medicine and the three-dimensional structure of the target into a preset protein structure matching model to obtain the matching degree of the three-dimensional structure of the medicine and the three-dimensional structure of the target; wherein the protein matching model is a convolutional neural network model;

judging whether the drug can act on the target protein according to the matching degree;

7. The method for predicting the therapeutic effect of a drug based on deep learning of claim 1, wherein the step of predicting the therapeutic effect of the drug on the target protein based on each target of action further comprises:

acquiring the actual drug curative effect of the drug, and calculating the similarity between the actual drug curative effect and the drug curative effect based on a similarity calculation formula;

judging whether the similarity is smaller than a preset similarity or not;

if the similarity is smaller than the preset similarity, calculating a curative effect loss value of the curative effect of the medicine;

and inputting the curative effect loss value into the word2vec model for retraining.

8. A medicine curative effect prediction device based on deep learning is applied to a target protein and is characterized by comprising the following components:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.