CN116206688A

CN116206688A - Multi-mode information fusion model and method for DTA prediction

Info

Publication number: CN116206688A
Application number: CN202310188140.5A
Authority: CN
Inventors: 欧阳纯萍; 刘永彬; 张琳琳; 万亚平; 田纹龙; 余颖
Original assignee: University of South China
Current assignee: University of South China
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-06-02

Abstract

The invention provides a multimode information fusion model and a method for DTA prediction, wherein the model comprises a drug molecular structure information encoder, a target structure information encoder, a multimode balance module and a drug target fusion module; the medicine molecular structure information encoder uses a transducer model to encode medicine character string modal information, and uses a GIN model to extract medicine graph modal information characteristics; the target structure information encoder encodes the target character string modal information by using a transducer model, and extracts drug graph modal information characteristics by using a GCN model; the multi-modal balancing module balances and integrates the drug strings and the graph modal information by using a contrast learning method, and balances and integrates the target strings and the graph modal information; the drug target fusion module connects two modal characteristics of the drug and the target obtained by the multi-modal balance module for DTA prediction.

Description

Multi-mode information fusion model and method for DTA prediction

Technical Field

The invention relates to the technical field of drug target binding affinity prediction, in particular to a multi-mode information fusion model and method for DTA prediction.

Background

Drug discovery is a process of discovering potential novel drugs, and involves various fields of pharmacology, chemistry, biology and the like, and huge economic cost and time cost are generally required to be consumed. It is counted that it costs about 26 billion dollars to develop a new drug, while it takes 17 years to get FDA approval. Over the years, with the development of computer technology, computer-aided drug discovery has become a trend, so there is an urgent need to develop a computational model that advances the process of drug discovery. Among them, successful recognition of drug-target interactions is a key step in drug discovery, while affinity to further accurately recognize drug-target interaction relationships is more important for drug development. DTA represents the relationship between the intensity of binding of a drug molecule to a target, and in general, the more strongly a compound molecule binds to a target, the more likely it is that the compound will affect the biological function of the target and is more likely to be a suitable candidate drug. Therefore, establishing a calculation model to accurately predict DTA can accelerate the screening process of drug molecules, minimize unnecessary in-vitro screening experiments, and have important significance for drug research and development.

Many calculation methods and models for DTA prediction have been proposed so far, for example: traditional molecular docking techniques, which predict the binding pattern and binding affinity of drugs and targets by computer simulation, are based on the 3D structure of the target and compound molecules. Many of the mature molecular docking algorithms are developed as software, such as Gold and Dock, and these molecular docking techniques are very time consuming. With the development of computer technology, molecular dynamics simulation technology appears, such as Elanie et al combine a rapid geometric docking algorithm with molecular mechanics interaction energy assessment, calculate the potential of each ligand atom for scoring, and are more flexible, and the prediction result is more accurate, but the cost of calculation and time is expensive.

Most early machine learning methods were based on matrix calculations predicted by structural similarity calculations, which greatly reduced costs. For example, he et al propose a method called SimBoost which predicts successive values of binding affinity for compounds and proteins. Li et al propose a random forest based molecular docking method that predicts by applying the Kronecker similarity matrix product. However, these methods rely excessively on structural data characteristics of the molecule, and acquiring such data is difficult and time consuming. With the rapid development of deep learning and big data age, convolutional Neural Networks (CNNs), graphic Neural Networks (GNNs), and their variants are applied in the field of drug discovery. Since structural information of drugs and targets plays an extremely critical role in DTA prediction, existing DTA prediction methods are mostly based on structural information of drugs and targets, and they can be classified into string-based and graph-based methods.

The string-modality based method learns features from sequence data. For example, deep dta uses CNN for feature extraction of one-dimensional representations of target sequences and drug SMILES. WideDTA calculates complementary protein domain, motif and maximum common substructural word information on this basis and introduces a word-based sequence representation for DTA prediction. In contrast, attentionDTA focuses more on important key subsequences in drug and target sequences, and introduces a double-sided multi-headed attention mechanism to predict DTA. These methods focus only on the string modality of the drug SMILES and target information, and the information of this modality ignores the spatial structure and hydrogen atom information. Furthermore, only a fixed length of the string is considered during the embedding process, which will result in the loss of some useful information. To address this drawback, methods based on graph modalities have evolved. GraphDTA proposes to represent the drug molecular structure information as a map, and feature-extract the drug molecular map using GNNs, and feature-extract the target sequence using CNN. DGgraphDTA performs DTA prediction by using a medicine molecular diagram and a target structure diagram, and performs feature extraction by using a graph convolutional neural network model (GCN). However, the drug molecular diagram lacks the contextual semantic information of the character string and the positional arrangement of the atoms. In the method, only the space structure of the target is considered in the target structure diagram, the arrangement sequence of the target residues is not considered, and the position information of the peptide chain residues is ignored. Therefore, there is a need to systematically consider multimodal information of drugs and target structures to obtain complete information that better predicts DTA.

Multimodal techniques can systematically consider information from a variety of different modalities. In the last decade, information fusion technology has successfully achieved fusion of multi-modal information, and the utilization of multi-modal information has attracted attention of researchers. For example: tuan et al propose a new method to detect false news by fusing multimodal features from text and visual data. Mou et al propose a deep learning model to fuse data of multiple modalities, including eye data, vehicle data and environmental data. From this, the multi-modal information fusion technique has been widely used in various fields. Likewise, the fusion utilization of multimodal information may also be applied in the field of drug discovery. For example: deng et al developed a multi-modal variogram-based embedding method, graph2MDA, that incorporates various attributes and features of microorganisms, drugs to predict microorganism-drug associations. Lyu et al consider the potential correlation between drugs and multimodal data for targets, enzymes, etc., and design an MDNN dual channel framework to obtain multimodal characterization of drugs. It can be seen that these drug discovery methods use embedded representations of different properties of drugs expressed in different modalities, without simultaneously focusing on the multiple modality information of a certain property. Moreover, the existing DTA method only considers the single structural properties of the drug and the target, and does not consider the multiple attribute information of different modes of the drug and the target.

Disclosure of Invention

The invention aims to provide a multi-mode information fusion model and a multi-mode information fusion method for DTA prediction, wherein the model can be embedded with character strings and graph mode information in medicines and targets, and feature representations of different modes are balanced through a contrast learning method so as to output richer information for DTA prediction.

In order to solve the technical problems, the invention adopts the following technical methods: a multimodal information fusion model for DTA prediction, comprising: the system comprises a drug molecular structure information encoder, a target structure information encoder, a multi-modal balancing module and a drug target fusion module;

the medicine molecular structure information encoder encodes medicine character string modal information by using a transducer model, and extracts medicine graph modal information characteristics by using a GIN model;

the target structure information encoder encodes target character string modal information by using a transducer model, and extracts drug graph modal information characteristics by using a GCN model;

the multi-modal balancing module balances and integrates the drug strings and the graph modal information by using a comparison learning method, and balances and integrates the target strings and the graph modal information;

the drug target fusion module connects two modal characteristics of the drug and the target obtained by the multi-modal balance module and is used for DTA prediction.

As another aspect of the present invention, a multi-modal information fusion method for DTA prediction includes:

step S1, embedding a character string mode;

taking the medicine SMILES code as a character string, carrying out integer coding on the character string, merging the position codes of the code to obtain vector representation, and carrying out feature extraction on the vector by a transducer model to obtain the final vector representation of the SMILES character string;

the target sequence is regarded as a character string, integer codes are carried out on the character string, the vector representation is obtained by merging the position codes of the codes, and the final vector representation of the target character string is obtained by carrying out feature extraction on the vector through a transducer model;

step S2, embedding a graph mode;

taking each atom as a node in the medicine molecular diagram, taking the relation among the atoms as an adjacent matrix of the medicine molecular diagram, and taking the attribute of the atoms as the attribute characteristic of the medicine molecular diagram node; taking the characteristic vectors of the medicine molecular diagram and the nodes thereof as input, and embedding the nodes through a GIN model to obtain the representation vector of the medicine molecular diagram;

taking each residue as a node in the target structure diagram, taking the probability of whether the residue pair contacts as an adjacent matrix of the target structure diagram, and grading the position of each residue through a sequence comparison result to be taken as an attribute characteristic of the target structure diagram node; taking a target structure diagram and the feature vector of the node thereof as input, and embedding the node through a GCN model to obtain the representation vector of the target structure diagram;

step S3, comparing and learning multi-mode representation and fusing representation;

and learning characteristic representation by maximizing consistency of a character string mode and a graph mode, respectively obtaining final representation of two modes of the drug and the target, and then splicing the final representation to obtain drug and target mode information for DTA prediction.

Further, in step S1, after the drug and target character strings are integer coded, the positional information of the character string modes is captured by using the arrangement information of the drug atoms and target residues, and the abstract features of different levels are learned from the input through a transducer model, and then the final vector representation of the drug and target character strings is obtained by applying a maximum pooling layer.

Further, in step S1, the following formula is used to represent the position information of the character string mode:

PE _(pos,2i) ＝sin(pos/10000 ^2i/dmodel ) (1)

PE _(pos,2i+1) ＝cos(pos/10000 ^2i/dmodel ) (2)

where pos represents a character in the string, i is the dimension of the character code, and dmedel is the code of the character.

Still further, the transducer model includes an MSA layer and an MLP block, the function of the MSA layer being expressed as:

z _m ＝MSA(LN(z _l-1 ))+z _l-1 ,l＝1…L (3)

wherein ,z_l-1 Representing the input of MSA layer, z _m Representing the output of MSA, LN represents the normalization layer, L represents the batch of input lengths;

the MLP block contains two CNN layers and one normalization layer, the function of which is expressed as:

z _l ＝MLP(LN(z _m ))+z _m ,l＝1…L (4)

wherein ,z_l Representing the output of the MLP.

Still further, the GIN model includes five GIN layers, each GIN layer is followed by a batch normalization layer, the last batch normalization layer is followed by a global maximum pool layer, each GIN layer uses a multi-layer perceptron model to characterize the node x _i The updating is as follows:

where k represents the GIN layer, e is a learnable parameter or fixed scalar, and N (i) is the neighbor set of the inode.

Still further, the GCN model includes three GCN layers, each activated by a RELU function, and a global maximum pool layer is connected after the last GCN layer, and each GCN layer performs a convolution operation, as follows:

wherein ,

adjacency matrix, which is the structure of the target,>

is the degree matrix of the target structure diagram, sigma is the activation function, and W is the learnable weight matrix.

Preferably, in step S3, when learning the feature representation by maximizing the consistency of the string mode and the graph mode, fixing any one drug or target itself as an anchor point a, obtaining a set of positive samples P composed of multiple modes of the drug or target itself, regarding all the modes of other drugs or targets as negative samples N, generating each positive (a, P) and negative pair (a, N), and forcing all the mode representations of the anchor point a to be consistent by adopting contrast learning, and distinguishing from all the mode representations of the negative samples, wherein the calculation of Loss is as follows:

wherein, for each sample i, P (i) is its positive sample set, |p (i) | is the number of positive samples, P is one of the positive samples, N (i) is the negative sample set, N is one of the negative samples, and T is the temperature coefficient.

Compared with the traditional model, the multi-mode information fusion model and the method provided by the invention have more excellent feature capturing capability, and can provide more abundant and complete mode information for DTA prediction. Specifically, the invention fully considers the character string and graph mode information in the medicine and the target, wherein the character string mode comprises the medicine SMILES and the target sequence, the graph mode comprises the medicine molecular graph and the target structure graph information, different deep learning models are selected to extract the data characteristics of different modes, the characteristic representation of different modes is balanced through a contrast learning method, and the arrangement information of medicine atoms and target residues is used to capture the position information of the string mode, so that more useful characteristic information can be extracted in the SMILES and the target sequence to effectively improve the DTA prediction effect.

Drawings

Fig. 1 is a flowchart of the application of the multi-modal information fusion model according to the present invention to DTA prediction;

FIG. 2 is a diagram of a comparative learning framework in a multi-modal information fusion method according to the present invention;

FIG. 3 is a graph comparing MSE and CI indicators for various models on a Davis dataset in an evaluation experiment according to the present invention;

FIG. 4 is a graph comparing MSE and CI indicators for various models on a KIBA dataset in an evaluation experiment according to the present invention.

Detailed Description

The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention.

Traditional drug-target affinity prediction methods based on biological experiments cannot meet the requirements of drug R & D in the big data age, and although a drug-target binding affinity prediction task model based on deep learning has been successful, the models only consider single mode characteristics of drug and target information, and the information is not complete and rich enough. In fact, the information of different modes of the drug and the target can be complementary, and more valuable information can be obtained by fusing the information of different modes. Based on the above, the invention designs a multi-mode information fusion model called FMDTA for DTA prediction, which is specifically as follows.

1. FMDTA

The invention combines and fuses the information of the two modes to obtain more complete information because the data of the character string mode and the figure mode have respective advantages and disadvantages. FMDTA consists of four parts: the device comprises a drug molecular structure information encoder, a target structure information encoder, a multi-modal balancing module and a drug target fusion module. Wherein:

the drug molecular structure information encoder uses a transducer model to encode drug string modal information, and uses a GIN model to extract drug map modal information features.

The target structure information encoder encodes the target character string modal information by using a transducer model, and extracts drug pattern modal information features by using a GCN model.

The multi-modal balancing module balances and integrates the drug strings and the graph modal information and balances and integrates the target strings and the graph modal information by using a contrast learning method.

The drug target fusion module connects two modal characteristics of the drug and the target obtained by the multi-modal balance module for DTA prediction.

2. The working flow of the FMDTA is shown in figure 1, namely the multi-mode information fusion method provided by the invention mainly comprises three steps of:

step S1, embedding a character string mode

S11, embedding of medicine SMILES

Each medication SMILES code is treated as a string of characters, and is input by integer encoding, with the integer as the character. Here, it should be noted that the model needs to be trained before practical application, and in order to facilitate the FMDAT training provided by the present invention, it is preferable to cut or fill each SMILES string into a fixed length of 100 characters, and these integer sequences are used as inputs to the embedding layer that returns a 128-dimensional vector representation. Different levels of abstract features are learned from the input using a transducer model.

Considering that the arrangement sequence of SMILES character strings is critical to the characteristic expression of drug molecules, the invention integrates the position information of SMILES, and the position coding is expressed by adopting the following formula:

PE _(pos,2i) ＝sin(pos/10000 ^2i/dmodel ) (1)

PE _(pos,2i+1) ＝cos(pos/10000 ^2i/dmodel ) (2)

where pos represents a character in the SMILE string, i is the dimension of the character code, and dmodel is the code of the character.

After the position codes are obtained, the drug SMLIES coding information is added with the position codes to obtain complete coding information, and then abstract features with different levels are learned from the input through a transducer model.

For the transducer model, the invention follows the original transducer design, which consists of a multi-head attention (MSA) layer and an MLP block. The function of the MSA layer is expressed as:

z _m ＝MSA(LN(z _l-1 ))+z _l-1 ,l＝1…L (3)

wherein ,z_l-1 Representing the input of MSA layer, z _m Representing the output of the MSA, LN represents the normalization layer, and L represents the batch of input lengths.

After the MSA layer, the MLP block contains two CNN layers and one linear normalization layer, the function of which is expressed as:

z _l ＝MLP(LN(z _m ))+z _m ,l＝1…L (4)

wherein ,z_l Representing the output of the MLP.

Finally, a max pooling layer is applied to obtain a final vector representation of the drug string.

S12, embedding of target sequence

Each target sequence is considered as a string of characters, which are input after being encoded by integers. Similarly, for ease of training, each target sequence is cut or padded to a fixed length of 1000 residues, and these integer sequences are used as inputs to the embedding layer that return the 128-dimensional vector representation. Similarly, considering that the arrangement sequence of target residues is critical to the structural feature expression of the target, after the position information is encoded, abstract features of different levels are learned from the input by using a transducer model, and then the final vector representation of the target character string is obtained by using a maximum pooling layer.

Step S2, embedding of graph modalities

S21, embedding medicine molecular diagram

The drug molecular profile data in the present invention are from GraphDTA [ from literature [15] jiang, m.; li, Z; zhang, s.; wang, s.; wang, x.; yuan, q.; wei, Z.drug-target affinity prediction using graph neural network and contact maps.RSC Advances 2020,10,20701-20712. According to the invention, each atom is used as a node in a medicine molecular diagram, and the connection among atoms is used as an adjacent matrix of the medicine molecular diagram. And meanwhile, the related attribute of the atoms is used as the attribute characteristic of the drug molecular diagram node. And taking the characteristic vectors of the medicine molecular graph and the nodes thereof as input, and embedding the nodes through a GIN (Graph Isomorphic Networks) model.

Specifically, the GIN model includes five GIN layers, each of which characterizes a node x through a multi-layer perceptron (MLP) model _i The updating is as follows:

In the GIN model, a batch normalization layer is arranged behind each GIN layer, and finally, a global maximum pool layer is added to obtain the representation vector of the medicine molecular diagram.

S22, embedding a target structure diagram

The target structure diagram data in the invention are from dggraphdta [ from literature Jiang, m.; li, Z; zhang, s.; wang, s.; wang, x.; yuan, q.; the invention uses each residue as a node in a target structure diagram, and uses the probability of whether the residue pair is contacted or not as an adjacent matrix of the target diagram, so that the spatial information of the protein is well reserved. And scoring each residue position through sequence comparison results and taking the residue position as a characteristic vector of a residue node. Then, the target structure diagram and the characteristic vector of the node are taken as input, and the node embedding is carried out through a GCN model.

Specifically, the GCN model includes three GCN layers, each of which is constructed by performing a convolution operation as follows:

wherein ,

adjacency matrix, which is the structure of the target,>

Each GCN layer of the GCN model is activated by RELU function, and a global maximum pool layer is added after the last GCN layer, so that a representation vector of a target structure diagram is obtained.

S3, fusion of comparison learning and representation of multi-modal representation

In extracting features from data of different modalities, on one hand, since the internal structures of different drugs or targets may contain information of a plurality of similar functional groups or residues, the features between different drugs/targets become blurred and similar after encoding. On the other hand, due to the obvious difference of embedding methods of different modes, the characteristic representation of two modes of the same drug/target has obvious difference, and the characteristic representation of different drugs/targets of the same mode has small difference. Therefore, a proper fusion method needs to be selected to balance information embedding between different modes, so that information of multiple modes can be complemented.

In order to balance information embedding among different modes, the invention adopts a method of contrast learning to capture the interaction relation of information among modes. As shown in fig. 2, in FMDTA, we learn the feature representation by maximizing the consistency of the string and graph modalities. Specifically, for example, for drug i, fixing itself as anchor point a, a set of positive samples P consisting of its own multiple modes is obtained, and the representation of all modes of the other drugs is regarded as negative samples N, generating each positive (a, P) and negative (a, N) pair. Then, with contrast learning, all modal representations of the forced anchor point a are identical and distinguished from all modal representations of the negative sample. In the contrast learning process, the embedded representation of the drug (target) is iteratively updated through calculation Loss, so that the representations of two modes of the same drug (target) are closer, the representations of different drugs (targets) of the same mode and different drugs (targets) of different modes are more different, the embedded representation of the drug (target) is balanced, and the final embedded representation is obtained, wherein the calculation of Loss is as follows:

wherein, for each sample i, P (i) is its positive sample set, |p (i) | is the number of positive samples, P is one of the positive samples, N (i) is the negative sample set, N is the negative sample therein, and T is the temperature coefficient.

And (5) comparing and learning the embedded information of different modes of the target.

After the final representation of the two modes of the drug and the target is obtained, the drug and the target mode information for DTA prediction is obtained by splicing the two modes, and then the DTA can be predicted through two full-connection layers.

3. In order to verify the feasibility and the superiority of the model and the method according to the invention, a verification experiment is carried out next.

3.1 Experimental data

In this experimental evaluation, the samples from deep DTA [ literature-derived ] were used

H.；/>

A.；Ozkirimli,E.DeepDTA:deep drug–target binding affinity prediction.Bioinformatics2018,34,i821–i829.]Comprises Davis [ originating from literature Davis, m.i.; hunt, j.p.; herrgard, s.; ciceri, p.; wodicka, l.m.; pallares, g.; hocker, m.; treiber, d.k.; zarrinkar, P.P. complex analysis of kinase inhibitor selection is 2011,29,1046-1051, nature biotechnology.]And KIBA [ from document Tang, j.; szwajda, a.; shakyawar, s.; xu, t.; hintsanen, P.; wennenberg, K.; aittokallio, T.marking sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis.journal ofChemical Information and Modeling 2014,54,735-743.]A data set.

The Davis dataset contains 72 compounds and 442 proteins, and their corresponding affinity values, measured by Kd values (kinase dissociation constants), ranging from 5.0 to 10.8. The compound has an average length of the SMILES string of 64 and an average length of 788 of the target sequence.

The KIBA dataset contains combined kinase inhibitor bioactivity from different sources (e.g., ki, kd and IC 50), contains 2,116 drugs and 229 binding affinities of the targets, measured as KIBA fraction, ranging from 0.0 to 17.2. The compound has an SMILES string average length of 58 and a target sequence average length of 728. The data information is summarized in table 1.

Table 1 dataset data statistics

3.2 evaluation index

The present experiment evaluates using Mean Squared Error (MSE), the Concordance Index (CI) and

the index is used as an evaluation index of the model performance.

MSE is an evaluation index commonly used in regression tasks, and samples p are measured by calculating the average of the sum of squares of the differences between the predicted values of the samples and the actual values of the samples _i Predicted value and sample actual value y _i Differences between them. The smaller the MSE the better the effect. The formula for the MSE is as follows:

CI is an index that evaluates the rank of predicted values and rank consistency of the true values between them, the higher the value, the better the predictive effect of the model. The calculation formula of CI is as follows:

wherein ,b_x Is of greater affinity d _x Predicted value of b _y Is of smaller affinity d _y Z is a normalization constant; h (x) is a step function, the formula is as follows:

the index is used to evaluate the external predictive performance of the QSAR (quantitative structure activity relationship) model. The formula is as follows:

wherein r² And

the square correlation coefficients with and without intercept, respectively.

For training, a server with 2 Intel (R) Xeon (R) Gold 5215.5 GHz CPU, 256GB RAM and 3 NVIDIA Corporation GV GL GPUs was used in the experimental evaluation. The super parameters used for the experiments are shown in table 2.

TABLE 2 super parameters

Hyper-parameters	Setting
		Learning rate	0.0005
Batch size	512
		Optimizer	Adam
GIN layers	5
		GCN layers	3
Transformer_embedding_dim	128
		GNNS_embedding_dim	128

3.3 evaluation strategy

The experiment uses 5 fold cross validation to verify the performance of the model. The data sets were randomly split into training, validation and test sets at a ratio of 3:1:1 at each compromise. Finally, the model selected by verification is evaluated on the test set.

3.4 model comparison

In this experiment, classical DTA predictive models were classified into two classes and compared with them for performance. The methods in which only the string modality is considered are DeepDTA, wideDTA and attritiondta. The models all adopt CNN models to encode character strings.

Deep dta: the system consists of two independent CNN modules, is used for learning characteristic representation of SMILES and target Sequence information of the drug respectively, and inputs the representation into three fully connected layers for DTA prediction.

WideDTA: protein domains and motifs and maximumcommon substructure words information is introduced on the basis of deep DTA, so that DTA prediction is better performed.

AttentionDTA: two-sided multi-headed attention mechanisms were introduced to focus on critical subsequences important for drug and protein sequences in predicting their affinities.

The methods that considered the graph modality were GraphDTA, DGraphDTA, deepGLSTM and SAG-DTA, both of which used GNNs to encode the graph.

GraphDTA: the medicine molecular diagram is introduced, the molecular diagram characteristics of the GNNs learning medicine are utilized, the characteristics of the CNN learning target Sequence are used for representing, and the two parts are spliced and predicted to DTA through the full-connection layer.

DGraphDTA: the target structure diagram is introduced on the basis of GraphDTA, and the GNNs are utilized to learn the characteristics of drugs and targets, and the two parts are spliced to predict the DTA through a full-connection layer.

SAG-DTA: drug molecular graph characterization was improved by a self-attention mechanism on the basis of GraphDTA for DTA prediction.

Deep glstm: a graph-convolution network and LSTM based method encodes drugs and targets, respectively, to predict DTA.

TABLE 3 results of DTA predictive model on DAVIS dataset

Table 3 gives the performance of FMDTA and baseline methods on Davis dataset. To maintain comparative fairness, the same hyper-parameters and evaluation index were used for all models. As can be seen, the FMDTA is significantly better than the method which relies solely on string or graph modes in three indexes, the MSE is 0.195, the CI is 0.909, and R ² m is 0.748. Furthermore, the graph modality-based approach is generally superior to the string modality-based approach because the fixed length interception of the string modality data robustly loses much information. Although the data of the graph modality is relatively complete, features tend to be smoothed during the graph embedding process. By a means ofIn order to fuse the multimodal information of the drug and the target by a suitable method, it is necessary to supplement the information with each other.

To further test the generalization of the proposed method, the present experiment uses the same hyper-parameters on the KIBA dataset as in the Davis dataset to evaluate the model. As shown in Table 4, it can be seen that FMDTA still exhibited excellent, specifically, MSE of FMDTA was 0.133, CI was 0.899, R ² m is 0.801. These results demonstrate the effectiveness of the model of the invention in DTA prediction and good generalization ability.

TABLE 4 results of DTA predictive model on KIBA dataset

3.5 ablation study

3.5.1 validity analysis of multimodal information fusion

The model of the invention was compared to DeepDTA, graphDTA and dggraphdta on the Davis dataset. In order to ensure the fairness of comparison, the same module of the model and the comparison model adopts the same parameters. Wherein FMDTA (w/o PC) represents the simple splicing and fusion of two modes of a drug and a target, and does not contain character string position information coding and contrast learning.

As can be seen from fig. 3, fusing information from both modalities is superior to using only single modality information. The combination of the structural information of the medicine and the target can obtain more complete structural information and spatial structural information, and the limitation that the deep DTA simply utilizes the character string mode is overcome. The character string modal information of the medicine SMILES and the target Sequence can acquire the arrangement information and the context semantic information of the character string, and the limitation that the DGgraphDTA simply utilizes the structural information is overcome. The graphDTA utilizes a medicine molecular structure diagram and target Sequence information, and also only utilizes information of a single mode, so that the information is not complete. The method shows that the information of the two modes can be mutually supplemented, and the prediction capability of the model on the DTA can be effectively improved by fusing the information of the two modes. Meanwhile, the effect of GraphDTA is better than that of deep DTA and DGgraphDTA, and the different modes of medicine and target protein information are indicated, so that the effect is better.

To further verify the validity of the multimodal information fusion on DTA tasks, FMDTA (w/o PC) was compared to DeepDTA, graphDTA and dggraphdta on KIBA dataset. FIG. 4 shows performance of FMDTA (w/o PC) and unimodal models on KIBA datasets. It can be seen that the fusion of multimodal information is equally excellent on the KIBA dataset, indicating that the method is rational and efficient.

3.5.2 validity analysis of string position information coding and multimodal contrast learning

This section aims to prove the importance of the position information coding of the character string modality and the importance of multi-modality contrast learning. As previously described, the order of the atoms in the SMILES and the order of the target sequence residues are critical to the molecular characterization. At the same time, it is also important to maximize the consistency of the string and graph modalities to learn the feature representation, where a comprehensive ablation study is deployed to explore the necessity of each individual module, and to compare on the Davis and KIBA datasets.

Three variants of FMDTA are:

(1) FMDTA (w/oPC) FMDTA without position information of string modal coding component and multi-modal contrast learning component, which directly splices two separate modalities of drug and target protein.

(2) FMDTA (w/o C) FMDTA without multimodal contrast learning component extracts positional information of string modalities but does not take interaction between modality features into account.

(3) FMDTA (w/o P) FMDTA without string modal position information encoding, balances fusion string and graphic modal characteristics by contrast learning components.

Table 5 results of ablation study on positional information and contrast learning

As can be seen from the results of Table 5, the MSE index was increased by 0.5% and the CI was increased by 0.7% by adding the position encoding module of the string modality to the Davis dataset. The MSE index improvement was not significant on the KIBA dataset, whereas the CI improved by 0.3%. After the multi-modal contrast characterization learning module is added on the Davis data set, the MSE measurement and the CI are significantly improved by 2% and 1%, respectively. On the KIBA dataset, MSE and CI were significantly increased by 0.4% and 0.7%, respectively. The addition of the position-coding module and the contrast-learning module resulted in a more significant improvement, with 2.8% and 1.5% improvement in MSE and 0.7% and 1.2% improvement in CI, respectively, over the Davis and KIBA datasets. From this, it can be concluded that:

(1) The position information added with the character string codes can capture more complete information, and the model performance is improved.

(2) Contrast learning allows information from different modalities to interact during encoding, resulting in a more balanced representation of features from different modalities and a more flexible model.

The foregoing embodiments are preferred embodiments of the present invention, and in addition, the present invention may be implemented in other ways, and any obvious substitution is within the scope of the present invention without departing from the concept of the present invention.

In order to facilitate understanding of the improvements of the present invention over the prior art, some of the figures and descriptions of the present invention have been simplified, and some other elements have been omitted from this document for clarity, as will be appreciated by those of ordinary skill in the art.

Claims

1. The multi-modal information fusion model for DTA prediction is characterized by comprising a drug molecular structure information encoder, a target structure information encoder, a multi-modal balancing module and a drug target fusion module;

2. A multi-modal information fusion method for DTA prediction, comprising:

step S1, embedding a character string mode;

step S2, embedding a graph mode;

3. The multi-modal information fusion method for DTA prediction as recited in claim 2, wherein: in step S1, after the medicine and the target character string are subjected to integer coding, the arrangement information of medicine atoms and target residues is utilized to capture the position information of the character string mode, abstract features of different levels are learned from input through a transducer model, and a maximum pooling layer is applied to obtain the final vector representation of the medicine and the target character string.

4. The multi-modal information fusion method for DTA prediction as recited in claim 3, wherein: in step S1, the following formula is used to represent the position information of the character string mode:

PE _(pos,2i) ＝sin(pos/10000 ^2i/dmodel ) (1)

PE _(pos,2i+1) ＝cos(pos/10000 ^2i/dmodel ) (2)

5. The multi-modal information fusion method for DTA prediction as recited in claim 4, wherein: the transducer model comprises an MSA layer and an MLP block, and the function of the MSA layer is expressed as:

z _m ＝MSA(LN(z _l-1 ))+z _l-1 ，l＝1...L (3)

z _l ＝MLP(LN(z _m ))+z _m ，l＝1...L (4)

wherein ,z_l Representing the output of the MLP.

6. The multi-modal information fusion method for DTA prediction as recited in claim 5, wherein: the GIN model comprises five GIN layers, each GIN layer is provided with a batch normalization layer at the back, the last batch normalization layer is connected with a global maximum pool layer, and each GIN layer is used for integrating node characteristics x through a multi-layer perceptron model _i The updating is as follows:

7. The multi-modal information fusion method for DTA prediction as recited in claim 6, wherein: the GCN model comprises three GCN layers, each GCN layer is activated by RELU functions, a global maximum pool layer is connected behind the last GCN layer, and each GCN layer performs convolution operation once, wherein the convolution operation comprises the following formula:

wherein ,

adjacency matrix, which is the structure of the target,>

8. The multi-modal information fusion method for DTA prediction as recited in claim 7, wherein: in step S3, when learning the feature representation by maximizing the consistency of the string mode and the graph mode, fixing any drug or target as an anchor point a, obtaining a group of positive samples P composed of multiple modes of the drug or target itself, regarding the representations of all modes of other drugs or targets as negative samples N, generating each positive pair (a, P) and negative pair (a, N), forcing all the mode representations of the anchor point a to be consistent by adopting contrast learning, and distinguishing from all the mode representations of the negative samples, wherein the calculation of Loss is as follows: