WO2021218791A1

WO2021218791A1 - Prediction method and device for ligand-protein interaction

Info

Publication number: WO2021218791A1
Application number: PCT/CN2021/089139
Authority: WO
Inventors: 蒋华良; 郑明月; 陈立凡
Original assignee: 中国科学院上海药物研究所
Priority date: 2020-04-29
Filing date: 2021-04-23
Publication date: 2021-11-04
Also published as: CN113571124A; CN113571124B

Abstract

A prediction method and device for a ligand-protein interaction. The method comprises: processing a primary sequence of a target protein to obtain a plurality of protein feature sequences consisting of feature vectors; obtaining a plurality of atom feature sequences of a target ligand on the basis of a molecular fingerprint spectrum of the target ligand; and performing prediction by using a preset prediction model on the basis of the plurality of protein feature sequences and the plurality of atom feature sequences, and obtaining the probability of the interaction between the target protein and the target ligand. When it is necessary to predict whether a certain protein can interact with a certain ligand, only the protein feature sequences of the protein and the atom feature sequences of the ligand need to be obtained; by using the prediction model, it can be predicted which amino acid segments of the protein interact with which atoms of the ligand, thus the probability of the interaction between the protein and the ligand can be calculated.

Description

Method and device for predicting ligand-protein interaction

Technical field

The present invention relates to the field of drug screening, in particular to a method and device for predicting ligand-protein interaction.

Background technique

Virtual screening is an important work in early drug research and development. It is divided into three categories: structure-based virtual screening, ligand-based virtual screening and chemical genomics-based virtual screening. Structure-based virtual screening requires the crystal structure of the protein, and many potential target proteins have not solved the crystal structure. Therefore, structure-based virtual screening cannot solve drug screening for such targets. Ligand-based virtual screening requires more ligand information, and the number of active small molecules reported for many targets is too small to establish accurate and reliable models. In addition, virtual screening based on ligands also limits the discovery and design of active small molecules with new structures. In view of the limitations of structure-based virtual screening and ligand-based virtual screening, many machine learning methods based on chemical genomics have been proposed to predict ligand-protein interactions. The disadvantage of these methods is the need to manually define proteins and Descriptors of small molecules.

Because machine learning models need to define the descriptors of proteins and small molecules. The model cannot autonomously learn the characteristics of proteins and small molecules from the data end-to-end, and machine learning is not good at learning large samples.

In addition, the existing deep learning models did not extract the true interaction features, which caused the model to be misled by statistical laws unrelated to the task, which made it impossible to achieve good results in practical applications and could not accurately predict the ligand-protein. Interaction relationship.

Summary of the invention

The purpose of the embodiments of the present invention is to provide a method and device for predicting the ligand-protein interaction, which is used to solve the problem that the ligand-protein interaction relationship cannot be accurately predicted in the prior art.

In order to solve the above technical problems, the embodiments of the present application adopt the following technical solutions: a method for predicting ligand-protein interaction, including the following steps:

Process the primary sequence of the target protein to obtain several protein feature sequences composed of feature vectors;

Obtain several atomic characteristic sequences of the target ligand based on the molecular fingerprint of the target ligand;

Prediction is performed based on the several protein feature sequences and the several atomic feature sequences using a preset prediction model to obtain the probability of the target protein interacting with the target ligand.

Optionally, the processing of the primary sequence of the target protein to obtain several protein feature sequences composed of feature vectors specifically includes:

Dividing the primary sequence of the target protein into a plurality of sequence fragments by taking consecutive predetermined numbers of amino acids as a group;

A predetermined algorithm is used to encode each of the sequence fragments, and a number of protein feature sequences composed of feature vectors corresponding to each sequence fragment are obtained.

Optionally, the acquiring several atomic characteristic sequences of the target ligand based on the molecular fingerprint of the target ligand specifically includes:

Use the chemical information package to process the SMILES molecular formula of the target ligand to obtain the molecular fingerprint of the target ligand;

A graph convolutional network is used to process the molecular fingerprint atlas to obtain several atomic characteristic sequences of the target ligand.

Optionally, the prediction based on the several protein feature sequences and the several atomic feature sequences using a preset prediction model to obtain the probability of the target protein interacting with the target ligand specifically includes:

Using a self-attention mechanism to process the several protein feature sequences and the several atomic feature sequences to determine the target feature sequence that can interact;

A calculation is performed based on the target characteristic sequence to obtain the probability that the target protein binds to the target ligand.

Optionally, the method further includes: training to obtain the prediction model using a deep learning method, which specifically includes:

Obtain experimental data;

Determine the true value of the sample protein-sample ligand interaction based on the experimental data;

Obtain several protein characteristic sequences of the sample protein, and obtain several atomic characteristic sequences of the sample ligand;

Model training is performed based on several protein feature sequences of the sample protein, several atomic feature sequences of the sample ligand, and the true value to obtain the prediction model.

Optionally, performing model training based on several protein feature sequences of the sample protein, several atomic feature sequences of the sample ligand, and the true value to obtain the prediction model specifically includes:

Using a self-attention mechanism to process several protein feature sequences of the sample protein and several atomic feature sequences of the sample ligand to obtain several sample sequences containing interaction information;

Calculate the several sample sequences using a preset calculation formula to obtain interaction characteristics;

Use a fully connected neural network to process the interaction feature to obtain a predicted value of the sample protein-sample ligand interaction;

Calculating cross entropy based on the predicted value and the true value;

The cross entropy is used as the loss function of the prediction model, and the stochastic gradient descent method is used for training to obtain the prediction model.

In order to solve the above technical problems, the following technical solutions are adopted in the embodiments of the present application: a ligand-protein interaction prediction device, including:

The first acquisition module is used to process the primary sequence of the target protein to obtain several protein feature sequences composed of feature vectors;

The second acquisition module is used to acquire several atomic characteristic sequences of the target ligand based on the molecular fingerprint of the target ligand;

The prediction module is used to make predictions based on the several protein feature sequences and the several atomic feature sequences using a preset prediction model to obtain the probability of the target protein interacting with the target ligand

Optionally, the first obtaining module is specifically configured to:

Optionally, the second acquisition module is specifically configured to: use a chemical information package to process the SMILES molecular formula of the target ligand to obtain the molecular fingerprint of the target ligand;

Optionally, the prediction module is specifically used for:

The beneficial effect of the embodiments of the present invention is that the prediction model is obtained through pre-training, so that when it is necessary to predict whether a certain protein and a certain ligand can interact, only the characteristic sequence of each protein of the protein and the characteristic sequence of the ligand need to be obtained. Atomic characteristic sequence, by using the prediction model, it is possible to predict which protein characteristic sequence in the protein can interact with which atomic characteristic sequence in the ligand, so that the probability of interaction between the protein and the ligand can be calculated, so that the protein The prediction of the interaction with the ligand is more accurate.

Description of the drawings

Fig. 1 is a flowchart of a method for predicting ligand-protein interaction in an embodiment of the present invention.

Figure 2 is a schematic diagram of the prediction of the ligand-protein interaction in the embodiment of the method;

FIG. 3 is a specific flow chart of obtaining an interaction feature sequence in an embodiment of the present invention;

Fig. 4 is a structural block diagram of a device for predicting ligand-protein interaction in an embodiment of the present invention.

Detailed ways

Various solutions and features of the present application are described here with reference to the drawings.

It should be understood that various modifications can be made to the embodiments applied herein. Therefore, the above description should not be regarded as a limitation, but merely as an example of an embodiment. Those skilled in the art will think of other modifications within the scope and spirit of this application.

The drawings included in the specification and constituting a part of the specification illustrate the embodiments of the application, and together with the general description of the application given above and the detailed description of the embodiments given below, are used to explain the application principle.

These and other characteristics of the present application will become apparent from the following description of preferred forms of embodiments given as non-limiting examples with reference to the accompanying drawings.

It should also be understood that although the application has been described with reference to some specific examples, those skilled in the art can surely implement many other equivalent forms of the application, which have the features described in the claims and are therefore all located here. Within the limited scope of protection.

When combined with the drawings, the above and other aspects, features and advantages of the present application will become more apparent in view of the following detailed description.

Hereinafter, specific embodiments of the present application will be described with reference to the accompanying drawings; however, it should be understood that the applied embodiments are merely examples of the present application, which can be implemented in various ways. Well-known and/or repeated functions and structures have not been described in detail to avoid unnecessary or redundant details from obscuring the present application. Therefore, the specific structural and functional details applied for herein are not intended to be limiting, but merely serve as the basis and representative basis of the claims to teach those skilled in the art to use the present in a variety of ways with substantially any suitable detailed structure. Application.

This specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which can all refer to the same as in accordance with the present application. Or one or more of the different embodiments.

The embodiment of the present invention provides a method for predicting ligand-protein interaction, as shown in FIG. 1, including the following steps:

In step S101, the primary sequence of the target protein is processed to obtain several protein feature sequences composed of feature vectors.

In the specific implementation of this step, the word vector embedding method in natural language processing (word2vec) can be used to process the amino acid sequence of the protein into a sequence of feature vectors, that is, to obtain several protein feature sequences p ₁ , p ₂ , …, p _b .

Step S102: Acquire several atomic characteristic sequences of the target ligand based on the molecular fingerprint of the target ligand.

In this step, in the specific embodiment process, the chemical information package RDkit can be used to encode the graph molecular fingerprint of the target ligand, and then several atomic characteristic sequences c ₁ , c ₂ ,..., c of the target ligand can be learned through the graph convolution network. _a .

Step S103: Predict using a preset prediction model based on the several protein feature sequences and the several atomic feature sequences to obtain the probability of the target protein interacting with the target ligand.

In the specific implementation of this step, after obtaining several characteristic sequences (protein characteristic sequences) p ₁ , p ₂ ,..., p _{b of the} protein and several atomic characteristic sequences c ₁ , c ₂ ,..., c _{a of the} ligand, It can be encoded and decoded (in the prediction model) through the Transformer framework in natural language processing, and output the interacting target feature sequence x ₁ , x ₂ ,..., x _a ; then calculate based on the target feature sequence, The probability that the target protein binds to the target ligand can be obtained.

In the embodiment of the present invention, when it is necessary to predict whether a certain protein and a certain ligand can interact, only the protein characteristic sequence of the protein and the atomic characteristic sequence of the ligand need to be obtained, and the prediction model can be used to predict Find out which protein characteristic sequences can interact with which atomic characteristic sequences, so that the probability of interaction between the protein and the ligand can be calculated.

Another embodiment of the present invention provides a method for predicting ligand-protein interaction, which includes the following steps:

Step S201, dividing the primary sequence of the target protein into a number of sequence fragments by using a predetermined number of consecutive amino acids as a group; using a predetermined algorithm to encode each of the sequence fragments to obtain characteristics corresponding to each sequence fragment Several protein feature sequences composed of vectors.

In the specific embodiment of this step, the amino acid sequence of the target protein can be divided into b fragments (b=amino acid length-2) by taking three consecutive amino acids as a group, and then using the word2vec algorithm to divide the b amino acid fragments Encoded into characteristic sequences p ₁ , p ₂ ,..., p _b .

Step S202: Use a chemical information package to process the SMILES molecular formula of the target ligand to obtain a molecular fingerprint of the target ligand; use a graph convolution network to process the molecular fingerprint of the target ligand to obtain the target ligand The sequence of several atomic characteristics.

In the specific implementation of this step, the RDKit package can be used to process the SMILES formula of the molecule, and each atom encodes a 34-dimensional feature vector to obtain the graph molecular fingerprint of the small molecule; the graph molecular fingerprint is processed through the graph convolutional neural network , Get the atomic characteristic sequence c ₁ , c ₂ ,..., c _a (a=the number of non-hydrogen atoms in the molecule).

Step S203: Use a self-attention mechanism to process the several protein feature sequences and the several atomic feature sequences to determine the target feature sequence that can interact; calculate based on the target feature sequence to obtain the target protein The probability of binding to the target ligand.

In the specific implementation of this step, the predetermined calculation formula is used to calculate the several target feature sequences to obtain the interaction feature; then the fully connected neural network is used to process the interaction feature to obtain the sample protein-sample configuration The predicted value (probability) of body interaction. More specifically, after obtaining several characteristic sequences of proteins (protein characteristic sequences) p ₁ , p ₂ ,..., p _b and several atomic characteristic sequences of ligands c ₁ , c ₂ ,..., c _a , you can use natural language The Transformer frame under processing encodes and decodes, and outputs the target feature sequence x ₁ , x ₂ ,..., x _{a of the} interaction; then calculate the target feature sequence using the preset calculation formula to obtain the interaction feature; and finally By using the fully connected neural network to process the interaction feature, the probability of target protein-target ligand binding can be obtained.

This embodiment provides a method for predicting ligand-protein interaction. Before predicting the interaction between the target protein and the target ligand, the method further includes using a deep learning method to train to obtain a prediction model. The implementation includes the following steps:

Step S301, obtaining experimental data;

Step S302: Determine the true value of the sample protein-sample ligand interaction based on the experimental data;

In the specific implementation process of this step, the actual value y of the interaction can be obtained according to the actual experimental data and results. The actual value y is specifically "1" or "0", where 1 indicates that the interaction is possible, and 0 indicates that the interaction cannot be performed. effect.

Step S303: Obtain several protein characteristic sequences of the sample protein, and obtain several atomic characteristic sequences of the sample ligand;

In this step, in the process of specific embodiments, the primary sequence of the sample protein can be processed to obtain several protein feature sequences composed of feature vectors. For example, using three consecutive amino acids as a group, divide the amino acid sequence of the sample protein into b fragments (b=amino acid length-2) and then use the word vector embedding method in natural language processing (word2vec) to divide the b amino acid fragments Encoded into a set of sequences p ₁ , p ₂ ,..., p _b composed of feature vectors, this set of sequences contains several protein feature sequences, for example, p ₁ represents a protein feature sequence. Specifically, a protein with an amino acid length of 200 can be selected from the experimental data, that is, a protein characteristic sequence with a dimension of 198×100 can be obtained.

When obtaining the atomic characteristic sequence of the sample ligand in this step, it is specifically possible to obtain several atomic characteristic sequences of the sample ligand based on the molecular fingerprint of the sample ligand. More specifically, you can use the chemical information package RDkit to process the SMILES formula of the sample ligand, and each atom encodes a 34-dimensional feature vector (as shown in Table 1) to obtain the ligand's molecular fingerprint map, and then use the image volume The product network processes the molecular fingerprints to obtain several atomic characteristic sequences c ₁ , c ₂ ,..., c _a (a = the number of non-hydrogen atoms in the molecule) of the sample ligand. Specifically, a sample ligand with a non-hydrogen atom number of 20 can be selected from the experimental data, that is, an atomic characteristic sequence with a dimension of 20×64 can be obtained.

Table 1

Step S304: Perform model training based on several protein feature sequences of the sample protein, several atomic feature sequences of the sample ligand, and the true value to obtain the prediction model.

In the process of specific embodiments, this step can be specifically divided into the following steps:

Step S3041, using a self-attention mechanism to process several protein feature sequences of the sample protein and several atomic feature sequences of the sample ligand, and predict and obtain several sample sequences that can interact.

More specifically, as shown in Figure 2, the characteristic sequence of the sample protein (that is, the characteristic sequence of the protein of the sample protein), that is, p ₁ , p ₂ ,..., p _{b with a} dimension of b×100 can be input into the encoder for encoding, Output the encoded sample protein feature sequence, that is, p ₁ , p ₂ ,..., p _{b with} dimensions of b×64. Then take the atomic characteristic sequence of the sample ligand, namely c ₁ , c ₂ ,..., c _a with dimensions a×64 and (the encoded sample protein characteristic sequence) p ₁ , p ₂ ,... with dimensions b×64 ,p _{b is} input to the decoder for learning, and after the learning of the Transformer decoder, the interaction feature sequence (ie, a number of sample sequences) is finally outputted with x ₁ , x ₂ ,..., x _{a with} dimensions of a×64;

Step S2042: Calculate the several sample sequences by using a preset calculation formula to obtain interaction characteristics;

In the specific implementation of this step, the following three calculation formulas are used to calculate the interaction characteristics:

Among them, _x'i is the modulus of the vector x _i , and α _i is the weight of the vector x _i. _Xi represents the i-th interaction feature sequence, and y _interaction represents the interaction feature.

Step S3043, using a fully connected neural network to process the interaction feature to obtain a predicted value of the sample protein-sample ligand interaction;

In the present step is obtained wherein y _interaction after interaction, y _interaction can be input to a fully connected neural networks, the final output prediction value

Step S3044: Calculate cross entropy based on the predicted value and the true value;

This step is to obtain the predicted value

After calculating the predicted value

And the cross entropy of the true value y.

In step S3045, the cross entropy is used as the loss function of the prediction model, and the stochastic gradient descent method is used for training to obtain the prediction model.

In this step, the stochastic gradient descent method is used to train the model is a common model training method, which will not be repeated here.

In this embodiment, the sample protein characteristic sequence (that is, the protein characteristic sequence of the sample protein), that is, p ₁ , p ₂ ,..., p _{b with} dimensions of b×100 is input into the encoder for encoding, and the encoded sample is output When protein feature sequence, specifically use the formula in the encoder

For processing, where

Is the input of the h _{l layer,}

W ₁ , s, W ₂ , and t are learnable parameters, n is the length of the sequence, m ₁ and m ₂ are the dimensions of the input and hidden layer features, k is the size of the convolution kernel, and σ is the sigmoid function,

Is the Hadamard product of the matrix. Parameter settings: k=7, m ₁ =100 (m ₁ represents the dimension of the input layer feature), m ₂ =64 (m ₂ represents the dimension of the hidden layer feature). That is, input X=p ₁ ,p ₂ ,...,p _b ,

_{Then calculate h l} (X) = p ₁ , p ₂ ,..., p _b through a one-dimensional convolution and gated linear unit,

And update the protein characteristic sequence p ₁ , p ₂ ,..., p _b , and finally output the encoded protein characteristic sequence p ₁ , p ₂ ,..., p _b .

In this embodiment, the atomic characteristic sequence of the sample ligand (c ₁ , c ₂ ,..., c _{a with} dimensions a×64) and the encoded sample protein characteristic sequence (p ₁ with dimensions b×64, p ₂ ,..., p _b ) is input to the decoder for learning, and the interaction feature sequence (ie, a number of sample sequences) x ₁ , x ₂ ,..., x _a is output, which can be implemented in the following way, namely through the self-attention layer The calculation formula:

To calculate attention. Among them, d _k represents a scaling factor, which is the dimension of the hidden layer feature, which is 64 in this embodiment; T represents the transposition symbol of the matrix. Specifically, as shown in Figure 3, the atomic characteristic sequence of the sample ligand can be used as the self-attention layer (that is, the formula

), calculate the attention value of the atomic feature sequence, perform weighted summation and normalization calculation, at this time Q, K, V = c ₁ , c ₂ ,..., c _a . Then the calculation result is used as the input of the second layer (self-attention layer), and the characteristic sequence of the protein (protein characteristic sequence) is used as the input of the second layer. The self-attention mechanism is used to calculate the atomic characteristic sequence and the protein characteristic sequence. Attention value, weighted summation, normalization, at this time Q=c ₁ , c ₂ ,..., c _a , K=V = p ₁ , p ₂ ,..., p _b . Finally, the obtained result is used as the input of the third layer (that is, input to the convolutional neural network) for the third weighted summation and normalization calculation, so that the interactive feature sequence (ie, several sample sequences) x can be obtained ₁ ,x ₂ ,…,x _a .

In the embodiment of the present invention, the end-to-end deep learning model TransformerCPI is used to obtain the current optimal results on three public benchmark data sets. The deep learning model TransformerCPI in this embodiment obtains the current optimal results in label reversal experiments. Compared with other models, the improvement effect is very significant, which proves that the method can learn real interaction features. At the same time, because the deep learning model TransformerCPI has good interpretability, it can not only show which amino acid fragments in the protein have a high probability of binding to which atomic characteristic sequences in the ligand, but also which atoms in the ligand molecule (atomic characteristics) Sequence) has a great contribution to binding, and provides guidance and suggestions for further molecular structure modification.

Another embodiment of the present invention provides a ligand-protein interaction prediction device, as shown in Figure 4, including:

In this implementation, the first acquisition module is specifically used to: divide the primary sequence of the target protein into a number of sequence fragments using a predetermined number of consecutive amino acids as a group; Encoding is performed to obtain several protein feature sequences composed of feature vectors corresponding to each sequence fragment.

In this embodiment, the second acquisition module is specifically configured to: use a chemical information package to process the SMILES molecular formula of the target ligand to obtain the molecular fingerprint of the target ligand; The molecular fingerprint is processed to obtain several atomic characteristic sequences of the target ligand.

Specifically, the prediction module is specifically configured to: use a self-attention mechanism to process the several protein feature sequences and the several atomic feature sequences to determine the target feature sequence that can interact; based on the target feature The sequence is calculated to obtain the probability that the target protein binds to the target ligand.

This embodiment also includes a training module for training to obtain the prediction model, the training module adopts a deep learning method to train to obtain the prediction model, and the training model is used for:

Obtain experimental data;

In the specific implementation process, the training module is specifically used to:

Using a self-attention mechanism to process several protein feature sequences of the sample protein and several atomic feature sequences of the sample ligand, and predict to obtain several sample sequences that can interact;

Calculating cross entropy based on the predicted value and the true value;

In the embodiment of the present invention, not only the probability of interaction between protein and ligand can be accurately predicted, but also the specific amino acid sequence in the protein and which atom in the ligand are combined to be used for further molecular structure modification. Give guidance and suggestions.

The above embodiments are only exemplary embodiments of the present invention, and are not used to limit the present invention, and the protection scope of the present invention is defined by the claims. Those skilled in the art can make various modifications or equivalent substitutions to the present invention within the essence and protection scope of the present invention, and such modifications or equivalent substitutions should also be regarded as falling within the protection scope of the present invention.

Claims

A method for predicting ligand-protein interaction, which is characterized in that it comprises the following steps:

Process the primary sequence of the target protein to obtain several protein feature sequences composed of feature vectors;

Obtain several atomic characteristic sequences of the target ligand based on the molecular fingerprint of the target ligand;

Prediction is performed based on the several protein feature sequences and the several atomic feature sequences using a preset prediction model to obtain the probability of the target protein interacting with the target ligand.
The method according to claim 1, wherein the processing the primary sequence of the target protein to obtain several protein feature sequences composed of feature vectors specifically includes:

Dividing the primary sequence of the target protein into a plurality of sequence fragments by taking consecutive predetermined numbers of amino acids as a group;

A predetermined algorithm is used to encode each of the sequence fragments, and a number of protein feature sequences composed of feature vectors corresponding to each sequence fragment are obtained.
The method according to claim 1, wherein the obtaining several atomic characteristic sequences of the target ligand based on the molecular fingerprint of the target ligand specifically comprises:

Use the chemical information package to process the SMILES molecular formula of the target ligand to obtain the molecular fingerprint of the target ligand;

A graph convolutional network is used to process the molecular fingerprint atlas to obtain several atomic characteristic sequences of the target ligand.
The method of claim 1, wherein the prediction is performed based on the plurality of protein feature sequences and the plurality of atomic feature sequences using a preset prediction model to obtain the target protein and the target ligand The probability of interaction, including:

Using a self-attention mechanism to process the several protein feature sequences and the several atomic feature sequences to determine the target feature sequence that can interact;

A calculation is performed based on the target characteristic sequence to obtain the probability that the target protein binds to the target ligand.
The method according to claim 1, wherein the method further comprises: adopting a deep learning method to train to obtain the prediction model, which specifically comprises:

Obtain experimental data;

Determine the true value of the sample protein-sample ligand interaction based on the experimental data;

Obtain several protein characteristic sequences of the sample protein, and obtain several atomic characteristic sequences of the sample ligand;

Model training is performed based on several protein feature sequences of the sample protein, several atomic feature sequences of the sample ligand, and the true value to obtain the prediction model.
The method according to claim 5, wherein the prediction model is obtained by performing model training based on several protein feature sequences of the sample protein, several atomic feature sequences of the sample ligand, and the true value , Specifically including:

Using a self-attention mechanism to process several protein feature sequences of the sample protein and several atomic feature sequences of the sample ligand to obtain several sample sequences containing interaction information;

Calculate the several sample sequences using a preset calculation formula to obtain interaction characteristics;

Use a fully connected neural network to process the interaction feature to obtain a predicted value of the sample protein-sample ligand interaction;

Calculating cross entropy based on the predicted value and the true value;

The cross entropy is used as the loss function of the prediction model, and the stochastic gradient descent method is used for training to obtain the prediction model.
A prediction device for ligand-protein interaction, which is characterized in that it comprises:

The first acquisition module is used to process the primary sequence of the target protein to obtain several protein feature sequences composed of feature vectors;

The second acquisition module is used to acquire several atomic characteristic sequences of the target ligand based on the molecular fingerprint of the target ligand;

The prediction module is used to make predictions based on the several protein feature sequences and the several atomic feature sequences using a preset prediction model to obtain the probability of the target protein interacting with the target ligand
The device according to claim 7, wherein the first obtaining module is specifically configured to:

Dividing the primary sequence of the target protein into a plurality of sequence fragments by taking consecutive predetermined numbers of amino acids as a group;

A predetermined algorithm is used to encode each of the sequence fragments, and a number of protein feature sequences composed of feature vectors corresponding to each sequence fragment are obtained.
8. The device of claim 7, wherein the second acquisition module is specifically configured to: use a chemical information package to process the SMILES molecular formula of the target ligand to obtain the molecular fingerprint of the target ligand;

A graph convolutional network is used to process the molecular fingerprint atlas to obtain several atomic characteristic sequences of the target ligand.
The device according to claim 7, wherein the prediction module is specifically configured to:

Using a self-attention mechanism to process the several protein feature sequences and the several atomic feature sequences to determine the target feature sequence that can interact;

A calculation is performed based on the target characteristic sequence to obtain the probability that the target protein binds to the target ligand.