CN117174162A - Method for predicting protein distance map, storage medium and electronic equipment - Google Patents

Method for predicting protein distance map, storage medium and electronic equipment Download PDF

Info

Publication number
CN117174162A
CN117174162A CN202311155995.4A CN202311155995A CN117174162A CN 117174162 A CN117174162 A CN 117174162A CN 202311155995 A CN202311155995 A CN 202311155995A CN 117174162 A CN117174162 A CN 117174162A
Authority
CN
China
Prior art keywords
attention
map
module
distance
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311155995.4A
Other languages
Chinese (zh)
Inventor
黄嘉健
陈钦畅
陈广勇
陈觅
段力文
王威
唐进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311155995.4A priority Critical patent/CN117174162A/en
Publication of CN117174162A publication Critical patent/CN117174162A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

In a method, a storage medium, and an electronic device for predicting a protein distance map provided in the present specification, a first attention map is generated by confirming a protein residue sequence. The second attention profile is determined by weighting the residues satisfying the triangle inequality by the triangle attention module. And weighting the local features of the residues with the specified structural relationship by a residue mixing module. And finally, outputting the residue mixing module by using a reduction module, superposing the residue mixing module with the first attention attempt to highlight the contrast effect, and outputting a distance graph through the size transformation of the regression prediction graph. The problem that the prediction result is invalid in the single prediction of the protein residue distance is solved by determining the residues meeting the constraint of the triangle and highlighting the distance map of the specified two-dimensional structure and the super two-dimensional structure, so that the efficiency of protein synthesis is improved.

Description

Method for predicting protein distance map, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of bioinformatics, and more particularly, to a method, a storage medium, and an electronic device for predicting a protein distance map.
Background
In recent years, with the development of bioinformatics, technology for artificially synthesizing proteins has also been rapidly developed. Proteins are the major contributors to vital activities, playing a number of important biological functions, such as catalytic regulation, immune response and cell signaling. Since the biological function of a protein is determined by a specific three-dimensional space structure, accurate grasping of the protein structure is important for understanding the function of the protein. For example, it has important roles in drug development, vaccine treatment and research of biological functions.
In the prior art, the three-dimensional structure of a protein is generally obtained by predicting the distance between residues of the protein sequence. However, the protein structure predicted by such a prediction method does not satisfy the inherent characteristics of the protein structure, and therefore the predicted protein cannot be synthesized. If the distances between groups of residues are simply obtained, the distances do not satisfy the group of residues in the triangle inequality which is satisfied in the distances between the groups of residues.
Therefore, the existing method for predicting the protein structure ignores the structural characteristics of the protein and cannot accurately predict the protein structure. The distance map (distance map) of protein residues shows two-dimensional local protein structures on the basis of showing the residues of the protein which meet the triangle inequality, and more accurate and visual three-dimensional protein structures can be obtained through observation. To this end, the present specification provides a method of predicting a protein distance map.
Disclosure of Invention
The present disclosure provides a method, a storage medium and an electronic device for predicting a protein distance map, so as to partially solve the above-mentioned problems in the prior art.
The technical scheme adopted in the specification is as follows:
a method of predicting a protein distance map, comprising:
determining the residue sequence of the target protein;
inputting the residue sequence into a trained distance map prediction model, and carrying out attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
the first attention map is subjected to first preprocessing, a first preprocessing result is input into a triangular attention module of the distance map prediction model, the distance between residues is predicted, the residues with the distances meeting the triangular distance constraint relation among the residues are determined, attention weighting is carried out according to the determined residues meeting the triangular distance constraint relation, and a second attention map corresponding to the first attention map is obtained;
performing second preprocessing on the second attention map, inputting a second preprocessing result into a residual mixing module of the distance map prediction model, filtering by the residual mixing module, performing size adjustment on a filtering result according to the second attention map, and performing local feature weighting on residues with specified structural relations in the adjusted filtering result to obtain a third attention map corresponding to the second attention map;
A residual convolution module that inputs the first attention attempt and the third attention attempt to the distance map prediction model, transpose convolutions the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
and inputting the fourth attention map into a regression prediction module of the distance map prediction model, and normalizing the fourth attention map through the regression prediction module to obtain a protein distance map corresponding to the fourth attention map.
Optionally, the first attention is directed to a first preprocessing, specifically including:
a feed-forward convolution module for inputting the first attention map into the distance map prediction model, wherein the size of the first attention map is reduced through a pooling layer of the feed-forward convolution module;
and adjusting the dimension of the reduced result according to the dimension of the input attention map of the triangular attention module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a first preprocessing result.
Optionally, the second attention is subjected to a second preprocessing, specifically including:
the third attention is input into a feedforward mixing module of the distance graph prediction model, and the third attention is compressed through a lightweight layer of the feedforward mixing module so as to facilitate local feature weighting of a residual mixing module of the distance graph prediction model;
reducing the size of the third attention map after compression by a pooling layer of the feed forward mixing module;
and adjusting the dimension of the reduced result according to the dimension of the input attention map of the residual error mixing module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a preprocessing result.
Optionally, filtering by the residual mixing module, and resizing the filtering result according to the second attention map specifically includes:
reducing the dimension of the second preprocessing result and increasing the size of the second preprocessing result through the transposition layer of the residual mixing module;
adjusting, by a filler layer of the residual mixing module, a transpose layer processing result size according to the transpose layer processing result and the second attention attempt difference until the transpose layer processing result size is the same as the second attention attempt size;
And superposing the adjustment result with the third attention map through the superposition layer of the residual mixing module to serve as an adjusted filtering result.
Optionally, a residual convolution module for inputting the first attention map and the third attention map into the distance map prediction model, transpose convolutions the third attention map by the residual convolution module, resizes a filtering result according to the first attention map, and superimposes the adjusting result with the first attention map to obtain a fourth attention map, which specifically includes:
a residual convolution module for inputting the third attention map into the distance map prediction model, wherein the dimension of the third attention map is adjusted according to the dimension of the first attention map through a transposition layer of the residual convolution module;
adjusting the dimension adjustment result according to the dimension difference between the dimension adjustment result and the first attention map and the dimension of the first attention map through the filling layer of the residual convolution module;
superposing a sizing result with the first attention map by an superposing layer of the residual convolution module;
and convolving the superposition result through a convolution layer of the residual convolution module, and determining the convolution result as a fourth attention map corresponding to the third attention map.
Optionally, before attempting to perform the first preprocessing on the first attention, the method further includes:
a group pooling module for inputting the first attention map into the distance map prediction model, wherein the group pooling module divides the first attention map into a preset number of sub-attention maps according to a preset number;
extracting the maximum value in each sub-attention force diagram, and determining a sub-feature diagram corresponding to each sub-attention force diagram;
and splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
Optionally, obtaining a protein distance map corresponding to the fourth attention map, the method further includes:
and inputting the regression prediction module result into a size adjustment module of the distance map prediction model, and integrally scaling the attention map value of the regression prediction module result through the size adjustment module to obtain a protein residue distance map convenient to observe.
The specification provides a model training method for predicting a protein distance map, comprising the following steps:
determining the residue sequence of the target protein;
inputting the residue sequence into a trained distance map prediction model, and carrying out attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
The first attention map is subjected to first preprocessing, a first preprocessing result is input into a triangular attention module of the distance map prediction model, the distance between residues is predicted, the residues with the distances meeting the triangular distance constraint relation among the residues are determined, attention weighting is carried out according to the determined residues meeting the triangular distance constraint relation, and a second attention map corresponding to the first attention map is obtained;
performing second preprocessing on the second attention map, inputting a second preprocessing result into a residual mixing module of the distance map prediction model, filtering by the residual mixing module, performing size adjustment on a filtering result according to the second attention map, and performing local feature weighting on residues with specified structural relations in the adjusted filtering result to obtain a third attention map corresponding to the second attention map;
a residual convolution module that inputs the first attention attempt and the third attention attempt to the distance map prediction model, transpose convolutions the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
The fourth attention map is input into a regression prediction module of the distance map prediction model, and the protein distance map corresponding to the fourth attention map is obtained through normalization of the fourth attention map by the regression prediction module;
performing supervised learning on a regression prediction module of the distance map prediction model by using a label of a protein distance map training set, and updating model parameters by adopting a preset optimization algorithm according to a calculated loss value;
iterating the training method and determining the current iteration times; and stopping iteration if the current iteration times reach preset times to obtain a target model.
Optionally, before the first attention is directed to the first preprocessing, the training method further includes:
a group pooling module for inputting the first attention map into the distance map prediction model, wherein the group pooling module divides the first attention map into a preset number of sub-attention maps according to a preset number;
extracting the maximum value in each sub-attention force diagram, and determining a sub-feature diagram corresponding to each sub-attention force diagram;
and splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
The present specification provides an apparatus for predicting a protein distance map, comprising:
A sequence acquisition module for determining the residue sequence of the target protein;
the sequence confirming module inputs the residue sequence into a trained distance map prediction model, and performs attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
the triangle attention module is used for carrying out first preprocessing on the first attention map, inputting a first preprocessing result into the triangle attention module of the distance map prediction model, predicting the distance between residues, determining each residue of which the distance between residues meets a triangle distance constraint relation, carrying out attention weighting according to the determined residues meeting the triangle distance constraint relation, and obtaining a second attention map corresponding to the first attention map;
the local structure attention module is used for carrying out second preprocessing on the second attention force diagram, inputting a second preprocessing result into a residual mixing module of the distance diagram prediction model, filtering through the residual mixing module, carrying out size adjustment on a filtering result according to the second attention force diagram, and carrying out local feature weighting on residues with specified structural relation in the adjusted filtering result to obtain a third attention force diagram corresponding to the second attention force diagram;
An attention attempt restoration module, a residual convolution module for inputting the first attention attempt and the third attention attempt into the distance map prediction model, a transpose convolution for the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
and the distance map output module is used for inputting the fourth attention map into the regression prediction module of the distance map prediction model, and normalizing the fourth attention map through the regression prediction module to obtain a protein distance map corresponding to the fourth attention map.
The present specification provides an apparatus for predicting a protein distance map training module, comprising:
the sequence acquisition training module is used for determining the residue sequence of the target protein;
the sequence confirmation training module inputs the residue sequence into a trained distance map prediction model, and performs attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
The triangular attention training module is used for carrying out first preprocessing on the first attention map, inputting a first preprocessing result into the triangular attention module of the distance map prediction model, predicting the distance between residues, determining the residues with the distance conforming to the triangular distance constraint relation, and carrying out attention weighting according to the determined residues meeting the triangular distance constraint relation to obtain a second attention map corresponding to the first attention map;
the local structure attention training module is used for carrying out second preprocessing on the second attention force diagram, inputting a second preprocessing result into a residual mixing module of the distance diagram prediction model, filtering through the residual mixing module, carrying out size adjustment on a filtering result according to the second attention force diagram, and carrying out local feature weighting on residues with specified structural relation in the adjusted filtering result to obtain a third attention force diagram corresponding to the second attention force diagram;
an attention attempt recovery training module, a residual convolution module inputting the first attention attempt and the third attention attempt into the distance map prediction model, a transpose convolution for the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
The distance map output training module is used for inputting the fourth attention map into the regression prediction module of the distance map prediction model, and normalizing the fourth attention map through the regression prediction module to obtain a protein distance map corresponding to the fourth attention map;
the parameter updating training module is used for performing supervised learning on the regression prediction module of the distance map prediction model by using the label of the protein distance map training set, and updating model parameters by adopting a preset optimization algorithm according to the calculated loss value;
the iteration training module is used for iterating the training method and determining the current iteration times; and stopping iteration if the current iteration times reach preset times to obtain a target model.
The present description provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a method.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of predicting a protein distance map when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
In one method of predicting a protein distance map provided herein, a first attention map is generated by confirming a sequence of protein residues. The second attention profile is determined by weighting the residues satisfying the triangle inequality by the triangle attention module. And weighting the local features of the residues with the specified structural relationship by a residue mixing module. And finally, outputting the residue mixing module by using a reduction module, superposing the residue mixing module with the first attention attempt to highlight the contrast effect, and outputting a distance graph through the size transformation of the regression prediction graph.
As can be seen from the above method, residues in the protein distance map that satisfy the triangle inequality are emphasized to help the relevant professional determine whether the predicted protein can be synthesized. The residues satisfying the specified structure in the protein distance map are emphasized, the helix structure of the protein is known from the two-dimensional structure of the protein to help the relevant professional and the specific functional implementation of the protein is inferred from the super-two-dimensional structure to help the relevant professional. The problem that the prediction result is invalid when the protein residue distance is predicted independently is solved, so that the efficiency of protein synthesis is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a schematic flow chart of a method for predicting a protein distance map provided in the present specification;
FIG. 2 is a schematic diagram of a feedforward convolution module of a predicted protein distance map according to the present disclosure;
FIG. 3 is a flow chart of an example of a method for predicting a protein distance map provided herein;
FIG. 4 is a schematic diagram of a feed-forward hybrid module of a predicted protein distance map provided in the present disclosure;
FIG. 5 is a schematic diagram of a residual mixing module of a predicted protein distance map provided in the present specification;
FIG. 6 is a schematic diagram of a residual convolution module of a predicted protein distance map provided in the present disclosure;
FIG. 7 is a schematic diagram of a model training process for predicting protein distance maps provided in the present disclosure;
FIG. 8 is a schematic flow chart of an apparatus for predicting protein distance map provided in the present specification;
FIG. 9 is a schematic flow chart of an apparatus for model training of a predicted protein distance map provided in the present specification;
fig. 10 is a schematic view of an electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
At present, when predicting the three-dimensional structure of a protein, the prediction method does not meet the inherent characteristics of the protein structure, so that the predicted protein cannot be synthesized. If the distances between groups of residues are simply obtained, the distances do not satisfy the group of residues in the triangle inequality which is satisfied in the distances between the groups of residues. The set of inter-residue distances are predictable, but the predicted protein should at least be characterized by the ability to synthesize and have a local structure. The synthesized protein belongs to a three-dimensional structure, so that at least one group of residues of the protein can meet the triangle inequality, and the protein can be synthesized normally. On the basis of meeting the triangle inequality, the synthesized protein has a local structure, so that the residue function of the predicted protein can be better reflected.
Therefore, the existing method for predicting the protein structure ignores the structural characteristics of the protein and cannot accurately predict the protein structure.
The protein distance map predicted by the application can be analyzed and solved for the problems. The protein distance map is a two-dimensional matrix, each value representing a residue-to-residue distance, which contains rich structural information including secondary structure, inter-residue distance, and type of interaction associated with a particular residue type. The distance map helps to assess the feasibility of generating tertiary structure for a given protein sequence in a protein design. The high quality of the distance map allows the accuracy of the two-stage protein structure prediction algorithm to approach that of a single-stage algorithm, while the former requires far less computational resources than the latter. In the excavation of a target protein, candidate active sites or candidate proteins are screened by identifying whether the candidate active sites are on a specific secondary structure and whether the corresponding distance relationships are satisfied. Protein distance maps are data materials used by protein domain segmentation algorithms that follow the basic principle of "making contact most in the domain of a protein and contact least in the domain".
Thus, the present specification provides a method of predicting a protein distance map to solve the above-mentioned problems.
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present application based on the embodiments herein.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a predicted protein distance map provided in the present specification, specifically including the following steps:
s101: determining the residue sequence of the target protein.
In the examples herein, the predicted protein structure is usually predicted by selecting the protein residue sequence as the basis data, because the arrangement of the protein residue sequence has an important influence on the protein function.
In particular, the protein distance map predicted in this specification may be predicted using a computer or a computational set of convolution calculations, and the specification is not limited in this regard. For convenience of description, this specification will be referred to as a server.
In the present specification, the server can determine a protein for which a distance map prediction is required as a target protein, and acquire a residue sequence of the target protein. The server may determine the target protein residue sequence by manually compiling and inputting, for example, obtaining from an existing protein residue sequence library (such as UniRef50, uniRef90, and UniRef100, etc.), and the specific obtaining method is not limited in this specification.
For example, taking dataset data_15051 as an example, dataset data_15051 is composed of 15051 protein sequences. In the present embodiment, the server predicts with a protein sequence in dataset 15051, which is 510 in length.
S103: inputting the residue sequence into a trained distance map prediction model, and carrying out attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence.
In the examples herein, the selected protein residue sequences need to be extracted and analyzed as an attention attempt to analyze the proteins constituted by the selected sequences.
In particular, there are many ways for the server to extract the selected protein sequence from the dataset and process it as an attention attempt, including the ESM (Self-supervised protein language model) series, as the application is not limited in this regard,
For example, the server uses ESM-2 to extract an attention map of the selected protein sequence from the dataset data_15051. The ESM-2 is made up of several encoder-decoder layers, which are common transducer model structures that output a first attention diagram with dimensions {1440,510,510 }.
S105: and carrying out first preprocessing on the first attention map, inputting a first preprocessing result into a triangular attention module of the distance map prediction model, predicting the distance between residues, determining residues with the distance meeting a triangular distance constraint relation, and carrying out attention weighting according to the determined residues meeting the triangular distance constraint relation to obtain a second attention map corresponding to the first attention map.
In the present embodiment, in order to facilitate determination of whether or not a predicted protein can be synthesized in the distance map, the residue satisfying the relationship of the triangular distance constraint is enhanced by using an existing triangular attention model for synthesizing a protein, and is represented in the second attention map. However, there are triangular attention models for synthesizing proteins that require the input attention to be of a dimensional size, so the first attention is first preprocessed to meet the requirement.
Specifically, the server inputs the first attention map to a feedforward convolution module of the distance map prediction model, reduces the size of the first attention map through the feedforward convolution module, adjusts the dimension of a reduced result according to the dimension of the triangle attention map input attention map of the distance map prediction model, and determines the adjustment result as a first preprocessing result.
For example, the server runs a feed forward convolution module as shown in FIG. 2: is formed by connecting a maximum pooling layer and a double-layer convolution neural network in series. The window size of the max pooling layer is 2 x 2 as shown in fig. 3, its input is the first attention map with dimension 36,510,510 and its output is the attention map with dimension 36,255,255. The double-layer convolutional neural network is formed by connecting two convolutional layers in series, wherein each convolutional layer is formed by connecting a filter with a convolutional kernel of 3 multiplied by 3, an example normalization layer and a ReLU activation function in series. The output of the first convolution layer is an attention diagram of dimension 36,255,255. The output of the second convolution layer is the attention map of dimension 48,255,255. The feedforward convolution module output attention is input to the triangular attention module and the output is the second attention is of dimension 48,255,255.
S107: and performing second preprocessing on the second attention map, inputting a second preprocessing result into a residual mixing module of the distance map prediction model, filtering by the residual mixing module, performing size adjustment on a filtering result according to the second attention map, and performing local feature weighting on residues with specified structural relations in the adjusted filtering result to obtain a third attention map corresponding to the second attention map.
In the examples herein, in order to facilitate the analysis of predicted proteins in distance maps to perform functions, the specific gravity of protein residues satisfying a specified structure was enhanced by the prior art, and is shown in a third attention drawing. However, there are local lightweight models that attempt to have a dimensional requirement for input attention, so the second attention is preprocessed to meet the requirement.
Specifically, the server inputs the third attention map to a feedforward mixing module of the distance map prediction model, and compresses the third attention map through the feedforward mixing module so as to facilitate local feature weighting by a residual mixing module of the distance map prediction model. The compressed third attention map is downsized. And according to the dimension of the input attention diagram of the residual mixing module of the distance diagram prediction model, adjusting the dimension of the reduced result, and determining the adjustment result as a preprocessing result.
Specifically, the server inputs the third attention map into a residual convolution module of the distance map prediction model, and adjusts the dimension of the third attention map according to the dimension of the first attention map through the residual convolution module. And adjusting the dimension adjustment result according to the dimension difference between the dimension adjustment result and the first attention drawing and the dimension of the first attention drawing. The resizing result is superimposed with the first attention attempt. And convolving the superposition result, and determining the convolution result as a fourth attention attempt corresponding to the third attention attempt.
For example, the server runs a feed-forward hybrid module, as shown in fig. 4, consisting of a local lightweight module, a max-pooling layer, and a two-layer convolutional neural network in series. The output of the local lightweight block is an attention map with dimension 48,255,255. The window of the max pooling layer is 2 x 2 and its output is the attention map with dimension 48,127,127 as shown in fig. 3. The double-layer convolution network is formed by connecting two convolution layers in series, wherein each convolution layer is formed by a filter with a convolution kernel of 3 multiplied by 3, an example normalization layer and a ReLU activation function. The output of the first convolution layer is the attention map of dimension 48,127,127 and the output of the second convolution layer is the attention map of dimension 128,127,127.
For example, the server runs a residual mixing module, as shown in fig. 5, which is composed of a transposed convolutional layer, a filling module, a superposition module, a local light weight module, and a double-layer convolutional neural network in series. It performs the following procedure:
the server runs a transposed convolutional layer consisting of a 2 x 2 transposed filter with its input being the output of the feed forward mixing block, and the transposed filter output of the transposed convolutional layer is an attention diagram with dimensions 48,254,254, as shown in fig. 3. The filling module calculates the size difference between the transposed filter output and the triangular attention module output, and adds 1 to the dimension of the second and third dimensions of the transposed filter output by filling 0, and finally outputs an attention map with the dimension {48,255,255 }. The superposition module adds the padding module output to the triangular attention module output to obtain an attention map with dimensions {48,255,255 }. The local lightweight module outputs an attention map with dimension {48,255,255}, which will be the input to the two-layer convolutional neural network. The double-layer convolutional neural network is formed by connecting two convolutional layers in series, wherein each convolutional layer is formed by connecting a filter with a convolutional kernel of 3 multiplied by 3, an example normalization layer and a ReLU activation function in series. The output of the first convolution layer is the attention map of dimension 48,255,255 and the output of the second convolution layer is the attention map of dimension 48,255,255, which will be the input to the residual convolution module.
S109: a residual convolution module that inputs the first attention attempt and the third attention attempt to the distance map prediction model, transpose convolutions the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, and superimposes the adjustment result with the first attention attempt, and obtaining a fourth attention force diagram corresponding to the third attention force diagram.
In the embodiment of the present specification, since the attention map is transformed to the dimension required by using the triangular attention module and the local light weight module, the third attention map form is directly adopted and the transformation is easy to calculate, but is not easy to analyze and compare, so the third attention is focused on the reduction, the contrast effect of the fourth attention profile is enhanced to facilitate observation analysis using the first attention profile to align and superimpose the third attention profile.
Specifically, the server inputs the third attention map into a residual convolution module of the distance map prediction model, and adjusts the dimension of the third attention map according to the dimension of the first attention map through the residual convolution module. And adjusting the dimension adjustment result according to the dimension difference between the dimension adjustment result and the first attention drawing and the dimension of the first attention drawing. The resizing result is superimposed with the first attention attempt. And convolving the superposition result, and determining the convolution result as a fourth attention attempt corresponding to the third attention attempt.
For example, as shown in fig. 6, the server running residual convolution module is formed by connecting a transpose convolution layer, a filling module, a superposition module and a double-layer convolution neural network in series, and performs the following procedures:
the server running transpose convolution layer consists of a transpose convolution with a convolution kernel of 2 x 2, whose output is the attention of dimension 36,510,510 as shown in fig. 3, intended as input to the padding module. The filling module calculates a third attention profile that is different from the first attention profile in the second and third dimensions and directly outputs the dimension resized third attention profile. The superposition module adds the filler module output result to the first attention map to obtain an attention map with dimension {36,510,510} which will be the input to the two-layer convolutional neural network. The double-layer convolutional neural network is formed by connecting two convolutional layers in series, wherein each convolutional layer is composed of a filter with a convolutional kernel of 3 multiplied by 3, an example normalization layer and a ReLU activation function. The output of the first convolution layer is the attention map of dimension 36,510,510 and the output of the second convolution layer is the fourth attention map of dimension 36,510,510.
S111: and inputting the fourth attention map into a regression prediction module of the distance map prediction model, and normalizing the fourth attention map through the regression prediction module to obtain a protein distance map corresponding to the fourth attention map.
In the present embodiment, a single attention is used in analyzing observations in an attempt to facilitate observations, so the fourth attention is normalized. Specifically, the server normalizes the fourth attention map.
For example, the server running regression prediction layer consists of a 1×1 filter with a convolution kernel, whose output is a distance map of protein residues with dimensions {1,510,510}, as shown in fig. 3.
The predicted protein distance map scheme based on FIG. 1 can achieve: the server represents the residues in the protein distance map that satisfy the triangle inequality to help the relevant professionals determine whether the predicted protein can be synthesized. Residues satisfying a specified structure in the protein distance map are emphasized, and the two-dimensional structure of the protein helps related professionals to know the spiral structure of the protein, so as to help related professionals infer specific functional implementation of the protein. The problem that the prediction result is invalid due to the independent prediction of the protein residue distance is solved, so that the efficiency of protein synthesis is greatly improved.
Optionally, the server performs a first preprocessing on the first attention map in step S105, inputs the first attention map into a feedforward convolution module of the distance map prediction model, and reduces a size of the first attention map through a pooling layer of the feedforward convolution module. And adjusting the dimension of the reduced result according to the dimension of the input attention map of the triangular attention module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a first preprocessing result.
Optionally, the server performs a second preprocessing on the second attention map in step S107, inputs the third attention map into a feedforward mixing module of the distance map prediction model, and compresses the third attention map through a lightweight layer of the feedforward mixing module so as to facilitate local feature weighting by a residual mixing module of the distance map prediction model. The third attention map after compression is downsized by a pooling layer of the feed forward mixing module. And adjusting the dimension of the reduced result according to the dimension of the input attention map of the residual error mixing module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a preprocessing result.
Optionally, the server performs step S109 to include inputting the third attention map into a residual convolution module of the distance map prediction model, and adjusting the dimension of the third attention map according to the dimension of the first attention map through a transpose layer of the residual convolution module. And adjusting the dimension adjustment result according to the dimension of the first attention map by the filling layer of the residual convolution module according to the dimension difference between the dimension adjustment result and the first attention map. And convolving the superposition result through a convolution layer of the residual convolution module, and determining the convolution result as a fourth attention map corresponding to the third attention map.
Optionally, the server inputs the first attention attempt to a group pooling module of the distance map prediction model before performing the first preprocessing on the first attention attempt in step S105, and divides the first attention attempt into a preset number of sub-attention attempts according to a preset number through the group pooling module; extracting the maximum value in each sub-attention force diagram, and determining a sub-feature diagram corresponding to each sub-attention force diagram; and splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
Optionally, in step S111, the server inputs the result of the regression prediction module into a size adjustment module of the distance map prediction model, and integrally scales the attention map value of the result of the regression prediction module through the size adjustment module to obtain a protein residue distance map convenient for observation.
It is additionally noted here that the protein distance map is not equal to the protein contact map, and that the protein distance map inherently has an exceptionally superior effect over the protein contact map in "estimating the feasibility of generating tertiary structure for a given protein sequence in a protein design" and "screening candidate active sites or candidate proteins" application direction. Second, the prediction algorithm according to the existing contact map does not predict simultaneously the protein two-dimensional structure from the triangle inequality distance constraint, resulting in an affected accuracy of the protein structure predicted based on the protein contact map.
The present disclosure also provides a model training method corresponding to the predicted protein distance map flowchart of fig. 1, as shown in fig. 7:
s201: determining the residue sequence of the target protein;
s203: inputting the residue sequence into a trained distance map prediction model, and carrying out attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
s205: the first attention map is subjected to first preprocessing, a first preprocessing result is input into a triangular attention module of the distance map prediction model, the distance between residues is predicted, the residues with the distances meeting the triangular distance constraint relation among the residues are determined, attention weighting is carried out according to the determined residues meeting the triangular distance constraint relation, and a second attention map corresponding to the first attention map is obtained;
s207: performing second preprocessing on the second attention map, inputting a second preprocessing result into a residual mixing module of the distance map prediction model, filtering by the residual mixing module, performing size adjustment on a filtering result according to the second attention map, and performing local feature weighting on residues with specified structural relations in the adjusted filtering result to obtain a third attention map corresponding to the second attention map;
S209: a residual convolution module that inputs the first attention attempt and the third attention attempt to the distance map prediction model, transpose convolutions the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
s211: the fourth attention map is input into a regression prediction module of the distance map prediction model, and the protein distance map corresponding to the fourth attention map is obtained through normalization of the fourth attention map by the regression prediction module;
s213: performing supervised learning on a regression prediction module of the distance map prediction model by using a label of a protein distance map training set, and updating model parameters by adopting a preset optimization algorithm according to a calculated loss value;
s215: iterating the training method and determining the current iteration times; and stopping iteration if the current iteration times reach preset times to obtain a target model.
Optionally, before the first attention attempt is subjected to the first preprocessing in step S205, the first attention attempt is input into a group pooling module of the distance map prediction model, and the first attention attempt is divided into a preset number of sub-attention attempts according to a preset number by the group pooling module; extracting the maximum value in each sub-attention force diagram, and determining a sub-feature diagram corresponding to each sub-attention force diagram; and splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
Optionally, before the first attention is subjected to the first preprocessing in step S205, the first attention is input to the pooling module of the distance map prediction model, and in any step before the supervised learning of the regression prediction module of the distance map prediction model by using the label of the protein distance map training set in step S213, the supervised learning of the pooling module of the distance map prediction model is performed by using the label of the protein distance map training set: the first attention map is input into a group pooling module of the distance map prediction model, and the first attention map is divided into a preset number of sub-attention maps according to the preset number through the group pooling module. And extracting the maximum value in each sub-attention force diagram, and determining a corresponding sub-feature diagram of each sub-attention force diagram. And splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
The present disclosure also provides an apparatus corresponding to the predicted protein distance map flowchart of fig. 1, as shown in fig. 8:
a sequence acquisition module 301 that determines a residue sequence of a target protein;
the sequence confirming module 303 inputs the residue sequence into a trained distance map prediction model, and performs attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
The triangle attention module 305 performs a first preprocessing on the first attention map, inputs a first preprocessing result into the triangle attention module of the distance map prediction model, predicts the distance between residues, determines each residue whose distance meets a triangle distance constraint relationship, and performs attention weighting according to the determined residues meeting the triangle distance constraint relationship to obtain a second attention map corresponding to the first attention map;
a local structure attention module 307, which performs a second preprocessing on the second attention map, inputs a second preprocessing result into a residual mixing module of the distance map prediction model, performs filtering by the residual mixing module, performs size adjustment on a filtering result according to the second attention map, and performs local feature weighting on residues having a specified structural relationship in the adjusted filtering result to obtain a third attention map corresponding to the second attention map;
a restoration module 309, configured to input the first attention map and the third attention map into a residual convolution module of the distance map prediction model, transpose the third attention map by the residual convolution module, resize a filtering result according to the first attention map, and superimpose the adjustment result with the first attention map to obtain a fourth attention map corresponding to the third attention map;
The output module 311 inputs the fourth attention map to a regression prediction module of the distance map prediction model, and normalizes the fourth attention map by the regression prediction module to obtain a protein distance map corresponding to the fourth attention map.
Optionally, a triangular attention module 305 is configured to input the first attention map into a feedforward convolution module of the distance map prediction model, and reduce the size of the first attention map by a pooling layer of the feedforward convolution module. And adjusting the dimension of the reduced result according to the dimension of the input attention map of the triangular attention module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a first preprocessing result.
Optionally, the local structure attention module 307 is configured to input the third attention map into a feedforward mixing module of the distance map prediction model, and compress the third attention map through a lightweight layer of the feedforward mixing module, so as to facilitate local feature weighting by a residual mixing module of the distance map prediction model. The third attention map after compression is downsized by a pooling layer of the feed forward mixing module. And adjusting the dimension of the reduced result according to the dimension of the input attention map of the residual error mixing module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a preprocessing result.
Optionally, the local structure attention module 307 is configured to reduce the dimension of the second preprocessing result and increase the size of the second preprocessing result by using the transpose layer of the residual mixing module. And adjusting the size of the processing result of the transpose layer until the size is the same as the second attention try size according to the difference between the processing result of the transpose layer and the second attention try. And superposing the adjustment result with the third attention map through the superposition layer of the residual mixing module to serve as an adjusted filtering result.
Optionally, the restoration module 309 is configured to input the third attention map into a residual convolution module of the distance map prediction model, and adjust, by using a transpose layer of the residual convolution module, a dimension of the third attention map according to a dimension of the first attention map. And adjusting the dimension adjustment result according to the dimension of the first attention map by the filling layer of the residual convolution module according to the dimension difference between the dimension adjustment result and the first attention map. The resizing result is superimposed with the first attention map by an superimposing layer of the residual convolution module. And convolving the superposition result through a convolution layer of the residual convolution module, and determining the convolution result as a fourth attention map corresponding to the third attention map.
Optionally, the sequence confirming module 303 is configured to input the first attention attempt to a group pooling module of the distance map prediction model, and divide the first attention attempt into a preset number of sub-attention attempts according to a preset number by the group pooling module. And extracting the maximum value in each sub-attention force diagram, and determining a corresponding sub-feature diagram of each sub-attention force diagram. And splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
Optionally, the output module 311 is configured to input the result of the regression prediction module into a size adjustment module of the distance map prediction model, and integrally scale the attention map value of the result of the regression prediction module through the size adjustment module to obtain a protein residue distance map that is convenient to observe.
The present disclosure also provides an apparatus corresponding to the predicted protein distance map model training flowchart of fig. 7, as shown in fig. 9:
a sequence acquisition training module 401 that determines the residue sequence of the target protein;
the sequence confirmation training module 403 inputs the residue sequence into a trained distance map prediction model, and performs attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
The triangle attention training module 405 performs a first preprocessing on the first attention map, inputs a first preprocessing result into a triangle attention module of the distance map prediction model, predicts distances between residues, determines residues with distances conforming to a triangle distance constraint relationship, and performs attention weighting according to the determined residues satisfying the triangle distance constraint relationship to obtain a second attention map corresponding to the first attention map;
the local structure attention training module 407 performs a second preprocessing on the second attention map, inputs a second preprocessing result into a residual mixing module of the distance map prediction model, performs filtering by the residual mixing module, performs size adjustment on a filtering result according to the second attention map, and performs local feature weighting on residues with a specified structural relationship in the adjusted filtering result to obtain a third attention map corresponding to the second attention map;
an attention attempt recovery training module 409, a residual convolution module for inputting the first attention attempt and the third attention attempt into the distance map prediction model, and performing transpose convolution on the third attention attempt by the residual convolution module, and performing size adjustment on a filtering result according to the first attention attempt, and overlapping the adjustment result with the first attention attempt to obtain a fourth attention attempt corresponding to the third attention attempt;
The distance map output training module 411 inputs the fourth attention map into a regression prediction module of the distance map prediction model, normalizes the fourth attention map by the regression prediction module, and obtains a protein distance map corresponding to the fourth attention map;
the parameter updating training module 413 is used for performing supervised learning on the regression prediction module of the distance map prediction model by using the label of the protein distance map training set, and updating model parameters by adopting a preset optimization algorithm according to the calculated loss value;
the iteration training module 415 iterates the training method and determines the current iteration number; and stopping iteration if the current iteration times reach preset times to obtain a target model.
Optionally, the triangle attention training module 405 is configured to input the first attention attempt to a group pooling module of the distance map prediction model, where the first attention attempt is divided into a preset number of sub-attention attempts according to a preset number by the group pooling module. And extracting the maximum value in each sub-attention force diagram, and determining a corresponding sub-feature diagram of each sub-attention force diagram. And splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
Optionally, the triangular attention training module 405 is configured to input the first attention map into a feedforward convolution module of the distance map prediction model, reduce a size of the first attention map through a pooling layer of the feedforward convolution module, adjust a dimension of a reduced result according to a dimension of the triangular attention map input by the triangular attention module of the distance map prediction model through a convolution layer of the feedforward convolution module, and determine an adjustment result as a first preprocessing result.
Optionally, the local structure attention training module 407 is configured to compress the third attention map by using a light-weight layer of the feedforward mixing module, so that the residual mixing module of the distance map prediction model performs local feature weighting, reduce the size of the compressed third attention map by using a pooling layer of the feedforward mixing module, input the dimension of the attention map by using a convolution layer of the feedforward convolution module, adjust the dimension of the reduced result according to the residual mixing module of the distance map prediction model, and determine the adjustment result as the preprocessing result.
Optionally, the local structure attention training module 407 is configured to reduce the dimension of the second preprocessing result and increase the size of the second preprocessing result through the transpose layer of the residual mixing module, adjust the size of the transpose layer processing result according to the difference between the transpose layer processing result and the second attention map until the size is the same as the second attention map through the filling layer of the residual mixing module, and superimpose the adjustment result and the third attention map through the superimposing layer of the residual mixing module as the adjusted filtering result.
Optionally, the attention attempt recovery training module 409 is configured to input the third attention attempt to the residual convolution module of the distance map prediction model, adjust the dimension of the third attention attempt according to the dimension of the first attention attempt through the transpose layer of the residual convolution module, adjust the dimension adjustment result according to the dimension difference between the dimension adjustment result and the first attention attempt through the fill layer of the residual convolution module, adjust the dimension adjustment result according to the dimension of the first attention attempt, overlap the dimension adjustment result and the first attention attempt through the overlap layer of the residual convolution module, convolve the overlap result through the convolution layer of the residual convolution module, and determine the convolution result as a fourth attention attempt corresponding to the third attention attempt.
Optionally, the distance map output training module 411 is configured to input the result of the regression prediction module into a size adjustment module of the distance map prediction model, and integrally scale the attention map value of the result of the regression prediction module through the size adjustment module to obtain a protein residue distance map that is convenient to observe.
The present specification also provides a computer readable storage medium storing a computer program operable to perform a method of predicting a protein distance map as described above.
The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 10. At the hardware level, as shown in fig. 10, the electronic device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile storage, and may include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the method for detecting the abnormality of the graphics processor described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims (13)

1. A method of predicting a protein distance map, comprising:
determining the residue sequence of the target protein;
inputting the residue sequence into a trained distance map prediction model, and carrying out attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
the first attention map is subjected to first preprocessing, a first preprocessing result is input into a triangular attention module of the distance map prediction model, the distance between residues is predicted, the residues with the distances meeting the triangular distance constraint relation among the residues are determined, attention weighting is carried out according to the determined residues meeting the triangular distance constraint relation, and a second attention map corresponding to the first attention map is obtained;
Performing second preprocessing on the second attention map, inputting a second preprocessing result into a residual mixing module of the distance map prediction model, filtering by the residual mixing module, performing size adjustment on a filtering result according to the second attention map, and performing local feature weighting on residues with specified structural relations in the adjusted filtering result to obtain a third attention map corresponding to the second attention map;
a residual convolution module that inputs the first attention attempt and the third attention attempt to the distance map prediction model, transpose convolutions the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
and inputting the fourth attention map into a regression prediction module of the distance map prediction model, and normalizing the fourth attention map through the regression prediction module to obtain a protein distance map corresponding to the fourth attention map.
2. The method of claim 1, wherein the first attention attempt is subjected to a first preprocessing, comprising:
A feed-forward convolution module for inputting the first attention map into the distance map prediction model, wherein the size of the first attention map is reduced through a pooling layer of the feed-forward convolution module;
and adjusting the dimension of the reduced result according to the dimension of the input attention map of the triangular attention module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a first preprocessing result.
3. The method of claim 1, wherein the second attention attempt is subjected to a second preprocessing, comprising in particular:
the third attention is input into a feedforward mixing module of the distance graph prediction model, and the third attention is compressed through a lightweight layer of the feedforward mixing module so as to facilitate local feature weighting of a residual mixing module of the distance graph prediction model;
reducing the size of the third attention map after compression by a pooling layer of the feed forward mixing module;
and adjusting the dimension of the reduced result according to the dimension of the input attention map of the residual error mixing module of the distance map prediction model by the convolution layer of the feedforward convolution module, and determining the adjustment result as a preprocessing result.
4. The method of claim 1, wherein the filtering by the residual mixing module performs a resizing of the filtering result based on the second attention map, specifically comprising:
reducing the dimension of the second preprocessing result and increasing the size of the second preprocessing result through the transposition layer of the residual mixing module;
adjusting, by a filler layer of the residual mixing module, a transpose layer processing result size according to the transpose layer processing result and the second attention attempt difference until the transpose layer processing result size is the same as the second attention attempt size;
and superposing the adjustment result with the third attention map through the superposition layer of the residual mixing module to serve as an adjusted filtering result.
5. The method of claim 1, wherein inputting the first attention map and the third attention map into a residual convolution module of the distance map prediction model, transpose convolving the third attention map by the residual convolution module, and resizing a filtering result according to the first attention map, and overlapping the adjustment result with the first attention map to obtain a fourth attention map, comprising:
A residual convolution module for inputting the third attention map into the distance map prediction model, wherein the dimension of the third attention map is adjusted according to the dimension of the first attention map through a transposition layer of the residual convolution module;
adjusting the dimension adjustment result according to the dimension difference between the dimension adjustment result and the first attention map and the dimension of the first attention map through the filling layer of the residual convolution module;
superposing a sizing result with the first attention map by an superposing layer of the residual convolution module;
and convolving the superposition result through a convolution layer of the residual convolution module, and determining the convolution result as a fourth attention map corresponding to the third attention map.
6. The method of claim 1, wherein prior to attempting the first pretreatment on the first attention, the method further comprises:
a group pooling module for inputting the first attention map into the distance map prediction model, wherein the group pooling module divides the first attention map into a preset number of sub-attention maps according to a preset number;
extracting the maximum value in each sub-attention force diagram, and determining a sub-feature diagram corresponding to each sub-attention force diagram;
And splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
7. The method of claim 1, wherein a protein distance map corresponding to the fourth attention map is obtained, the method further comprising:
and inputting the regression prediction module result into a size adjustment module of the distance map prediction model, and integrally scaling the attention map value of the regression prediction module result through the size adjustment module to obtain a protein residue distance map convenient to observe.
8. A model training method for predicting a protein distance map, comprising:
determining the residue sequence of the target protein;
inputting the residue sequence into a trained distance map prediction model, and carrying out attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
the first attention map is subjected to first preprocessing, a first preprocessing result is input into a triangular attention module of the distance map prediction model, the distance between residues is predicted, the residues with the distances meeting the triangular distance constraint relation among the residues are determined, attention weighting is carried out according to the determined residues meeting the triangular distance constraint relation, and a second attention map corresponding to the first attention map is obtained;
Performing second preprocessing on the second attention map, inputting a second preprocessing result into a residual mixing module of the distance map prediction model, filtering by the residual mixing module, performing size adjustment on a filtering result according to the second attention map, and performing local feature weighting on residues with specified structural relations in the adjusted filtering result to obtain a third attention map corresponding to the second attention map;
a residual convolution module that inputs the first attention attempt and the third attention attempt to the distance map prediction model, transpose convolutions the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
the fourth attention map is input into a regression prediction module of the distance map prediction model, and the protein distance map corresponding to the fourth attention map is obtained through normalization of the fourth attention map by the regression prediction module;
performing supervised learning on a regression prediction module of the distance map prediction model by using a label of a protein distance map training set, and updating model parameters by adopting a preset optimization algorithm according to a calculated loss value;
Iterating the training method and determining the current iteration times; and stopping iteration if the current iteration times reach preset times to obtain a target model.
9. The method of claim 8, wherein prior to attempting the first pretreatment on the first attention, the training method further comprises:
a group pooling module for inputting the first attention map into the distance map prediction model, wherein the group pooling module divides the first attention map into a preset number of sub-attention maps according to a preset number;
extracting the maximum value in each sub-attention force diagram, and determining a sub-feature diagram corresponding to each sub-attention force diagram;
and splicing the sub-feature graphs to obtain a first attention map with a preset number of dimensions.
10. An apparatus for predicting a protein distance map, comprising:
a sequence acquisition module for determining the residue sequence of the target protein;
the sequence confirming module inputs the residue sequence into a trained distance map prediction model, and performs attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
The triangular attention module is used for carrying out first preprocessing on the first attention map, inputting a first preprocessing result into the triangular attention module of the distance map prediction model, predicting the distance between residues, determining the residues with the distance conforming to the triangular distance constraint relation, and carrying out attention weighting according to the determined residues meeting the triangular distance constraint relation to obtain a second attention map corresponding to the first attention map;
the local structure attention module is used for carrying out second preprocessing on the second attention force diagram, inputting a second preprocessing result into a residual mixing module of the distance diagram prediction model, filtering through the residual mixing module, carrying out size adjustment on a filtering result according to the second attention force diagram, and carrying out local feature weighting on residues with specified structural relation in the adjusted filtering result to obtain a third attention force diagram corresponding to the second attention force diagram;
an attention attempt restoration module, a residual convolution module for inputting the first attention attempt and the third attention attempt into the distance map prediction model, a transpose convolution for the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
And the distance map output module is used for inputting the fourth attention map into the regression prediction module of the distance map prediction model, and normalizing the fourth attention map through the regression prediction module to obtain a protein distance map corresponding to the fourth attention map.
11. An apparatus for predictive protein distance map model training, comprising:
the sequence acquisition training module is used for determining the residue sequence of the target protein;
the sequence confirmation training module inputs the residue sequence into a trained distance map prediction model, and performs attention weighting on the residue sequence through an attention sub-network of the distance map prediction model to determine a first attention map corresponding to the residue sequence;
the triangular attention training module is used for carrying out first preprocessing on the first attention map, inputting a first preprocessing result into the triangular attention module of the distance map prediction model, predicting the distance between residues, determining the residues with the distance conforming to the triangular distance constraint relation, and carrying out attention weighting according to the determined residues meeting the triangular distance constraint relation to obtain a second attention map corresponding to the first attention map;
The local structure attention training module is used for carrying out second preprocessing on the second attention force diagram, inputting a second preprocessing result into a residual mixing module of the distance diagram prediction model, filtering through the residual mixing module, carrying out size adjustment on a filtering result according to the second attention force diagram, and carrying out local feature weighting on residues with specified structural relation in the adjusted filtering result to obtain a third attention force diagram corresponding to the second attention force diagram;
an attention attempt recovery training module, a residual convolution module inputting the first attention attempt and the third attention attempt into the distance map prediction model, a transpose convolution for the third attention attempt by the residual convolution module, and resizing the filtering result in accordance with the first attention attempt, superposing the adjustment result with the first attention force diagram to obtain a fourth attention force diagram corresponding to the third attention force diagram;
the distance map output training module is used for inputting the fourth attention map into the regression prediction module of the distance map prediction model, and normalizing the fourth attention map through the regression prediction module to obtain a protein distance map corresponding to the fourth attention map;
The parameter updating training module is used for performing supervised learning on the regression prediction module of the distance map prediction model by using the label of the protein distance map training set, and updating model parameters by adopting a preset optimization algorithm according to the calculated loss value;
the iteration training module is used for iterating the training method and determining the current iteration times; and stopping iteration if the current iteration times reach preset times to obtain a target model.
12. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-7 when executing the program.
CN202311155995.4A 2023-09-07 2023-09-07 Method for predicting protein distance map, storage medium and electronic equipment Pending CN117174162A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311155995.4A CN117174162A (en) 2023-09-07 2023-09-07 Method for predicting protein distance map, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311155995.4A CN117174162A (en) 2023-09-07 2023-09-07 Method for predicting protein distance map, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117174162A true CN117174162A (en) 2023-12-05

Family

ID=88944633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311155995.4A Pending CN117174162A (en) 2023-09-07 2023-09-07 Method for predicting protein distance map, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117174162A (en)

Similar Documents

Publication Publication Date Title
CN112699991A (en) Method, electronic device, and computer-readable medium for accelerating information processing for neural network training
CN115981870B (en) Data processing method and device, storage medium and electronic equipment
CN116663618B (en) Operator optimization method and device, storage medium and electronic equipment
CN117372631A (en) Training method and application method of multi-view image generation model
CN116306856B (en) Deep learning model deployment method and device based on search
CN116304720B (en) Cost model training method and device, storage medium and electronic equipment
CN116502679B (en) Model construction method and device, storage medium and electronic equipment
CN117392485B (en) Image generation model training method, service execution method, device and medium
CN117635822A (en) Model training method and device, storage medium and electronic equipment
CN117409466B (en) Three-dimensional dynamic expression generation method and device based on multi-label control
CN116597892B (en) Model training method and molecular structure information recommending method and device
CN116030247B (en) Medical image sample generation method and device, storage medium and electronic equipment
CN116167431A (en) Service processing method and device based on hybrid precision model acceleration
CN117079777A (en) Medical image complement method and device, storage medium and electronic equipment
CN117036829A (en) Method and system for achieving label enhancement based on prototype learning for identifying fine granularity of blade
CN117174162A (en) Method for predicting protein distance map, storage medium and electronic equipment
CN116524998A (en) Model training method and molecular property information prediction method and device
CN116524295A (en) Image processing method, device, equipment and readable storage medium
CN112927815A (en) Method, device and equipment for predicting intracranial aneurysm information
CN117726760B (en) Training method and device for three-dimensional human body reconstruction model of video
CN117893696B (en) Three-dimensional human body data generation method and device, storage medium and electronic equipment
CN116309582B (en) Portable ultrasonic scanning image identification method and device and electronic equipment
CN117808976B (en) Three-dimensional model construction method and device, storage medium and electronic equipment
CN117075918B (en) Model deployment method and device, storage medium and electronic equipment
CN113593025B (en) Geologic body model updating method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination