CN115083513A - Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image - Google Patents
Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image Download PDFInfo
- Publication number
- CN115083513A CN115083513A CN202210709043.1A CN202210709043A CN115083513A CN 115083513 A CN115083513 A CN 115083513A CN 202210709043 A CN202210709043 A CN 202210709043A CN 115083513 A CN115083513 A CN 115083513A
- Authority
- CN
- China
- Prior art keywords
- main chain
- backbone
- conformation
- chain
- conformations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Abstract
The invention discloses a method for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image, which adopts a deep learning guidance method to automatically construct a multi-chain protein complex structure from the medium-resolution cryoelectron microscope image. By iterative fitting, optimization and assembly of single-stranded protein structures predicted from the sequence, high quality protein complex structures can be constructed without human intervention. Different from the prior art that the protein chain is fitted with a cryoelectron microscope image, the invention fits the protein chain with a main chain probability image; the value at each lattice point in the backbone probability map represents the probability of finding a backbone atom around that lattice point; predicting a main chain probability map from the input density map by using a UNet + + based deep learning network model; compared with a density map, the main chain probability map contains more accurate main chain atom position information, and the fitting precision can be improved.
Description
Technical Field
The invention belongs to the field of protein complex structure construction, and particularly relates to a method for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image.
Background
The progress in the aspects of a cryoelectron microscope instrument, data collection, image reconstruction and the like enables the reconstructed cryoelectron microscope density images to be more and more. However, the ultimate goal of cryoelectron microscopy is not to reconstruct density maps, but to determine atomic structures. For high resolution mapsSoftware, typically designed for X-ray crystallography, can be used to create high quality atomic structures. For the resolution inThe left and right cryoelectron microscope density maps can obtain satisfactory performance through head modeling. However, for intermediate resolutionsFor cryo-electron micrographs, establishing an accurate structural model remains a challenging and labor-intensive process, reflecting the large gap between the number of reconstructed intermediate resolution density maps and the modeled three-dimensional structure. On the public database EMDB, medium resolution cryoelectron micrographs of over 40% did not model three-dimensional structures.
The fundamental reason for creating a protein model de novo on medium resolution electron microscopy is the large uncertainty in the intermediate resolution cryo-electron microscopy. This uncertainty is usually compensated for by a priori knowledge. In most cases, this a priori knowledge is embodied as the initial template structure. Starting from a given initial template structure, fitting the initial template structure to a cryoelectron micrograph, and then performing structure optimization, an atomic model corresponding to the given cryoelectron micrograph can be constructed. The initial template structure can be taken from an existing high-resolution structure, and can also be predicted by a structure prediction method such as homologous modeling or deep learning. Rigid and flexible fitting are common techniques for fitting template structures into medium resolution cryoelectron micrographs. The rigid fit searches for possible relative orientations between the structure and the density map, in which process the template structure is rigid. The fit between the fitting structure and the electron microscope image is measured by a scoring function, such as cross-correlation, mutual information, and the like. To date, various rigid fitting tools have been developed, including EMfit, UCSF Chimera, gmfit, multifit, Situs, PowerFit, TEMPy, MOFIT, VESPER, Phenix. If the initial template structure deviates from the true structure to a certain degree of conformation deviation, flexible fitting is often needed to improve and optimize the structure obtained by rigid fitting, so that the structure is more suitable for a density map.
Existing fitting algorithms still have limited ability to model structures from medium resolution cryoelectron micrographs. First, experimentally solved electron microscopy density maps typically contain heterogeneous density signals and random noise. Therefore, scoring functions that measure how well structures fit to the density map may mislead the search and ranking process of the fit. The existing method has no choice but to ensure the robustness at the cost of sacrificing the scoring precision. Second, manual intervention is still necessary for most fitting algorithms, which makes fitting labor intensive and very unfriendly for non-professional users. The fitting results are greatly affected by the use of specific parameters. Third, many methods are designed for single-chain protein fitting, and therefore require electron microscopy segmentation prior to fitting. Complex structures can only be constructed by manually combining the fit results of multiple single strands. For some low-quality density maps, accurate segmentation of the electron micrographs is almost impossible, and in this case it is extremely difficult to build a reasonable composite model. Finally, proteins are flexible molecules and therefore often require flexible fitting and optimization during modeling, which is a laborious process. The optimization method based on molecular dynamics has complex initialization process, long calculation time, parameter-dependent result and possible error in the whole process. Therefore, lightweight optimization methods are more common, however they cannot handle large conformational changes.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a method for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image, so that the problems that the existing cryoelectron microscope fitting technology is seriously influenced by noise, needs manual intervention, can only perform single-chain protein fitting, is difficult to realize flexible fitting and the like are solved.
To achieve the above object, according to a first aspect of the present invention, there is provided a method for constructing a protein complex structure based on medium resolution cryoelectron microscopy, comprising:
s1, inputting the cryoelectron microscope image into a pre-trained prediction model to obtain a corresponding main chain probability image;
s2, predicting a corresponding atom model from the L single-stranded sequences;
s3, rigidly fitting each single-chain atom model to the main chain probability map respectively to obtain a plurality of rigidly fitted conformations, and keeping the rigidly fitted conformations with the highest main chain matching scores of the former M main chains;
s4, splitting each single-chain atom model into n short structural domains, and constructing an undirected graph V1 according to the connection mode of the n short structural domains; respectively taking each short structural domain as a seed domain, and performing structural domain optimization on the M rigid fitting conformations with the highest main chain matching fraction based on a simplex method according to the breadth-first search sequence on V1 to obtain M multiplied by n structural domain optimized conformations of each single chain;
s5, selecting conformations with the highest main chain matching scores from the rigid fitting conformations with the highest main chain matching scores of the first M single chains and the optimized conformations of the Mxn structural domains respectively to obtain LxK conformations so as to construct an undirected graph V2, wherein the vertex of V2 is the main chain matching score of each conformation, and the edges are atomic conflict scores between each conformation of every two single chains; and removing the edge with the atomic collision score exceeding the threshold value from the undirected graph, and determining the single-chain and conformation combination with the highest total main chain matching score as the constructed protein complex structure based on a Bron-Kerbosch search algorithm.
According to a second aspect of the present invention, there is provided a system for constructing a protein complex structure based on medium resolution cryo-electron microscopy, comprising: a computer-readable storage medium and a processor;
the computer readable storage medium is used for storing executable instructions;
the processor is configured to read executable instructions stored in the computer-readable storage medium and execute the method according to the first aspect.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
1. the invention provides a method for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image, which adopts a deep learning guidance method to automatically construct a multi-chain protein complex structure from the medium-resolution cryoelectron microscope image. By iterative fitting, optimization and assembly of single-stranded protein structures predicted from the sequence, high quality protein complex structures can be constructed without human intervention.
2. According to the method for constructing the protein complex structure based on the medium-resolution cryoelectron microscope image, the protein chain is not directly fitted into the initial density image, but the protein chain is fitted with the main chain probability image. The value at each lattice point in the backbone probability map represents the probability of finding the backbone atom around that lattice point. The backbone probability map is predicted from the input density map using a UNet + + based deep learning network model. Compared with a density map, the main chain probability map contains more accurate main chain atom position information, and the fitting precision is greatly improved.
3. The invention provides a method for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image, and also provides a semi-flexible protein domain optimization strategy on the basis of rigid fitting, so that the orientation of a protein domain can be rapidly optimized. And (3) finding the protein chain combination with the highest score from different protein chain combinations by using a searching method based on the maximum cluster problem, thereby obtaining a final compound model. The evaluation results on the test set show that the composite model constructed by the modeling method of the invention has high quality on each index no matter the PDB structure or the electron microscope density map is taken as a reference.
Drawings
FIG. 1 is a schematic diagram of a process for constructing a protein complex structure based on a medium-resolution cryoelectron micrograph according to an embodiment of the present invention;
fig. 2 (a) is a schematic diagram of a protein domain optimization process provided by an embodiment of the present invention, and fig. 2 (b) is a schematic diagram of an iterative modeling provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The embodiment of the invention provides a method for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image, which comprises the following steps:
and S1, inputting the cryoelectron microscope image into a pre-trained prediction model to obtain a corresponding main chain probability image.
Preferably, the prediction model is obtained by taking a cryoelectron micrograph as a training sample and training by taking a main chain probability map of the PDB structure corresponding to the cryoelectron micrograph as a label.
Specifically, a training strategy of a deep learning based main chain probability prediction module is adopted. The deep learning module adopts a UNet + + architecture. The training set consists of a plurality of experimental electron microscope images and a main chain probability image derived from the corresponding PDB structure. And predicting a main chain probability map by adopting a UNet + + network according to an input electron microscope density map.
Preferably, as shown in fig. 1, UNet + + network uses 3 encoder sub-blocks and 3 decoder sub-blocks with dense hop connections and uses 3 x 3 three-dimensional convolutional layers. A three-dimensional maximum pooling layer with step size 2 is used for down-sampling, and a three-dimensional linear interpolation layer with a magnification factor of 2 is used for up-sampling. The input to the network is of size 40 x 40 and grid spacingThe electron microscope density square. These same sized density squares were cut from the electron micrograph using a sliding window. Prior to this step, electron micrographs with different grid spacings were all interpolated using cubic interpolation toNetwork interval of (2). The output of the network is a main chain probability block of the same size. These backbone probability squares are then recombined into the final backbone probability map by averaging the overlap.
It should be noted that, during the training and practical use process, the inputted electron microscope density map is a main map, and is usually a sharpened map. Therefore, it is preferable to use the sharpened graph as an input for the prediction of the main chain probability graph. Simple RELION sharpening using an automatically determined B factor can significantly improve the prediction quality of the main chain probability map compared to an unsharpened half map.
To train the network model, 209 pairs of experimental electron micrographs and associated PDB protein three-dimensional structures are preferably used as training sets. For each density map in the training set, its target backbone probability map is generated from its associated PDB structure. During training, 20% of the graphs are randomly selected from the training set as the validation set. The network is implemented with Pytrich1.8.1 + cuda 11.1. For each model, the network trained a maximum of 300 rounds with a single batch of 160 blocks input. An Adam optimizer is employed to minimize the prediction loss.
Two different loss functions are used to measure the difference between the predicted backbone probability block and the target backbone probability block, including the smoothing L1 loss function and the Structural Similarity (SSIM) loss function. The initial learning rate was set to 1e-3 and no L2 regularization was performed. A learning rate decay strategy is employed, specifically if the average training loss is not reduced during 4 consecutive rounds of training, the learning rate is reduced to 1/2 of its current value. The training process will stop when the learning rate reaches a minimum value of 1 e-5. The final used main chain probability graph prediction model is the network model with the minimum verification loss.
Preferably, the main chain probability map is formed by PDB structures associated with the main chain probability mapOn a three-dimensional lattice point which is a lattice point spacing, the following formula is obtained:
wherein x is a position vector of a grid point; p (x) is the probability of finding a backbone atom in the vicinity of lattice point x; the set A consists of position vectors of all main chain atoms N, C alpha and C; k ═ pi/(2.4 +0.8R) 2 And R is the resolution of the cryoelectron micrograph.
To train the deep learning model, for a given cryoelectron microscopy density map, the target backbone probability map is determined by the PDB structure associated with itThree-dimensional lattice points which are lattice point intervals are generated according to the following formula:
wherein x is the position vector of the grid point; p (x) is the backbone probability for lattice point x, i.e., the probability of finding a backbone atom in the vicinity of lattice point x; set A consists of the position vectors of all the main chain atoms (N, C α, and C). The value of k depends on the resolution R of the cryoelectron micrograph, i.e., k ═ pi/(2.4 +0.8R) 2 。
Preferably, the main chain point is adopted to carry out simplified representation on the main chain probability chart; the backbone points may be generated from a backbone probability map by a mean shift clustering algorithm.
In order to improve the calculation efficiency, the main chain probability graph obtained by deep learning model prediction is not directly used, but a simplified representation, namely main chain points, is adopted. The main chain points can be all adoptedThe value-shifting clustering algorithm is generated from the main-chain probability map. Specifically, a set of points are placed at all grid point positions greater than 0 on the main chain probability map as seed points, and the positions of the seed points are Move to local maxima on the main chain probability map through multiple iterations:
wherein x is n (N-1, N) is a position vector of a grid point,is a Gaussian kernel function, p (x) n ) Is a lattice point x n T is the number of iterations. Gaussian kernel function Where the value of k depends on the resolution R of the cryoelectron micrograph, i.e. k ═ pi/(2.4 +0.8R) 2 . Backbone probability of seed pointCan be calculated on the main chain probability map according to the Gaussian kernel function, i.e.After the mean shift process is converged (namely the seed points do not move any more), the seed points with closer distance (below a certain threshold value) are clustered, and the seed point with the highest main chain probability value in each class is selected as a representative. The final group of main chain points Z consists of the representative points of each type of seed points i (i=1,...,L)∈Z。
S2, predicting the corresponding atom model from the L single-stranded sequences.
Specifically, the atomic model is predicted from the single-stranded sequence using a protein structure prediction program. Preferably, AlphaFold2 is used as a prediction program with excellent performance in protein structure prediction. For the predicted single-stranded model, their domains were assigned using SWORD.
And S3, rigidly fitting each single-chain atom model to the main chain probability map respectively to obtain a plurality of rigidly fitted conformations, and keeping the rigidly fitted conformations with the highest main chain matching scores of the former M main chains.
Preferably, the atomic model of each single chain is rigidly fitted to the backbone probability map separately based on global search by FFT and simplex method.
In the rigid fitting process, a Fast Fourier Transform (FFT) based fitting strategy is used to fit the protein model of each chain to the backbone probability map by global search, as shown in fig. 1. The search process involves a translation search and a rotation search to traverse the entire sampling space. All translational searches can be performed by a round of fast fourier transform, while rotational searches are performed by exploring a large set of rotational angles. Specifically, an FFT-based translation search is performed for each rotation of the protein structure.
Preferably, the rotational search is performed over a uniform discrete euler space with 15 degree intervals, for a total of 4392 rotational searches. The exhaustive global search is discrete to a fixed lattice point interval and angle interval, and in order to further improve the fitting accuracy, a simplex method is adopted to further continuously optimize the fitting result of each global discrete search.
The objective function in the fitting and optimization process is the main chain matching score.
And measuring the fit degree of the fitted conformations and the main chain probability density map by calculating main chain matching scores, sequencing rigid fitting results, and keeping the rigid fitted conformations with the highest main chain matching scores in the first M numbers.
Preferably, the backbone match score is calculated by the formula:
wherein the backbone match score of conformation Y,z is the position vector of the backbone points, Z is the set of all backbone points, P (Z) is the backbone probability of the backbone points, y q (Q ═ 1., Q.) is the backbone atom in conformation Y, k ═ pi/(2.4 +0.8R) 2 R is resolution of a cryo-electron micrograph, and θ ═ k/π 1.5 。
That is, for backbone atom Y in fitted conformation Y q (Q ═ 1.., Q), its backbone match score s' can be calculated according to the following formula:
where Z ∈ Z is the position vector of the backbone points (Z is the set of all backbone points), P (Z) is the backbone probability of a backbone point.
S4, splitting each single-chain atom model into n short structural domains, and constructing an undirected graph V1 according to the connection mode of the n short structural domains; and respectively taking each short domain as a seed domain, and performing domain optimization (adjusting to adjust the position of each domain on the rigid fitting conformation) on the rigid fitting conformations with the highest M main chain matching scores based on a simplex method according to the breadth-first search sequence on V1 to obtain M multiplied by n domain optimized conformations of each single chain.
Wherein, for undirected graph V1, any two domains with adjacent residues are connected by an undirected edge.
Preferably, in step S4, the L single-chain atom model is split into n short domains based on the SWORD domain splitting method.
After a protein model of a certain chain is rigidly fitted to a main chain probability map, a semi-flexible structural domain optimization strategy is adopted to further improve the fit degree of the protein model and the main chain probability map, as shown in (a) in fig. 2, a schematic diagram of a protein domain optimization process is shown. Due to the conformational deviation between the prediction model and the reference PDB structure, the prediction chain is used as a rigid whole to be fitted, and the matching degree with the reference structure is poor. By segmenting the protein chain into small domains and locally optimizing the position of each domain in the backbone probability map, the quality of the fit can be greatly improved. Wherein the lower half of (a) in fig. 2 compares the fitted structure before (left) and after (right) the application domain optimization. The upper half of fig. 2 (a) is a detailed description of domain optimization. Each domain in the fitted structure is labeled with a unique number, and the protein chain is well fitted to the backbone probability map (right) by local optimization of each domain separately on the rigidly fitted conformation (left).
For each protein model, SWORD is assigned n short domains. On this basis, a simple map can be constructed in which any two domains with adjacent residues are connected by an undirected edge. Domain optimization was performed on the M highest scoring fitted conformations from the rigid fit. Starting from a selected one of the domains as the seed domain, an optimal fit is performed for all the domains in turn. Specifically, starting from the seed domain, the current domain is simplex optimized, a local optimal match is found, and then the neighboring domains of the current domain are optimized. This process is repeated until all domains have completed local optimization. The order of optimization is determined by breadth-first search (BFS) on the above-described graph. Each domain is separately taken as a seed domain and domain optimization will yield n different models from each rigid fit result. For each chain, the final fit results are M × n domain optimization models plus M rigid fit models. The models are ordered according to respective match scores.
S5, selecting conformations with the highest main chain matching scores from the rigid fitting conformations with the highest main chain matching scores of the first M single chains and the optimized conformations of the Mxn structural domains respectively to obtain LxK conformations so as to construct an undirected graph V2, wherein the vertex of V2 is the main chain matching score of each conformation, and the edges are atomic conflict scores between each conformation of every two single chains; and removing the edge with the atomic collision score exceeding the threshold value from the undirected graph, and determining the single-chain and conformation combination with the highest total main chain matching score as the constructed protein complex structure based on a Bron-Kerbosch search algorithm.
After rigid fitting and flexible optimization of the individual strands, multiple single strands need to be assembled into a composite structure. To prevent collisions between different chains, atomic collision scores between the various conformations of different chains are first calculated.
Preferably, the atomic collision fraction between conformation a and conformation B is calculated by the formula:
wherein n is A Is the number of atoms in A, c (a; B) is the collision fraction of atoms a of conformation A relative to all atoms in conformation B,d clash is the distance cut-off of any one of atom a and conformation B.
To prevent collisions between different conformations, a collision score c of one atom a in the set of atoms a of conformation a relative to all atoms B e B of another conformation B is defined as follows:
it can be seen that if the distance between atom a and any one of the atoms of conformation B is at the truncation distance d clash Hereinafter, the collision score is fixed to 1.0. The collision fraction C of conformation A relative to conformation B is the average of the collision fractions of all atoms of AMean value, i.e.Wherein n is A Is the number of atoms in A.
The problem of assembly of multiple strands into a complex is translated into a maximum clump problem based on the match score for each conformation and the collision score between the different strands, as shown in figure 1. First, an undirected graph is constructed in which the vertices are the matching scores for each conformation and the edges of the two conformations connecting the two different chains are the atomic collision scores between the two. It should be noted that the different fitting conformations of the same strand are not connected. Edges whose collision score exceeds a given threshold are then removed from the graph. After the graph construction is completed, the chain combination with the highest total matching score is found as the constructed protein complex structure by using the Bron-Kerbosch algorithm.
As shown in fig. 1, the system comprises an upper part, a middle part and a lower part, wherein the upper part is a training schematic diagram of a deep learning-based main chain probability prediction module; the middle part is the workflow of composite modeling. The invention inputs electron microscope density image and multi-protein chain sequence. The main chain probability map is predicted from the input density map using a trained deep learning model. The atomic model is predicted from each single-stranded sequence using a structure prediction program. Models of a set of predicted chains are individually fitted to the main chain probability map by Fast Fourier Transform (FFT) based fitting and optimization. Using the fitting results for all single strands, the final protein complex structure was constructed by the Bronon-Kerbosch maximum cluster algorithm. Finally, the quality of the built model is estimated by using the main chain matching fraction between the built model and the main chain probability graph. The bottom part is a schematic diagram of the Bron-Kerbosch maximum blob algorithm, which computes the collision scores of the matching conformations of the different chains, and if the collision score is below a certain threshold, concatenates the two conformations, and vice versa. Based on these linkage cases, a graph is generated whose vertices are the fit scores of the different conformations from the different chains. The Bron-Kerbosch search algorithm was used to find the highest scoring combination of different chains as the final protein complex structure.
It is contemplated that some chains may not assemble into protein complexes by only one round of the Bron-Kerbosch algorithm. Therefore, the present invention employs an iterative strategy for composite assembly, preferably the method further comprises: s6, removing the region of the existing protein complex structure from the main chain probability map to update the main chain probability map, and returning to the step S2 until all single chains are assembled into the protein complex structure.
As shown in fig. 2 (b), the iterative modeling strategy is adopted, and after each modeling round, the density region of the existing structure is removed in the next modeling round of fitting. After each round of modeling, unassembled chains are iteratively assembled onto the complex under the direction of the updated backbone probability map. That is, after each round of Bron-Kerbosch algorithm, the main chain probability regions of the existing fitted structures are removed from the assembly of the next round. And fitting and optimizing the unassembled chains according to the removed main chain diagram, and iteratively assembling the unassembled chains into the complex structure to obtain the protein complex structure.
After each round of assembly of protein complexes by the Bron-Kerbosch algorithm, the current complex structure D and backbone point P (z) were calculated based on the collision scores described above i ) Collision score c (z) of (i ═ 1, …, L) i (ii) a D) (in this process, the set of main chain points can be treated as a set of atoms), and then the main chain points are updated according to the following formula:
P'(z i ;D)=P(z i )*(1.0-c(z i ;D))
by updating the backbone probability of the backbone points, the regions of the assembled structure can be removed from further structure assembly.
The embodiment of the invention provides a system for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image, which comprises the following steps:
a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is used for reading the executable instructions stored in the computer readable storage medium and executing the method according to any one of the above embodiments.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for constructing a protein complex structure based on a medium-resolution cryoelectron microscope image is characterized by comprising the following steps:
s1, inputting the cryoelectron microscope image into a pre-trained prediction model to obtain a corresponding main chain probability image;
s2, predicting a corresponding atom model from the L single-stranded sequences;
s3, rigidly fitting each single-chain atom model to the main chain probability map respectively to obtain a plurality of rigidly fitted conformations, and keeping the rigidly fitted conformations with the highest main chain matching scores of the former M main chains;
s4, splitting each single-chain atom model into n short structural domains, and constructing an undirected graph V1 according to the connection mode of the n short structural domains; respectively taking each short structural domain as a seed domain, and performing structural domain optimization on the M rigid fitting conformations with the highest main chain matching fraction based on a simplex method according to a breadth-first search sequence on V1 to obtain M multiplied by n structural domain optimized conformations of each single chain;
s5, selecting conformations with the highest main chain matching scores from the rigid fitting conformations with the highest main chain matching scores of the first M single chains and the optimized conformations of the Mxn structural domains respectively to obtain LxK conformations so as to construct an undirected graph V2, wherein the vertex of V2 is the main chain matching score of each conformation, and the edges are atomic conflict scores between each conformation of every two single chains; and removing edges with atomic conflict scores exceeding a threshold value from the undirected graph, and determining the conformation combination of each single chain with the highest total main chain matching score as the constructed protein complex structure based on a Bron-Kerbosch search algorithm.
2. The method according to claim 1, wherein the prediction model is obtained by training a cryo-electron micrograph as a training sample and a main chain probability map of the PDB structure corresponding to the cryo-electron micrograph as a label.
3. The method of claim 2, wherein the backbone probability map is constructed from PDB structures associated therewithOn a three-dimensional lattice point which is a lattice point spacing, the following formula is obtained:
wherein x is a position vector of a grid point; p (x) is the probability of finding a backbone atom at lattice point x; the set A consists of position vectors of all main chain atoms N, C alpha and C; k ═ pi/(2.4 +0.8R) 2 And R is the resolution of the cryoelectron micrograph.
4. The method of claim 3, wherein the main chain probability map is represented in a simplified manner using main chain points; the main chain points are generated from the main chain probability map by a mean shift clustering algorithm.
5. The method of claim 1, wherein the backbone match score is calculated by:
wherein the backbone match score of conformation Y,z is the position vector of the backbone points, Z is the set of all backbone points, P (Z) is the backbone probability of the backbone points, y q (Q-1. -, Q.) is a backbone atom of conformation Y, k- (/ (2)).4+0.8R)) 2 R is resolution of a cryo-electron micrograph, and θ ═ k/π 1.5 。
6. The method of claim 1, wherein the atomic collision fraction between conformation a and conformation B is calculated by the formula:
7. The method of claim 1, wherein the method further comprises: s6, removing the region of the existing protein complex structure from the main chain probability map to update the main chain probability map, and returning to the step S2 until all single chains are assembled into the protein complex structure.
8. The method of claim 1, wherein in step S3, an FFT-based global search and simplex method are used to rigidly fit the atomic model of each single chain to the main chain probability map.
9. The method of claim 1 or 2, wherein in step S4, the L single-chain atom model is split into n short domains based on a SWORD domain splitting method.
10. A system for constructing a protein complex structure based on medium resolution cryoelectron microscopy images, comprising: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is configured to read executable instructions stored in the computer-readable storage medium and execute the method according to any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210709043.1A CN115083513B (en) | 2022-06-21 | 2022-06-21 | Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210709043.1A CN115083513B (en) | 2022-06-21 | 2022-06-21 | Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115083513A true CN115083513A (en) | 2022-09-20 |
CN115083513B CN115083513B (en) | 2023-03-10 |
Family
ID=83253592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210709043.1A Active CN115083513B (en) | 2022-06-21 | 2022-06-21 | Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115083513B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115497553A (en) * | 2022-09-29 | 2022-12-20 | 水木未来(杭州)科技有限公司 | Protein three-dimensional structure modeling method and device, electronic device and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060121455A1 (en) * | 2003-04-14 | 2006-06-08 | California Institute Of Technology | COP protein design tool |
US20170103161A1 (en) * | 2015-10-13 | 2017-04-13 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation |
US20190156911A1 (en) * | 2016-04-27 | 2019-05-23 | Massachusetts Institute Of Technology | Stable nanoscale nucleic acid assemblies and methods thereof |
CN111210869A (en) * | 2020-01-08 | 2020-05-29 | 中山大学 | Protein cryoelectron microscope structure analysis model training method and analysis method |
US20200300763A1 (en) * | 2017-12-05 | 2020-09-24 | Simon Fraser University | Methods for analysis of single molecule localization microscopy to define molecular architecture |
US20200333270A1 (en) * | 2017-10-06 | 2020-10-22 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation using non-uniform refinement |
CN111968707A (en) * | 2020-08-07 | 2020-11-20 | 上海交通大学 | Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method |
WO2020239822A1 (en) * | 2019-05-27 | 2020-12-03 | The European Molecular Biology Laboratory | Nucleic acid construct binding to influenza polymerase pb1 rna synthesis active site |
WO2020243839A1 (en) * | 2019-06-07 | 2020-12-10 | Structura Biotechnology Inc. | Methods and systems for determining variability of cryo-em protein structures |
WO2021178508A1 (en) * | 2020-03-05 | 2021-09-10 | University Of Washington | Rigid helical junctions for modular repeat protein sculpting and methods of use |
CN113990384A (en) * | 2021-08-12 | 2022-01-28 | 清华大学 | Deep learning-based frozen electron microscope atomic model structure building method and system and application |
WO2022040423A2 (en) * | 2020-08-19 | 2022-02-24 | University Of Pittsburgh-Of The Commonwealth System Of Higher Education | Coronavirus nanobodies and methods for their use and identification |
WO2022081920A1 (en) * | 2020-10-15 | 2022-04-21 | The Regents Of The University Of California | Systems for and methods of treatment selection |
US20220189579A1 (en) * | 2020-12-14 | 2022-06-16 | University Of Washington | Protein complex structure prediction from cryo-electron microscopy (cryo-em) density maps |
-
2022
- 2022-06-21 CN CN202210709043.1A patent/CN115083513B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060121455A1 (en) * | 2003-04-14 | 2006-06-08 | California Institute Of Technology | COP protein design tool |
US20170103161A1 (en) * | 2015-10-13 | 2017-04-13 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation |
US20190156911A1 (en) * | 2016-04-27 | 2019-05-23 | Massachusetts Institute Of Technology | Stable nanoscale nucleic acid assemblies and methods thereof |
US20200333270A1 (en) * | 2017-10-06 | 2020-10-22 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation using non-uniform refinement |
US20200300763A1 (en) * | 2017-12-05 | 2020-09-24 | Simon Fraser University | Methods for analysis of single molecule localization microscopy to define molecular architecture |
WO2020239822A1 (en) * | 2019-05-27 | 2020-12-03 | The European Molecular Biology Laboratory | Nucleic acid construct binding to influenza polymerase pb1 rna synthesis active site |
WO2020243839A1 (en) * | 2019-06-07 | 2020-12-10 | Structura Biotechnology Inc. | Methods and systems for determining variability of cryo-em protein structures |
CN111210869A (en) * | 2020-01-08 | 2020-05-29 | 中山大学 | Protein cryoelectron microscope structure analysis model training method and analysis method |
WO2021178508A1 (en) * | 2020-03-05 | 2021-09-10 | University Of Washington | Rigid helical junctions for modular repeat protein sculpting and methods of use |
CN111968707A (en) * | 2020-08-07 | 2020-11-20 | 上海交通大学 | Energy-based atomic structure and electron density map multi-objective optimization fitting prediction method |
WO2022040423A2 (en) * | 2020-08-19 | 2022-02-24 | University Of Pittsburgh-Of The Commonwealth System Of Higher Education | Coronavirus nanobodies and methods for their use and identification |
WO2022081920A1 (en) * | 2020-10-15 | 2022-04-21 | The Regents Of The University Of California | Systems for and methods of treatment selection |
US20220189579A1 (en) * | 2020-12-14 | 2022-06-16 | University Of Washington | Protein complex structure prediction from cryo-electron microscopy (cryo-em) density maps |
CN113990384A (en) * | 2021-08-12 | 2022-01-28 | 清华大学 | Deep learning-based frozen electron microscope atomic model structure building method and system and application |
Non-Patent Citations (2)
Title |
---|
JIAHUA HE等: "Full-length de novo protein structure determination from cryo-EM maps using deep learning", 《BIOINFORMATICS》 * |
XIAOGEN ZHOU等: "Progressive assembly of multi-domain protein structures from cryo-EM density maps", 《NATURE COMPUTATIONAL SCIENCE》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115497553A (en) * | 2022-09-29 | 2022-12-20 | 水木未来(杭州)科技有限公司 | Protein three-dimensional structure modeling method and device, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115083513B (en) | 2023-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106846425B (en) | Scattered point cloud compression method based on octree | |
CN109685831B (en) | Target tracking method and system based on residual layered attention and correlation filter | |
Estrada et al. | Tree topology estimation | |
JP2022501694A (en) | Determining a protein distance map by combining distance map crops | |
CN113065594B (en) | Road network extraction method and device based on Beidou data and remote sensing image fusion | |
CN115083513B (en) | Method for constructing protein complex structure based on medium-resolution cryoelectron microscope image | |
CN114841898B (en) | Deep learning-based post-processing method and device for three-dimensional density map of cryoelectron microscope | |
CN114972366A (en) | Full-automatic segmentation method and system for cerebral cortex surface based on graph network | |
CN114330867A (en) | Path planning method based on problem solving of coverage traveling salesman | |
CN112750137A (en) | Liver tumor segmentation method and system based on deep learning | |
CN117274339A (en) | Point cloud registration method based on improved ISS-3DSC characteristics combined with ICP | |
Zhu et al. | Vdb-edt: An efficient euclidean distance transform algorithm based on vdb data structure | |
CN112101475A (en) | Intelligent classification and splicing method for multiple disordered images | |
JP2002537605A (en) | Matching engine | |
Cao et al. | On accurate computation of trajectory similarity via single image super-resolution | |
CN116206108A (en) | OCT image choroid segmentation network model and method based on domain self-adaption | |
CN112785082B (en) | Learning-based road network shortest path distance approximate calculation model training method and device | |
CN108828641A (en) | A method of shortening the Fast integer Ambiguity Resolution time | |
CN114530195A (en) | Protein model quality evaluation method based on deep learning | |
CN112446893B (en) | Contour segmentation method and device for liver image | |
Rahman et al. | Equivariant encoding based gvae (eqen-gvae) for protein tertiary structure generation | |
Castro et al. | ReLSO: a transformer-based model for latent space optimization and generation of proteins | |
KR102475727B1 (en) | Method and apparatus for improving surface registration using neural network model | |
CN113633375B (en) | Construction method of non-diagnosis-purpose virtual bronchoscope | |
Vrćek et al. | Reconstruction of short genomic sequences with graph convolutional networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |