WO2023147706A1 - Neural network model training method and resolution estimation method for cryo-electron microscope density map - Google Patents
Neural network model training method and resolution estimation method for cryo-electron microscope density map Download PDFInfo
- Publication number
- WO2023147706A1 WO2023147706A1 PCT/CN2022/075408 CN2022075408W WO2023147706A1 WO 2023147706 A1 WO2023147706 A1 WO 2023147706A1 CN 2022075408 W CN2022075408 W CN 2022075408W WO 2023147706 A1 WO2023147706 A1 WO 2023147706A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- value
- resolution
- cryo
- density map
- global
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 125
- 238000003062 neural network model Methods 0.000 title claims abstract description 75
- 238000012549 training Methods 0.000 title claims abstract description 61
- 238000003860 storage Methods 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims description 43
- 238000004590 computer program Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 13
- 238000013459 approach Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 238000005520 cutting process Methods 0.000 claims description 4
- 238000001493 electron microscopy Methods 0.000 claims description 2
- 229920002521 macromolecule Polymers 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 7
- 238000004364 calculation method Methods 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 26
- 238000012360 testing method Methods 0.000 description 19
- 230000008569 process Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 16
- STJMRWALKKWQGH-UHFFFAOYSA-N clenbuterol Chemical compound CC(C)(C)NCC(O)C1=CC(Cl)=C(N)C(Cl)=C1 STJMRWALKKWQGH-UHFFFAOYSA-N 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 6
- 238000005070 sampling Methods 0.000 description 6
- 238000011176 pooling Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 239000002245 particle Substances 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 102100037387 Gasdermin-A Human genes 0.000 description 1
- 101001026276 Homo sapiens Gasdermin-A Proteins 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
Definitions
- the present application relates to the technical field of resolution estimation of cryo-electron microscope density maps, in particular to a neural network model training method and device, a cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage media.
- cryo-EM density maps is a critical step in determining atomic structure.
- the resolution of cryo-EM density map includes global resolution and local resolution.
- different algorithms are used to estimate the global resolution and the local resolution, and the same resolution estimation method can only estimate one kind of resolution, that is, the global resolution or the local resolution.
- the global resolution can be estimated by Fourier shell correlation algorithm.
- the local resolution it can be estimated by the ResMap algorithm.
- one of the input data of the Blocres method is half-maps. Therefore, when it is necessary to perform resolution estimation on the cryo-electron microscope density map downloaded from EMDB or the cryo-electron microscope density map obtained by other means, it is necessary to first Obtain half-maps, and half-maps are not always provided, resulting in difficult to obtain input data for resolution estimation, difficult or require complex pre-preparation work to obtain input data.
- the embodiment of the present application provides a neural network model training method and device, a cryo-electron microscope density map resolution estimation method and device, computer equipment, and a storage medium to solve the problem of resolution of cryo-electron microscope density maps in the prior art.
- the input data of the rate estimation algorithm is not easy to obtain and the calculation time is long.
- the first aspect of the present application provides a training method for a neural network model, including: determining a mask value, a local resolution fluctuation value, and a global resolution value based on the first target cryo-electron microscope density map, annotating the first target cryo-electron microscope density map There are mask value labels, local resolution fluctuation value labels and global resolution value labels; the neural network model is trained based on the mask value, local resolution fluctuation value and global resolution value, so that the mask value tends to be close to the mask value Membrane Value Label, Local Resolution Fluctuation Value Approaches Local Resolution Fluctuation Value Label, Global Resolution Value Approaches Global Resolution Value Label.
- determining the mask value, local resolution fluctuation value, and global resolution value based on the first target cryo-electron microscope density map includes: performing coding processing based on the residual module on the first target cryo-electron microscope density map to obtain m feature maps; decode the m feature maps to obtain the expected density map; determine the mask value and local resolution fluctuation value based on the expected density map; determine the global resolution value based on the top-level feature maps in the m feature maps.
- determining the mask value based on the expected density map includes: the expected density map undergoes a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 in sequence to obtain the mask value .
- determining the local resolution fluctuation value based on the expected density map includes: classifying the expected density map to obtain multiple first categories and respective weights of the multiple first categories; determining respective weights of the multiple first categories The product of each and the first preset value represented by each is the local resolution fluctuation value.
- determining the global resolution value based on the top-level feature maps in the multiple feature maps includes: classifying the top-level feature maps to obtain multiple second categories and their respective weights for multiple second categories; The product of the respective weights of the two categories and the second preset values represented by them is the global resolution value.
- training the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value includes: determining a first loss function based on the mask value and the mask value label; Determine the second loss function with the local resolution fluctuation value label; determine the third loss function based on the global resolution value and the global resolution value label; determine the total loss function based on the first loss function, the second loss function and the third loss function; Update the parameters of the neural network model based on the gradient of the total loss function.
- cryo-electron microscope density map before determining the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, it further includes: cutting the cryo-electron microscope density map to obtain the Biomacromolecule circumscribed cube; scale the biomacromolecule circumscribed cube to obtain the density map of the first cryo-electron microscope.
- the second aspect of the present application provides a method for estimating the resolution of a cryo-electron microscope density map based on a neural network, including: determining a mask value, a local resolution fluctuation value, and a global resolution value based on a second target cryo-electron microscope density map; The film value, the local resolution fluctuation value, and the global resolution value determine the local resolution value.
- the third aspect of the present application provides a neural network model training device, including: a first determination module, which determines the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, the first target The cryo-electron microscope density map is marked with a mask value label, a local resolution fluctuation value label and a global resolution value label; the training module trains the neural network model based on the mask value, local resolution fluctuation value and global resolution value, to Make the mask value approximate the mask value label, the local resolution fluctuation value approximate the local resolution fluctuation value label, and the global resolution value approximate the global resolution value label.
- the fourth aspect of the present application provides a neural network-based cryo-electron microscope density map resolution estimation device, including: a first determination module, based on the second target cryo-electron microscope density map to determine the mask value, local resolution fluctuation value and global resolution rate value; the second determination module determines the local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
- the fifth aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executed by the processor.
- the steps of the training method of the network model or the method for detecting the resolution of the cryo-electron microscope density map based on the neural network provided by any of the above-mentioned embodiments.
- the sixth aspect of the present application provides a computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the steps or steps of the neural network model training method provided by any of the above-mentioned embodiments are implemented.
- the neural network-based method for detecting the resolution of cryo-electron microscope density maps provided by any of the above-mentioned embodiments.
- cryo-electron microscope density map resolution estimation method and device According to the neural network model training method and device, cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage medium provided in this application, mask values and local resolution fluctuations can be estimated at the same time based on a cryo-electron microscope density map value and the global resolution value. Subsequently, the local resolution value may be determined based on the mask value, the local resolution fluctuation value and the global resolution value. It overcomes the limitation that conventional resolution estimation methods can only evaluate cryo-EM density maps from one dimension, that is, global resolution or local resolution. At the same time, the estimation method provided in this embodiment does not need to provide half-maps and masks, nor does it need to provide and adjust parameters manually.
- FIG. 1 is a schematic diagram of resolution distribution of training samples provided by an embodiment of the present application.
- FIG. 2 is a structure diagram of a neural network model provided by an embodiment of the present application.
- FIG. 3 is a flowchart of a training method for a neural network model provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of an execution process of step S310 provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of an execution process of step S320 provided by an embodiment of the present application.
- FIG. 6 is a logical framework of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application.
- FIG. 7 is a flowchart of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application.
- Figure 8 shows the comparison results of the global resolution of each cryo-EM density map in the test set based on the CryoRes method and the global resolution of each cryo-EM density map published on EMDB.
- Figure 9 shows the comparison results of the median of the local resolution of each cryo-EM density map in the test set based on the ResMap method and the global resolution obtained based on the CryoRes method and the global resolution published by EMDB.
- Figure 10 shows the IoU results of the mask and mask label of each cryo-EM density map in the test set based on the CryoRes method.
- Figure 11 shows the confusion matrix of the IoU results for masks and mask labels.
- FIG. 12 is a structural block diagram of a neural network model training device provided by an embodiment of the present application.
- Fig. 13 is a structural block diagram of a device for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided by an embodiment of the present application.
- Fig. 14 is a structural block diagram of an electronic device provided by an embodiment of the present application.
- the input is a three-dimensional image, including downsampling, upsampling and a full convolutional network similar to a skip connection structure, which is characterized by a fully symmetrical convolutional layer in the downsampling and upsampling parts, and
- the feature map at the downsampling end can skip deep sampling and be spliced to the corresponding upsampling end.
- H(x) is the predicted value
- F(x) corresponds to the residual, so it is called the residual network.
- Encoder-Decoder is a model architecture in deep learning.
- An Encoder is a network that receives input and outputs feature vectors. These feature vectors are actually another representation of the input features and information.
- Decoder is also a network (usually the same network structure as the encoder, but in the opposite direction), which takes the feature vector from the encoder and outputs the result that is the closest to the actual input or expected output.
- the Group Normalization (GN) algorithm refers to dividing the channel dimension into G groups first, then normalizing each group separately, and finally merging the normalized data of the G groups into a feature map .
- Rectified Linear Units also known as corrected linear units, is a commonly used activation function in artificial neural networks, usually referring to nonlinear functions represented by ramp functions and their variants.
- the activation function is to pass the activated information to the next layer when activating a certain part of the neurons in the neural network. It has nonlinearity, differentiability and monotonicity.
- the process of three-dimensional reconstruction of the cryo-EM density map includes two cases.
- the first case is to reconstruct the whole particle data in three dimensions, and the result is called single map; the second case is to randomly divide the particle data into two sub-maps.
- Dataset the results of three-dimensional reconstruction of the two sub-datasets are called half-maps.
- the current algorithms for estimating the resolution of cryo-EM density maps usually can only estimate local resolution or global resolution, resulting in a single function.
- the present application provides a training method of a neural network model, and the neural network model obtained based on the training method can be used to estimate the global resolution value and the local resolution value at the same time.
- Step one prepare the training set.
- cryo-EM density maps are actual experimental data downloaded from the Electron Microscopy Database (EMDB).
- EMDB Electron Microscopy Database
- 1523 cryo-electron microscope density maps were selected, including cryo-electron microscope density maps of proteins and cryo-electron microscope density maps of nucleic acids.
- 1174 cryo-electron microscope density maps were selected as the training set, and 349 cryo-electron microscope density maps were used as the test set to evaluate the estimation effect of the model.
- FIG. 1 is a schematic diagram of resolution distribution of a training set and a test set provided by an embodiment of the present application.
- the global resolutions of the 1523 cryo-EM density maps are all greater than or equal to 1 angstrom and less than 8 angstrom.
- the range from 1 to 8 Angstroms is divided into 6 intervals, namely: [1.0,3.0), [3.0,3.5), [3.5,4.0), [4.0,4.5), [4.5,6.0), [6.0,8.0 ).
- the distribution of 1174 training sets in each interval segment is as follows: 112, 279, 298, 231, 130 and 124.
- the distribution of the 349 test sets in each interval is as follows: 17, 81, 98, 61, 48 and 44.
- Each cryo-EM density map includes three types of labels, namely global resolution value labels, local resolution fluctuation value labels and mask value labels.
- the global resolution value is a numerical value.
- the global resolution value published on the EMDB website is selected as the global resolution value label of the cryo-EM density map.
- the global resolution value published on the EMDB website is the currently recognized accurate global resolution result, so the global resolution value published on the EMDB website is selected as the label.
- the local resolution fluctuation value is used instead of the local resolution value as the second type of label.
- the Blocres method is to cut the cryo-electron microscope density map into Sliding window after the small block, use the FSC method to obtain the resolution of the small block as the local resolution of the center of the small block, and obtain the local resolution of the entire cryo-EM density map little by little.
- the Blocres method requires half-maps, and it is difficult to obtain more training sets.
- the fluctuation value of the ResMap result is selected as the local resolution fluctuation value label.
- ResMap can use single map to obtain local resolution, and the local resolution obtained by it has certain errors. Selecting the local resolution fluctuation value can reduce the error of local resolution, thereby improving the reliability of the label.
- the local resolution obtained by ResMap is a three-dimensional matrix. In this matrix, some values are 100, and some values are not 100. After taking the average value of all non-100 values, subtract each value other than 100 in the local resolution. This average yields the local resolution fluctuation value.
- the mask value mentioned in this article is a three-dimensional matrix whose dimension and length are consistent with the three-dimensional density map. The mask value can be obtained after subsequent threshold processing. Thresholding refers to setting a value less than 0 to 0, indicating the background area, that is, there is no macromolecular information; setting a value greater than or equal to 0 to 1, indicating a non-background area, that is, having macromolecular information.
- a mask is simulated based on a Protein Data Bank (PDB) file corresponding to the density map.
- PDB Protein Data Bank
- the width of the mask is, for example, 4 angstroms.
- Step 2 build a neural network model
- FIG. 2 is a structure diagram of a neural network model provided by an embodiment of the present application.
- the neural network model includes a Residual 3D-Unet module, a first branch module 22, a second branch module 23 and a third branch module 24.
- the Residual 3D-Unet module includes an encoding submodule 211 and a decoding submodule 212.
- the output of the encoding sub-module 211 is used as the input of the decoding sub-module 212
- the output of the decoding sub-module 212 is used as the input of the first branch module 22 and the second branch module 23 .
- the first branch module 22 is used to output the mask value
- the second branch module 23 is used to output the local resolution fluctuation value.
- the output of the coding sub-module 211 is also used as the input of the third branch module 24, and the third branch module 24 is used to output the global resolution value.
- the encoding sub-module 211 includes at least one feature extraction unit and at least one downsampling unit, and the at least one feature extraction unit and at least one downsampling unit are cascaded in an alternate manner.
- the encoding submodule 211 includes a first feature extraction unit, a downsampling unit, and a second feature extraction unit connected in sequence.
- the feature extraction unit is a residual sub-network including three convolutional layers.
- the downsampling unit is the max pooling layer.
- the decoding sub-module 12 includes at least one upsampling unit and at least one feature extraction unit, and at least one upsampling unit and at least one feature extraction unit are cascaded in an alternate manner.
- the decoding sub-module 12 includes an up-sampling unit and a feature extraction unit connected in sequence.
- the upsampling unit is a deconvolutional layer
- the feature extraction unit is a residual subnetwork consisting of three convolutional layers.
- the first branch module 22 includes a convolution layer with a convolution kernel of 3*3 and a convolution layer with a convolution kernel of 1*1.
- the second branch module 23 adopts a classification+regression architecture.
- the second branch module 23 includes a convolutional layer with a convolution kernel of 3*3, a convolutional kernel with a convolutional layer of 1*1 and a soft-Argmax layer.
- the third branch module 24 also adopts the classification+regression architecture.
- the third branch module 24 includes three convolution layers with a convolution kernel of 3*3, two convolution layers with a convolution kernel of 1*1, a global average pooling layer, and a soft-Argmax layer.
- a maximum pooling layer is respectively set after the first two convolution layers with a convolution kernel of 3*3.
- FIG. 3 is a flowchart of a training method for a neural network model provided by an embodiment of the present application. As shown in Figure 3, the training method 300 includes:
- Step S310 Determine the mask value, local resolution fluctuation value, and global resolution value based on the first target cryo-electron microscope density map, where the first target cryo-electron microscope density map is marked with a mask value label, a local resolution fluctuation value label, and a global resolution value. Rate value label.
- Step S320 train the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value approaches the mask value label, and the local resolution fluctuation value approaches the local resolution fluctuation value Value label, global resolution value approaches global resolution value label.
- the first target cryo-electron microscope density map refers to the preprocessed 1174 training sets mentioned above.
- the preprocessing process mentioned here includes: cutting the cryo-electron microscope density map to obtain the circumscribed cube of the biomacromolecule in the cryo-electron microscope density map; performing size scaling on the circumscribed cube of the biomacromolecule to obtain the first target Cryo-EM density map.
- the size of the first target cryo-EM density map is less than or equal to 248*248*248.
- FIG. 4 is a schematic diagram of an execution process of step S310 provided by an embodiment of the present application. As shown in Figure 4, step S310 specifically includes:
- Step S311 performing encoding processing based on the residual module on the first target cryo-EM density map to obtain m feature maps.
- this step is performed by the encoding sub-module 211 .
- Each feature extraction unit in the encoding sub-module 211 outputs a feature map.
- feature extraction is performed on the density map of the first cryo-electron microscope to obtain the first feature map.
- the feature extraction process is performed based on a residual module.
- the first target cryo-EM density map is first subjected to the GN operation, and then the first convolution operation to obtain the first sub-feature map.
- the first sub-feature map undergoes the GN operation, the second convolution operation, the GN operation and the third convolution operation in sequence to obtain the second sub-feature map.
- the ReLU operation is performed on the first sub-feature map, and the first sub-feature map and the second sub-feature map after the ReLU operation are added to obtain the first feature map.
- the first feature map is down-sampled to obtain a down-sampled feature map.
- the feature extraction is a feature extraction process based on the residual module.
- the specific process refer to the above-mentioned process of obtaining the first feature map, and will not repeat it here.
- the encoding sub-module 211 shown in FIG. 2 only includes two feature extraction units and one down-sampling unit, and two feature maps can be obtained. In other embodiments, the coding sub-module 211 may also include three feature extraction units and two down-sampling units, or four feature extraction units and three down-sampling units, etc. In the embodiment of the present application, the encoding sub-module 211 The number of feature extraction units and down-sampling units is not limited.
- step S311 can be summarized as follows: for the i-th feature map, when i is equal to 1, perform feature extraction processing on the first target cryo-electron microscope density map to obtain the first feature map; When i is greater than 1, the i-1th feature map is down-sampled to obtain a down-sampled feature map. Perform feature extraction processing on the downsampled feature map to obtain the i-th feature map.
- i is a positive integer greater than or equal to 1 and less than m
- the feature extraction process refers to the feature extraction process based on the residual module.
- Step S312 decoding the m feature maps to obtain an expected density map.
- step S312 is specifically executed by sequentially performing nonlinear rectification and deconvolution processing on the second feature map to obtain an upsampled feature map.
- the sum of the upsampling feature map and the nonlinear rectification feature map is subjected to feature extraction processing to obtain the desired density map.
- the feature extraction is a feature extraction process based on the residual module. For the specific process, please refer to the above-mentioned process of obtaining the first feature map, and will not go into details here.
- Step S313 determining a mask value and a local resolution fluctuation value based on the expected density map.
- the process of determining the mask value based on the desired density map is performed by the first branch module 22 .
- the desired density map is sequentially passed through a convolution layer with a convolution kernel of 3*3 and a convolution layer with a convolution kernel of 1*1 to obtain a mask value.
- the process of determining the local resolution fluctuation value based on the desired density map is performed by the second branch module 23 .
- the expected density map is classified to obtain multiple first categories and respective weights of the multiple first categories.
- the desired density map is sequentially passed through a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 to obtain multiple first categories and respective weights of the multiple first categories.
- the product of the respective weights of the multiple first categories and the first preset values represented by them is determined as the local resolution fluctuation value.
- the first preset value is set manually and can be reasonably selected according to actual conditions.
- the number of first categories is 37, and the first preset values represented by each of the 37 first categories are: -5, -4.5, -4, -3.5, -3, -2.5, -2,-1.5,-1,-0.9,-0.8,-0.7,-0.6,-0.5,-0.4,-0.3,0.2,-0.1,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7 ,0.8,0.9,1,1.5,2,2.5,3,3.5,4,4.5,5.
- Step S314 determining a global resolution value based on the top-level feature map in the plurality of feature maps.
- step S314 is executed by the third branch module 24 .
- the top-level feature map is the second feature map, which is the output of the encoding sub-module 211 .
- the top-level feature map is classified to obtain multiple second categories and respective weights of the multiple second categories.
- the top-level feature map sequentially undergoes three convolution operations with a convolution kernel of 3*3, two convolution operations with a convolution kernel of 1*1, and a global average pooling operation to obtain multiple second categories and multiple The respective weights of the second categories.
- the first two convolution kernels are 3*3 convolution operations followed by pooling operations.
- the product of the respective weights of the plurality of second categories and the second preset values represented by them is determined as the global resolution value.
- the second preset value is set manually and can be reasonably selected according to actual conditions.
- the number of second categories is ten.
- the second preset values respectively represented by the ten second categories are: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
- FIG. 5 is a schematic diagram of an execution process of step S320 provided by an embodiment of the present application. As shown in Figure 5, step S320 specifically includes:
- Step S321 determining a first loss function based on the mask value and the mask value label.
- binary cross-entropy can be used as the first loss function, and the formula is: in, is the output of the network, and y is the label value.
- Step S322 determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label.
- log10 can be used as the second loss function, and the formula is: in, is the output of the network, and y is the label value.
- Step S323 determining a third loss function based on the global resolution value and the global resolution value label.
- MSE can be used as the third loss function, and the formula is: in, Output the result for the network, and y is the label value.
- Step S324 determining a total loss function based on the first loss function, the second loss function and the third loss function.
- Loss all Loss global + 10 ⁇ Loss local + Loss mask .
- Step S325 updating the parameters of the neural network model based on the gradient of the total loss function.
- the mask value, local resolution fluctuation value and global resolution value can be estimated simultaneously based on a cryo-electron microscope density map.
- This application uses 349 test sets to test the trained neural network model.
- the test results show that the errors of local resolution estimation and global resolution estimation of the neural network model are both 0.44 angstroms, and the average Intersection over Union (IoU) of the mask value is 0.71.
- IoU Intersection over Union
- FIG. 6 is a logical framework of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application.
- FIG. 7 is a flowchart of a method for estimating the resolution of a cryo-electron microscope density map (ie, the CryoRes method) based on a neural network model provided by an embodiment of the present application. As shown in Figure 6 and Figure 7, the CryoRes method 700 includes:
- Step S710 preprocessing the cryo-electron microscope density map to obtain a second target cryo-electron microscope density map.
- the cryo-electron microscope density map here can be any cryo-electron microscope density map.
- the preprocessing process is, for example, cutting the cryo-electron microscope density map to obtain the circumscribed cube of the biomacromolecule in the cryo-electron microscope density map; scaling the size of the circumscribed cube of the biomacromolecule to obtain the second target cryo-electron microscope density map.
- the size of the second target cryo-EM density map is less than or equal to 248*248*248.
- Step S720 determining a mask value, a local resolution fluctuation value and a global resolution value based on the second target cryo-EM density map.
- Step S720 determining a mask value, a local resolution fluctuation value and a global resolution value based on the second target cryo-EM density map.
- Step S730 determining a local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
- the real value corresponding to each voxel in the second target cryo-EM density map can be obtained, which is the local resolution value of the voxel.
- the matrix addition operation is performed on the global resolution value and the local resolution fluctuation value to obtain the first sum.
- the second sum is obtained by multiplying the mask by the first constant and adding the second constant.
- the first constant is -100 and the second constant is 100. The sum of the second sum and the first product is determined as the local resolution value.
- Table 1 shows the comparison results between the CryoRes method 700 shown in FIG. 6 and FIG. 7 and several conventional resolution estimation methods. It can be seen from Table 1 that according to the method for estimating the resolution of the cryo-electron microscope density map provided in this embodiment, the local resolution and the global resolution can be estimated simultaneously based on a cryo-electron microscope density map, which overcomes the conventional resolution estimation method that only The limitations of cryo-EM density maps can be evaluated from one dimension, global resolution or local resolution. At the same time, the estimation method provided in this embodiment does not need to provide half-maps, and does not need to provide parameters such as masks.
- This application evaluates the performance of the CryoRes method 700 shown in FIG. 6 and FIG. 7 from three aspects, including: (1) local resolution; (2) global resolution; (3) mask.
- cryo-EM density maps were selected as test density maps to evaluate the performance of the CryoRes method 700.
- the first experimental density map is the cryo-EM structure of RelA bound to the 70S ribosome (EMDB: EMD-8108).
- the experimental density map was published in 2016, and its dimension is 400*400*400, and the voxel size is 1.34 Angstroms.
- the global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 3.0 angstroms, and the threshold mentioned here is generally 0.143.
- CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution
- Blocres and MonoRes respectively use half-maps as input to obtain local resolution.
- the local resolution based on CryoRes ranges from 3.19 to 3.91 angstroms, and the average and standard deviation are 3.38 angstroms and 0.14 angstroms, respectively.
- the local resolution obtained based on Blocres ranged from 2.88 to 10.89 ⁇ , with a mean and standard deviation of 3.39 ⁇ and 0.77 ⁇ , respectively.
- the local resolution obtained based on ResMap ranges from 2.9 to 5.9 ⁇ , with a mean and standard deviation of 2.9 ⁇ and 0.91 ⁇ , respectively.
- the local resolution obtained based on MonoRes ranges from 2.68 to 8.93 ⁇ , with a mean and standard deviation of 3.67 ⁇ and 1.45 ⁇ , respectively.
- the local resolution obtained based on DeepRes ranges from 2.68 to 6.64 ⁇ , with a mean and standard deviation of 3.41 ⁇ and 0.52 ⁇ , respectively.
- the second experimental density map is the cryo-EM structure of ArfA and TtRF2 bound to the 70S ribosome (EMDB: EMD-3492).
- the experimental density map was published in 2016, and its dimension is 400*400*400, and the voxel size is 1.04 Angstroms.
- the global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 3.35 angstroms, and the threshold mentioned here is generally 0.143.
- CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution
- Blocres and MonoRes respectively use half-maps as input to obtain local resolution.
- the local resolution based on CryoRes is in the range of 3.37-4.07 angstroms, and the average value and standard deviation are 3.57 angstroms and 0.12 angstroms, respectively.
- the local resolution obtained based on Blocres ranged from 3.17 to 11.27 ⁇ , with a mean and standard deviation of 3.62 ⁇ and 0.79 ⁇ , respectively.
- the local resolution obtained based on ResMap ranged from 2.3 to 4.05 ⁇ , and the mean and standard deviation were 2.3 ⁇ and 0.26 ⁇ , respectively.
- the local resolution obtained based on MonoRes ranged from 2.83 to 8.16 ⁇ , with a mean and standard deviation of 4.08 ⁇ and 1.1 ⁇ , respectively.
- the local resolution obtained based on DeepRes ranges from 2.5 to 6.06 ⁇ , with a mean and standard deviation of 2.91 ⁇ and 0.49 ⁇ , respectively.
- the third experimental density map is the cryo-EM structure of Gasdermin A3 membrane pores (EMDB: EMD-7450).
- the experimental density map was published in 2018. Its dimension is 380*380*380, and the voxel size is 1.0 Angstroms.
- the global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 4.4 angstroms, and the threshold mentioned here is generally 0.143.
- CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution
- Blocres and MonoRes respectively use half-maps as input to obtain local resolution.
- the local resolution based on CryoRes is in the range of 3.58-4.46 angstroms, and the average value and standard deviation are 3.75 angstroms and 0.18 angstroms, respectively.
- the local resolution obtained based on Blocres ranged from 3.28 to 4.9 ⁇ , with a mean and standard deviation of 3.7 ⁇ and 0.31 ⁇ , respectively.
- the local resolution obtained based on ResMap ranges from 2.2 to 2.45 ⁇ , and the mean and standard deviation are 2.2 ⁇ and 0.00 ⁇ , respectively.
- the local resolution obtained based on MonoRes ranged from 2.0 to 7.31 ⁇ , with a mean and standard deviation of 4.27 ⁇ and 1.36 ⁇ , respectively.
- the local resolution obtained based on DeepRes ranges from 3.45 to 8.24 ⁇ , with a mean and standard deviation of 5.55 ⁇ and 0.7 ⁇ , respectively.
- the fourth experimental density map is the cryo-EM structure of the bacterial 30S-IF1-IF2-IF3-mRNA-tRNA pre-translation initiation complex (EMDB: EMD-4082).
- the experimental density map was published in 2016. Its dimensions are 260*260*260 and the voxel size is 1.34 Angstroms.
- the global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 8.3 angstroms, and the threshold mentioned here is generally 0.143.
- CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution
- Blocres and MonoRes respectively use half-maps as input to obtain local resolution.
- the local resolution based on CryoRes is in the range of 7.57-9.05 angstroms, and the average value and standard deviation are 7.92 angstroms and 0.25 angstroms, respectively.
- the local resolution based on Blocres ranged from 6.48 to 33.96 ⁇ , with a mean and standard deviation of 9.25 ⁇ and 2.47 ⁇ , respectively.
- the local resolution obtained based on ResMap ranged from 8.9 to 13.4 ⁇ , with a mean and standard deviation of 11.15 ⁇ and 1.05 ⁇ , respectively.
- the local resolution obtained based on MonoRes ranges from 2.68 to 20.49 ⁇ , with a mean and standard deviation of 8.5 ⁇ and 4.59 ⁇ , respectively.
- the local resolution obtained based on DeepRes ranges from 2.68 to 12.9 ⁇ , with a mean and standard deviation of 8.69 ⁇ and 1.05 ⁇ , respectively.
- FIG. 8 shows a comparison result of the global resolution of each cryo-EM density map in the test set obtained based on the CryoRes method 700 and the global resolution of each cryo-EM density map published on EMDB.
- the global resolution based on the CryoRes method 700 is close to the global resolution published on EMDB, and the error is less than 1 angstrom; the error of a few cryo-electron microscope density maps is greater than 1 angstrom , but the error is basically within 2 Angstroms.
- Figure 9 shows the comparison results of the median of the local resolution of each cryo-EM density map in the test set based on the ResMap method and the global resolution obtained based on the CryoRes method and the global resolution published by EMDB.
- the ordinate in Fig. 9 indicates the difference between the median obtained by the ResMap method and the global resolution obtained by the CryoRes method and the global resolution published by EMDB. Comparing Fig. 8 and Fig. 9, it can be seen that the median obtained by the ResMap method is The error between the number of digits and the global resolution published on EMDB is larger than the error between the global resolution obtained based on the CryoRes method 700 and the global resolution published on EMDB.
- the error between the median of the local resolution obtained based on the ResMap method and the global resolution published by EMDB is negatively correlated with the resolution of the cryo-EM density map, that is, the lower the resolution, the greater the error. big.
- the error fluctuation corresponding to the CryoRes-based method 700 shown in FIG. 8 is relatively stable, and is less affected by the resolution.
- cryo-EM density maps in the test set were evaluated, and the average IoU of 349 cryo-EM density maps in the test set was 0.74.
- FIG. 10 shows the IoU results of the masks and mask labels of each cryo-EM density map in the test set based on the CryoRes method 700 . It can be seen from Figure 10 that the IoU of most cryo-EM density maps is above 0.7. For cryo-EM density maps with low IoU results, there are usually large noises or unresolved low-resolution structures.
- the mask label depends on the PDB file for it, not on the cryo-EM density map itself.
- the mask obtained based on the CryoRes method 700 is more dependent on the cryo-EM density map itself, resulting in a lower IoU result, which meets the expectations for the mask.
- Figure 11 shows the confusion matrix of the IoU results for masks and mask labels.
- a confusion matrix was made for the IoU results of 349 cryo-EM density images in the test set to evaluate the recognition effect of the mask obtained by the CryoRes method 700 on biological macromolecules and background parts. It can be seen from Fig. 11 that the recognition rate of the macromolecular position provided by the mask label reaches 0.91, and the recognition rate of the background position reaches 0.92.
- FIG. 12 is a structural block diagram of a neural network model training device provided by an embodiment of the present application.
- the training device 800 includes a first determination module 810 and a training module 820 .
- the first determination module 810 is used to determine the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, the first target cryo-electron microscope density map is marked with mask value label, local resolution Fluctuation value labels and global resolution value labels.
- the training module 820 is used to train the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value tends to the mask value label, and the local resolution fluctuation value tends to the local resolution
- the rate fluctuation value label, the global resolution value approaches the global resolution value label.
- the first determination module 810 includes an encoding submodule, a decoding submodule, a first branch module, a second branch module and a third branch module.
- the coding sub-module is used to perform coding processing based on the residual module on the first target cryo-EM density map to obtain m feature maps.
- the decoding sub-module is used to decode the m feature maps to obtain the expected density map.
- a first branch module is used to determine a mask value based on the desired density map.
- the second branch module is used to determine the local resolution fluctuation value based on the desired density map.
- the third branch module is used to determine the global resolution value based on the top-level feature map in the m feature maps.
- the training module 820 includes a first determination submodule, a second determination submodule, a third determination submodule, a fourth determination submodule and an update module.
- the first determination submodule is used to determine the first loss function based on the mask value and the mask value label.
- the second determining submodule is used for determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label.
- the third determining submodule is used for determining a third loss function based on the global resolution value and the global resolution value label.
- the fourth determining submodule is used for determining the total loss function based on the first loss function, the second loss function and the third loss function.
- the update module is used to update the parameters of the neural network model based on the gradient of the total loss function.
- the mask value, local resolution fluctuation value and global resolution value can be estimated simultaneously based on a cryo-electron microscope density map.
- the training device for the neural network model provided in this embodiment belongs to the same application concept as the training method for the neural network model provided in the embodiment of the present application, and can execute the training method for the neural network model provided in any embodiment of the application.
- Corresponding functional modules and beneficial effects of the neural network model training method For technical details not described in detail in this embodiment, refer to the training method of the neural network model provided in the embodiment of the present application, which will not be repeated here.
- the present application also provides a device for estimating the resolution of the cryo-electron microscope density map based on the neural network model.
- Fig. 13 is a structural block diagram of a device for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided by an embodiment of the present application.
- the estimation device 900 includes a preprocessing module 910 , a second determination module 920 and a third determination module 930 .
- the preprocessing module 910 is used for preprocessing the cryo-electron microscope density map to obtain the second target cryo-electron microscope density map.
- the second determining module 920 is configured to determine a mask value, a local resolution fluctuation value and a global resolution value based on the second target cryo-EM density map.
- the third determination module 930 is used for determining the local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
- the local resolution value and the global resolution value can be estimated at the same time based on a cryo-electron microscope density map, which overcomes that the conventional resolution estimation method can only start from one dimension, That is, global resolution or local resolution evaluates the limitations of cryo-EM density maps.
- the estimation method provided in this embodiment does not need to provide half-maps, and does not need to provide parameters such as masks.
- the device for estimating the resolution of the cryo-electron microscope density map based on the neural network model provided in this embodiment belongs to the same application concept as the method for estimating the resolution of the cryo-electron microscope density map based on the neural network model provided in the embodiment of the present application, and this application can be implemented
- the method for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided in any embodiment of the application has the corresponding functional modules and beneficial effects for executing the method for estimating the resolution of a cryo-electron microscope density map based on a neural network model.
- Fig. 14 is a structural block diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 14 , electronic device 10 includes one or more processors 11 and memory 12 .
- Processor 11 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 10 to perform desired functions.
- CPU central processing unit
- Processor 11 may control other components in electronic device 10 to perform desired functions.
- Memory 12 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
- the volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example.
- Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like.
- One or more computer program instructions can be stored on the computer-readable storage medium, and the processor 11 can run the program instructions to implement the training method of the neural network model and the neural network-based Estimation methods for cryo-EM density map resolution of network models and/or other desired features.
- Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.
- the electronic device 10 may further include: an input device 13 and an output device 14, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
- the output device 14 can output various information to the outside, including determined distance information, direction information, and the like.
- Output devices 14 may include, for example, displays, speakers, printers, and communication networks and remote output devices to which they are connected, among others.
- the electronic device 10 may also include any other suitable components.
- embodiments of the present application may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the procedures described in the above-mentioned "Exemplary Methods" section of this specification. Steps in the method for training a neural network model and the method for estimating the resolution of a cryo-electron microscope density map based on the neural network model according to various embodiments of the present application.
- the computer program product can write program codes for executing the operations of the embodiments of the present application in any combination of one or more programming languages.
- the programming languages include object-oriented programming languages, such as Java, C++, etc., and also include conventional A procedural programming language such as "C" or similar programming language.
- the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.
- the embodiment of the present application may also be a computer-readable storage medium, on which computer program instructions are stored.
- the computer program instructions When executed by the processor, the computer program instructions cause the processor 11 to execute the method described in the above-mentioned "Exemplary Method" section of this specification. Steps in the method for training a neural network model and the method for estimating the resolution of a cryo-electron microscope density map based on the neural network model according to various embodiments of the present application.
- the computer readable storage medium may employ any combination of one or more readable media.
- the readable medium may be a readable signal medium or a readable storage medium.
- a readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of this application.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Analysing Materials By The Use Of Radiation (AREA)
Abstract
The present application provides a neural network model training method and apparatus, a resolution estimation method and apparatus for a cryo-electron microscope density map, a computer device, and a storage medium, used for solving the problems in the prior art that input data of a resolution estimation algorithm of a cryo-electron microscope density map is not easy to obtain, and the calculation time is long. The neural network model training method comprises: determining a mask value, a local resolution fluctuation value, and a global resolution value on the basis of a first target cryo-electron microscope density map, wherein a mask value label, a local resolution fluctuation value label, and a global resolution value label are annotated for the first target cryo-electron microscope density map; and training a neural network model on the basis of the mask value, the local resolution fluctuation value, and the global resolution value, so that the mask value converges to the mask value label, the local resolution fluctuation value converges to the local resolution fluctuation value label, and the global resolution value converges to the global resolution value label.
Description
本申请涉及冷冻电镜密度图的分辨率估计技术领域,具体涉及一种神经网络模型的训练方法和装置、冷冻电镜密度图分辨率估计方法和装置、计算机设备,以及存储介质。The present application relates to the technical field of resolution estimation of cryo-electron microscope density maps, in particular to a neural network model training method and device, a cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage media.
冷冻电镜密度图的分辨率估计是确定原子结构的关键步骤。冷冻电镜密度图的分辨率包括全局分辨率和局部分辨率。通常情况下针对全局分辨率和局部分辨率分别采用不同的算法来估计得到,同一种分辨率估计方法仅能估计一种分辨率,即全局分辨率或局部分辨率。例如,对于全局分辨率而言,可以采用傅里叶壳相关算法估计得到。对于局部分辨率而言,可以采用ResMap算法估计得到。Resolution estimation of cryo-EM density maps is a critical step in determining atomic structure. The resolution of cryo-EM density map includes global resolution and local resolution. Usually, different algorithms are used to estimate the global resolution and the local resolution, and the same resolution estimation method can only estimate one kind of resolution, that is, the global resolution or the local resolution. For example, the global resolution can be estimated by Fourier shell correlation algorithm. For the local resolution, it can be estimated by the ResMap algorithm.
常规分辨率估计方法,例如Blocres方法的输入数据之一为half-maps,因此,当需要对从EMDB上下载的冷冻电镜密度图或其它途径得到的冷冻电镜密度图进行分辨率估计时,需要先获得half-maps,而half-maps不总是被提供,导致分辨率估计的输入数据难以获得,难以或需要复杂的前期准备工作得到输入数据。Conventional resolution estimation methods, for example, one of the input data of the Blocres method is half-maps. Therefore, when it is necessary to perform resolution estimation on the cryo-electron microscope density map downloaded from EMDB or the cryo-electron microscope density map obtained by other means, it is necessary to first Obtain half-maps, and half-maps are not always provided, resulting in difficult to obtain input data for resolution estimation, difficult or require complex pre-preparation work to obtain input data.
发明内容Contents of the invention
有鉴于此,本申请实施例提供了一种神经网络模型训练方法和装置、冷冻电镜密度图分辨率估计方法和装置、计算机设备,以及存储介质,以解决现有技术中冷冻电镜密度图的分辨率估计算法输入数据不易获得、计算时间长的问题。In view of this, the embodiment of the present application provides a neural network model training method and device, a cryo-electron microscope density map resolution estimation method and device, computer equipment, and a storage medium to solve the problem of resolution of cryo-electron microscope density maps in the prior art. The input data of the rate estimation algorithm is not easy to obtain and the calculation time is long.
本申请第一方面提供了一种神经网络模型的训练方法,包括:基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值,第一目标冷冻电镜密度图标注有掩膜值标签、局部分辨率波动值标签和全局分辨率值标签;基于掩膜值、局部分辨率波动值和全局分辨率值对神经网络模型进行训练,以使掩膜值趋近于掩膜值标签、局部分辨率波动值趋近于局部分辨率波动值标签、全局分辨率值趋近于全局分辨率值标签。The first aspect of the present application provides a training method for a neural network model, including: determining a mask value, a local resolution fluctuation value, and a global resolution value based on the first target cryo-electron microscope density map, annotating the first target cryo-electron microscope density map There are mask value labels, local resolution fluctuation value labels and global resolution value labels; the neural network model is trained based on the mask value, local resolution fluctuation value and global resolution value, so that the mask value tends to be close to the mask value Membrane Value Label, Local Resolution Fluctuation Value Approaches Local Resolution Fluctuation Value Label, Global Resolution Value Approaches Global Resolution Value Label.
在一个实施例中,基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值包括:对第一目标冷冻电镜密度图进行基于残差模块的编码处理,得到m个特征图;对m个特征图进行解码,得到期望密度图;基于期望密度图确定掩膜值和局部分辨率波动值;基于m个特征图中的顶层特征图确定全局分辨率值。In one embodiment, determining the mask value, local resolution fluctuation value, and global resolution value based on the first target cryo-electron microscope density map includes: performing coding processing based on the residual module on the first target cryo-electron microscope density map to obtain m feature maps; decode the m feature maps to obtain the expected density map; determine the mask value and local resolution fluctuation value based on the expected density map; determine the global resolution value based on the top-level feature maps in the m feature maps.
在一个实施例中,基于期望密度图确定掩膜值包括:期望密度图顺次经过卷积核为3*3的卷积操作和卷积核为1*1的卷积操作,得到掩膜值。In one embodiment, determining the mask value based on the expected density map includes: the expected density map undergoes a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 in sequence to obtain the mask value .
在一个实施例中,基于期望密度图确定局部分辨率波动值包括:对期望密度图进行分类,得到多个第一类别和多个第一类别各自的权重;确定多个第一类别各自的权重和各自代表的第一预设值的乘积为局部分辨率波动值。In one embodiment, determining the local resolution fluctuation value based on the expected density map includes: classifying the expected density map to obtain multiple first categories and respective weights of the multiple first categories; determining respective weights of the multiple first categories The product of each and the first preset value represented by each is the local resolution fluctuation value.
在一个实施例中,基于多个特征图中的顶层特征图确定全局分辨率值包括:对顶层特征图进行分类,得到多个第二类别和多个第二类别各自的权重;确定多个第二类别各自的权重和各自代表的第二预设值的乘积为全局分辨率值。In one embodiment, determining the global resolution value based on the top-level feature maps in the multiple feature maps includes: classifying the top-level feature maps to obtain multiple second categories and their respective weights for multiple second categories; The product of the respective weights of the two categories and the second preset values represented by them is the global resolution value.
在一个实施例中,基于掩膜值、局部分辨率波动值和全局分辨率值对神经网络模型进行训练包括:基于掩膜值和掩膜值标签确定第一损失函数;基于局部分辨率波动值和局部分辨率波动值标签确定第二损失函数;基于全局分辨率值和全局分辨率值标签确定第三损失函数;基于第一损失函数、第二损失函数和第三损失函数确定总损失函数;基于总损失函数的梯度更新神经网络模型的参数。In one embodiment, training the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value includes: determining a first loss function based on the mask value and the mask value label; Determine the second loss function with the local resolution fluctuation value label; determine the third loss function based on the global resolution value and the global resolution value label; determine the total loss function based on the first loss function, the second loss function and the third loss function; Update the parameters of the neural network model based on the gradient of the total loss function.
在一个实施例中,在基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值之前,还包括:对冷冻电镜密度图进行切割,得到冷冻电镜密度图中的生物大分子外接立方体;对生物大分子外接立方体进行尺寸缩放,得到第一目标冷冻电镜密度图。In one embodiment, before determining the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, it further includes: cutting the cryo-electron microscope density map to obtain the Biomacromolecule circumscribed cube; scale the biomacromolecule circumscribed cube to obtain the density map of the first cryo-electron microscope.
本申请第二方面提供了一种基于神经网络的冷冻电镜密度图分辨率估计方法,包括:基于第二目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值;基于掩膜值、局部分辨率波动值和全局分辨率值确定局部分辨率值。The second aspect of the present application provides a method for estimating the resolution of a cryo-electron microscope density map based on a neural network, including: determining a mask value, a local resolution fluctuation value, and a global resolution value based on a second target cryo-electron microscope density map; The film value, the local resolution fluctuation value, and the global resolution value determine the local resolution value.
本申请第三方面提供了一种神经网络模型的训练装置,包括:第一确定模块,基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值,第一目标冷冻电镜密度图标注有掩膜值标签、局部分辨率波动值标签和全局分辨率值标签;训练模块,基于掩膜值、局部分辨率波动值和全局分辨率值对神经网络模型进行训练,以使掩膜值趋近于掩膜值标签、局部分辨率波动值趋近于局部分辨率波动值标签、全局分辨率值趋近于全局分辨率值标签。The third aspect of the present application provides a neural network model training device, including: a first determination module, which determines the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, the first target The cryo-electron microscope density map is marked with a mask value label, a local resolution fluctuation value label and a global resolution value label; the training module trains the neural network model based on the mask value, local resolution fluctuation value and global resolution value, to Make the mask value approximate the mask value label, the local resolution fluctuation value approximate the local resolution fluctuation value label, and the global resolution value approximate the global resolution value label.
本申请第四方面提供了一种基于神经网络的冷冻电镜密度图分辨率估计装置,包括:第一确定模块,基于第二目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值;第二确定模块,基于掩膜值、局部分辨率波动值和全局分辨率值确定局部分辨率值。The fourth aspect of the present application provides a neural network-based cryo-electron microscope density map resolution estimation device, including: a first determination module, based on the second target cryo-electron microscope density map to determine the mask value, local resolution fluctuation value and global resolution rate value; the second determination module determines the local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
本申请第五方面提供了一种计算机设备,包括存储器、处理器以及存储在存储器上被处理器执行的计算机程序,其特征在于,处理器执行计算机程序时实现如上述任一实施例提供的神经网络模型的训练方法的步骤或上述任一实施例提供的基于神经网络的冷冻电镜密度图分辨率检测方法。The fifth aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executed by the processor. The steps of the training method of the network model or the method for detecting the resolution of the cryo-electron microscope density map based on the neural network provided by any of the above-mentioned embodiments.
本申请第六方面提供了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,计算机程序被处理器执行时实现上述任一实施例提供的神经网络模型的训练方法的步骤或上述任一实施例提供的基于神经网络的冷 冻电镜密度图分辨率检测方法。The sixth aspect of the present application provides a computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the steps or steps of the neural network model training method provided by any of the above-mentioned embodiments are implemented. The neural network-based method for detecting the resolution of cryo-electron microscope density maps provided by any of the above-mentioned embodiments.
根据本申请提供的神经网络模型训练方法和装置、冷冻电镜密度图分辨率估计方法和装置、计算机设备,以及存储介质,可以基于一张冷冻电镜密度图同时估计出掩膜值、局部分辨率波动值和全局分辨率值。后续,可以基于掩膜值、局部分辨率波动值和全局分辨率值确定出局部分辨率值。克服了常规分辨率估计方法只能从一个维度,即全局分辨率或局部分辨率评价冷冻电镜密度图的局限性。与此同时,本实施例提供的估计方法无需提供half-maps和掩膜,也无需人为提供、调节参数。According to the neural network model training method and device, cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage medium provided in this application, mask values and local resolution fluctuations can be estimated at the same time based on a cryo-electron microscope density map value and the global resolution value. Subsequently, the local resolution value may be determined based on the mask value, the local resolution fluctuation value and the global resolution value. It overcomes the limitation that conventional resolution estimation methods can only evaluate cryo-EM density maps from one dimension, that is, global resolution or local resolution. At the same time, the estimation method provided in this embodiment does not need to provide half-maps and masks, nor does it need to provide and adjust parameters manually.
图1为本申请一实施例提供的训练样本的分辨率分布情况示意图。FIG. 1 is a schematic diagram of resolution distribution of training samples provided by an embodiment of the present application.
图2为本申请一实施例提供的神经网络模型架构图。FIG. 2 is a structure diagram of a neural network model provided by an embodiment of the present application.
图3为本申请一实施例提供的神经网络模型的训练方法流程图。FIG. 3 is a flowchart of a training method for a neural network model provided by an embodiment of the present application.
图4为本申请一实施例提供的步骤S310的执行过程示意图。FIG. 4 is a schematic diagram of an execution process of step S310 provided by an embodiment of the present application.
图5为本申请一实施例提供的步骤S320的执行过程示意图。FIG. 5 is a schematic diagram of an execution process of step S320 provided by an embodiment of the present application.
图6为本申请一实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计方法的逻辑架构。FIG. 6 is a logical framework of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application.
图7为本申请一实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计方法的流程图。FIG. 7 is a flowchart of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application.
图8示出了基于CryoRes方法得到的测试集中每个冷冻电镜密度图的全局分辨率和每个冷冻电镜密度图各自在EMDB上公布的全局分辨率的对比结果。Figure 8 shows the comparison results of the global resolution of each cryo-EM density map in the test set based on the CryoRes method and the global resolution of each cryo-EM density map published on EMDB.
图9示出了基于ResMap方法得到的测试集中每个冷冻电镜密度图的局部分辨率的中位数与基于CryoRes方法得到的全局分辨率分别和EMDB公布的全局分辨率的比较结果。Figure 9 shows the comparison results of the median of the local resolution of each cryo-EM density map in the test set based on the ResMap method and the global resolution obtained based on the CryoRes method and the global resolution published by EMDB.
图10示出了基于CryoRes方法得到的测试集中每个冷冻电镜密度图的 掩膜与掩膜标签的IoU结果。Figure 10 shows the IoU results of the mask and mask label of each cryo-EM density map in the test set based on the CryoRes method.
图11示出了掩膜与掩膜标签的IoU结果的混淆矩阵。Figure 11 shows the confusion matrix of the IoU results for masks and mask labels.
图12为本申请一实施例提供的神经网络模型的训练装置的结构框图。FIG. 12 is a structural block diagram of a neural network model training device provided by an embodiment of the present application.
图13为本申请一实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计装置的结构框图。Fig. 13 is a structural block diagram of a device for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided by an embodiment of the present application.
图14是本申请一实施例提供的电子设备的结构框图。Fig. 14 is a structural block diagram of an electronic device provided by an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some, not all, embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
在介绍本申请提供的神经网络模型训练方法和装置、冷冻电镜密度图分辨率估计方法和装置、计算机设备,以及存储介质之前,先对本申请实施例中可能涉及到的专业术语或者名词进行简要介绍,以便于本领域技术人员理解。Before introducing the neural network model training method and device, cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage media provided by this application, a brief introduction to the technical terms or nouns that may be involved in the embodiments of this application , so that those skilled in the art can understand.
三维全卷积网络(3D-UNet),输入为三维图像,包含降采样、升采样和类似跳跃连接结构的全卷积网络,其特点是卷积层在降采样和升采样部分完全对称,且降采样端的特征图可以跳过深层采样,被拼接至对应的升采样端。Three-dimensional fully convolutional network (3D-UNet), the input is a three-dimensional image, including downsampling, upsampling and a full convolutional network similar to a skip connection structure, which is characterized by a fully symmetrical convolutional layer in the downsampling and upsampling parts, and The feature map at the downsampling end can skip deep sampling and be spliced to the corresponding upsampling end.
残差(Residual)网络,神经网络的一层通常可以看做y=H(x),而残差网络的一个残差块可以表示为H(x)=F(x)+x,也就是F(x)=H(x)-x。在单位映射中,y=x便是观测值,而H(x)是预测值,所以F(x)便对应着残差,因此叫做残差网络。Residual (Residual) network, a layer of neural network can usually be regarded as y=H(x), and a residual block of residual network can be expressed as H(x)=F(x)+x, which is F (x)=H(x)-x. In the unit mapping, y=x is the observed value, and H(x) is the predicted value, so F(x) corresponds to the residual, so it is called the residual network.
编码-解码(Encoder-Decoder)是深度学习中的一种模型架构,一个Encoder(编码器)是一个接收输入,输出特征向量的网络。这些特征向量 实际上就是输入的特征和信息的另一种表示。Decoder(解码器)同样也是一个网络(通常与编码器相同的网络结构,但方向相反),它从编码器获取特征向量,并输出与实际输入或预期输出最近似的结果。Encoder-Decoder is a model architecture in deep learning. An Encoder (encoder) is a network that receives input and outputs feature vectors. These feature vectors are actually another representation of the input features and information. Decoder (decoder) is also a network (usually the same network structure as the encoder, but in the opposite direction), which takes the feature vector from the encoder and outputs the result that is the closest to the actual input or expected output.
群组归一化(Group Normalization,GN)算法,是指先把通道维分成G组,然后针对每个组单独进行归一化处理,最后把G个组归一化后的数据合并成一张特征图。The Group Normalization (GN) algorithm refers to dividing the channel dimension into G groups first, then normalizing each group separately, and finally merging the normalized data of the G groups into a feature map .
线性整流函数(Rectified Linear Units,RELU),又称修正线性单元,是一种人工神经网络中常用的激活函数,通常指代以斜坡函数及其变种为代表的非线性函数。激活函数是在激活神经网络中某一部分神经元运行时,将激活后的信息传入下一层,它具有非线性、可微和单调性。Rectified Linear Units (RELU), also known as corrected linear units, is a commonly used activation function in artificial neural networks, usually referring to nonlinear functions represented by ramp functions and their variants. The activation function is to pass the activated information to the next layer when activating a certain part of the neurons in the neural network. It has nonlinearity, differentiability and monotonicity.
冷冻电镜密度图进行三维重构的过程包括两种情况,第一种情况是对颗粒数据的整体进行三维重构,得到的结果称为single map;第二种情况是将颗粒数据随机分成两个子数据集,对该两个子数据集分别进行三维重构得到的结果称为half-maps。The process of three-dimensional reconstruction of the cryo-EM density map includes two cases. The first case is to reconstruct the whole particle data in three dimensions, and the result is called single map; the second case is to randomly divide the particle data into two sub-maps. Dataset, the results of three-dimensional reconstruction of the two sub-datasets are called half-maps.
示例性方法exemplary method
如背景技术所述,目前用于估计冷冻电镜密度图的分辨率的算法通常只能估计局部分辨率或者全局分辨率,导致功能单一。有鉴于此,本申请提供了一种神经网络模型的训练方法,基于该训练方法得到的神经网络模型可以同时用于估计全局分辨率值和局部分辨率值。As mentioned in the background art, the current algorithms for estimating the resolution of cryo-EM density maps usually can only estimate local resolution or global resolution, resulting in a single function. In view of this, the present application provides a training method of a neural network model, and the neural network model obtained based on the training method can be used to estimate the global resolution value and the local resolution value at the same time.
下面按照准备训练集、搭建神经网络模型、模型训练和模型测试的顺序进行具体描述。The following is a detailed description in the order of preparing the training set, building the neural network model, model training, and model testing.
步一,准备训练集。Step one, prepare the training set.
准备多张冷冻电镜密度图。在一示例中,多张冷冻电镜密度图是从电子显微镜数据库(EMDB)下载下来的真实实验数据。本次训练过程选用1523张冷冻电镜密度图,包括蛋白质的冷冻电镜密度图和核酸的冷冻电镜密度图。其中,选用1174张冷冻电镜密度图作为训练集,349张冷冻电镜密度图作 为测试集,用于评估模型的估计效果。图1为本申请一实施例提供的训练集和测试集的分辨率分布情况示意图。如图1所示,1523张冷冻电镜密度图的全局分辨率均大于或等于1埃并且小于8埃。1~8埃范围被划分为6个区间段,分别为:[1.0,3.0)、[3.0,3.5)、[3.5,4.0)、[4.0,4.5)、[4.5,6.0)、[6.0,8.0)。其中,1174张训练集在各区间段的分布情况依次为:112张、279张、298张、231张、130张和124张。349张测试集在各区间段的分布情况依次为:17张、81张、98张、61张、48张和44张。Prepare multiple cryo-EM density maps. In one example, the plurality of cryo-EM density maps are actual experimental data downloaded from the Electron Microscopy Database (EMDB). In this training process, 1523 cryo-electron microscope density maps were selected, including cryo-electron microscope density maps of proteins and cryo-electron microscope density maps of nucleic acids. Among them, 1174 cryo-electron microscope density maps were selected as the training set, and 349 cryo-electron microscope density maps were used as the test set to evaluate the estimation effect of the model. FIG. 1 is a schematic diagram of resolution distribution of a training set and a test set provided by an embodiment of the present application. As shown in Figure 1, the global resolutions of the 1523 cryo-EM density maps are all greater than or equal to 1 angstrom and less than 8 angstrom. The range from 1 to 8 Angstroms is divided into 6 intervals, namely: [1.0,3.0), [3.0,3.5), [3.5,4.0), [4.0,4.5), [4.5,6.0), [6.0,8.0 ). Among them, the distribution of 1174 training sets in each interval segment is as follows: 112, 279, 298, 231, 130 and 124. The distribution of the 349 test sets in each interval is as follows: 17, 81, 98, 61, 48 and 44.
为了监督训练神经网络模型,需要为每一张冷冻电镜密度图制作标签。每一张冷冻电镜密度图包括三类标签,即全局分辨率值标签、局部分辨率波动值标签和掩膜值标签。其中,全局分辨率值为一个数值。在一示例中,选用EMDB网站上公布的全局分辨率值作为冷冻电镜密度图的全局分辨率值标签。EMDB网站上公布的全局分辨率值是目前认可的准确全局分辨率结果,因此选用EMDB网站上公布的全局分辨率值作为标签。本申请实施例采用局部分辨率波动值代替局部分辨率值作为第二类标签,这是因为,目前,较公认的估计局部分辨率的方法是Blocres方法,Blocres方法是将冷冻电镜密度图切成小块后滑窗,利用FSC的方法得到小块的分辨率,以作为这个小块中心的局部分辨率,一点点得到整个冷冻电镜密度图的局部分辨率。Blocres方法需要half-maps,难以获得较多的训练集。在一示例中,选用ResMap结果的波动值作为局部分辨率波动值标签。ResMap可以利用single map得到局部分辨率,其得到的局部分辨率存在一定误差,选用局部分辨率波动值可以减小局部分辨率的误差,从而提高标签的可靠性。ResMap得到的局部分辨率是一个三维矩阵,这个矩阵里面有一些值是100,有一些值是非100,非100的所有值取平均值后,将局部分辨率中的非100的每个值减去这个平均值就得到局部分辨率波动值。本文提到的掩膜值是一个三维矩阵,其维数和长度都和三维密度图保持一致。掩膜值后续经过阈值处理后可以得到掩膜。阈值处理是指将小于0的值置为0,表示背景区域,即无大分子信 息;将大于或等于0的值置为1,表示非背景区域,即有大分子信息。这样,通过将掩膜和冷冻电镜密度图相乘,便可以将冷冻电镜密度图中的颗粒区域提取出来。在一示例中,基于密度图对应的蛋白质数据库(Protein Data Bank,PDB)文件模拟一个掩膜。该掩膜的宽度例如为4埃。In order to supervise the training of the neural network model, it is necessary to make a label for each cryo-EM density map. Each cryo-EM density map includes three types of labels, namely global resolution value labels, local resolution fluctuation value labels and mask value labels. Wherein, the global resolution value is a numerical value. In one example, the global resolution value published on the EMDB website is selected as the global resolution value label of the cryo-EM density map. The global resolution value published on the EMDB website is the currently recognized accurate global resolution result, so the global resolution value published on the EMDB website is selected as the label. In the embodiment of the present application, the local resolution fluctuation value is used instead of the local resolution value as the second type of label. This is because, at present, the more recognized method for estimating the local resolution is the Blocres method. The Blocres method is to cut the cryo-electron microscope density map into Sliding window after the small block, use the FSC method to obtain the resolution of the small block as the local resolution of the center of the small block, and obtain the local resolution of the entire cryo-EM density map little by little. The Blocres method requires half-maps, and it is difficult to obtain more training sets. In an example, the fluctuation value of the ResMap result is selected as the local resolution fluctuation value label. ResMap can use single map to obtain local resolution, and the local resolution obtained by it has certain errors. Selecting the local resolution fluctuation value can reduce the error of local resolution, thereby improving the reliability of the label. The local resolution obtained by ResMap is a three-dimensional matrix. In this matrix, some values are 100, and some values are not 100. After taking the average value of all non-100 values, subtract each value other than 100 in the local resolution. This average yields the local resolution fluctuation value. The mask value mentioned in this article is a three-dimensional matrix whose dimension and length are consistent with the three-dimensional density map. The mask value can be obtained after subsequent threshold processing. Thresholding refers to setting a value less than 0 to 0, indicating the background area, that is, there is no macromolecular information; setting a value greater than or equal to 0 to 1, indicating a non-background area, that is, having macromolecular information. In this way, by multiplying the mask and the cryo-EM density map, the particle regions in the cryo-EM density map can be extracted. In one example, a mask is simulated based on a Protein Data Bank (PDB) file corresponding to the density map. The width of the mask is, for example, 4 angstroms.
步二,搭建神经网络模型 Step 2, build a neural network model
图2为本申请一实施例提供的神经网络模型架构图。该神经网络模型包括Residual 3D-Unet模块、第一分支模块22、第二分支模块23和第三分支模块24。其中,Residual 3D-Unet模块包括编码子模块211和解码子模块212。编码子模块211的输出作为解码子模块212的输入,解码子模块212的输出作为第一分支模块22和第二分支模块23的输入。第一分支模块22用于输出掩膜值,第二分支模块23用于输出局部分辨率波动值。编码子模块211的输出还作为第三分支模块24的输入,第三分支模块24用于输出全局分辨率值。FIG. 2 is a structure diagram of a neural network model provided by an embodiment of the present application. The neural network model includes a Residual 3D-Unet module, a first branch module 22, a second branch module 23 and a third branch module 24. Wherein, the Residual 3D-Unet module includes an encoding submodule 211 and a decoding submodule 212. The output of the encoding sub-module 211 is used as the input of the decoding sub-module 212 , and the output of the decoding sub-module 212 is used as the input of the first branch module 22 and the second branch module 23 . The first branch module 22 is used to output the mask value, and the second branch module 23 is used to output the local resolution fluctuation value. The output of the coding sub-module 211 is also used as the input of the third branch module 24, and the third branch module 24 is used to output the global resolution value.
具体而言,编码子模块211包括至少一个特征提取单元和至少一个下采样单元,至少一个特征提取单元和至少一个下采样单元按照交替方式级联。例如,如图2所示,编码子模块211包括顺次连接的第一特征提取单元、下采样单元和第二特征提取单元。在一个示例中,如图2所示,特征提取单元为一个包括三个卷积层的残差子网络。下采样单元为最大池化层。Specifically, the encoding sub-module 211 includes at least one feature extraction unit and at least one downsampling unit, and the at least one feature extraction unit and at least one downsampling unit are cascaded in an alternate manner. For example, as shown in FIG. 2 , the encoding submodule 211 includes a first feature extraction unit, a downsampling unit, and a second feature extraction unit connected in sequence. In one example, as shown in Figure 2, the feature extraction unit is a residual sub-network including three convolutional layers. The downsampling unit is the max pooling layer.
解码子模块12包括至少一个上采样单元和至少一个特征提取单元,至少一个上采样单元和至少一个特征提取单元按照交替方式级联。例如,如图2所示,解码子模块12包括顺次连接的一个上采样单元和一个特征提取单元。在一个示例中,如图2所示,上采样单元为反卷积层,特征提取单元为一个包含三个卷积层的残差子网络。The decoding sub-module 12 includes at least one upsampling unit and at least one feature extraction unit, and at least one upsampling unit and at least one feature extraction unit are cascaded in an alternate manner. For example, as shown in FIG. 2 , the decoding sub-module 12 includes an up-sampling unit and a feature extraction unit connected in sequence. In one example, as shown in Figure 2, the upsampling unit is a deconvolutional layer, and the feature extraction unit is a residual subnetwork consisting of three convolutional layers.
第一分支模块22包括一个卷积核为3*3的卷积层和一个卷积核为1*1的卷积层。The first branch module 22 includes a convolution layer with a convolution kernel of 3*3 and a convolution layer with a convolution kernel of 1*1.
第二分支模块23采用分类+回归架构。例如,如图2所示,第二分支模 块23包括一个卷积核为3*3的卷积层、一个卷积核为1*1的卷积层和一个soft-Argmax层。The second branch module 23 adopts a classification+regression architecture. For example, as shown in Figure 2, the second branch module 23 includes a convolutional layer with a convolution kernel of 3*3, a convolutional kernel with a convolutional layer of 1*1 and a soft-Argmax layer.
第三分支模块24也采用了分类+回归架构。例如,如图2所示,第三分支模块24包括三个卷积核为3*3的卷积层、两个卷积核为1*1的卷积层、一个全局平均池化层和一个soft-Argmax层。其中,前两个卷积核为3*3的卷积层之后分别设置有一个最大池化层。The third branch module 24 also adopts the classification+regression architecture. For example, as shown in FIG. 2, the third branch module 24 includes three convolution layers with a convolution kernel of 3*3, two convolution layers with a convolution kernel of 1*1, a global average pooling layer, and a soft-Argmax layer. Among them, a maximum pooling layer is respectively set after the first two convolution layers with a convolution kernel of 3*3.
步三,模型训练 Step 3, model training
图3为本申请一实施例提供的神经网络模型的训练方法流程图。如图3所示,训练方法300包括:FIG. 3 is a flowchart of a training method for a neural network model provided by an embodiment of the present application. As shown in Figure 3, the training method 300 includes:
步骤S310,基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值,第一目标冷冻电镜密度图标注有掩膜值标签、局部分辨率波动值标签和全局分辨率值标签。Step S310: Determine the mask value, local resolution fluctuation value, and global resolution value based on the first target cryo-electron microscope density map, where the first target cryo-electron microscope density map is marked with a mask value label, a local resolution fluctuation value label, and a global resolution value. Rate value label.
步骤S320,基于掩膜值、局部分辨率波动值和全局分辨率值对神经网络模型进行训练,以使掩膜值趋近于掩膜值标签、局部分辨率波动值趋近于局部分辨率波动值标签、全局分辨率值趋近于全局分辨率值标签。Step S320, train the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value approaches the mask value label, and the local resolution fluctuation value approaches the local resolution fluctuation value Value label, global resolution value approaches global resolution value label.
在步骤S310中,第一目标冷冻电镜密度图是指上面提到的1174张训练集经过预处理后得到的。在一个实施例中,这里提到的预处理过程包括:对冷冻电镜密度图进行切割,得到冷冻电镜密度图中的生物大分子外接立方体;对生物大分子外接立方体进行尺寸缩放,得到第一目标冷冻电镜密度图。在一示例中,第一目标冷冻电镜密度图的尺寸小于或等于248*248*248。In step S310, the first target cryo-electron microscope density map refers to the preprocessed 1174 training sets mentioned above. In one embodiment, the preprocessing process mentioned here includes: cutting the cryo-electron microscope density map to obtain the circumscribed cube of the biomacromolecule in the cryo-electron microscope density map; performing size scaling on the circumscribed cube of the biomacromolecule to obtain the first target Cryo-EM density map. In an example, the size of the first target cryo-EM density map is less than or equal to 248*248*248.
图4为本申请一实施例提供的步骤S310的执行过程示意图。如图4所示,步骤S310具体包括:FIG. 4 is a schematic diagram of an execution process of step S310 provided by an embodiment of the present application. As shown in Figure 4, step S310 specifically includes:
步骤S311,对第一目标冷冻电镜密度图进行基于残差模块的编码处理,得到m个特征图。Step S311, performing encoding processing based on the residual module on the first target cryo-EM density map to obtain m feature maps.
具体而言,参阅图2,该步骤由编码子模块211执行。编码子模块211中的每个特征提取单元输出一个特征图。Specifically, referring to FIG. 2 , this step is performed by the encoding sub-module 211 . Each feature extraction unit in the encoding sub-module 211 outputs a feature map.
首先,对第一目标冷冻电镜密度图进行特征提取,得到第一个特征图。在一示例中,基于残差模块进行特征提取处理。例如,第一目标冷冻电镜密度图先经过GN操作,再经过第一卷积操作,得到第一子特征图。第一子特征图顺次经过GN操作、第二卷积操作、GN操作和第三卷积操作,得到第二子特征图。对第一子特征图进行ReLU操作,并将经过ReLU操作后的第一子特征图和第二子特征图相加,得到第一个特征图。First, feature extraction is performed on the density map of the first cryo-electron microscope to obtain the first feature map. In one example, the feature extraction process is performed based on a residual module. For example, the first target cryo-EM density map is first subjected to the GN operation, and then the first convolution operation to obtain the first sub-feature map. The first sub-feature map undergoes the GN operation, the second convolution operation, the GN operation and the third convolution operation in sequence to obtain the second sub-feature map. The ReLU operation is performed on the first sub-feature map, and the first sub-feature map and the second sub-feature map after the ReLU operation are added to obtain the first feature map.
其次,对第一个特征图进行下采样,得到下采样特征图。Second, the first feature map is down-sampled to obtain a down-sampled feature map.
接着,对下采样特征图进行特征提取,得到第二个特征图。该特征提取为基于残差模块进行的特征提取处理。具体过程参阅上述得到第一个特征图的过程,这里不再赘述。Next, feature extraction is performed on the downsampled feature map to obtain the second feature map. The feature extraction is a feature extraction process based on the residual module. For the specific process, refer to the above-mentioned process of obtaining the first feature map, and will not repeat it here.
应当理解,图2示出的编码子模块211仅包括两个特征提取单元和一个下采样单元,可以得到两个特征图。在其它实施例中,编码子模块211还可以包括三个特征提取单元和两个下采样单元,或者四个特征提取单元和三个下采样单元等等,本申请实施例对编码子模块211中特征提取单元和下采样单元的数量不作限定。It should be understood that the encoding sub-module 211 shown in FIG. 2 only includes two feature extraction units and one down-sampling unit, and two feature maps can be obtained. In other embodiments, the coding sub-module 211 may also include three feature extraction units and two down-sampling units, or four feature extraction units and three down-sampling units, etc. In the embodiment of the present application, the encoding sub-module 211 The number of feature extraction units and down-sampling units is not limited.
基于上述描述过程可知,步骤S311的执行过程可以归纳为:对于第i个特征图,在i等于1的情况下,对第一目标冷冻电镜密度图进行特征提取处理,得到第一个特征图;在i大于1的情况下,对第i-1个特征图进行下采样,得到下采样特征图。对下采样特征图进行特征提取处理,得到第i个特征图。其中,i为大于或等于1并且小于m的正整数,特征提取处理是指基于残差模块进行的特征提取处理。Based on the above described process, it can be known that the execution process of step S311 can be summarized as follows: for the i-th feature map, when i is equal to 1, perform feature extraction processing on the first target cryo-electron microscope density map to obtain the first feature map; When i is greater than 1, the i-1th feature map is down-sampled to obtain a down-sampled feature map. Perform feature extraction processing on the downsampled feature map to obtain the i-th feature map. Wherein, i is a positive integer greater than or equal to 1 and less than m, and the feature extraction process refers to the feature extraction process based on the residual module.
步骤S312,对m个特征图进行解码,得到期望密度图。Step S312, decoding the m feature maps to obtain an expected density map.
参阅图2,该步骤由解码子模块212执行。以m等于2为例,步骤S312具体执行为,对第二个特征图顺次执行非线性整流和反卷积处理,得到上采样特征图。对第一个特征图进行非线性整流处理,得到非线性整流特征图。对上采样特征图和非线性整流特征图的加和进行特征提取处理,得到期望密 度图。该特征提取为基于残差模块进行的特征提取处理。具体过程参阅上述得到第一个特征图的过程,这里不再赘述Referring to FIG. 2 , this step is performed by the decoding sub-module 212 . Taking m equal to 2 as an example, step S312 is specifically executed by sequentially performing nonlinear rectification and deconvolution processing on the second feature map to obtain an upsampled feature map. Perform nonlinear rectification processing on the first feature map to obtain a nonlinear rectification feature map. The sum of the upsampling feature map and the nonlinear rectification feature map is subjected to feature extraction processing to obtain the desired density map. The feature extraction is a feature extraction process based on the residual module. For the specific process, please refer to the above-mentioned process of obtaining the first feature map, and will not go into details here.
步骤S313,基于期望密度图确定掩膜值和局部分辨率波动值。Step S313, determining a mask value and a local resolution fluctuation value based on the expected density map.
参阅图2,基于期望密度图确定掩膜值的过程由第一分支模块22执行。具体而言,期望密度图顺次经过卷积核为3*3的卷积层和卷积核为1*1的卷积层,得到掩膜值。基于期望密度图确定局部分辨率波动值的过程由第二分支模块23执行。具体而言,首先,对期望密度图进行分类,得到多个第一类别和多个第一类别各自的权重。例如,期望密度图顺次通过卷积核为3*3的卷积操作和卷积核为1*1的卷积操作,得到多个第一类别和多个第一类别各自的权重。其次,确定多个第一类别各自的权重和各自代表的第一预设值的乘积为局部分辨率波动值。Referring to FIG. 2 , the process of determining the mask value based on the desired density map is performed by the first branch module 22 . Specifically, the desired density map is sequentially passed through a convolution layer with a convolution kernel of 3*3 and a convolution layer with a convolution kernel of 1*1 to obtain a mask value. The process of determining the local resolution fluctuation value based on the desired density map is performed by the second branch module 23 . Specifically, firstly, the expected density map is classified to obtain multiple first categories and respective weights of the multiple first categories. For example, the desired density map is sequentially passed through a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 to obtain multiple first categories and respective weights of the multiple first categories. Secondly, the product of the respective weights of the multiple first categories and the first preset values represented by them is determined as the local resolution fluctuation value.
第一预设值由人为设置,可以根据实际情况合理选取。在一个实施例中,第一类别的数量为37个,该37个第一类别各自代表的第一预设值依次为:-5,-4.5,-4,-3.5,-3,-2.5,-2,-1.5,-1,-0.9,-0.8,-0.7,-0.6,-0.5,-0.4,-0.3,0.2,-0.1,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,1.5,2,2.5,3,3.5,4,4.5,5。The first preset value is set manually and can be reasonably selected according to actual conditions. In one embodiment, the number of first categories is 37, and the first preset values represented by each of the 37 first categories are: -5, -4.5, -4, -3.5, -3, -2.5, -2,-1.5,-1,-0.9,-0.8,-0.7,-0.6,-0.5,-0.4,-0.3,0.2,-0.1,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7 ,0.8,0.9,1,1.5,2,2.5,3,3.5,4,4.5,5.
步骤S314,基于多个特征图中的顶层特征图确定全局分辨率值。Step S314, determining a global resolution value based on the top-level feature map in the plurality of feature maps.
参阅图2,步骤S314由第三分支模块24执行。对于图2所示的神经网络模型而言,顶层特征图即为第二特征图,也即编码子模块211的输出。Referring to FIG. 2 , step S314 is executed by the third branch module 24 . For the neural network model shown in FIG. 2 , the top-level feature map is the second feature map, which is the output of the encoding sub-module 211 .
具体而言,首先,对顶层特征图进行分类,得到多个第二类别和多个第二类别各自的权重。例如,顶层特征图顺次经过三次卷积核为3*3的卷积操作、两次卷积核为1*1的卷积操作和全局平均池化操作,得到多个第二类别和多个第二类别各自的权重。其中,前两次卷积核为3*3的卷积操作后进行池化操作。其次,确定多个第二类别各自的权重和各自代表的第二预设值的乘积为全局分辨率值。Specifically, firstly, the top-level feature map is classified to obtain multiple second categories and respective weights of the multiple second categories. For example, the top-level feature map sequentially undergoes three convolution operations with a convolution kernel of 3*3, two convolution operations with a convolution kernel of 1*1, and a global average pooling operation to obtain multiple second categories and multiple The respective weights of the second categories. Among them, the first two convolution kernels are 3*3 convolution operations followed by pooling operations. Secondly, the product of the respective weights of the plurality of second categories and the second preset values represented by them is determined as the global resolution value.
第二预设值由人为设置,可以根据实际情况合理选取。在一个实施例中,第二类别的数量为10个。该10个第二类别各自代表的第二预设值依次为: 1,2,3,4,5,6,7,8,9,10。The second preset value is set manually and can be reasonably selected according to actual conditions. In one embodiment, the number of second categories is ten. The second preset values respectively represented by the ten second categories are: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
图5为本申请一实施例提供的步骤S320的执行过程示意图。如图5所示,步骤S320具体包括:FIG. 5 is a schematic diagram of an execution process of step S320 provided by an embodiment of the present application. As shown in Figure 5, step S320 specifically includes:
步骤S321,基于掩膜值和掩膜值标签确定第一损失函数。Step S321, determining a first loss function based on the mask value and the mask value label.
具体可以采用二元交叉熵被作为第一损失函数,公式为:
其中,
为网络输出,y为标签值。
Specifically, binary cross-entropy can be used as the first loss function, and the formula is: in, is the output of the network, and y is the label value.
步骤S322,基于局部分辨率波动值和局部分辨率波动值标签确定第二损失函数。Step S322, determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label.
具体可以采用log10被作为第二损失函数,公式为:
其中,
为网络输出,y为标签值。
Specifically, log10 can be used as the second loss function, and the formula is: in, is the output of the network, and y is the label value.
步骤S323,基于全局分辨率值和全局分辨率值标签确定第三损失函数。Step S323, determining a third loss function based on the global resolution value and the global resolution value label.
具体可以采用MSE被用作第三损失函数,公式为:
其中,
为网络输出结果,y为标签值。
Specifically, MSE can be used as the third loss function, and the formula is: in, Output the result for the network, and y is the label value.
步骤S324,基于第一损失函数、第二损失函数和第三损失函数确定总损失函数。Step S324, determining a total loss function based on the first loss function, the second loss function and the third loss function.
总损失函数的公式为:Loss
all=Loss
global+10·Loss
local+Loss
mask。
The formula of the total loss function is: Loss all = Loss global + 10·Loss local + Loss mask .
步骤S325,基于总损失函数的梯度更新神经网络模型的参数。Step S325, updating the parameters of the neural network model based on the gradient of the total loss function.
采用SGD优化器(动量=0.8)确定如何使用总损失函数的梯度来更新网络参数。An SGD optimizer (momentum = 0.8) was employed to determine how to use the gradient of the total loss function to update the network parameters.
根据本申请实施例提供的训练方法得到的神经网络模型,可以基于一张冷冻电镜密度图同时估计出掩膜值、局部分辨率波动值和全局分辨率值。According to the neural network model obtained by the training method provided in the embodiment of the present application, the mask value, local resolution fluctuation value and global resolution value can be estimated simultaneously based on a cryo-electron microscope density map.
步四模型测试Step 4 Model Testing
本申请利用349个测试集对训练得到的神经网络模型进行了测试。测试结果表明,神经网络模型的局部分辨率估计和全局分辨率估计的误差均为 0.44埃,掩膜值的平均交并比(Intersection over Union,IoU)为0.71。This application uses 349 test sets to test the trained neural network model. The test results show that the errors of local resolution estimation and global resolution estimation of the neural network model are both 0.44 angstroms, and the average Intersection over Union (IoU) of the mask value is 0.71.
本申请还提供了一种利用上述任一实施例提供的神经网络模型估计冷冻电镜密度图的分辨率的方法。图6为本申请一实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计方法的逻辑架构。图7为本申请一实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计方法(即CryoRes方法)的流程图。结合图6和图7所示,CryoRes方法700包括:The present application also provides a method for estimating the resolution of the cryo-electron microscope density map by using the neural network model provided by any one of the above-mentioned embodiments. FIG. 6 is a logical framework of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application. FIG. 7 is a flowchart of a method for estimating the resolution of a cryo-electron microscope density map (ie, the CryoRes method) based on a neural network model provided by an embodiment of the present application. As shown in Figure 6 and Figure 7, the CryoRes method 700 includes:
步骤S710,对冷冻电镜密度图进行预处理,得到第二目标冷冻电镜密度图。这里的冷冻电镜密度图可以是任意一张冷冻电镜密度图。Step S710, preprocessing the cryo-electron microscope density map to obtain a second target cryo-electron microscope density map. The cryo-electron microscope density map here can be any cryo-electron microscope density map.
预处理过程例如为:对冷冻电镜密度图进行切割,得到冷冻电镜密度图中的生物大分子外接立方体;对生物大分子外接立方体进行尺寸缩放,得到第二目标冷冻电镜密度图。在一示例中,第二目标冷冻电镜密度图的尺寸小于或等于248*248*248。The preprocessing process is, for example, cutting the cryo-electron microscope density map to obtain the circumscribed cube of the biomacromolecule in the cryo-electron microscope density map; scaling the size of the circumscribed cube of the biomacromolecule to obtain the second target cryo-electron microscope density map. In an example, the size of the second target cryo-EM density map is less than or equal to 248*248*248.
步骤S720,基于第二目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值。该过程可以参阅上述神经网络模型的训练方法实施例,这里不再赘述。Step S720, determining a mask value, a local resolution fluctuation value and a global resolution value based on the second target cryo-EM density map. For this process, reference may be made to the above-mentioned embodiment of the training method for the neural network model, which will not be repeated here.
步骤S730,基于掩膜值、局部分辨率波动值和全局分辨率值确定局部分辨率值。Step S730, determining a local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
根据局部分辨率波动值、全局分辨率值和掩膜值可以得到第二目标冷冻电镜密度图中每个体素对应的实数值,即为该体素的局部分辨率值。According to the local resolution fluctuation value, the global resolution value and the mask value, the real value corresponding to each voxel in the second target cryo-EM density map can be obtained, which is the local resolution value of the voxel.
具体而言,参阅图6所示,将全局分辨率值和局部分辨率波动值进行矩阵相加运算,得到第一加和。对掩膜值进行阈值处理,得到掩膜。阈值处理包括将掩膜值中小于0的值置为0,表示背景区域,即无大分子信息;将大于或等于0的值置为1,表示非背景区域,即有大分子信息。将第一加和与掩膜相乘,得到第一乘积。将掩膜乘以第一常数并加上第二常数,得到第二加和。在一示例中,第一常数为-100,第二常数为100。确定第二加和与第一乘积之和为局部分辨率值。Specifically, as shown in FIG. 6 , the matrix addition operation is performed on the global resolution value and the local resolution fluctuation value to obtain the first sum. Threshold the mask value to get the mask. Thresholding includes setting the value less than 0 in the mask value to 0, indicating the background area, that is, there is no macromolecular information; setting the value greater than or equal to 0 to 1, indicating the non-background area, that is, having macromolecular information. Multiply the first sum with the mask to get the first product. The second sum is obtained by multiplying the mask by the first constant and adding the second constant. In one example, the first constant is -100 and the second constant is 100. The sum of the second sum and the first product is determined as the local resolution value.
表一示出了图6和图7所示CryoRes方法700与几种常规的分辨率估计方法的对比结果。从表1可以看出,根据本实施例提供的冷冻电镜密度图分辨率的估计方法,可以基于一张冷冻电镜密度图同时估计出局部分辨率和全局分辨率,克服了常规分辨率估计方法只能从一个维度,即全局分辨率或局部分辨率评价冷冻电镜密度图的局限性。与此同时,本实施例提供的估计方法无需提供half-maps,也无需提供掩膜等参数。Table 1 shows the comparison results between the CryoRes method 700 shown in FIG. 6 and FIG. 7 and several conventional resolution estimation methods. It can be seen from Table 1 that according to the method for estimating the resolution of the cryo-electron microscope density map provided in this embodiment, the local resolution and the global resolution can be estimated simultaneously based on a cryo-electron microscope density map, which overcomes the conventional resolution estimation method that only The limitations of cryo-EM density maps can be evaluated from one dimension, global resolution or local resolution. At the same time, the estimation method provided in this embodiment does not need to provide half-maps, and does not need to provide parameters such as masks.
表一 CryoRes方法700与几种常规的分辨率估计方法的对比结果(表中*表示优选地,即更推荐提供)Table 1 Comparison results between CryoRes method 700 and several conventional resolution estimation methods (* in the table means preferred, that is, it is more recommended to provide)
本申请从三个方面,包括:(1)局部分辨率;(2)全局分辨率;(3)掩膜,对图6和图7所示CryoRes方法700的性能进行了评估。This application evaluates the performance of the CryoRes method 700 shown in FIG. 6 and FIG. 7 from three aspects, including: (1) local resolution; (2) global resolution; (3) mask.
对于(1)局部分辨率,选取了四个冷冻电镜密度图作为试验密度图对CryoRes方法700的性能进行评估。具体而言,第一个试验密度图为RelA与70S核糖体结合的冷冻电镜结构(EMDB:EMD-8108)。该试验密度图发表于2016年,它的维度为400*400*400,体素大小为1.34埃。EMDB官网公布的通过阈值截断(Fourier Shell Correlation,FSC)得到的全局分辨率为3.0埃,这里提到的阈值一般为0.143。CryoRes、ResMap和DeepRes分别将signal map作为输入得到局部分辨率,Blocres和MonoRes分别将half-maps作为输入得到局部分辨率。其中,基于CryoRes得到的局部分辨率的范围为3.19-3.91埃,平均值和标准差分别为3.38埃和0.14埃。基于Blocres得到的局部分辨率的范围是2.88-10.89埃,平均值和标准差分别为3.39埃和0.77埃。基于ResMap得到的局部分辨率的范围为2.9-5.9埃,平均值和标准差分别为2.9埃和0.91埃。基于MonoRes得到的局部分辨率的范围是2.68-8.93埃,平均值和标准差分别为3.67埃和1.45埃。基于DeepRes得到的局部分辨率的范围是2.68-6.64埃,平均值和标准差分别为3.41埃和0.52埃。For (1) local resolution, four cryo-EM density maps were selected as test density maps to evaluate the performance of the CryoRes method 700. Specifically, the first experimental density map is the cryo-EM structure of RelA bound to the 70S ribosome (EMDB: EMD-8108). The experimental density map was published in 2016, and its dimension is 400*400*400, and the voxel size is 1.34 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 3.0 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes ranges from 3.19 to 3.91 angstroms, and the average and standard deviation are 3.38 angstroms and 0.14 angstroms, respectively. The local resolution obtained based on Blocres ranged from 2.88 to 10.89 Å, with a mean and standard deviation of 3.39 Å and 0.77 Å, respectively. The local resolution obtained based on ResMap ranges from 2.9 to 5.9 Å, with a mean and standard deviation of 2.9 Å and 0.91 Å, respectively. The local resolution obtained based on MonoRes ranges from 2.68 to 8.93 Å, with a mean and standard deviation of 3.67 Å and 1.45 Å, respectively. The local resolution obtained based on DeepRes ranges from 2.68 to 6.64 Å, with a mean and standard deviation of 3.41 Å and 0.52 Å, respectively.
第二个试验密度图为ArfA和TtRF2与70S核糖体结合的冷冻电镜结构(EMDB:EMD-3492)。该试验密度图发表于2016年,它的维度为400*400*400,体素大小为1.04埃。EMDB官网公布的通过阈值截断(Fourier Shell Correlation,FSC)得到的全局分辨率为3.35埃,这里提到的阈值一般为0.143。CryoRes、ResMap和DeepRes分别将signal map作为输入得到局部分辨率,Blocres和MonoRes分别将half-maps作为输入得到局部分辨率。其中,基于CryoRes得到的局部分辨率的范围为3.37-4.07埃,平均值和标准差分别为3.57埃和0.12埃。基于Blocres得到的局部分辨率的范围是 3.17-11.27埃,平均值和标准差分别为3.62埃和0.79埃。基于ResMap得到的局部分辨率的范围为2.3-4.05埃,平均值和标准差分别为2.3埃和0.26埃。基于MonoRes得到的局部分辨率的范围是2.83-8.16埃,平均值和标准差分别为4.08埃和1.1埃。基于DeepRes得到的局部分辨率的范围是2.5-6.06埃,平均值和标准差分别为2.91埃和0.49埃。The second experimental density map is the cryo-EM structure of ArfA and TtRF2 bound to the 70S ribosome (EMDB: EMD-3492). The experimental density map was published in 2016, and its dimension is 400*400*400, and the voxel size is 1.04 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 3.35 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes is in the range of 3.37-4.07 angstroms, and the average value and standard deviation are 3.57 angstroms and 0.12 angstroms, respectively. The local resolution obtained based on Blocres ranged from 3.17 to 11.27 Å, with a mean and standard deviation of 3.62 Å and 0.79 Å, respectively. The local resolution obtained based on ResMap ranged from 2.3 to 4.05 Å, and the mean and standard deviation were 2.3 Å and 0.26 Å, respectively. The local resolution obtained based on MonoRes ranged from 2.83 to 8.16 Å, with a mean and standard deviation of 4.08 Å and 1.1 Å, respectively. The local resolution obtained based on DeepRes ranges from 2.5 to 6.06 Å, with a mean and standard deviation of 2.91 Å and 0.49 Å, respectively.
第三个试验密度图为Gasdermin A3膜孔的冷冻电镜结构(EMDB:EMD-7450)。该试验密度图发表于2018年,它的维度为380*380*380,体素大小为1.0埃。EMDB官网公布的通过阈值截断(Fourier Shell Correlation,FSC)得到的全局分辨率为4.4埃,这里提到的阈值一般为0.143。CryoRes、ResMap和DeepRes分别将signal map作为输入得到局部分辨率,Blocres和MonoRes分别将half-maps作为输入得到局部分辨率。其中,基于CryoRes得到的局部分辨率的范围为3.58-4.46埃,平均值和标准差分别为3.75埃和0.18埃。基于Blocres得到的局部分辨率的范围是3.28-4.9埃,平均值和标准差分别为3.7埃和0.31埃。基于ResMap得到的局部分辨率的范围为2.2-2.45埃,平均值和标准差分别为2.2埃和0.00埃。基于MonoRes得到的局部分辨率的范围是2.0-7.31埃,平均值和标准差分别为4.27埃和1.36埃。基于DeepRes得到的局部分辨率的范围是3.45-8.24埃,平均值和标准差分别为5.55埃和0.7埃。The third experimental density map is the cryo-EM structure of Gasdermin A3 membrane pores (EMDB: EMD-7450). The experimental density map was published in 2018. Its dimension is 380*380*380, and the voxel size is 1.0 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 4.4 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes is in the range of 3.58-4.46 angstroms, and the average value and standard deviation are 3.75 angstroms and 0.18 angstroms, respectively. The local resolution obtained based on Blocres ranged from 3.28 to 4.9 Å, with a mean and standard deviation of 3.7 Å and 0.31 Å, respectively. The local resolution obtained based on ResMap ranges from 2.2 to 2.45 Å, and the mean and standard deviation are 2.2 Å and 0.00 Å, respectively. The local resolution obtained based on MonoRes ranged from 2.0 to 7.31 Å, with a mean and standard deviation of 4.27 Å and 1.36 Å, respectively. The local resolution obtained based on DeepRes ranges from 3.45 to 8.24 Å, with a mean and standard deviation of 5.55 Å and 0.7 Å, respectively.
第四个试验密度图为细菌30S-IF1-IF2-IF3-mRNA-tRNA翻译起始前复合体的冷冻电镜结构(EMDB:EMD-4082)。该试验密度图发表于2016年,它的维度为260*260*260,体素大小为1.34埃。EMDB官网公布的通过阈值截断(Fourier Shell Correlation,FSC)得到的全局分辨率为8.3埃,这里提到的阈值一般为0.143。CryoRes、ResMap和DeepRes分别将signal map作为输入得到局部分辨率,Blocres和MonoRes分别将half-maps作为输入得到局部分辨率。其中,基于CryoRes得到的局部分辨率的范围为7.57-9.05埃,平均值和标准差分别为7.92埃和0.25埃。基于Blocres得到的局部分辨 率的范围是6.48-33.96埃,平均值和标准差分别为9.25埃和2.47埃。基于ResMap得到的局部分辨率的范围为8.9-13.4埃,平均值和标准差分别为11.15埃和1.05埃。基于MonoRes得到的局部分辨率的范围是2.68-20.49埃,平均值和标准差分别为8.5埃和4.59埃。基于DeepRes得到的局部分辨率的范围是2.68-12.9埃,平均值和标准差分别为8.69埃和1.05埃。The fourth experimental density map is the cryo-EM structure of the bacterial 30S-IF1-IF2-IF3-mRNA-tRNA pre-translation initiation complex (EMDB: EMD-4082). The experimental density map was published in 2016. Its dimensions are 260*260*260 and the voxel size is 1.34 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 8.3 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes is in the range of 7.57-9.05 angstroms, and the average value and standard deviation are 7.92 angstroms and 0.25 angstroms, respectively. The local resolution based on Blocres ranged from 6.48 to 33.96 Å, with a mean and standard deviation of 9.25 Å and 2.47 Å, respectively. The local resolution obtained based on ResMap ranged from 8.9 to 13.4 Å, with a mean and standard deviation of 11.15 Å and 1.05 Å, respectively. The local resolution obtained based on MonoRes ranges from 2.68 to 20.49 Å, with a mean and standard deviation of 8.5 Å and 4.59 Å, respectively. The local resolution obtained based on DeepRes ranges from 2.68 to 12.9 Å, with a mean and standard deviation of 8.69 Å and 1.05 Å, respectively.
对于(2)全局分辨率,基于CryoRes方法700得到测试集中的349张冷冻电镜密度图的全局分辨率。确定出基于CryoRes方法700得到的全局分辨率与该349张冷冻电镜密度图各自在EMDB上公布的全局分辨率的绝对误差平均值为0.44。For (2) global resolution, the global resolution of 349 cryo-EM density images in the test set was obtained based on CryoRes method 700. It was determined that the average absolute error between the global resolution based on the CryoRes method 700 and the global resolution published on EMDB of the 349 cryo-EM density maps was 0.44.
图8示出了基于CryoRes方法700得到的测试集中每个冷冻电镜密度图的全局分辨率和每个冷冻电镜密度图各自在EMDB上公布的全局分辨率的对比结果。如图8所示,对于大多数冷冻电镜密度图而言,基于CryoRes方法700得到的全局分辨率与EMDB上公布的全局分辨率接近,误差小于1埃;少数冷冻电镜密度图的误差大于1埃,但误差基本在2埃以内。FIG. 8 shows a comparison result of the global resolution of each cryo-EM density map in the test set obtained based on the CryoRes method 700 and the global resolution of each cryo-EM density map published on EMDB. As shown in Figure 8, for most cryo-EM density maps, the global resolution based on the CryoRes method 700 is close to the global resolution published on EMDB, and the error is less than 1 angstrom; the error of a few cryo-electron microscope density maps is greater than 1 angstrom , but the error is basically within 2 Angstroms.
图9示出了基于ResMap方法得到的测试集中每个冷冻电镜密度图的局部分辨率的中位数与基于CryoRes方法得到的全局分辨率分别和EMDB公布的全局分辨率的比较结果。图9中的纵坐标指示ResMap方法得到的中位数和CryoRes方法得到的全局分辨率分别与EMDB公布的全局分辨率的差值,对比图8和图9可以看出,基于ResMap方法得到的中位数和EMDB上公布的全局分辨率的误差相比于基于CryoRes方法700得到的全局分辨率和EMDB上公布的全局分辨率的误差更大。与此同时,如图9所示,基于ResMap方法得到的局部分辨率的中位数与EMDB公布的全局分辨率的误差与冷冻电镜密度图的分辨率负相关,即分辨率越低,误差越大。比较而言,如图8所示的基于CryoRes方法700对应的误差的波动相对稳定,受分辨率影响较小。Figure 9 shows the comparison results of the median of the local resolution of each cryo-EM density map in the test set based on the ResMap method and the global resolution obtained based on the CryoRes method and the global resolution published by EMDB. The ordinate in Fig. 9 indicates the difference between the median obtained by the ResMap method and the global resolution obtained by the CryoRes method and the global resolution published by EMDB. Comparing Fig. 8 and Fig. 9, it can be seen that the median obtained by the ResMap method is The error between the number of digits and the global resolution published on EMDB is larger than the error between the global resolution obtained based on the CryoRes method 700 and the global resolution published on EMDB. At the same time, as shown in Figure 9, the error between the median of the local resolution obtained based on the ResMap method and the global resolution published by EMDB is negatively correlated with the resolution of the cryo-EM density map, that is, the lower the resolution, the greater the error. big. In comparison, the error fluctuation corresponding to the CryoRes-based method 700 shown in FIG. 8 is relatively stable, and is less affected by the resolution.
对于(3)掩膜,对测试集中的冷冻电镜密度图做了评估,测试集中的 349张冷冻电镜密度图的IoU平均值为0.74。For (3) masks, the cryo-EM density maps in the test set were evaluated, and the average IoU of 349 cryo-EM density maps in the test set was 0.74.
图10示出了基于CryoRes方法700得到的测试集中每个冷冻电镜密度图的掩膜与掩膜标签的IoU结果。从图10可以看出,大多数冷冻电镜密度图的IoU在0.7以上。对于IoU结果较低的冷冻电镜密度图而言,其通常存在噪声较大或存在未解析的低分辨率结构,掩膜标签依赖于其对于的PDB文件,不依赖于冷冻电镜密度图本身。而基于CryoRes方法700得到的掩膜更依赖于冷冻电镜密度图本身,导致其得到的IoU结果较低,符合对于掩膜的期望。FIG. 10 shows the IoU results of the masks and mask labels of each cryo-EM density map in the test set based on the CryoRes method 700 . It can be seen from Figure 10 that the IoU of most cryo-EM density maps is above 0.7. For cryo-EM density maps with low IoU results, there are usually large noises or unresolved low-resolution structures. The mask label depends on the PDB file for it, not on the cryo-EM density map itself. The mask obtained based on the CryoRes method 700 is more dependent on the cryo-EM density map itself, resulting in a lower IoU result, which meets the expectations for the mask.
图11示出了掩膜与掩膜标签的IoU结果的混淆矩阵。对测试集中的349张冷冻电镜密度图的IoU结果制作混淆矩阵,用于评估CryoRes方法700得到的掩膜对生物大分子和背景部分的识别效果。从图11可以看出,对于掩膜标签提供的大分子位置识别率达到0.91,对于背景位置的识别率达到0.92。Figure 11 shows the confusion matrix of the IoU results for masks and mask labels. A confusion matrix was made for the IoU results of 349 cryo-EM density images in the test set to evaluate the recognition effect of the mask obtained by the CryoRes method 700 on biological macromolecules and background parts. It can be seen from Fig. 11 that the recognition rate of the macromolecular position provided by the mask label reaches 0.91, and the recognition rate of the background position reaches 0.92.
示例性装置Exemplary device
本申请还提供了一种神经网络模型的训练装置。图12为本申请一实施例提供的神经网络模型的训练装置的结构框图。如图12所示,训练装置800包括第一确定模块810和训练模块820。其中,第一确定模块810用于基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值,第一目标冷冻电镜密度图标注有掩膜值标签、局部分辨率波动值标签和全局分辨率值标签。训练模块820用于基于掩膜值、局部分辨率波动值和全局分辨率值对神经网络模型进行训练,以使掩膜值趋近于掩膜值标签、局部分辨率波动值趋近于局部分辨率波动值标签、全局分辨率值趋近于全局分辨率值标签。The present application also provides a training device for a neural network model. FIG. 12 is a structural block diagram of a neural network model training device provided by an embodiment of the present application. As shown in FIG. 12 , the training device 800 includes a first determination module 810 and a training module 820 . Among them, the first determination module 810 is used to determine the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, the first target cryo-electron microscope density map is marked with mask value label, local resolution Fluctuation value labels and global resolution value labels. The training module 820 is used to train the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value tends to the mask value label, and the local resolution fluctuation value tends to the local resolution The rate fluctuation value label, the global resolution value approaches the global resolution value label.
在一个实施例中,第一确定模块810包括编码子模块、解码子模块、第一分支模块、第二分支模块和第三分支模块。其中,编码子模块用于对第一目标冷冻电镜密度图进行基于残差模块的编码处理,得到m个特征图。解码子模块用于对m个特征图进行解码,得到期望密度图。第一分支模块用 于基于期望密度图确定掩膜值。第二分支模块用于基于期望密度图确定局部分辨率波动值。第三分支模块用于基于m个特征图中的顶层特征图确定全局分辨率值。In one embodiment, the first determination module 810 includes an encoding submodule, a decoding submodule, a first branch module, a second branch module and a third branch module. Wherein, the coding sub-module is used to perform coding processing based on the residual module on the first target cryo-EM density map to obtain m feature maps. The decoding sub-module is used to decode the m feature maps to obtain the expected density map. A first branch module is used to determine a mask value based on the desired density map. The second branch module is used to determine the local resolution fluctuation value based on the desired density map. The third branch module is used to determine the global resolution value based on the top-level feature map in the m feature maps.
在一个实施例中,训练模块820包括第一确定子模块、第二确定子模块、第三确定子模块、第四确定子模块和更新模块。其中,第一确定子模块用于基于掩膜值和掩膜值标签确定第一损失函数。第二确定子模块用于基于局部分辨率波动值和局部分辨率波动值标签确定第二损失函数。第三确定子模块用于基于全局分辨率值和全局分辨率值标签确定第三损失函数。第四确定子模块用于基于第一损失函数、第二损失函数和第三损失函数确定总损失函数。更新模块用于基于总损失函数的梯度更新神经网络模型的参数。In one embodiment, the training module 820 includes a first determination submodule, a second determination submodule, a third determination submodule, a fourth determination submodule and an update module. Wherein, the first determination submodule is used to determine the first loss function based on the mask value and the mask value label. The second determining submodule is used for determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label. The third determining submodule is used for determining a third loss function based on the global resolution value and the global resolution value label. The fourth determining submodule is used for determining the total loss function based on the first loss function, the second loss function and the third loss function. The update module is used to update the parameters of the neural network model based on the gradient of the total loss function.
根据本申请实施例提供的训练方法得到的神经网络模型,可以基于一张冷冻电镜密度图同时估计出掩膜值、局部分辨率波动值和全局分辨率值。According to the neural network model obtained by the training method provided in the embodiment of the present application, the mask value, local resolution fluctuation value and global resolution value can be estimated simultaneously based on a cryo-electron microscope density map.
本实施例提供的神经网络模型的训练装置,与本申请实施例所提供的神经网络模型的训练方法属于同一申请构思,可执行本申请任意实施例所提供的神经网络模型的训练方法,具备执行神经网络模型的训练方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例提供的神经网络模型的训练方法,此处不再加以赘述。The training device for the neural network model provided in this embodiment belongs to the same application concept as the training method for the neural network model provided in the embodiment of the present application, and can execute the training method for the neural network model provided in any embodiment of the application. Corresponding functional modules and beneficial effects of the neural network model training method. For technical details not described in detail in this embodiment, refer to the training method of the neural network model provided in the embodiment of the present application, which will not be repeated here.
本申请还提供了一种基于神经网络模型的冷冻电镜密度图分辨率的估计装置。图13为本申请一实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计装置的结构框图。如图13所示,估计装置900包括预处理模块910、第二确定模块920和第三确定模块930。其中,预处理模块910用于对冷冻电镜密度图进行预处理,得到第二目标冷冻电镜密度图。第二确定模块920用于基于第二目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值。第三确定模块930用于基于掩膜值、局部分辨率波动值和全局分辨率值确定局部分辨率值。The present application also provides a device for estimating the resolution of the cryo-electron microscope density map based on the neural network model. Fig. 13 is a structural block diagram of a device for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided by an embodiment of the present application. As shown in FIG. 13 , the estimation device 900 includes a preprocessing module 910 , a second determination module 920 and a third determination module 930 . Wherein, the preprocessing module 910 is used for preprocessing the cryo-electron microscope density map to obtain the second target cryo-electron microscope density map. The second determining module 920 is configured to determine a mask value, a local resolution fluctuation value and a global resolution value based on the second target cryo-EM density map. The third determination module 930 is used for determining the local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
根据本实施例提供的冷冻电镜密度图分辨率的估计方法,可以基于一张 冷冻电镜密度图同时估计出局部分辨率值和全局分辨率值,克服了常规分辨率估计方法只能从一个维度,即全局分辨率或局部分辨率评价冷冻电镜密度图的局限性。与此同时,本实施例提供的估计方法无需提供half-maps,也无需提供掩膜等参数。According to the method for estimating the resolution of the cryo-electron microscope density map provided in this embodiment, the local resolution value and the global resolution value can be estimated at the same time based on a cryo-electron microscope density map, which overcomes that the conventional resolution estimation method can only start from one dimension, That is, global resolution or local resolution evaluates the limitations of cryo-EM density maps. At the same time, the estimation method provided in this embodiment does not need to provide half-maps, and does not need to provide parameters such as masks.
本实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计装置,与本申请实施例所提供的基于神经网络模型的冷冻电镜密度图分辨率的估计方法属于同一申请构思,可执行本申请任意实施例所提供的基于神经网络模型的冷冻电镜密度图分辨率的估计方法,具备执行基于神经网络模型的冷冻电镜密度图分辨率的估计方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例提供的基于神经网络模型的冷冻电镜密度图分辨率的估计方法,此处不再加以赘述。The device for estimating the resolution of the cryo-electron microscope density map based on the neural network model provided in this embodiment belongs to the same application concept as the method for estimating the resolution of the cryo-electron microscope density map based on the neural network model provided in the embodiment of the present application, and this application can be implemented The method for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided in any embodiment of the application has the corresponding functional modules and beneficial effects for executing the method for estimating the resolution of a cryo-electron microscope density map based on a neural network model. For technical details that are not described in detail in this embodiment, please refer to the method for estimating the resolution of cryo-electron microscope density maps based on neural network models provided in the embodiments of this application, and will not be repeated here.
电子设备Electronic equipment
图14是本申请一实施例提供的电子设备的结构框图。如图14所示,电子设备10包括一个或多个处理器11和存储器12。Fig. 14 is a structural block diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 14 , electronic device 10 includes one or more processors 11 and memory 12 .
处理器11可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备10中的其他组件以执行期望的功能。Processor 11 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 10 to perform desired functions.
存储器12可以包括一个或多个计算机程序产品,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器11可以运行所述程序指令,以实现上文所述的本申请的各个实施例的神经网络模型的训练方法和基于神经网络模型的冷冻电镜密度图分辨率的估计方法以及/或者其他期望的功能。在计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。Memory 12 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions can be stored on the computer-readable storage medium, and the processor 11 can run the program instructions to implement the training method of the neural network model and the neural network-based Estimation methods for cryo-EM density map resolution of network models and/or other desired features. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.
在一个示例中,电子设备10还可以包括:输入装置13和输出装置14,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。In one example, the electronic device 10 may further include: an input device 13 and an output device 14, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
输出装置14可以向外部输出各种信息,包括确定出的距离信息、方向信息等。输出设备14可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。The output device 14 can output various information to the outside, including determined distance information, direction information, and the like. Output devices 14 may include, for example, displays, speakers, printers, and communication networks and remote output devices to which they are connected, among others.
当然,为了简化,图14中仅示出了该电子设备10中与本申请有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备10还可以包括任何其他适当的组件。Of course, for simplicity, only some of the components related to the present application in the electronic device 10 are shown in FIG. 14 , and components such as bus, input/output interface, etc. are omitted. In addition, according to specific application conditions, the electronic device 10 may also include any other suitable components.
计算机程序产品和计算机可读存储介质Computer program product and computer readable storage medium
除了上述方法和设备以外,本申请的实施例还可以是计算机程序产品,其包括计算机程序指令,计算机程序指令在被处理器运行时使得处理器执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的神经网络模型的训练方法和基于神经网络模型的冷冻电镜密度图分辨率的估计方法中的步骤。In addition to the methods and devices described above, embodiments of the present application may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the procedures described in the above-mentioned "Exemplary Methods" section of this specification. Steps in the method for training a neural network model and the method for estimating the resolution of a cryo-electron microscope density map based on the neural network model according to various embodiments of the present application.
计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本申请实施例操作的程序代码,程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product can write program codes for executing the operations of the embodiments of the present application in any combination of one or more programming languages. The programming languages include object-oriented programming languages, such as Java, C++, etc., and also include conventional A procedural programming language such as "C" or similar programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.
此外,本申请的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,计算机程序指令在被处理器运行时使得处理器11执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的神经网络模型的训练方法和基于神经网络模型的冷冻电镜密度图分辨率的估计方法中的步骤。In addition, the embodiment of the present application may also be a computer-readable storage medium, on which computer program instructions are stored. When executed by the processor, the computer program instructions cause the processor 11 to execute the method described in the above-mentioned "Exemplary Method" section of this specification. Steps in the method for training a neural network model and the method for estimating the resolution of a cryo-electron microscope density map based on the neural network model according to various embodiments of the present application.
所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可 读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
以上结合具体实施例描述了本申请的基本原理,但是,需要指出的是,在本申请中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本申请的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本申请为必须采用上述具体的细节来实现。The basic principles of the present application have been described above in conjunction with specific embodiments, but it should be pointed out that the advantages, advantages, effects, etc. mentioned in the application are only examples rather than limitations, and these advantages, advantages, effects, etc. Various embodiments of this application must have. In addition, the specific details disclosed above are only for the purpose of illustration and understanding, rather than limitation, and the above details do not limit the application to be implemented by using the above specific details.
本申请中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。The block diagrams of devices, devices, equipment, and systems involved in this application are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be appreciated by those skilled in the art, these devices, devices, devices, systems may be connected, arranged, configured in any manner. Words such as "including", "comprising", "having" and the like are open-ended words meaning "including but not limited to" and may be used interchangeably therewith. As used herein, the words "or" and "and" refer to the word "and/or" and are used interchangeably therewith, unless the context clearly dictates otherwise. As used herein, the word "such as" refers to the phrase "such as but not limited to" and can be used interchangeably therewith.
还需要指出的是,在本申请的装置、设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本申请的等效方案。It should also be pointed out that in the devices, equipment and methods of the present application, each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of this application.
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本申请。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本申请的范围。因此,本申请不意图被限制到在此示出的方面,而是按照与在此公开的 原理和新颖的特征一致的最宽范围。The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
应当理解,本申请实施例描述中所用到的限定词“第一”、“第二”、“第三”、“第四”、“第五”和“第六”仅用于更清楚的阐述技术方案,并不能用于限制本申请的保护范围。It should be understood that the qualifiers "first", "second", "third", "fourth", "fifth" and "sixth" used in the description of the embodiments of the present application are only for clearer explanation The technical solution cannot be used to limit the scope of protection of this application.
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本申请的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.
Claims (12)
- 一种神经网络模型的训练方法,其特征在于,包括:A training method for a neural network model, characterized in that it comprises:基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值,所述第一目标冷冻电镜密度图标注有掩膜值标签、局部分辨率波动值标签和全局分辨率值标签;Determining a mask value, a local resolution fluctuation value, and a global resolution value based on a first target cryo-EM density map marked with a mask value label, a local resolution fluctuation value label, and a global resolution value label;基于所述掩膜值、所述局部分辨率波动值和所述全局分辨率值对所述神经网络模型进行训练,以使所述掩膜值趋近于所述掩膜值标签、所述局部分辨率波动值趋近于局部分辨率波动值标签、所述全局分辨率值趋近于所述全局分辨率值标签。The neural network model is trained based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value approaches the mask value label, the local The resolution fluctuation value approaches the local resolution fluctuation value label, and the global resolution value approaches the global resolution value label.
- 根据权利要求1所述的神经网络模型的训练方法,其特征在于,所述基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值包括:The training method of the neural network model according to claim 1, wherein said determining mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map comprises:对所述第一目标冷冻电镜密度图进行基于残差模块的编码处理,得到m个特征图;performing coding processing based on a residual module on the first target cryo-electron microscope density map to obtain m feature maps;对所述m个特征图进行解码,得到期望密度图;Decoding the m feature maps to obtain an expected density map;基于所述期望密度图确定所述掩膜值和所述局部分辨率波动值;determining the mask value and the local resolution fluctuation value based on the desired density map;基于所述m个特征图中的顶层特征图确定所述全局分辨率值。The global resolution value is determined based on a top-level feature map of the m feature maps.
- 根据权利要求2所述的神经网络模型的训练方法,其特征在于,所述基于所述期望密度图确定所述掩膜值包括:The training method of the neural network model according to claim 2, wherein said determining said mask value based on said expected density map comprises:所述期望密度图顺次经过卷积核为3*3的卷积操作和卷积核为1*1的卷积操作,得到所述掩膜值。The expected density map undergoes a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 in sequence to obtain the mask value.
- 根据权利要求2所述的神经网络模型的训练方法,其特征在于,所述基于所述期望密度图确定所述局部分辨率波动值包括:The training method of the neural network model according to claim 2, wherein said determining said local resolution fluctuation value based on said expected density map comprises:对所述期望密度图进行分类,得到多个第一类别和所述多个第一类别各自的权重;classifying the expected density map to obtain a plurality of first categories and respective weights of the plurality of first categories;确定所述多个第一类别各自的权重和各自代表的第一预设值的乘积为 所述局部分辨率波动值。Determining the product of the respective weights of the plurality of first categories and the first preset values represented by them as the local resolution fluctuation value.
- 根据权利要求2所述的神经网络模型的训练方法,其特征在于,基于所述多个特征图中的顶层特征图确定所述全局分辨率值包括:The training method of the neural network model according to claim 2, wherein, determining the global resolution value based on the top-level feature map in the plurality of feature maps comprises:对所述顶层特征图进行分类,得到多个第二类别和所述多个第二类别各自的权重;classifying the top-level feature map to obtain a plurality of second categories and respective weights of the plurality of second categories;确定所述多个第二类别各自的权重和各自代表的第二预设值的乘积为所述全局分辨率值。A product of respective weights of the plurality of second categories and a second preset value represented by each is determined as the global resolution value.
- 根据权利要求1所述的神经网络模型的训练方法,其特征在于,所述基于所述掩膜值、所述局部分辨率波动值和所述全局分辨率值对所述神经网络模型进行训练包括:The training method of the neural network model according to claim 1, wherein the training of the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value comprises :基于所述掩膜值和所述掩膜值标签确定第一损失函数;determining a first loss function based on the mask value and the mask value label;基于所述局部分辨率波动值和所述局部分辨率波动值标签确定第二损失函数;determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label;基于所述全局分辨率值和所述全局分辨率值标签确定第三损失函数;determining a third loss function based on the global resolution value and the global resolution value label;基于所述第一损失函数、所述第二损失函数和所述第三损失函数确定总损失函数;determining an overall loss function based on the first loss function, the second loss function, and the third loss function;基于所述总损失函数的梯度更新所述神经网络模型的参数。The parameters of the neural network model are updated based on the gradient of the total loss function.
- 根据权利要求1所述的神经网络模型的训练方法,其特征在于,在所述基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值之前,还包括:The training method of neural network model according to claim 1, is characterized in that, before said determining mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, also includes:对冷冻电镜密度图进行切割,得到冷冻电镜密度图中的生物大分子外接立方体;Cutting the density map of the cryo-electron microscope to obtain circumscribed cubes of biological macromolecules in the density map of the cryo-electron microscope;对所述生物大分子外接立方体进行尺寸缩放,得到所述第一目标冷冻电镜密度图。performing size scaling on the circumscribed cube of the biomacromolecule to obtain the cryo-electron microscope density map of the first target.
- 一种基于神经网络的冷冻电镜密度图分辨率估计方法,其特征在于,包括:A method for estimating the resolution of a cryo-electron microscope density map based on a neural network, characterized in that it comprises:基于第二目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值;determining a mask value, a local resolution fluctuation value, and a global resolution value based on the second target cryo-electron microscope density map;基于所述掩膜值、所述局部分辨率波动值和所述全局分辨率值确定局部分辨率值。A local resolution value is determined based on the mask value, the local resolution fluctuation value and the global resolution value.
- 一种神经网络模型的训练装置,其特征在于,包括:A training device for a neural network model, characterized in that it comprises:第一确定模块,基于第一目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值,所述第一目标冷冻电镜密度图标注有掩膜值标签、局部分辨率波动值标签和全局分辨率值标签;The first determination module determines the mask value, the local resolution fluctuation value and the global resolution value based on the first target cryo-electron microscope density map, and the first target cryo-electron microscope density map is marked with a mask value label and a local resolution fluctuation value labels and global resolution value labels;训练模块,基于所述掩膜值、所述局部分辨率波动值和所述全局分辨率值对所述神经网络模型进行训练,以使所述掩膜值趋近于所述掩膜值标签、所述局部分辨率波动值趋近于局部分辨率波动值标签、所述全局分辨率值趋近于所述全局分辨率值标签。A training module that trains the neural network model based on the mask value, the local resolution fluctuation value, and the global resolution value, so that the mask value approaches the mask value label, The local resolution fluctuation value approaches the local resolution fluctuation value label, and the global resolution value approaches the global resolution value label.
- 一种基于神经网络的冷冻电镜密度图分辨率估计装置,其特征在于,包括:A neural network-based cryo-electron microscope density map resolution estimation device, characterized in that it includes:第一确定模块,基于第二目标冷冻电镜密度图确定掩膜值、局部分辨率波动值和全局分辨率值;The first determination module determines the mask value, local resolution fluctuation value and global resolution value based on the second target cryo-electron microscope density map;第二确定模块,基于所述掩膜值、所述局部分辨率波动值和所述全局分辨率值确定局部分辨率值。The second determining module is configured to determine a local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器上被所述处理器执行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述神经网络模型的训练方法的步骤或权利要求8所述的基于神经网络的冷冻电镜密度图分辨率检测方法。A computer device, comprising a memory, a processor, and a computer program stored on the memory and executed by the processor, characterized in that, when the processor executes the computer program, any one of claims 1 to 7 is implemented. The steps of the training method of the neural network model described in the item or the method for detecting the resolution of the cryo-electron microscope density map based on the neural network described in claim 8.
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述神经网络模型的训练方法的步骤或权利要求8所述的基于神经网络的冷冻电镜密度图分辨率检测方法。A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps or rights of the neural network model training method according to any one of claims 1 to 7 are realized The neural network-based detection method for the density map resolution of cryo-electron microscopy described in claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/075408 WO2023147706A1 (en) | 2022-02-07 | 2022-02-07 | Neural network model training method and resolution estimation method for cryo-electron microscope density map |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/075408 WO2023147706A1 (en) | 2022-02-07 | 2022-02-07 | Neural network model training method and resolution estimation method for cryo-electron microscope density map |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023147706A1 true WO2023147706A1 (en) | 2023-08-10 |
Family
ID=87553150
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/075408 WO2023147706A1 (en) | 2022-02-07 | 2022-02-07 | Neural network model training method and resolution estimation method for cryo-electron microscope density map |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023147706A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069797A (en) * | 2015-08-13 | 2015-11-18 | 上海交通大学 | Method for detecting resolution of three-dimensional density picture of cryo-electron microscopy based on mask |
WO2019079251A1 (en) * | 2017-10-18 | 2019-04-25 | President And Fellows Of Harvard College | Modeling of molecular structures |
CN111210869A (en) * | 2020-01-08 | 2020-05-29 | 中山大学 | Protein cryoelectron microscope structure analysis model training method and analysis method |
CN113643230A (en) * | 2021-06-22 | 2021-11-12 | 清华大学 | Continuous learning method and system for identifying biomacromolecule particles of cryoelectron microscope |
-
2022
- 2022-02-07 WO PCT/CN2022/075408 patent/WO2023147706A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069797A (en) * | 2015-08-13 | 2015-11-18 | 上海交通大学 | Method for detecting resolution of three-dimensional density picture of cryo-electron microscopy based on mask |
WO2019079251A1 (en) * | 2017-10-18 | 2019-04-25 | President And Fellows Of Harvard College | Modeling of molecular structures |
CN111210869A (en) * | 2020-01-08 | 2020-05-29 | 中山大学 | Protein cryoelectron microscope structure analysis model training method and analysis method |
CN113643230A (en) * | 2021-06-22 | 2021-11-12 | 清华大学 | Continuous learning method and system for identifying biomacromolecule particles of cryoelectron microscope |
Non-Patent Citations (1)
Title |
---|
ZHAO HE, LI XUE-MING;SHEN YUAN: "Wavelet packet transform based local filter in cryo-EM single particle reconstruction", DIANZI-XIANWEI-XUEBAO = JOURNAL OF CHINESE ELECTRON MICROSCOPY SOCIETY, vol. 39, no. 3, 15 June 2020 (2020-06-15), pages 282 - 287, XP055966265, ISSN: 1000-6281, DOI: 10.3969/j.issn.1000-6281.2020.03.009 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109918671B (en) | Electronic medical record entity relation extraction method based on convolution cyclic neural network | |
US11507800B2 (en) | Semantic class localization digital environment | |
WO2023134084A1 (en) | Multi-label identification method and apparatus, electronic device, and storage medium | |
WO2022105125A1 (en) | Image segmentation method and apparatus, computer device, and storage medium | |
CN110659665B (en) | Model construction method of different-dimension characteristics and image recognition method and device | |
CN110390363A (en) | A kind of Image Description Methods | |
CN114612501B (en) | Neural network model training method and frozen electron microscope density map resolution estimation method | |
CN111475622A (en) | Text classification method, device, terminal and storage medium | |
CN114926835A (en) | Text generation method and device, and model training method and device | |
TWI803243B (en) | Method for expanding images, computer device and storage medium | |
US20220188605A1 (en) | Recurrent neural network architectures based on synaptic connectivity graphs | |
CN111898704B (en) | Method and device for clustering content samples | |
US20230281826A1 (en) | Panoptic segmentation with multi-database training using mixed embedding | |
CN114266897A (en) | Method and device for predicting pox types, electronic equipment and storage medium | |
WO2023116572A1 (en) | Word or sentence generation method and related device | |
KR20230046946A (en) | Electronic device for identifying a target speaker and an operating method thereof | |
KR102434969B1 (en) | Method and apparatus for face super-resolution using adversarial distillation of facial region dictionary | |
Wu et al. | Transferring vision-language models for visual recognition: A classifier perspective | |
CN114529917B (en) | Zero-sample Chinese single-word recognition method, system, device and storage medium | |
US11410016B2 (en) | Selective performance of deterministic computations for neural networks | |
CN114694255A (en) | Sentence-level lip language identification method based on channel attention and time convolution network | |
CN117474796B (en) | Image generation method, device, equipment and computer readable storage medium | |
WO2023147706A1 (en) | Neural network model training method and resolution estimation method for cryo-electron microscope density map | |
US20230360636A1 (en) | Quality estimation for automatic speech recognition | |
CN103279581A (en) | Method for performing video retrieval by compact video theme descriptors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22924635 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |