WO2023147706A1

WO2023147706A1 - Neural network model training method and resolution estimation method for cryo-electron microscope density map

Info

Publication number: WO2023147706A1
Application number: PCT/CN2022/075408
Authority: WO
Inventors: 张强锋; 代沐芷; 徐魁
Original assignee: 清华大学
Priority date: 2022-02-07
Filing date: 2022-02-07
Publication date: 2023-08-10

Abstract

The present application provides a neural network model training method and apparatus, a resolution estimation method and apparatus for a cryo-electron microscope density map, a computer device, and a storage medium, used for solving the problems in the prior art that input data of a resolution estimation algorithm of a cryo-electron microscope density map is not easy to obtain, and the calculation time is long. The neural network model training method comprises: determining a mask value, a local resolution fluctuation value, and a global resolution value on the basis of a first target cryo-electron microscope density map, wherein a mask value label, a local resolution fluctuation value label, and a global resolution value label are annotated for the first target cryo-electron microscope density map; and training a neural network model on the basis of the mask value, the local resolution fluctuation value, and the global resolution value, so that the mask value converges to the mask value label, the local resolution fluctuation value converges to the local resolution fluctuation value label, and the global resolution value converges to the global resolution value label.

Description

Neural Network Model Training Method and Cryo-EM Density Map Resolution Estimation Method

technical field

The present application relates to the technical field of resolution estimation of cryo-electron microscope density maps, in particular to a neural network model training method and device, a cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage media.

Background technique

Resolution estimation of cryo-EM density maps is a critical step in determining atomic structure. The resolution of cryo-EM density map includes global resolution and local resolution. Usually, different algorithms are used to estimate the global resolution and the local resolution, and the same resolution estimation method can only estimate one kind of resolution, that is, the global resolution or the local resolution. For example, the global resolution can be estimated by Fourier shell correlation algorithm. For the local resolution, it can be estimated by the ResMap algorithm.

Conventional resolution estimation methods, for example, one of the input data of the Blocres method is half-maps. Therefore, when it is necessary to perform resolution estimation on the cryo-electron microscope density map downloaded from EMDB or the cryo-electron microscope density map obtained by other means, it is necessary to first Obtain half-maps, and half-maps are not always provided, resulting in difficult to obtain input data for resolution estimation, difficult or require complex pre-preparation work to obtain input data.

Contents of the invention

In view of this, the embodiment of the present application provides a neural network model training method and device, a cryo-electron microscope density map resolution estimation method and device, computer equipment, and a storage medium to solve the problem of resolution of cryo-electron microscope density maps in the prior art. The input data of the rate estimation algorithm is not easy to obtain and the calculation time is long.

The first aspect of the present application provides a training method for a neural network model, including: determining a mask value, a local resolution fluctuation value, and a global resolution value based on the first target cryo-electron microscope density map, annotating the first target cryo-electron microscope density map There are mask value labels, local resolution fluctuation value labels and global resolution value labels; the neural network model is trained based on the mask value, local resolution fluctuation value and global resolution value, so that the mask value tends to be close to the mask value Membrane Value Label, Local Resolution Fluctuation Value Approaches Local Resolution Fluctuation Value Label, Global Resolution Value Approaches Global Resolution Value Label.

In one embodiment, determining the mask value, local resolution fluctuation value, and global resolution value based on the first target cryo-electron microscope density map includes: performing coding processing based on the residual module on the first target cryo-electron microscope density map to obtain m feature maps; decode the m feature maps to obtain the expected density map; determine the mask value and local resolution fluctuation value based on the expected density map; determine the global resolution value based on the top-level feature maps in the m feature maps.

In one embodiment, determining the mask value based on the expected density map includes: the expected density map undergoes a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 in sequence to obtain the mask value .

In one embodiment, determining the local resolution fluctuation value based on the expected density map includes: classifying the expected density map to obtain multiple first categories and respective weights of the multiple first categories; determining respective weights of the multiple first categories The product of each and the first preset value represented by each is the local resolution fluctuation value.

In one embodiment, determining the global resolution value based on the top-level feature maps in the multiple feature maps includes: classifying the top-level feature maps to obtain multiple second categories and their respective weights for multiple second categories; The product of the respective weights of the two categories and the second preset values represented by them is the global resolution value.

In one embodiment, training the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value includes: determining a first loss function based on the mask value and the mask value label; Determine the second loss function with the local resolution fluctuation value label; determine the third loss function based on the global resolution value and the global resolution value label; determine the total loss function based on the first loss function, the second loss function and the third loss function; Update the parameters of the neural network model based on the gradient of the total loss function.

In one embodiment, before determining the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, it further includes: cutting the cryo-electron microscope density map to obtain the Biomacromolecule circumscribed cube; scale the biomacromolecule circumscribed cube to obtain the density map of the first cryo-electron microscope.

The second aspect of the present application provides a method for estimating the resolution of a cryo-electron microscope density map based on a neural network, including: determining a mask value, a local resolution fluctuation value, and a global resolution value based on a second target cryo-electron microscope density map; The film value, the local resolution fluctuation value, and the global resolution value determine the local resolution value.

The third aspect of the present application provides a neural network model training device, including: a first determination module, which determines the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, the first target The cryo-electron microscope density map is marked with a mask value label, a local resolution fluctuation value label and a global resolution value label; the training module trains the neural network model based on the mask value, local resolution fluctuation value and global resolution value, to Make the mask value approximate the mask value label, the local resolution fluctuation value approximate the local resolution fluctuation value label, and the global resolution value approximate the global resolution value label.

The fourth aspect of the present application provides a neural network-based cryo-electron microscope density map resolution estimation device, including: a first determination module, based on the second target cryo-electron microscope density map to determine the mask value, local resolution fluctuation value and global resolution rate value; the second determination module determines the local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.

The fifth aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executed by the processor. The steps of the training method of the network model or the method for detecting the resolution of the cryo-electron microscope density map based on the neural network provided by any of the above-mentioned embodiments.

The sixth aspect of the present application provides a computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the steps or steps of the neural network model training method provided by any of the above-mentioned embodiments are implemented. The neural network-based method for detecting the resolution of cryo-electron microscope density maps provided by any of the above-mentioned embodiments.

According to the neural network model training method and device, cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage medium provided in this application, mask values and local resolution fluctuations can be estimated at the same time based on a cryo-electron microscope density map value and the global resolution value. Subsequently, the local resolution value may be determined based on the mask value, the local resolution fluctuation value and the global resolution value. It overcomes the limitation that conventional resolution estimation methods can only evaluate cryo-EM density maps from one dimension, that is, global resolution or local resolution. At the same time, the estimation method provided in this embodiment does not need to provide half-maps and masks, nor does it need to provide and adjust parameters manually.

Description of drawings

FIG. 1 is a schematic diagram of resolution distribution of training samples provided by an embodiment of the present application.

FIG. 2 is a structure diagram of a neural network model provided by an embodiment of the present application.

FIG. 3 is a flowchart of a training method for a neural network model provided by an embodiment of the present application.

FIG. 4 is a schematic diagram of an execution process of step S310 provided by an embodiment of the present application.

FIG. 5 is a schematic diagram of an execution process of step S320 provided by an embodiment of the present application.

FIG. 6 is a logical framework of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application.

FIG. 7 is a flowchart of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application.

Figure 8 shows the comparison results of the global resolution of each cryo-EM density map in the test set based on the CryoRes method and the global resolution of each cryo-EM density map published on EMDB.

Figure 9 shows the comparison results of the median of the local resolution of each cryo-EM density map in the test set based on the ResMap method and the global resolution obtained based on the CryoRes method and the global resolution published by EMDB.

Figure 10 shows the IoU results of the mask and mask label of each cryo-EM density map in the test set based on the CryoRes method.

Figure 11 shows the confusion matrix of the IoU results for masks and mask labels.

FIG. 12 is a structural block diagram of a neural network model training device provided by an embodiment of the present application.

Fig. 13 is a structural block diagram of a device for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided by an embodiment of the present application.

Fig. 14 is a structural block diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some, not all, embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

Before introducing the neural network model training method and device, cryo-electron microscope density map resolution estimation method and device, computer equipment, and storage media provided by this application, a brief introduction to the technical terms or nouns that may be involved in the embodiments of this application , so that those skilled in the art can understand.

Three-dimensional fully convolutional network (3D-UNet), the input is a three-dimensional image, including downsampling, upsampling and a full convolutional network similar to a skip connection structure, which is characterized by a fully symmetrical convolutional layer in the downsampling and upsampling parts, and The feature map at the downsampling end can skip deep sampling and be spliced to the corresponding upsampling end.

Residual (Residual) network, a layer of neural network can usually be regarded as y=H(x), and a residual block of residual network can be expressed as H(x)=F(x)+x, which is F (x)=H(x)-x. In the unit mapping, y=x is the observed value, and H(x) is the predicted value, so F(x) corresponds to the residual, so it is called the residual network.

Encoder-Decoder is a model architecture in deep learning. An Encoder (encoder) is a network that receives input and outputs feature vectors. These feature vectors are actually another representation of the input features and information. Decoder (decoder) is also a network (usually the same network structure as the encoder, but in the opposite direction), which takes the feature vector from the encoder and outputs the result that is the closest to the actual input or expected output.

The Group Normalization (GN) algorithm refers to dividing the channel dimension into G groups first, then normalizing each group separately, and finally merging the normalized data of the G groups into a feature map .

Rectified Linear Units (RELU), also known as corrected linear units, is a commonly used activation function in artificial neural networks, usually referring to nonlinear functions represented by ramp functions and their variants. The activation function is to pass the activated information to the next layer when activating a certain part of the neurons in the neural network. It has nonlinearity, differentiability and monotonicity.

The process of three-dimensional reconstruction of the cryo-EM density map includes two cases. The first case is to reconstruct the whole particle data in three dimensions, and the result is called single map; the second case is to randomly divide the particle data into two sub-maps. Dataset, the results of three-dimensional reconstruction of the two sub-datasets are called half-maps.

exemplary method

As mentioned in the background art, the current algorithms for estimating the resolution of cryo-EM density maps usually can only estimate local resolution or global resolution, resulting in a single function. In view of this, the present application provides a training method of a neural network model, and the neural network model obtained based on the training method can be used to estimate the global resolution value and the local resolution value at the same time.

The following is a detailed description in the order of preparing the training set, building the neural network model, model training, and model testing.

Step one, prepare the training set.

Prepare multiple cryo-EM density maps. In one example, the plurality of cryo-EM density maps are actual experimental data downloaded from the Electron Microscopy Database (EMDB). In this training process, 1523 cryo-electron microscope density maps were selected, including cryo-electron microscope density maps of proteins and cryo-electron microscope density maps of nucleic acids. Among them, 1174 cryo-electron microscope density maps were selected as the training set, and 349 cryo-electron microscope density maps were used as the test set to evaluate the estimation effect of the model. FIG. 1 is a schematic diagram of resolution distribution of a training set and a test set provided by an embodiment of the present application. As shown in Figure 1, the global resolutions of the 1523 cryo-EM density maps are all greater than or equal to 1 angstrom and less than 8 angstrom. The range from 1 to 8 Angstroms is divided into 6 intervals, namely: [1.0,3.0), [3.0,3.5), [3.5,4.0), [4.0,4.5), [4.5,6.0), [6.0,8.0 ). Among them, the distribution of 1174 training sets in each interval segment is as follows: 112, 279, 298, 231, 130 and 124. The distribution of the 349 test sets in each interval is as follows: 17, 81, 98, 61, 48 and 44.

In order to supervise the training of the neural network model, it is necessary to make a label for each cryo-EM density map. Each cryo-EM density map includes three types of labels, namely global resolution value labels, local resolution fluctuation value labels and mask value labels. Wherein, the global resolution value is a numerical value. In one example, the global resolution value published on the EMDB website is selected as the global resolution value label of the cryo-EM density map. The global resolution value published on the EMDB website is the currently recognized accurate global resolution result, so the global resolution value published on the EMDB website is selected as the label. In the embodiment of the present application, the local resolution fluctuation value is used instead of the local resolution value as the second type of label. This is because, at present, the more recognized method for estimating the local resolution is the Blocres method. The Blocres method is to cut the cryo-electron microscope density map into Sliding window after the small block, use the FSC method to obtain the resolution of the small block as the local resolution of the center of the small block, and obtain the local resolution of the entire cryo-EM density map little by little. The Blocres method requires half-maps, and it is difficult to obtain more training sets. In an example, the fluctuation value of the ResMap result is selected as the local resolution fluctuation value label. ResMap can use single map to obtain local resolution, and the local resolution obtained by it has certain errors. Selecting the local resolution fluctuation value can reduce the error of local resolution, thereby improving the reliability of the label. The local resolution obtained by ResMap is a three-dimensional matrix. In this matrix, some values are 100, and some values are not 100. After taking the average value of all non-100 values, subtract each value other than 100 in the local resolution. This average yields the local resolution fluctuation value. The mask value mentioned in this article is a three-dimensional matrix whose dimension and length are consistent with the three-dimensional density map. The mask value can be obtained after subsequent threshold processing. Thresholding refers to setting a value less than 0 to 0, indicating the background area, that is, there is no macromolecular information; setting a value greater than or equal to 0 to 1, indicating a non-background area, that is, having macromolecular information. In this way, by multiplying the mask and the cryo-EM density map, the particle regions in the cryo-EM density map can be extracted. In one example, a mask is simulated based on a Protein Data Bank (PDB) file corresponding to the density map. The width of the mask is, for example, 4 angstroms.

Step 2, build a neural network model

FIG. 2 is a structure diagram of a neural network model provided by an embodiment of the present application. The neural network model includes a Residual 3D-Unet module, a first branch module 22, a second branch module 23 and a third branch module 24. Wherein, the Residual 3D-Unet module includes an encoding submodule 211 and a decoding submodule 212. The output of the encoding sub-module 211 is used as the input of the decoding sub-module 212 , and the output of the decoding sub-module 212 is used as the input of the first branch module 22 and the second branch module 23 . The first branch module 22 is used to output the mask value, and the second branch module 23 is used to output the local resolution fluctuation value. The output of the coding sub-module 211 is also used as the input of the third branch module 24, and the third branch module 24 is used to output the global resolution value.

Specifically, the encoding sub-module 211 includes at least one feature extraction unit and at least one downsampling unit, and the at least one feature extraction unit and at least one downsampling unit are cascaded in an alternate manner. For example, as shown in FIG. 2 , the encoding submodule 211 includes a first feature extraction unit, a downsampling unit, and a second feature extraction unit connected in sequence. In one example, as shown in Figure 2, the feature extraction unit is a residual sub-network including three convolutional layers. The downsampling unit is the max pooling layer.

The decoding sub-module 12 includes at least one upsampling unit and at least one feature extraction unit, and at least one upsampling unit and at least one feature extraction unit are cascaded in an alternate manner. For example, as shown in FIG. 2 , the decoding sub-module 12 includes an up-sampling unit and a feature extraction unit connected in sequence. In one example, as shown in Figure 2, the upsampling unit is a deconvolutional layer, and the feature extraction unit is a residual subnetwork consisting of three convolutional layers.

The first branch module 22 includes a convolution layer with a convolution kernel of 3*3 and a convolution layer with a convolution kernel of 1*1.

The second branch module 23 adopts a classification+regression architecture. For example, as shown in Figure 2, the second branch module 23 includes a convolutional layer with a convolution kernel of 3*3, a convolutional kernel with a convolutional layer of 1*1 and a soft-Argmax layer.

The third branch module 24 also adopts the classification+regression architecture. For example, as shown in FIG. 2, the third branch module 24 includes three convolution layers with a convolution kernel of 3*3, two convolution layers with a convolution kernel of 1*1, a global average pooling layer, and a soft-Argmax layer. Among them, a maximum pooling layer is respectively set after the first two convolution layers with a convolution kernel of 3*3.

Step 3, model training

FIG. 3 is a flowchart of a training method for a neural network model provided by an embodiment of the present application. As shown in Figure 3, the training method 300 includes:

Step S310: Determine the mask value, local resolution fluctuation value, and global resolution value based on the first target cryo-electron microscope density map, where the first target cryo-electron microscope density map is marked with a mask value label, a local resolution fluctuation value label, and a global resolution value. Rate value label.

Step S320, train the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value approaches the mask value label, and the local resolution fluctuation value approaches the local resolution fluctuation value Value label, global resolution value approaches global resolution value label.

In step S310, the first target cryo-electron microscope density map refers to the preprocessed 1174 training sets mentioned above. In one embodiment, the preprocessing process mentioned here includes: cutting the cryo-electron microscope density map to obtain the circumscribed cube of the biomacromolecule in the cryo-electron microscope density map; performing size scaling on the circumscribed cube of the biomacromolecule to obtain the first target Cryo-EM density map. In an example, the size of the first target cryo-EM density map is less than or equal to 248*248*248.

FIG. 4 is a schematic diagram of an execution process of step S310 provided by an embodiment of the present application. As shown in Figure 4, step S310 specifically includes:

Step S311, performing encoding processing based on the residual module on the first target cryo-EM density map to obtain m feature maps.

Specifically, referring to FIG. 2 , this step is performed by the encoding sub-module 211 . Each feature extraction unit in the encoding sub-module 211 outputs a feature map.

First, feature extraction is performed on the density map of the first cryo-electron microscope to obtain the first feature map. In one example, the feature extraction process is performed based on a residual module. For example, the first target cryo-EM density map is first subjected to the GN operation, and then the first convolution operation to obtain the first sub-feature map. The first sub-feature map undergoes the GN operation, the second convolution operation, the GN operation and the third convolution operation in sequence to obtain the second sub-feature map. The ReLU operation is performed on the first sub-feature map, and the first sub-feature map and the second sub-feature map after the ReLU operation are added to obtain the first feature map.

Second, the first feature map is down-sampled to obtain a down-sampled feature map.

Next, feature extraction is performed on the downsampled feature map to obtain the second feature map. The feature extraction is a feature extraction process based on the residual module. For the specific process, refer to the above-mentioned process of obtaining the first feature map, and will not repeat it here.

It should be understood that the encoding sub-module 211 shown in FIG. 2 only includes two feature extraction units and one down-sampling unit, and two feature maps can be obtained. In other embodiments, the coding sub-module 211 may also include three feature extraction units and two down-sampling units, or four feature extraction units and three down-sampling units, etc. In the embodiment of the present application, the encoding sub-module 211 The number of feature extraction units and down-sampling units is not limited.

Based on the above described process, it can be known that the execution process of step S311 can be summarized as follows: for the i-th feature map, when i is equal to 1, perform feature extraction processing on the first target cryo-electron microscope density map to obtain the first feature map; When i is greater than 1, the i-1th feature map is down-sampled to obtain a down-sampled feature map. Perform feature extraction processing on the downsampled feature map to obtain the i-th feature map. Wherein, i is a positive integer greater than or equal to 1 and less than m, and the feature extraction process refers to the feature extraction process based on the residual module.

Step S312, decoding the m feature maps to obtain an expected density map.

Referring to FIG. 2 , this step is performed by the decoding sub-module 212 . Taking m equal to 2 as an example, step S312 is specifically executed by sequentially performing nonlinear rectification and deconvolution processing on the second feature map to obtain an upsampled feature map. Perform nonlinear rectification processing on the first feature map to obtain a nonlinear rectification feature map. The sum of the upsampling feature map and the nonlinear rectification feature map is subjected to feature extraction processing to obtain the desired density map. The feature extraction is a feature extraction process based on the residual module. For the specific process, please refer to the above-mentioned process of obtaining the first feature map, and will not go into details here.

Step S313, determining a mask value and a local resolution fluctuation value based on the expected density map.

Referring to FIG. 2 , the process of determining the mask value based on the desired density map is performed by the first branch module 22 . Specifically, the desired density map is sequentially passed through a convolution layer with a convolution kernel of 3*3 and a convolution layer with a convolution kernel of 1*1 to obtain a mask value. The process of determining the local resolution fluctuation value based on the desired density map is performed by the second branch module 23 . Specifically, firstly, the expected density map is classified to obtain multiple first categories and respective weights of the multiple first categories. For example, the desired density map is sequentially passed through a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 to obtain multiple first categories and respective weights of the multiple first categories. Secondly, the product of the respective weights of the multiple first categories and the first preset values represented by them is determined as the local resolution fluctuation value.

The first preset value is set manually and can be reasonably selected according to actual conditions. In one embodiment, the number of first categories is 37, and the first preset values represented by each of the 37 first categories are: -5, -4.5, -4, -3.5, -3, -2.5, -2,-1.5,-1,-0.9,-0.8,-0.7,-0.6,-0.5,-0.4,-0.3,0.2,-0.1,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7 ,0.8,0.9,1,1.5,2,2.5,3,3.5,4,4.5,5.

Step S314, determining a global resolution value based on the top-level feature map in the plurality of feature maps.

Referring to FIG. 2 , step S314 is executed by the third branch module 24 . For the neural network model shown in FIG. 2 , the top-level feature map is the second feature map, which is the output of the encoding sub-module 211 .

Specifically, firstly, the top-level feature map is classified to obtain multiple second categories and respective weights of the multiple second categories. For example, the top-level feature map sequentially undergoes three convolution operations with a convolution kernel of 3*3, two convolution operations with a convolution kernel of 1*1, and a global average pooling operation to obtain multiple second categories and multiple The respective weights of the second categories. Among them, the first two convolution kernels are 3*3 convolution operations followed by pooling operations. Secondly, the product of the respective weights of the plurality of second categories and the second preset values represented by them is determined as the global resolution value.

The second preset value is set manually and can be reasonably selected according to actual conditions. In one embodiment, the number of second categories is ten. The second preset values respectively represented by the ten second categories are: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

FIG. 5 is a schematic diagram of an execution process of step S320 provided by an embodiment of the present application. As shown in Figure 5, step S320 specifically includes:

Step S321, determining a first loss function based on the mask value and the mask value label.

Specifically, binary cross-entropy can be used as the first loss function, and the formula is:

in,

is the output of the network, and y is the label value.

Step S322, determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label.

Specifically, log10 can be used as the second loss function, and the formula is:

in,

is the output of the network, and y is the label value.

Step S323, determining a third loss function based on the global resolution value and the global resolution value label.

Specifically, MSE can be used as the third loss function, and the formula is:

in,

Output the result for the network, and y is the label value.

Step S324, determining a total loss function based on the first loss function, the second loss function and the third loss function.

The formula of the total loss function is: Loss _all = Loss _global + 10·Loss _local + Loss _mask .

Step S325, updating the parameters of the neural network model based on the gradient of the total loss function.

An SGD optimizer (momentum = 0.8) was employed to determine how to use the gradient of the total loss function to update the network parameters.

According to the neural network model obtained by the training method provided in the embodiment of the present application, the mask value, local resolution fluctuation value and global resolution value can be estimated simultaneously based on a cryo-electron microscope density map.

Step 4 Model Testing

This application uses 349 test sets to test the trained neural network model. The test results show that the errors of local resolution estimation and global resolution estimation of the neural network model are both 0.44 angstroms, and the average Intersection over Union (IoU) of the mask value is 0.71.

The present application also provides a method for estimating the resolution of the cryo-electron microscope density map by using the neural network model provided by any one of the above-mentioned embodiments. FIG. 6 is a logical framework of a method for estimating the resolution of a cryo-EM density map based on a neural network model provided by an embodiment of the present application. FIG. 7 is a flowchart of a method for estimating the resolution of a cryo-electron microscope density map (ie, the CryoRes method) based on a neural network model provided by an embodiment of the present application. As shown in Figure 6 and Figure 7, the CryoRes method 700 includes:

Step S710, preprocessing the cryo-electron microscope density map to obtain a second target cryo-electron microscope density map. The cryo-electron microscope density map here can be any cryo-electron microscope density map.

The preprocessing process is, for example, cutting the cryo-electron microscope density map to obtain the circumscribed cube of the biomacromolecule in the cryo-electron microscope density map; scaling the size of the circumscribed cube of the biomacromolecule to obtain the second target cryo-electron microscope density map. In an example, the size of the second target cryo-EM density map is less than or equal to 248*248*248.

Step S720, determining a mask value, a local resolution fluctuation value and a global resolution value based on the second target cryo-EM density map. For this process, reference may be made to the above-mentioned embodiment of the training method for the neural network model, which will not be repeated here.

Step S730, determining a local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.

According to the local resolution fluctuation value, the global resolution value and the mask value, the real value corresponding to each voxel in the second target cryo-EM density map can be obtained, which is the local resolution value of the voxel.

Specifically, as shown in FIG. 6 , the matrix addition operation is performed on the global resolution value and the local resolution fluctuation value to obtain the first sum. Threshold the mask value to get the mask. Thresholding includes setting the value less than 0 in the mask value to 0, indicating the background area, that is, there is no macromolecular information; setting the value greater than or equal to 0 to 1, indicating the non-background area, that is, having macromolecular information. Multiply the first sum with the mask to get the first product. The second sum is obtained by multiplying the mask by the first constant and adding the second constant. In one example, the first constant is -100 and the second constant is 100. The sum of the second sum and the first product is determined as the local resolution value.

Table 1 shows the comparison results between the CryoRes method 700 shown in FIG. 6 and FIG. 7 and several conventional resolution estimation methods. It can be seen from Table 1 that according to the method for estimating the resolution of the cryo-electron microscope density map provided in this embodiment, the local resolution and the global resolution can be estimated simultaneously based on a cryo-electron microscope density map, which overcomes the conventional resolution estimation method that only The limitations of cryo-EM density maps can be evaluated from one dimension, global resolution or local resolution. At the same time, the estimation method provided in this embodiment does not need to provide half-maps, and does not need to provide parameters such as masks.

Table 1 Comparison results between CryoRes method 700 and several conventional resolution estimation methods (* in the table means preferred, that is, it is more recommended to provide)

This application evaluates the performance of the CryoRes method 700 shown in FIG. 6 and FIG. 7 from three aspects, including: (1) local resolution; (2) global resolution; (3) mask.

For (1) local resolution, four cryo-EM density maps were selected as test density maps to evaluate the performance of the CryoRes method 700. Specifically, the first experimental density map is the cryo-EM structure of RelA bound to the 70S ribosome (EMDB: EMD-8108). The experimental density map was published in 2016, and its dimension is 400*400*400, and the voxel size is 1.34 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 3.0 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes ranges from 3.19 to 3.91 angstroms, and the average and standard deviation are 3.38 angstroms and 0.14 angstroms, respectively. The local resolution obtained based on Blocres ranged from 2.88 to 10.89 Å, with a mean and standard deviation of 3.39 Å and 0.77 Å, respectively. The local resolution obtained based on ResMap ranges from 2.9 to 5.9 Å, with a mean and standard deviation of 2.9 Å and 0.91 Å, respectively. The local resolution obtained based on MonoRes ranges from 2.68 to 8.93 Å, with a mean and standard deviation of 3.67 Å and 1.45 Å, respectively. The local resolution obtained based on DeepRes ranges from 2.68 to 6.64 Å, with a mean and standard deviation of 3.41 Å and 0.52 Å, respectively.

The second experimental density map is the cryo-EM structure of ArfA and TtRF2 bound to the 70S ribosome (EMDB: EMD-3492). The experimental density map was published in 2016, and its dimension is 400*400*400, and the voxel size is 1.04 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 3.35 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes is in the range of 3.37-4.07 angstroms, and the average value and standard deviation are 3.57 angstroms and 0.12 angstroms, respectively. The local resolution obtained based on Blocres ranged from 3.17 to 11.27 Å, with a mean and standard deviation of 3.62 Å and 0.79 Å, respectively. The local resolution obtained based on ResMap ranged from 2.3 to 4.05 Å, and the mean and standard deviation were 2.3 Å and 0.26 Å, respectively. The local resolution obtained based on MonoRes ranged from 2.83 to 8.16 Å, with a mean and standard deviation of 4.08 Å and 1.1 Å, respectively. The local resolution obtained based on DeepRes ranges from 2.5 to 6.06 Å, with a mean and standard deviation of 2.91 Å and 0.49 Å, respectively.

The third experimental density map is the cryo-EM structure of Gasdermin A3 membrane pores (EMDB: EMD-7450). The experimental density map was published in 2018. Its dimension is 380*380*380, and the voxel size is 1.0 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 4.4 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes is in the range of 3.58-4.46 angstroms, and the average value and standard deviation are 3.75 angstroms and 0.18 angstroms, respectively. The local resolution obtained based on Blocres ranged from 3.28 to 4.9 Å, with a mean and standard deviation of 3.7 Å and 0.31 Å, respectively. The local resolution obtained based on ResMap ranges from 2.2 to 2.45 Å, and the mean and standard deviation are 2.2 Å and 0.00 Å, respectively. The local resolution obtained based on MonoRes ranged from 2.0 to 7.31 Å, with a mean and standard deviation of 4.27 Å and 1.36 Å, respectively. The local resolution obtained based on DeepRes ranges from 3.45 to 8.24 Å, with a mean and standard deviation of 5.55 Å and 0.7 Å, respectively.

The fourth experimental density map is the cryo-EM structure of the bacterial 30S-IF1-IF2-IF3-mRNA-tRNA pre-translation initiation complex (EMDB: EMD-4082). The experimental density map was published in 2016. Its dimensions are 260*260*260 and the voxel size is 1.34 Angstroms. The global resolution obtained by threshold truncation (Fourier Shell Correlation, FSC) announced on the EMDB official website is 8.3 angstroms, and the threshold mentioned here is generally 0.143. CryoRes, ResMap, and DeepRes respectively use the signal map as input to obtain local resolution, and Blocres and MonoRes respectively use half-maps as input to obtain local resolution. Among them, the local resolution based on CryoRes is in the range of 7.57-9.05 angstroms, and the average value and standard deviation are 7.92 angstroms and 0.25 angstroms, respectively. The local resolution based on Blocres ranged from 6.48 to 33.96 Å, with a mean and standard deviation of 9.25 Å and 2.47 Å, respectively. The local resolution obtained based on ResMap ranged from 8.9 to 13.4 Å, with a mean and standard deviation of 11.15 Å and 1.05 Å, respectively. The local resolution obtained based on MonoRes ranges from 2.68 to 20.49 Å, with a mean and standard deviation of 8.5 Å and 4.59 Å, respectively. The local resolution obtained based on DeepRes ranges from 2.68 to 12.9 Å, with a mean and standard deviation of 8.69 Å and 1.05 Å, respectively.

For (2) global resolution, the global resolution of 349 cryo-EM density images in the test set was obtained based on CryoRes method 700. It was determined that the average absolute error between the global resolution based on the CryoRes method 700 and the global resolution published on EMDB of the 349 cryo-EM density maps was 0.44.

FIG. 8 shows a comparison result of the global resolution of each cryo-EM density map in the test set obtained based on the CryoRes method 700 and the global resolution of each cryo-EM density map published on EMDB. As shown in Figure 8, for most cryo-EM density maps, the global resolution based on the CryoRes method 700 is close to the global resolution published on EMDB, and the error is less than 1 angstrom; the error of a few cryo-electron microscope density maps is greater than 1 angstrom , but the error is basically within 2 Angstroms.

Figure 9 shows the comparison results of the median of the local resolution of each cryo-EM density map in the test set based on the ResMap method and the global resolution obtained based on the CryoRes method and the global resolution published by EMDB. The ordinate in Fig. 9 indicates the difference between the median obtained by the ResMap method and the global resolution obtained by the CryoRes method and the global resolution published by EMDB. Comparing Fig. 8 and Fig. 9, it can be seen that the median obtained by the ResMap method is The error between the number of digits and the global resolution published on EMDB is larger than the error between the global resolution obtained based on the CryoRes method 700 and the global resolution published on EMDB. At the same time, as shown in Figure 9, the error between the median of the local resolution obtained based on the ResMap method and the global resolution published by EMDB is negatively correlated with the resolution of the cryo-EM density map, that is, the lower the resolution, the greater the error. big. In comparison, the error fluctuation corresponding to the CryoRes-based method 700 shown in FIG. 8 is relatively stable, and is less affected by the resolution.

For (3) masks, the cryo-EM density maps in the test set were evaluated, and the average IoU of 349 cryo-EM density maps in the test set was 0.74.

FIG. 10 shows the IoU results of the masks and mask labels of each cryo-EM density map in the test set based on the CryoRes method 700 . It can be seen from Figure 10 that the IoU of most cryo-EM density maps is above 0.7. For cryo-EM density maps with low IoU results, there are usually large noises or unresolved low-resolution structures. The mask label depends on the PDB file for it, not on the cryo-EM density map itself. The mask obtained based on the CryoRes method 700 is more dependent on the cryo-EM density map itself, resulting in a lower IoU result, which meets the expectations for the mask.

Figure 11 shows the confusion matrix of the IoU results for masks and mask labels. A confusion matrix was made for the IoU results of 349 cryo-EM density images in the test set to evaluate the recognition effect of the mask obtained by the CryoRes method 700 on biological macromolecules and background parts. It can be seen from Fig. 11 that the recognition rate of the macromolecular position provided by the mask label reaches 0.91, and the recognition rate of the background position reaches 0.92.

Exemplary device

The present application also provides a training device for a neural network model. FIG. 12 is a structural block diagram of a neural network model training device provided by an embodiment of the present application. As shown in FIG. 12 , the training device 800 includes a first determination module 810 and a training module 820 . Among them, the first determination module 810 is used to determine the mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, the first target cryo-electron microscope density map is marked with mask value label, local resolution Fluctuation value labels and global resolution value labels. The training module 820 is used to train the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value tends to the mask value label, and the local resolution fluctuation value tends to the local resolution The rate fluctuation value label, the global resolution value approaches the global resolution value label.

In one embodiment, the first determination module 810 includes an encoding submodule, a decoding submodule, a first branch module, a second branch module and a third branch module. Wherein, the coding sub-module is used to perform coding processing based on the residual module on the first target cryo-EM density map to obtain m feature maps. The decoding sub-module is used to decode the m feature maps to obtain the expected density map. A first branch module is used to determine a mask value based on the desired density map. The second branch module is used to determine the local resolution fluctuation value based on the desired density map. The third branch module is used to determine the global resolution value based on the top-level feature map in the m feature maps.

In one embodiment, the training module 820 includes a first determination submodule, a second determination submodule, a third determination submodule, a fourth determination submodule and an update module. Wherein, the first determination submodule is used to determine the first loss function based on the mask value and the mask value label. The second determining submodule is used for determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label. The third determining submodule is used for determining a third loss function based on the global resolution value and the global resolution value label. The fourth determining submodule is used for determining the total loss function based on the first loss function, the second loss function and the third loss function. The update module is used to update the parameters of the neural network model based on the gradient of the total loss function.

The training device for the neural network model provided in this embodiment belongs to the same application concept as the training method for the neural network model provided in the embodiment of the present application, and can execute the training method for the neural network model provided in any embodiment of the application. Corresponding functional modules and beneficial effects of the neural network model training method. For technical details not described in detail in this embodiment, refer to the training method of the neural network model provided in the embodiment of the present application, which will not be repeated here.

The present application also provides a device for estimating the resolution of the cryo-electron microscope density map based on the neural network model. Fig. 13 is a structural block diagram of a device for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided by an embodiment of the present application. As shown in FIG. 13 , the estimation device 900 includes a preprocessing module 910 , a second determination module 920 and a third determination module 930 . Wherein, the preprocessing module 910 is used for preprocessing the cryo-electron microscope density map to obtain the second target cryo-electron microscope density map. The second determining module 920 is configured to determine a mask value, a local resolution fluctuation value and a global resolution value based on the second target cryo-EM density map. The third determination module 930 is used for determining the local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.

According to the method for estimating the resolution of the cryo-electron microscope density map provided in this embodiment, the local resolution value and the global resolution value can be estimated at the same time based on a cryo-electron microscope density map, which overcomes that the conventional resolution estimation method can only start from one dimension, That is, global resolution or local resolution evaluates the limitations of cryo-EM density maps. At the same time, the estimation method provided in this embodiment does not need to provide half-maps, and does not need to provide parameters such as masks.

The device for estimating the resolution of the cryo-electron microscope density map based on the neural network model provided in this embodiment belongs to the same application concept as the method for estimating the resolution of the cryo-electron microscope density map based on the neural network model provided in the embodiment of the present application, and this application can be implemented The method for estimating the resolution of a cryo-electron microscope density map based on a neural network model provided in any embodiment of the application has the corresponding functional modules and beneficial effects for executing the method for estimating the resolution of a cryo-electron microscope density map based on a neural network model. For technical details that are not described in detail in this embodiment, please refer to the method for estimating the resolution of cryo-electron microscope density maps based on neural network models provided in the embodiments of this application, and will not be repeated here.

Electronic equipment

Fig. 14 is a structural block diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 14 , electronic device 10 includes one or more processors 11 and memory 12 .

Processor 11 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions can be stored on the computer-readable storage medium, and the processor 11 can run the program instructions to implement the training method of the neural network model and the neural network-based Estimation methods for cryo-EM density map resolution of network models and/or other desired features. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).

The output device 14 can output various information to the outside, including determined distance information, direction information, and the like. Output devices 14 may include, for example, displays, speakers, printers, and communication networks and remote output devices to which they are connected, among others.

Of course, for simplicity, only some of the components related to the present application in the electronic device 10 are shown in FIG. 14 , and components such as bus, input/output interface, etc. are omitted. In addition, according to specific application conditions, the electronic device 10 may also include any other suitable components.

Computer program product and computer readable storage medium

In addition to the methods and devices described above, embodiments of the present application may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the procedures described in the above-mentioned "Exemplary Methods" section of this specification. Steps in the method for training a neural network model and the method for estimating the resolution of a cryo-electron microscope density map based on the neural network model according to various embodiments of the present application.

The computer program product can write program codes for executing the operations of the embodiments of the present application in any combination of one or more programming languages. The programming languages include object-oriented programming languages, such as Java, C++, etc., and also include conventional A procedural programming language such as "C" or similar programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.

In addition, the embodiment of the present application may also be a computer-readable storage medium, on which computer program instructions are stored. When executed by the processor, the computer program instructions cause the processor 11 to execute the method described in the above-mentioned "Exemplary Method" section of this specification. Steps in the method for training a neural network model and the method for estimating the resolution of a cryo-electron microscope density map based on the neural network model according to various embodiments of the present application.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

The basic principles of the present application have been described above in conjunction with specific embodiments, but it should be pointed out that the advantages, advantages, effects, etc. mentioned in the application are only examples rather than limitations, and these advantages, advantages, effects, etc. Various embodiments of this application must have. In addition, the specific details disclosed above are only for the purpose of illustration and understanding, rather than limitation, and the above details do not limit the application to be implemented by using the above specific details.

The block diagrams of devices, devices, equipment, and systems involved in this application are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be appreciated by those skilled in the art, these devices, devices, devices, systems may be connected, arranged, configured in any manner. Words such as "including", "comprising", "having" and the like are open-ended words meaning "including but not limited to" and may be used interchangeably therewith. As used herein, the words "or" and "and" refer to the word "and/or" and are used interchangeably therewith, unless the context clearly dictates otherwise. As used herein, the word "such as" refers to the phrase "such as but not limited to" and can be used interchangeably therewith.

It should also be pointed out that in the devices, equipment and methods of the present application, each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of this application.

The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

It should be understood that the qualifiers "first", "second", "third", "fourth", "fifth" and "sixth" used in the description of the embodiments of the present application are only for clearer explanation The technical solution cannot be used to limit the scope of protection of this application.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

A training method for a neural network model, characterized in that it comprises:

Determining a mask value, a local resolution fluctuation value, and a global resolution value based on a first target cryo-EM density map marked with a mask value label, a local resolution fluctuation value label, and a global resolution value label;

The neural network model is trained based on the mask value, the local resolution fluctuation value and the global resolution value, so that the mask value approaches the mask value label, the local The resolution fluctuation value approaches the local resolution fluctuation value label, and the global resolution value approaches the global resolution value label.
The training method of the neural network model according to claim 1, wherein said determining mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map comprises:

performing coding processing based on a residual module on the first target cryo-electron microscope density map to obtain m feature maps;

Decoding the m feature maps to obtain an expected density map;

determining the mask value and the local resolution fluctuation value based on the desired density map;

The global resolution value is determined based on a top-level feature map of the m feature maps.
The training method of the neural network model according to claim 2, wherein said determining said mask value based on said expected density map comprises:

The expected density map undergoes a convolution operation with a convolution kernel of 3*3 and a convolution operation with a convolution kernel of 1*1 in sequence to obtain the mask value.
The training method of the neural network model according to claim 2, wherein said determining said local resolution fluctuation value based on said expected density map comprises:

classifying the expected density map to obtain a plurality of first categories and respective weights of the plurality of first categories;

Determining the product of the respective weights of the plurality of first categories and the first preset values represented by them as the local resolution fluctuation value.
The training method of the neural network model according to claim 2, wherein, determining the global resolution value based on the top-level feature map in the plurality of feature maps comprises:

classifying the top-level feature map to obtain a plurality of second categories and respective weights of the plurality of second categories;

A product of respective weights of the plurality of second categories and a second preset value represented by each is determined as the global resolution value.
The training method of the neural network model according to claim 1, wherein the training of the neural network model based on the mask value, the local resolution fluctuation value and the global resolution value comprises :

determining a first loss function based on the mask value and the mask value label;

determining a second loss function based on the local resolution fluctuation value and the local resolution fluctuation value label;

determining a third loss function based on the global resolution value and the global resolution value label;

determining an overall loss function based on the first loss function, the second loss function, and the third loss function;

The parameters of the neural network model are updated based on the gradient of the total loss function.
The training method of neural network model according to claim 1, is characterized in that, before said determining mask value, local resolution fluctuation value and global resolution value based on the first target cryo-electron microscope density map, also includes:

Cutting the density map of the cryo-electron microscope to obtain circumscribed cubes of biological macromolecules in the density map of the cryo-electron microscope;

performing size scaling on the circumscribed cube of the biomacromolecule to obtain the cryo-electron microscope density map of the first target.
A method for estimating the resolution of a cryo-electron microscope density map based on a neural network, characterized in that it comprises:

determining a mask value, a local resolution fluctuation value, and a global resolution value based on the second target cryo-electron microscope density map;

A local resolution value is determined based on the mask value, the local resolution fluctuation value and the global resolution value.
A training device for a neural network model, characterized in that it comprises:

The first determination module determines the mask value, the local resolution fluctuation value and the global resolution value based on the first target cryo-electron microscope density map, and the first target cryo-electron microscope density map is marked with a mask value label and a local resolution fluctuation value labels and global resolution value labels;

A training module that trains the neural network model based on the mask value, the local resolution fluctuation value, and the global resolution value, so that the mask value approaches the mask value label, The local resolution fluctuation value approaches the local resolution fluctuation value label, and the global resolution value approaches the global resolution value label.
A neural network-based cryo-electron microscope density map resolution estimation device, characterized in that it includes:

The first determination module determines the mask value, local resolution fluctuation value and global resolution value based on the second target cryo-electron microscope density map;

The second determining module is configured to determine a local resolution value based on the mask value, the local resolution fluctuation value and the global resolution value.
A computer device, comprising a memory, a processor, and a computer program stored on the memory and executed by the processor, characterized in that, when the processor executes the computer program, any one of claims 1 to 7 is implemented. The steps of the training method of the neural network model described in the item or the method for detecting the resolution of the cryo-electron microscope density map based on the neural network described in claim 8.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps or rights of the neural network model training method according to any one of claims 1 to 7 are realized The neural network-based detection method for the density map resolution of cryo-electron microscopy described in claim 8.