CN114580603B - Method for constructing single-particle level energy curved surface based on data of cryoelectron microscope - Google Patents
Method for constructing single-particle level energy curved surface based on data of cryoelectron microscope Download PDFInfo
- Publication number
- CN114580603B CN114580603B CN202210484308.2A CN202210484308A CN114580603B CN 114580603 B CN114580603 B CN 114580603B CN 202210484308 A CN202210484308 A CN 202210484308A CN 114580603 B CN114580603 B CN 114580603B
- Authority
- CN
- China
- Prior art keywords
- particle
- dimensional
- data
- electron density
- density map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/002—Biomolecular computers, i.e. using biomolecules, proteins, cells
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Analysing Materials By The Use Of Radiation (AREA)
Abstract
The invention discloses a method for constructing a single-particle level energy curved surface based on cryoelectron microscope data, and belongs to the crossing field of data science and biology. The energy curved surface of the single particle level is obtained through grouping classification of single particle data, low-dimensional manifold mapping of a three-dimensional electron density map data set, similarity calculation of a single particle image and a corresponding projection image thereof, and training and prediction of a convolutional neural network. The single-particle horizontal energy curved surface obtained by the invention can visually reflect the conformation distribution of biomolecules, expands the multi-conformation analysis mode of the data of the cryoelectron microscope and improves the robustness of the data processing result to high noise.
Description
Technical Field
The invention provides a method for constructing a single-particle-level energy curved surface by designing a machine learning algorithm through data of a cryoelectron microscope, and belongs to the crossing field of data science and biology.
Background
The technique of electron freezing microscope is one of the important means for analyzing the structure of biological macromolecules and compounds. And (3) rapidly freezing the purified protein, dispersing the protein in a thin ice layer, carrying out data acquisition under an electron microscope to obtain two-dimensional projections of each single particle, and then carrying out three-dimensional reconstruction to obtain an electron density map of a three-dimensional structure. The technology can analyze the high-resolution structure of the biological molecules under the condition close to the natural physiological state, and is beneficial to researching the working mechanism of a complex protein machine.
Biomolecules generally have intrinsic flexibility, so dynamic structural changes and conformational heterogeneity of biomolecules have been one of the key points of research in structural biology. In the crystalline state, the structural changes of the biomolecules are lattice constrained, typically providing only a static structure and limited kinetic parameters. The advantage of cryoelectron microscopy over crystallography is that it is possible to capture various states of biomolecules in solution and record projections from different angles under different conformations. Cryo-electron microscopy data therefore provide the basis for multi-conformation analysis of biomolecules. In the field of data processing of a cryoelectron microscope, some existing algorithms classify multi-conformations through clustering analysis, maximum likelihood analysis and other methods, but variation differences of biomolecular components and conformations need to be checked for reasonableness through other technologies.
Disclosure of Invention
The invention aims to provide a method for constructing an energy curved surface at a single particle level through data of a cryoelectron microscope, which is used for solving the problem of high-precision description of the conformation distribution of biomolecules.
The method of the invention comprises the following steps:
a method for constructing an energy curved surface at a single particle level based on cryoelectron microscope data is disclosed, and the method comprises the following steps as shown in figure 1:
A. dividing the image data of the cryoelectron microscope into a plurality of groups of single-particle images, respectively carrying out three-dimensional classification on each group of single-particle images, and reconstructing to generate a series of three-dimensional electron density map data sets;
B. the method for realizing the low-dimensional manifold embedding of the three-dimensional electron density map data set mapped by adopting the deep manifold learning algorithm comprises the following steps:
B1. extracting the structure detail characteristics of each three-dimensional electron density map by using a depth self-coding network;
B2. processing the three-dimensional electron density map data set and the detail characteristics thereof through manifold learning to obtain the low-dimensional manifold coordinates of each three-dimensional electron density map;
C. For each single particle image(Is an index of a single-particle image,is a pixel index), calculates its angle projection image corresponding to the three-dimensional electron density mapThe realization method comprises the following steps:
C1. taking out each single particle imageObtaining the projection image of the three-dimensional electron density map at the angle;
C2. Single particle imageProjection image corresponding theretoDegree of similarity therebetweenThe specific expression of (a) is defined as:
in the formula (I), the compound is shown in the specification,andrespectively representing single-grain imagesProjection image corresponding theretoThe pixel average value of (1). The degree of similarityHas a value range of [0,1 ]],The larger the size of the single particle imageProjection image corresponding theretoThe more similar. For each single particle imageCalculating a single-particle imageCorresponding projected imageDegree of similarity of;
D. Designing a convolutional neural network, and training the convolutional neural network by using the whole single-particle data set, namely the convolutional neural network is used for learning single-particle image dataCorresponding to the low-dimensional manifold coordinates of the three-dimensional electron density mapObtaining a single particle imageLow dimensional manifold coordinate mapping ofObjective function of convolutional neural networkEuclidean distance defined as the degree of similarity weighted:
in the formula, weightIs a single particle imageCorresponding projected imageThe similarity degree value of (a) is obtained,is a dimension index of the low-dimensional manifold coordinates.
E. Predicting each single-particle image by using trained convolutional neural networkCoordinate values on the energy surfaceThereby obtaining an energy surface at a single particle level.
As a preferable scheme, in the step B, before the low-dimensional manifold mapping operation, low-pass filtering preprocessing with the same threshold may be performed on the three-dimensional electron density map data set, so as to improve robustness of low-dimensional manifold embedding reflecting the conformational distribution, and provide a regression label with higher quality for the convolutional neural network.
As a preferable scheme, in the step B2, a t-SNE or UMAP manifold learning algorithm with a good effect may be applied to obtain a low-dimensional manifold embedding of the three-dimensional electron density map data set, so as to improve the regression performance of the convolutional neural network.
Preferably, in the step D, in order to consider the similarity degree between the single-particle image and the angle projection image corresponding to the three-dimensional electron density map, a decoder may be added to the design of the convolutional neural network, and the angle projection image corresponding to the three-dimensional electron density map is regressed through the output of the decoding layer, and the objective function is simultaneously usedDefined as a weighted sum of the distance between the low-dimensional manifold coordinates and the distance between the decoded images.
The invention has the following technical effects:
the invention provides a method for constructing a single-particle-level energy curved surface based on single-particle image data of a cryoelectron microscope, which is used for obtaining the single-particle-level energy curved surface through grouping classification of the single-particle data, low-dimensional manifold mapping of a three-dimensional electron density map data set, similarity calculation of a single-particle image and a corresponding projection image thereof, and training and prediction of a convolutional neural network. The invention can intuitively reflect the conformation distribution of the biological molecules by utilizing the particle-level energy curved surface obtained by the convolutional neural network. According to the invention, the cryoelectron microscope technology and the convolutional neural network are combined with each other, the high-precision visualization is carried out on the conformational space of the biomolecule to be researched, the multi-conformational analysis mode of the cryoelectron microscope data is expanded, and the robustness of the data processing result to high noise is improved.
Drawings
FIG. 1 is a technical flowchart for constructing a single-particle data energy curved surface of a cryoelectron microscope according to the present invention.
FIG. 2 is a schematic diagram of a convolutional neural network of the present invention to obtain coordinates of a single-particle energy surface.
FIG. 3 is a schematic diagram of an energy surface and conformational distribution analysis obtained by an embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: unless specifically stated otherwise, the sample pictures and values provided in these examples do not limit the scope of the invention.
Referring to fig. 2, the present invention uses convolutional neural networks to study the complex conformational distribution of a protein.
Firstly, dividing single particle data of the protein by a cryoelectron microscope into a plurality of groups, and respectively carrying out three-dimensional classification to generate a three-dimensional electron density map data set. The single particle data for this protein, which total over 2,500,000 single particle images, were divided into 25 groups, each group containing about 100,000 single particle images, and after three-dimensional classification, 250 three-dimensional electron density maps were generated.
And secondly, designing a deep manifold learning algorithm to obtain low-dimensional manifold embedding of the three-dimensional electron density map data set. Inputting a depth self-coding network for extracting features through 250 three-dimensional electron density maps, setting the number of rounds to be 50 rounds in the training process, enabling each batch of data to comprise 4 three-dimensional electron density maps, and iteratively updating the parameters of the depth self-coding network by using an Adam optimizer. And after training is finished, using the depth self-coding network parameters with the minimum mean square error of the decoding layer for feature extraction of the three-dimensional electron density map data set. After the characteristics of the three-dimensional electron density map are extracted, mapping the low-dimensional manifold embedding of the three-dimensional electron density map data set through a t-SNE manifold learning algorithm to obtain the low-dimensional manifold coordinate of each three-dimensional electron density mapThe low-dimensional coordinates require that the local distance of the high-dimensional data is kept as constant as possible, the dimension of the low-dimensional coordinates is set to be 2, the confusion is set to be 30, and the maximum number of iteration rounds is set to be 1000.
And thirdly, calculating the similarity degree of each single-particle image and the projection image corresponding to the three-dimensional electron density map of the single-particle image. Namely, angle information of each single particle image is taken out, and a projection image of the three-dimensional electron density image at the angle is obtained; single particle imageProjection image corresponding thereto(Is an index of a single-particle image,is an index of a pixel point), the degree of similarityThe specific expression of (a) is defined as:
in the formula (I), the compound is shown in the specification,andrespectively representing single-grain imagesProjection image corresponding theretoThe pixel average value of (2). The degree of similarityHas a value range of [0,1 ]],The larger the size of the single particle imageProjection image corresponding theretoThe more similar. For each single particle imageCalculating its corresponding projection imageDegree of similarity of (2) ;
If it is usedA value of 1 indicates a single-particle image used for the calculationProjection image corresponding theretoIs identical to each other ifThe value of 0 represents the single-particle image used for calculationWithout any structural information. Degree of similarity of a typical single-particle image to its corresponding projected imageThe magnitude of the degree of similarity between two images is measured as a numerical value between 0 and 1.
Fourthly, training a convolution neural network and regressing the single particle passing similarityWeighted low-dimensional manifold embedding.Objective function of convolutional neural networkDefined as degree of similarityWeighted euclidean distance:
the objective function is used for learning single-particle image dataCorresponding to the low-dimensional manifold coordinates of the three-dimensional electron density mapObtaining a single particle imageLow dimensional manifold coordinate mapping ofWherein the weight isIs a single particle imageCorresponding projected imageThe similarity measure value of (2). Training a convolutional neural network by using the whole single-particle data set; when the low-dimensional coordinates of the three-dimensional electron density map are learned, the similarity degree of the single-particle image and the corresponding three-dimensional electron density map is considered, and the effect expression of the convolution neural network for single-particle low-dimensional manifold mapping is improved. Training of convolutional neural networksIn the process, the maximum round number is set as 50 rounds, the batch data volume is set as 128 single-particle images, and the parameters of the convolutional neural network are updated iteratively until the loss function is converged.
Fifthly, generating each single-particle image by using the trained convolutional neural networkLow dimensional manifold coordinates ofAnd the method is used for drawing the energy curved surface of the single particle level. Inputting the data of each single-particle image in the cryoelectron microscope data of the protein into a trained convolutional neural network to obtain the low-dimensional manifold coordinate of each single-particle image, namely the coordinate value on an energy curved surfaceThereby obtaining the single-particle level energy curved surface and carrying out high-precision visualization on the complex conformation space of the protein.
Referring to fig. 3, for another protein, the method for constructing the energy surface of the single particle level based on the single particle image data of the cryoelectron microscope is also applied to generate 600 three-dimensional electron density maps, and the conformational distribution characteristics of the protein are researched through the energy surface. The protein is analyzed by drawing an energy surface of a single particle level by using a convolutional neural network, and the analysis shows that if a certain conformation is more stable, the probability of the conformation is higher, the energy surface corresponds to lower energy, and conversely, if the stability of the certain conformation is lower, the probability of the conformation is lower, and the energy surface corresponds to higher energy value.
It is finally noted that the disclosed embodiments are intended to aid in the further understanding of the invention, but that those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of this disclosure and the appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.
Claims (6)
1. A method for constructing an energy curved surface of a single particle level based on data of a cryoelectron microscope comprises the following steps:
A. dividing the image data of the cryoelectron microscope into a plurality of groups of single-particle images, respectively carrying out three-dimensional classification on each group of single-particle images, and reconstructing to generate a series of three-dimensional electron density map data sets;
B. mapping the low-dimensional manifold embedding of the three-dimensional electron density map data set by adopting a deep manifold learning algorithm;
C. for each single particle imageCalculating the projection image with the angle corresponding to the three-dimensional electron density mapTo the extent of the similarity of (a) to (b),is an index of the single-particle image,is a pixel point index, and the realization method is as follows:
C1. taking out each single particle imageObtaining the projection image of the three-dimensional electron density map at the angle;
C2. Single particle imageProjection image corresponding theretoDegree of similarity between themThe specific expression of (a) is defined as:
in the formula (I), the compound is shown in the specification,andrespectively representing single-grain imagesProjection image corresponding theretoThe pixel average value of (a);
D. training a convolutional neural network using the entire single-particle dataset, the convolutional neural network being used to learn single-particle image dataCorresponding to the low-dimensional manifold coordinates of the three-dimensional electron density mapObtaining a single particle imageLow dimensional manifold coordinate mapping ofOf convolutional neural networksObjective functionEuclidean distance defined as a weighted similarity measure:
in the formula, weightIs a single particle imageCorresponding projected imageThe similarity degree value of (2);is a dimension index of the low-dimensional manifold coordinates;
2. The method for constructing an energy surface at a single particle level based on cryo-electron microscopy data as defined in claim 1, wherein step B is performed by:
B1. extracting the structure detail characteristics of each three-dimensional electron density map by using a depth self-coding network;
3. The method for constructing a single-particle-level energy surface based on cryo-electron microscopy data as claimed in claim 2, characterized in that in step B2, the three-dimensional electron density map data set is subjected to a low-pass filtering pre-processing with the same threshold.
4. The method for constructing an energy surface at a single particle level based on cryo-electron microscopy data as claimed in claim 2 wherein in step B2, a t-SNE or UMAP manifold learning algorithm is used.
6. The method for constructing the energy curved surface of the single particle level based on the cryo-electron microscopy data as claimed in claim 1, wherein a decoder is added in the step D, the three-dimensional electron density map projection image corresponding to the angle is regressed through the output of the decoding layer, and simultaneously the objective function is processedDefined as a weighted sum of the distance between the low-dimensional manifold coordinates and the distance between the decoded images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210484308.2A CN114580603B (en) | 2022-05-06 | 2022-05-06 | Method for constructing single-particle level energy curved surface based on data of cryoelectron microscope |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210484308.2A CN114580603B (en) | 2022-05-06 | 2022-05-06 | Method for constructing single-particle level energy curved surface based on data of cryoelectron microscope |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114580603A CN114580603A (en) | 2022-06-03 |
CN114580603B true CN114580603B (en) | 2022-07-26 |
Family
ID=81769121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210484308.2A Active CN114580603B (en) | 2022-05-06 | 2022-05-06 | Method for constructing single-particle level energy curved surface based on data of cryoelectron microscope |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114580603B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107092790A (en) * | 2017-04-19 | 2017-08-25 | 上海交通大学 | Ice mapping three-dimensional density figure resolution detection method |
CN113643230A (en) * | 2021-06-22 | 2021-11-12 | 清华大学 | Continuous learning method and system for identifying biomacromolecule particles of cryoelectron microscope |
CN114283217A (en) * | 2021-09-23 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Method, device and equipment for training reconstruction model of three-dimensional electron microscope image |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9373059B1 (en) * | 2014-05-05 | 2016-06-21 | Atomwise Inc. | Systems and methods for applying a convolutional network to spatial data |
-
2022
- 2022-05-06 CN CN202210484308.2A patent/CN114580603B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107092790A (en) * | 2017-04-19 | 2017-08-25 | 上海交通大学 | Ice mapping three-dimensional density figure resolution detection method |
CN113643230A (en) * | 2021-06-22 | 2021-11-12 | 清华大学 | Continuous learning method and system for identifying biomacromolecule particles of cryoelectron microscope |
CN114283217A (en) * | 2021-09-23 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Method, device and equipment for training reconstruction model of three-dimensional electron microscope image |
Also Published As
Publication number | Publication date |
---|---|
CN114580603A (en) | 2022-06-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110443281B (en) | Text classification self-adaptive oversampling method based on HDBSCAN (high-density binary-coded decimal) clustering | |
CN110889852B (en) | Liver segmentation method based on residual error-attention deep neural network | |
CN112001218B (en) | Three-dimensional particle class detection method and system based on convolutional neural network | |
CN113693563B (en) | Brain function network classification method based on hypergraph attention network | |
WO2023217290A1 (en) | Genophenotypic prediction based on graph neural network | |
CN114091628B (en) | Three-dimensional point cloud up-sampling method and system based on double branch network | |
Li et al. | Dictionary optimization and constraint neighbor embedding-based dictionary mapping for superdimension reconstruction of porous media | |
CN115311502A (en) | Remote sensing image small sample scene classification method based on multi-scale double-flow architecture | |
Maddumala | A Weight Based Feature Extraction Model on Multifaceted Multimedia Bigdata Using Convolutional Neural Network. | |
Zhang et al. | PM-ARNN: 2D-TO-3D reconstruction paradigm for microstructure of porous media via adversarial recurrent neural network | |
CN118038032A (en) | Point cloud semantic segmentation model based on super point embedding and clustering and training method thereof | |
CN114580603B (en) | Method for constructing single-particle level energy curved surface based on data of cryoelectron microscope | |
Peng et al. | Fully convolutional neural networks for tissue histopathology image classification and segmentation | |
CN115393378B (en) | Low-cost and efficient cell nucleus image segmentation method | |
CN103336781B (en) | A kind of medical image clustering method | |
Sun et al. | Spatial-temporal scientific data clustering via deep convolutional neural network | |
CN112991402A (en) | Cultural relic point cloud registration method and system based on improved differential evolution algorithm | |
Uddin et al. | Practical analysis of macromolecule identity from cryo-electron tomography images using deep learning | |
Senthilnathan et al. | Comparison and validation of stochastic microstructure characterization and reconstruction: Machine learning vs. deep learning methodologies | |
CN117373564B (en) | Method and device for generating binding ligand of protein target and electronic equipment | |
Liu et al. | PointFP: A Feature-Preserving Point Cloud Sampling | |
Zheng et al. | Localization and recognition of single particle image in microscopy micrographs based on region based convolutional neural networks | |
Hamitouche | Machine learning for determining continuous conformational transitions of biomolecular complexes from single-particle cryo-electron microscopy images | |
WO2023091180A1 (en) | System and method for automated microscope image acquisition and 3d analysis | |
Codex | Exploring Subgraph Modeling Techniques for Contextual-Point Relationships in 3D Point Cloud Segmentation: A Comprehensive Review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |