CN106951918B - Single-particle image clustering method for analysis of cryoelectron microscope - Google Patents

Single-particle image clustering method for analysis of cryoelectron microscope Download PDF

Info

Publication number
CN106951918B
CN106951918B CN201710116076.4A CN201710116076A CN106951918B CN 106951918 B CN106951918 B CN 106951918B CN 201710116076 A CN201710116076 A CN 201710116076A CN 106951918 B CN106951918 B CN 106951918B
Authority
CN
China
Prior art keywords
class
image
similarity
network
classes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710116076.4A
Other languages
Chinese (zh)
Other versions
CN106951918A (en
Inventor
沈红斌
殷硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201710116076.4A priority Critical patent/CN106951918B/en
Publication of CN106951918A publication Critical patent/CN106951918A/en
Application granted granted Critical
Publication of CN106951918B publication Critical patent/CN106951918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a single-particle image clustering method for analysis of a cryoelectron microscope. A single-particle image clustering method is used for single-particle image analysis and comprises the following steps: the method comprises the following steps: accepting user input of an initial number of classes k0Number of final classes knAnd inputting the data set, randomly initializing the data set to k0Calculating a class center, and establishing a shared K nearest neighbor network for an input data set; step two: performing KMeans clustering once, adding the class center into the network when measuring the similarity between the input image and the class center, updating the network, and calculating the similarity between the nodes based on the network; step three: determining if the number of current classes, K, is equal to a user input, KnIf so, outputting each class and the class average image, and exiting, otherwise, splitting the largest class and returning to the step two to continue the execution.

Description

Single-particle image clustering method for analysis of cryoelectron microscope
Technical Field
The invention belongs to the technical field of structural biology analysis, and particularly relates to a single-particle image clustering method for analysis of a cryoelectron microscope.
Background
The cryoelectron microscope technology is a technology for generating a three-dimensional model of a sample by placing the sample in an ultra-cold environment and then sampling a two-dimensional image by using an electron microscope. Compared with two mature structural biology research means of X-ray crystallography and nuclear magnetic resonance technology, the cryoelectron microscope technology has the advantages that the morphological information and the phase information of molecules can be directly obtained, and proteins which are not suitable for being analyzed by the X-ray crystallography and the nuclear magnetic resonance technology can be analyzed. With the improvement of biological sample preparation technology, the improvement of electron microscope equipment and the development of digital image processing technology, electron microscopy has become a well-known powerful means for studying biomacromolecules, supramolecular complexes and subcellular structures.
The most common cryoelectron microscopy method is single-particle image analysis, which is a technique that generates a large number of two-dimensional projection images into a three-dimensional model. However, the signal-to-noise ratio of the image obtained by the electron microscope is very low, so that a large amount of single-particle image data, in the order of thousands to tens of thousands of images, must be collected in order to obtain a relatively accurate three-dimensional model. Therefore, clustering of the images is required before three-dimensional reconstruction is performed, so as to ensure that the images in each class belong to projection views generated from the same projection direction. And the single-particle image is characterized by extremely low signal-to-noise ratio, which is often lower than 1/30, so that the traditional image clustering algorithm is no longer applicable to the single-particle image.
Most of the single-particle image clustering algorithms commonly used at present are based on a variant of the KMeans algorithm. The SPIDER software adopts the steps of filtering and denoising firstly, then carrying out PCA dimension reduction on a pixel space, and finally clustering by adopting a split KMeans method. The EMAN2 software adopts the method that the image is subjected to feature extraction, and then KMeans clustering is carried out in a feature space. The XMIPP software adopts KMeans clustering which is directly split in pixel space, but the clustering criterion is a special method proposed by XMIPP.
Whether clustering is performed in a feature space or a pixel space, the similarity measurement of the current popular algorithm is pairwise similarity measurement, that is, the similarity of two images is obtained only by the two images. But the measurement result of pairwise similarity is no longer reliable due to the high noise of the single-particle image. Since the similarity measure is the most fundamental problem in clustering, once the similarity measure is inaccurate, the subsequent steps lose meaning.
Moreover, the input single-particle image data has class structure information, and the distance between the images belonging to the same class is relatively short, only the distance between the classes becomes small due to the influence of noise, and the distance between the classes becomes large, so that the classification is difficult by using the traditional method.
Disclosure of Invention
The invention provides a single-particle image clustering method for analysis of a cryoelectron microscope, which adopts a network method and utilizes global structural information to inhibit the influence of noise.
A single-particle image clustering method is used for single-particle image analysis and comprises the following steps:
the method comprises the following steps: accepting user input of an initial number of classes k0Number of final classes knAnd inputting the data set, randomly initializing the data set to k0Calculating a class center, and establishing a shared K nearest neighbor network for an input data set;
step two: performing KMeans clustering once, adding the class center into the network when measuring the similarity between the input image and the class center, updating the network, and calculating the network-based similarity between nodes;
step three: determining if the number of current classes, K, is equal to a user input, KnIf so, outputting each class and the class average image, and exiting, otherwise, splitting the largest class and returning to the step two to continue the execution.
The concrete realization of the second step comprises the following steps:
performing once Kmeans, namely calculating the Jaccard similarity of the image and all class centers of each input image, assigning the image to the class represented by the class center with the maximum similarity, updating the class center and the shared K nearest neighbor network after the assignment of all the images is finished, assigning each image, and repeating the steps until convergence or the number of iterations reaches a set upper limit;
when establishing the shared K nearest neighbor network, the following formula (1) is provided:
sim(Xi,C)>sim(Xi,Xj),sim(Ci,Cj)>sim(Ci,Xi) (1)
wherein C is a mean-like image, Xi,XjSim is a pairwise similarity calculation method adopted when establishing a shared K nearest neighbor network for any two input images,
each class maintains a shared K nearest neighbor network, the network is obtained by adding the current class center image on the basis of the original shared K nearest neighbor network,
the method for measuring the similarity of the Jaccard comprises the following steps:
Figure BDA0001235644650000021
wherein SxyFor Jaccard similarity of two images, Γ (x) is the neighborhood of xA domain.
Further, when the maximum class is split, the Jaccard similarity of the images in the class and the class average image is counted, the similarity values are arranged according to the height, the first 50% of the similarity values are taken as one class, the rest are taken as one class, information such as class centers of the two classes is calculated respectively, then the original class information is deleted, and two newly generated classes are reserved.
The single-particle image clustering algorithm based on the network similarity measurement is applied to the single-particle image clustering field for the first time, and has higher precision under the condition that the operation time is approximately the same compared with other methods in the current field. The invention aims to solve the problem of single-particle image clustering under the condition of low signal-to-noise ratio.
Compared with the method in the prior art, the method has the following remarkable advantages: and by adopting a network-based similarity measurement method, the algorithm is still applicable under the condition of low signal-to-noise ratio.
Drawings
FIG. 1 is a system structure diagram of a single-particle image clustering algorithm based on network similarity measurement according to the present invention.
FIG. 2 is four representative images of a data set in an embodiment of the present invention.
Fig. 3 is a class center image obtained in an embodiment of the present invention.
Fig. 4 is the true value of the class center in the embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
FIG. 1 shows a system structure diagram of the single-particle image clustering method of the present invention:
firstly, initializing a class center, and establishing a shared K nearest neighbor network for input data. The next step is a split KMeans algorithm from the top of the algorithm. From an algorithmic detail, we adopt network-based similarity as a similarity measure method in kmans. The following is specifically set forth:
the first step is as follows: accepting user input of an initial number of classes k0Number of final classes knAnd inputting the data set. Initializing a dataset as k0And (4) initializing a class center. A shared K-nearest neighbor network is established for the input data set.
The second step is that: KMeans was performed once. That is, for each input image, the Jaccard similarity between the image and all class centers is calculated and the image is assigned to the class represented by the class center with the highest similarity. And after all the images are assigned, updating the class center and the shared K nearest neighbor network, assigning each image, and repeating the steps until convergence or iteration times reach a set upper limit.
Since the signal-to-noise ratio of single-grain images is low but the signal-to-noise ratio of mean-like images is high, we lead to the following results when establishing a shared K-nearest-neighbor network:
sim(Xi,C)>sim(Xi,Xj),sim(Ci,Cj)>sim(Ci,Xi) (1)
wherein C is a mean-like image, Xi,XjFor any two input images, sim is a pairwise similarity calculation method adopted when establishing a shared K nearest neighbor network, and we adopt correct here.
Therefore, if we add all class average images to the network of input images at once, the class average images must be connected to each other, and these unnecessary edges will cause interference in the network, which is contrary to our objective of examining the similarity between the class average images and the input images. Therefore, the method is that each class maintains a shared K nearest neighbor network, and the network is obtained by adding the current class center image on the basis of the original shared K nearest neighbor network.
The method for measuring the similarity of the Jaccard comprises the following steps:
Figure BDA0001235644650000041
wherein SxyIs the Jaccard similarity of the two images. Γ (x) is the neighborhood of x.
The third step: judging whether the number of the current classes reaches the user input knIf yes, outputting each class and class center, exiting, otherwise splitting the maximum class, updating the number of the current class, and returning to execute the second step.
When the class with the maximum splitting is obtained, the Jaccard similarity of the images in the class and the class average image is counted, similarity values are arranged according to the height, the first 50% of the similarity values are used as one class, the rest similarity values are used as one class, and information such as class centers of the two classes is calculated respectively. Then, the original class information is deleted, and two newly generated classes are reserved.
Example (c):
there is a data set containing four classes, each with 60 images, with a signal-to-noise ratio of 1/30. We select one image per class to display as shown in fig. 2.
The software processing results using the method of the present invention are output as follows:
true class 1 True class 2 True class 3 True class 4
Output class 1 55 1 0 0
Output class 2 4 54 3 0
Output class 3 0 5 54 0
Output class 4 1 0 3 60
Therefore, we obtained the method with an accuracy of 92.92%.
The output class center image is fig. 3.
The true values of class centers are shown in fig. 4.
The result shows that the method effectively clusters the single-particle images with low signal-to-noise ratio, and the accuracy in the current data set reaches 92.92%.
The above embodiments do not limit the present invention in any way, and all technical solutions obtained by means of equivalent substitution or equivalent transformation fall within the protection scope of the present invention.

Claims (4)

1. A single-particle image clustering method is used for single-particle image analysis and is characterized by comprising the following steps:
the method comprises the following steps: accepting user input of an initial number of classes k0Number of final classes knAnd inputting the data set, randomly initializing the data set to k0Calculating a class center, and establishing a shared K nearest neighbor network for an input data set;
step two: performing KMeans clustering once, adding the class center into the network when measuring the similarity between the input image and the class center, updating the network, and taking the similarity based on the network as a similarity measurement method in KMeans;
step three: determining if the number of current classes, K, is equal to a user input, KnIf so, outputting each class and the class average image, and exiting, otherwise, splitting the largest class and returning to the step two to continue the execution.
2. The single-particle image clustering method of claim 1, wherein the specific implementation of the second step comprises:
performing once Kmeans, namely calculating the Jaccard similarity of the image and all class centers of each input image, assigning the image to the class represented by the class center with the maximum similarity, updating the class center and the shared K nearest neighbor network after the assignment of all the images is finished, assigning each image, and repeating the steps until convergence or the number of iterations reaches a set upper limit;
when establishing the shared K nearest neighbor network, the following formula (1) is provided:
sim(Xi,C)>sim(Xi,Xj),sim(Ci,Cj)>sim(Ci,Xi) (1)
wherein C is a mean-like image, Xi,XjSim is a pairwise similarity calculation method adopted when establishing a shared K nearest neighbor network for any two input images,
each class maintains a shared K nearest neighbor network, the network is obtained by adding the current class center image on the basis of the original shared K nearest neighbor network,
the method for measuring the similarity of the Jaccard comprises the following steps:
Figure FDA0002365860350000011
wherein SxyΓ (x) is the neighborhood of x for the Jaccard similarity of the two images.
3. The single-particle image clustering method according to claim 2, wherein when the largest class is split, the Jaccard similarity between the images in the class and the class average image is counted, the similarity values are arranged according to the height, the first 50% is taken as one class, the rest is taken as one class, information such as class centers of the two classes is calculated respectively, then the original class information is deleted, and two newly generated classes are retained.
4. The method for clustering single particle images according to claim 1, wherein the single particle image analysis is used in a cryoelectron microscopy biological analysis method.
CN201710116076.4A 2017-03-01 2017-03-01 Single-particle image clustering method for analysis of cryoelectron microscope Active CN106951918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710116076.4A CN106951918B (en) 2017-03-01 2017-03-01 Single-particle image clustering method for analysis of cryoelectron microscope

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710116076.4A CN106951918B (en) 2017-03-01 2017-03-01 Single-particle image clustering method for analysis of cryoelectron microscope

Publications (2)

Publication Number Publication Date
CN106951918A CN106951918A (en) 2017-07-14
CN106951918B true CN106951918B (en) 2020-04-28

Family

ID=59468153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710116076.4A Active CN106951918B (en) 2017-03-01 2017-03-01 Single-particle image clustering method for analysis of cryoelectron microscope

Country Status (1)

Country Link
CN (1) CN106951918B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898180B (en) * 2018-06-28 2020-09-01 中国人民解放军国防科技大学 Depth clustering method for single-particle cryoelectron microscope images
CN111461054B (en) * 2020-04-14 2021-04-27 上海月新生科信息科技有限公司 Method for full-process automatic analysis of single particle analysis data of cryoelectron microscope
CN112465067B (en) * 2020-12-15 2022-07-15 上海交通大学 Cryoelectron microscope single-particle image clustering implementation method based on image convolution self-encoder

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069797A (en) * 2015-08-13 2015-11-18 上海交通大学 Method for detecting resolution of three-dimensional density picture of cryo-electron microscopy based on mask
CN105488509A (en) * 2015-11-19 2016-04-13 Tcl集团股份有限公司 Image clustering method and system based on local chromatic features
WO2016142674A1 (en) * 2015-03-06 2016-09-15 Micromass Uk Limited Cell population analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8473279B2 (en) * 2008-05-30 2013-06-25 Eiman Al-Shammari Lemmatizing, stemming, and query expansion method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016142674A1 (en) * 2015-03-06 2016-09-15 Micromass Uk Limited Cell population analysis
CN105069797A (en) * 2015-08-13 2015-11-18 上海交通大学 Method for detecting resolution of three-dimensional density picture of cryo-electron microscopy based on mask
CN105488509A (en) * 2015-11-19 2016-04-13 Tcl集团股份有限公司 Image clustering method and system based on local chromatic features

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
一种分裂式的 k -means 聚类算法;楼佳;《杭州电子科技大学学报》;20090831;54-57页 *
基于三种近邻网络的聚类算法研究;马闯;《佳木斯大学学报》;20140930;779-782页 *
基于相似中心的 k-cmeans 文本聚类算法;许厚金;《计算机工程与设计》;20101231;1802-1805页 *
基于近邻图的 k-means 初始中心选择调优算法;胡湘萍;《计算机应用与软件》;20140430;178-181页 *

Also Published As

Publication number Publication date
CN106951918A (en) 2017-07-14

Similar Documents

Publication Publication Date Title
CN112669463B (en) Method for reconstructing curved surface of three-dimensional point cloud, computer device and computer-readable storage medium
Bai et al. Fast graph sampling set selection using gershgorin disc alignment
Kylberg et al. Segmentation of virus particle candidates in transmission electron microscopy images
CN110032761B (en) Classification method for single-particle imaging data of frozen electron microscope
CA3190344A1 (en) Methods for identifying cross-modal features from spatially resolved data sets
Dinh et al. Consistent feature selection for analytic deep neural networks
CN101061951A (en) Method and apparatus for classifying tissue using image data
CN106951918B (en) Single-particle image clustering method for analysis of cryoelectron microscope
Zeng et al. A study on multi-kernel intuitionistic fuzzy C-means clustering with multiple attributes
Porto et al. ML‐morph: A fast, accurate and general approach for automated detection and landmarking of biological structures in images
CN116012364B (en) SAR image change detection method and device
WO2020168648A1 (en) Image segmentation method and device, and computer-readable storage medium
CN113177592B (en) Image segmentation method and device, computer equipment and storage medium
CN103226595A (en) Clustering method for high dimensional data based on Bayes mixed common factor analyzer
Beagum et al. Nonparametric de‐noising filter optimization using structure‐based microscopic image classification
CN112634149A (en) Point cloud denoising method based on graph convolution network
AU2014328463A1 (en) Manifold diffusion of solutions for kinetic analysis of pharmacokinetic data
KR20180137386A (en) Community detection method and community detection framework apparatus
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN110415339B (en) Method and device for calculating matching relation between input three-dimensional shapes
Hao et al. VP-Detector: A 3D multi-scale dense convolutional neural network for macromolecule localization and classification in cryo-electron tomograms
CN113920320A (en) Radar image target detection system for typical active interference
Sparling et al. Arbitrary image reinflation: A deep learning technique for recovering 3D photoproduct distributions from a single 2D projection
JP2008152619A (en) Data processor and data processing program
CN108846407B (en) Magnetic resonance image classification method based on independent component high-order uncertain brain network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant