WO2022178949A1 - Semantic segmentation method and apparatus for electron microtomography data, device, and medium - Google Patents

Semantic segmentation method and apparatus for electron microtomography data, device, and medium Download PDF

Info

Publication number
WO2022178949A1
WO2022178949A1 PCT/CN2021/084568 CN2021084568W WO2022178949A1 WO 2022178949 A1 WO2022178949 A1 WO 2022178949A1 CN 2021084568 W CN2021084568 W CN 2021084568W WO 2022178949 A1 WO2022178949 A1 WO 2022178949A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
data
electron
protein
training sample
Prior art date
Application number
PCT/CN2021/084568
Other languages
French (fr)
Chinese (zh)
Inventor
孙奥兰
王健宗
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022178949A1 publication Critical patent/WO2022178949A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts

Definitions

  • the present application relates to the field of digital medical technology, and in particular, to a method, device, device and medium for semantic segmentation of electron microscopic tomography data.
  • Electron microtomography data is an important type of 3D (three-dimensional) data in the field of computational biology. Electron microtomography is applicable to a wide range of scales, including proteins at the molecular level, organelles at the subcellular level, and tissue structures at the cellular level. It can be used to obtain the three-dimensional spatial distribution of important molecular machines in the cellular environment and assembly, thereby providing important and beneficial information for a deep understanding of the interaction mechanism of these molecular machines. The semantic segmentation task of cell electron microtomography data is of great significance for studying the spatial distribution and 3D morphology of macromolecular structures in cells.
  • the inventor realizes that the related data sets of cell electron microtomography in the prior art contain a small amount of data and a single 3D data volume is large, resulting in less related research on the semantic segmentation task of cell electron microtomography data and difficulty in model training. , GPU (graphics processing unit) hardware conditions are difficult to support.
  • cell electron microtomography-related datasets contain a small amount of data and a single 3D data volume is relatively large, resulting in less research on semantic segmentation tasks of cell electron microtomography data, difficulty in model training, and GPU hardware conditions. Difficult to support technical issues.
  • the main purpose of this application is to provide a semantic segmentation method, device, equipment and medium for electron microtomography data, which aims to solve the problem that the related data sets of cell electron microtomography in the prior art contain a small amount of data and a single 3D data volume is relatively small. This leads to the technical problems of less research on the semantic segmentation task of cell electron microtomography data, difficult model training, and unsupported GPU hardware conditions.
  • the present application proposes a method for semantic segmentation of electron microscopic tomography data, the method comprising:
  • the cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
  • the present application also proposes a semantic segmentation device for electron microtomography data, the device comprising:
  • the data acquisition module is used to acquire a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
  • a protein semantic segmentation module is used to perform protein semantic segmentation on each of the plurality of electron microscopic tomography data to be analyzed by using a cellular protein semantic segmentation model, and obtain the plurality of to-be-analyzed electron microtomography data.
  • the set of protein semantic segmentation results corresponding to the electron microscopic tomography data of the Cellular protein semantic segmentation model;
  • a data splicing module configured to perform data splicing according to the protein semantic segmentation result set corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain the target protein semantics corresponding to the plurality of to-be-analyzed electron microscopic tomographic data Split result.
  • the present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the following method steps when executing the computer program:
  • the cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
  • the present application also proposes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following method steps are implemented:
  • the cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
  • the semantic segmentation method, device, equipment and medium of electron microtomography data of the present application are obtained by segmenting a plurality of electron microtomography data to be analyzed according to the same cell electron microtomography data to be segmented, using The cellular protein semantic segmentation model separately performs protein semantic segmentation on each of the electron micro tomographic data to be analyzed, and obtains the protein semantic segmentation results corresponding to the plurality of electron micro tomographic data to be analyzed.
  • Set perform data splicing according to the set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed, and obtain the target protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed.
  • the amount of data for protein semantic segmentation by the semantic segmentation model reduces the requirements for GPU hardware conditions for model training; through the alternate training of adversarial and semi-supervised learning based on the generative network and discriminant network, the alternate training of adversarial and semi-supervised learning is obtained.
  • the generative network is used as a semantic segmentation model of cellular proteins. Based on adversarial training, the performance of training with a small amount of data is effectively improved, and the generalization effect of the model is enhanced.
  • Semi-supervised learning training is used to enhance the performance of the model with unlabeled data.
  • FIG. 1 is a schematic flowchart of a method for semantic segmentation of electron microtomography data according to an embodiment of the present application
  • FIG. 2 is a schematic structural block diagram of a device for semantic segmentation of electron microtomography data according to an embodiment of the application;
  • FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • the present application proposes a semantic segmentation method of electron microscopic tomography data, the method is applied in the field of artificial intelligence technology, and the method can also be applied in the field of digital medical technology.
  • the method for semantic segmentation of electron microtomography data is segmented according to the same cell electron microtomography data to be segmented, and then the protein semantic segmentation is performed, which reduces the amount of data for performing protein semantic segmentation by using a model each time, and reduces the cost of protein semantic segmentation.
  • the model is obtained by alternating training of adversarial and semi-supervised learning based on the generative network and discriminant network. Semi-supervised learning training enhances model performance with unlabeled data.
  • an embodiment of the present application provides a method for semantic segmentation of electron microscopic tomography data, and the method includes:
  • S1 Acquire a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
  • S3 Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
  • a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented
  • the cell protein semantic segmentation model is used to separate the multiple electron microtomography tomography data to be analyzed respectively.
  • Perform protein semantic segmentation for each electron microtomography data to be analyzed in the data and obtain a set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed.
  • the semantic segmentation result set is used for data splicing, and the target protein semantic segmentation results corresponding to the multiple electron microtomography data to be analyzed are obtained, which reduces the amount of data for protein semantic segmentation using the cell protein semantic segmentation model each time, and reduces the training of the model.
  • the requirements of GPU hardware conditions through the alternating training of adversarial and semi-supervised learning based on the generation network and the discriminant network, the generation network obtained by the alternating training of the adversarial and semi-supervised learning is used as a cell protein semantic segmentation model, and the data is effectively improved based on the adversarial training.
  • the performance of the training with less amount of training enhances the generalization effect of the model, and training with semi-supervised learning enhances the performance of the model with unlabeled data.
  • a plurality of electron microtomography data to be analyzed obtained by dividing the same cell electron microtomography data to be segmented can be obtained, and multiple electron microtomography data to be analyzed can also be obtained from
  • the database obtains multiple electron microtomography data to be analyzed obtained by dividing the same cell electron microtomography data to be segmented, or it can be sent by a third-party application system according to the same cell electron microtomography data to be segmented.
  • a plurality of electron microtomography data to be analyzed are obtained by slicing the microtomography data.
  • the same piece of cell electron microtomography data to be segmented is the electron microtomography data extracted from the tissue structure of the cells.
  • the spatial distribution of 12 proteins was included in the cell electron microtomography data.
  • the method of sliding window is used to segment the same cell electron microtomography data to be segmented, and the data obtained from one segment is used as one electron microtomography data to be analyzed.
  • the cell electron microtomography data is 200*512*512
  • the sliding window method is used to divide the data into small volume data of 50*64*64, that is to say, the size of the electron microtomography data to be analyzed is 50*64 *64, there is no specific limitation in this example.
  • each of the plurality of electron microtomography data to be analyzed is input into the cellular protein semantic segmentation model to perform protein semantic segmentation, and the cellular protein semantic segmentation model is for each to be analyzed.
  • the electron micro tomography data outputs a protein semantic segmentation result, and all protein semantic segmentation results are used as a set of protein semantic segmentation results corresponding to the plurality of electron micro tomographic data to be analyzed.
  • the alternating training of confrontation and semi-supervised learning based on the generation network and the discriminant network that is, the generation network and the discriminant network are cycled through one confrontation training and one semi-supervised learning until the convergence conditions are met, so that the confrontation training is based on Effectively improve the performance of training with a small amount of data, enhance the generalization effect of the model, use semi-supervised learning training, and use unlabeled data to enhance the performance of the model.
  • the generating network may select a network that can perform semantic segmentation from the prior art.
  • the discriminant network can be selected from the prior art networks that can be used for adversarial training.
  • the protein semantic segmentation result is the protein classification result of each voxel point in the electron microtomography data to be analyzed.
  • the protein classification result is any one of 12 kinds of proteins, which is not specifically limited in this example.
  • the target protein semantic segmentation result is the protein semantic segmentation result of the cell electron microtomography data to be segmented.
  • the cellular protein semantic segmentation model is used to perform protein semantic segmentation on each of the plurality of electron microscopic tomographic data to be analyzed, respectively, to obtain the plurality of to-be-analyzed electron microscopic tomographic data.
  • the steps before the collection of the corresponding protein semantic segmentation results of the electron microtomography data include:
  • S021 Obtain a labeled training sample set and an unlabeled training sample set obtained by obtaining the same training sample of the cell electron microtomography data to be divided;
  • S022 obtain a marked training sample as a target marked training sample from the set of marked training samples, and obtain an unmarked training sample as a target unmarked training sample from the set of unmarked training samples;
  • S023 Perform adversarial training on the generation network and the discrimination network according to the target marked training samples, wherein the generation network adopts a segmentation network U-net++, and the discrimination network adopts a full convolution discriminator;
  • S024 Perform semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training;
  • S025 Repeat the process of obtaining a labeled training sample from the labeled training sample set as a target labeled training sample, and obtaining an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample step, until the alternating training of adversarial and semi-supervised learning reaches the convergence condition, and the generation network whose alternating training of confrontation and semi-supervised learning both reaches the convergence condition is determined as the cell protein semantic segmentation model.
  • This embodiment realizes the alternating training of confrontation and semi-supervised learning based on the generation network and the discriminant network.
  • the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model, which effectively improves the performance based on the confrontation training.
  • the performance of training with a small amount of data enhances the generalization effect of the model.
  • Semi-supervised learning training is used to enhance the performance of the model by using unlabeled data. Through the alternating training of confrontation and semi-supervised learning, the performance of each confrontation training is realized. The results are augmented once to further improve the accuracy of the model.
  • the labeled training sample set and the unlabeled training sample set obtained from the same training sample of the cell electron microtomography data to be segmented input by the user can be obtained, or the same copy of the cell electron micrograph to be segmented can be obtained from the database
  • the set of labeled training samples and the set of unlabeled training samples obtained from the training samples of microscopic tomography data can also be the set of labeled training samples obtained from the same training sample of cell electron microtomography data to be segmented sent by a third-party application system and a set of unlabeled training samples.
  • the number of labeled training samples in the labeled training sample set is greater than the number of unlabeled training samples in the unlabeled training sample set.
  • the labeled training samples include: electron microscopy tomography sample data and protein calibration data, where the protein calibration data is the protein classification result of each voxel point in the electron microscopy tomography sample data.
  • Each of the labeled training samples includes one electron microtomography sample data and one protein calibration data.
  • the unlabeled training samples include: electron microtomography sample data.
  • the method for adversarial training of the generation network and the discriminant network according to the electron microtomography sample data and protein calibration data of the target marked training sample can be selected from the prior art, thereby effectively increasing the amount of data Less training performance, enhanced model generalization.
  • the generation network adopts a segmentation network U-net++, and the segmentation network U-net++ sequentially includes: 4 convolution layers, 4 deconvolution layers, and 12 1*1 convolution kernels.
  • 4 convolutional layers are used for feature extraction
  • 4 deconvolutional layers are used for deconvolution reduction
  • 12 1*1 convolution kernels are used to obtain the classification probabilities of 12 categories (that is, the 12 proteins). classification probability).
  • the method for semi-supervised training of the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training can be selected from the prior art, so as to use semi-supervised learning and training, Leveraging unlabeled data enhances the performance of the model.
  • steps S022 to S025 are repeatedly executed until the alternate training of adversarial and semi-supervised learning reaches the convergence condition.
  • condition that the alternating training of adversarial and semi-supervised learning both reach the convergence condition includes: the loss value of adversarial training and the loss value of semi-supervised training both reach the first convergence condition, or, the number of training times of alternating training of adversarial and semi-supervised learning The second convergence condition is reached.
  • the loss value of the adversarial training and the loss value of the semi-supervised training both reach the first convergence condition, which refers to the loss value corresponding to the generating network in the adversarial training and the loss corresponding to the discriminating network in the adversarial training.
  • value and the loss value of semi-supervised training all reach the first convergence condition.
  • the first convergence condition means that the size of the loss value calculated twice adjacent to the same network satisfies the Lipschitz condition (the Lipschitz continuity condition).
  • the number of training times of the alternating training of adversarial and semi-supervised learning reaches the second convergence condition, which refers to the number of times that the generative network and the discriminative network are used for the alternating training of adversarial and semi-supervised learning, that is, the alternating training of adversarial and semi-supervised learning.
  • the number of training sessions for alternating training with adversarial and semi-supervised learning is increased by 1.
  • the above-mentioned steps of obtaining the labeled training sample set and the unlabeled training sample set obtained from the same training sample of the cell electron microtomography data to be segmented include:
  • S0212 Use a sliding window method to segment the cell electron microtomography data training sample to obtain a plurality of electron microtomography sample data;
  • S0213 Use a preset ratio to divide the plurality of electron microscopy tomography sample data to obtain a to-be-labeled training sample set and an unlabeled training sample set, wherein the electron microscopy tomography sample data in the to-be-labeled training sample set The quantity is greater than the quantity of the electron microscopy tomography sample data in the unlabeled training sample set;
  • S0214 Perform protein semantic segmentation and calibration on each of the electron microscopy tomography sample data in the to-be-labeled training sample set, respectively, to obtain the labeled training sample set.
  • This embodiment realizes that a labeled training sample set and an unlabeled training sample set are determined from one of the cell electron microscopy tomography data training samples, so that model training can be performed even with a small amount of data.
  • the training sample of cell electron micro tomography data is the electron micro tomography data extracted from the tissue structure of cells.
  • the spatial distribution of 12 proteins was included in the training sample of TEM data.
  • a sliding window method is used to segment a training sample of the cell electron micro tomography data, and the data obtained from one segment is used as an electron micro tomography sample data.
  • a preset ratio to divide the plurality of electron micro tomography sample data, that is, use a part of the electron micro tomography sample data in the plurality of electron micro tomography sample data as the training sample to be marked Set, and use another part of the electron microtomography sample data in the plurality of electron microtomography sample data as an unlabeled training sample set. It can be understood that the same electron microtomography sample data can only be divided into one set (that is, one set of the labeled training sample set and the unlabeled training sample set).
  • the preset ratio is set to 85:15, that is, 85% of the electron microtomography sample data in the multiple electron microtomography sample data is used as the set of training samples to be marked, and the multiple electron microtomography tomography sample data The remaining 15% of the electron microtomography sample data in the sample data is used as a set of unlabeled training samples.
  • the above-mentioned step of adversarial training of the generation network and the discriminant network according to the target marked training samples includes:
  • S0231 Input the electron microtomography sample data of the target labeled training sample into the generation network to perform protein semantic segmentation, and obtain a first training result;
  • S0232 Input the protein calibration data of the target marked training sample and the first training result into the discrimination network for discrimination, and obtain a first confidence result;
  • S0233 Use the protein calibration data of the target labeled training sample, the first training result, and the first confidence result to perform adversarial training on the generation network and the discriminant network.
  • This embodiment effectively improves the performance of training with a small amount of data based on adversarial training, and enhances the effect of model generalization.
  • the first training result is the protein classification result of each voxel point in the electron microtomography sample data of the target labeled training sample.
  • each confidence result in the first confidence result is a confidence level for each protein classification result in the first training result.
  • the above-mentioned step of adversarial training of the generation network and the discriminant network using the protein calibration data of the target labeled training samples, the first training result and the first confidence result include:
  • S02331 Input the protein calibration data of the target labeled training sample and the first training result into a first loss function for calculation, obtain a first loss value of the generation network, and update according to the first loss value the parameters of the generating network;
  • S02332 Input the first confidence result into a second loss function for calculation, obtain a second loss value of the discriminant network, and update the parameters of the discriminant network according to the second loss value;
  • Xn is the electron microtomography sample data of the target labeled training sample
  • h is the width of the size of the electron microtomography sample data of the target labeled training sample
  • w is the target height of the dimension of the electron microtomography sample data of the labeled training sample
  • c is the channel number of the dimension of the electron microtomography sample data of the target labeled training sample
  • S(X n ) (h,w , c) is the first training result
  • log() is a logarithmic function
  • Y n (h,w,c) is the protein calibration data of the target labeled training sample
  • C is the number of cell protein species
  • D(S(X n )) (h,w) is the first confidence result.
  • This example uses semi-supervised learning training to enhance the performance of the model with unlabeled data.
  • the method for updating the parameters of the generation network according to the first loss value can be selected from the prior art, and details are not described here.
  • the method for updating the parameters of the discriminating network according to the second loss value can be selected from the prior art, and details are not described here.
  • h ⁇ w ⁇ c is the size of the electron microtomography sample data of the target marked training sample, that is, the length h, the width w, and the number of channels c of the electron microtomography sample data.
  • the above-mentioned step of performing semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training includes:
  • S0241 Input the electron microtomography sample data of the target unlabeled training sample into the generation network to perform protein semantic segmentation to obtain a second training result;
  • S0242 Input the second training result into the discrimination network for discrimination, and obtain a second confidence result
  • S0244 Perform semi-supervised training on the generation network by using the reliable results and the second training results corresponding to the target unlabeled training samples.
  • This embodiment thus enhances the performance of the model with unlabeled data using semi-supervised learning training.
  • the electron microtomography sample data of the target unlabeled training sample is input into the generation network to perform protein semantic segmentation, and the result of the protein semantic segmentation is used as the second training result. That is, the second training result is the protein classification result of each voxel point in the electron microtomography sample data of the target unlabeled training sample.
  • each confidence result in the second confidence result is a confidence level for each protein classification result in the second training result.
  • the above-mentioned step of semi-supervised training of the generating network by using the reliable result corresponding to the target unlabeled training sample and the second training result includes:
  • X n is the SEM sample data of the target unlabeled training sample
  • h ⁇ w ⁇ c is the size of the SEM sample data of the target unlabeled training sample
  • S(X n ) (h,w,c) is the second training result
  • D(S(X n )) (h,w) is the reliable result corresponding to the target unlabeled training sample
  • log() is the logarithm function
  • T semi is the threshold that controls the sensitivity of the self-learning process
  • I() is the indicator function
  • I() is constant.
  • This embodiment thus enhances the performance of the model with unlabeled data using semi-supervised learning training.
  • the method for updating the parameters of the generating network according to the third loss value can be selected from the prior art, and details are not described here.
  • the threshold for controlling the sensitivity of the self-learning process can be obtained from a database, a third-party application system, or written into a program file implementing the present application.
  • the present application also proposes a semantic segmentation device for electron microtomography data, the device includes:
  • the data acquisition module 100 is used for acquiring a plurality of electron microscopy tomography data to be analyzed obtained by dividing according to the same cell electron microscopy tomography data to be segmented;
  • the protein semantic segmentation module 200 is configured to perform protein semantic segmentation on each of the plurality of electron microscopic tomographic data to be analyzed in the plurality of electron microscopic tomographic data to be analyzed by using a cellular protein semantic segmentation model, and obtain the The set of protein semantic segmentation results corresponding to the analyzed electron microscopic tomography data, wherein the alternate training of confrontation and semi-supervised learning is performed based on the generation network and the discriminant network, and the generation network obtained by the alternate training of confrontation and semi-supervised learning is used as the described cellular protein semantic segmentation model;
  • the data splicing module 300 is configured to perform data splicing according to the protein semantic segmentation result set corresponding to the plurality of electron microscopic tomographic data to be analyzed, so as to obtain the target protein corresponding to the plurality of to-be-analyzed electron microscopic tomographic data Semantic segmentation results.
  • the present application also proposes a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when the processor executes the computer program.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any one of the methods described above.
  • a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented
  • the cell protein semantic segmentation model is used to separate the multiple electron microtomography tomography data to be analyzed respectively.
  • Perform protein semantic segmentation for each electron microtomography data to be analyzed in the data and obtain a set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed.
  • the semantic segmentation result set is used for data splicing, and the target protein semantic segmentation results corresponding to the multiple electron microtomography data to be analyzed are obtained, which reduces the amount of data for protein semantic segmentation using the cell protein semantic segmentation model each time, and reduces the training of the model.
  • the requirements of GPU hardware conditions through the alternating training of adversarial and semi-supervised learning based on the generation network and the discriminant network, the generation network obtained by the alternating training of the adversarial and semi-supervised learning is used as a cell protein semantic segmentation model, and the data is effectively improved based on the adversarial training.
  • the performance of the training with less amount of training enhances the generalization effect of the model, and training with semi-supervised learning enhances the performance of the model with unlabeled data.
  • an embodiment of the present application further provides a computer device
  • the computer device may be a server, and its internal structure may be as shown in Fig. 3 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus.
  • the processor of the computer design is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used for storing data such as semantic segmentation methods of electron microscopic tomography data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program when executed by the processor, implements a method for semantic segmentation of electron microtomography data.
  • the method for semantic segmentation of electron microscopic tomography data includes: acquiring a plurality of electron microscopic tomographic data to be analyzed obtained by segmenting the same piece of cell electron microscopic tomographic data to be segmented; adopting a cell protein semantic segmentation model Perform protein semantic segmentation on each of the plurality of electron microscopic tomographic data to be analyzed, respectively, to obtain a set of protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data , wherein the alternate training of confrontation and semi-supervised learning is performed based on the generation network and the discriminant network, and the generation network obtained by the alternate training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model; Perform data splicing on the set of protein semantic segmentation results corresponding to the electron mic
  • a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented
  • the cell protein semantic segmentation model is used to separate the multiple electron microtomography tomography data to be analyzed respectively.
  • Perform protein semantic segmentation for each electron microtomography data to be analyzed in the data and obtain a set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed.
  • the semantic segmentation result set is used for data splicing, and the target protein semantic segmentation results corresponding to the multiple electron microtomography data to be analyzed are obtained, which reduces the amount of data for protein semantic segmentation using the cell protein semantic segmentation model each time, and reduces the training of the model.
  • the requirements of GPU hardware conditions through the alternating training of adversarial and semi-supervised learning based on the generation network and the discriminant network, the generation network obtained by the alternating training of the adversarial and semi-supervised learning is used as a cell protein semantic segmentation model, and the data is effectively improved based on the adversarial training.
  • the performance of the training with less amount of training enhances the generalization effect of the model, and training with semi-supervised learning enhances the performance of the model with unlabeled data.
  • An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements a method for semantic segmentation of electron microscopic tomography data, including the steps of: obtaining data according to the same A plurality of electron microscopy tomography data to be analyzed obtained by segmenting the segmented cell electron microscopy tomography data; using a cell protein semantic segmentation model to separately analyze each of the plurality of electron microscopy tomography data to be analyzed Perform protein semantic segmentation on the electron micro tomography data obtained by obtaining a set of protein semantic segmentation results corresponding to the plurality of electron micro tomography data to be analyzed, wherein the alternating training of confrontation and semi-supervised learning is performed based on the generation network and the discriminant network, The generation network obtained by the alternate training of confrontation and semi-supervised learning is used as the cellular protein semantic segmentation model; data splicing is performed according to the protein semantic segmentation result set corresponding to the plurality of electron mic
  • the semantic segmentation method for electron microtomography data performed above is obtained by segmenting a plurality of electron microtomography data to be analyzed according to the same piece of cell electron microtomography data to be segmented, using a cell protein semantic segmentation model to separate the data. Perform protein semantic segmentation on each electron micro tomography data to be analyzed in the plurality of electron micro tomography data to be analyzed, and obtain a set of protein semantic segmentation results corresponding to the plurality of electron micro tomography data to be analyzed.
  • Data splicing is performed on the set of protein semantic segmentation results corresponding to the analyzed electron micro tomography data, and the target protein semantic segmentation results corresponding to multiple electron micro tomography data to be analyzed are obtained, which reduces the use of the cell protein semantic segmentation model each time for protein semantic segmentation.
  • the amount of divided data reduces the requirements for GPU hardware conditions for model training; through alternate training of adversarial and semi-supervised learning based on the generative network and discriminant network, the generative network obtained by alternate training of adversarial and semi-supervised learning is used as a cell protein Semantic segmentation model, based on adversarial training, effectively improves the performance of training with less data, enhances the generalization effect of the model, uses semi-supervised learning training, and uses unlabeled data to enhance the performance of the model.
  • the computer storage medium can be non-volatile or volatile.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)

Abstract

A semantic segmentation method and apparatus for electron microtomography data, a device, and a medium, relating to the technical field of digital medical treatment. The method comprises: using a cell protein semantic segmentation model to separately perform protein semantic segmentation on each piece of electron microtomography data to be analyzed in a plurality of pieces of electron microtomography data to be analyzed obtained by segmenting the same piece of cell electron microtomography data to be segmented, to obtain a protein semantic segmentation result set, alternative training of adversarial and semi-supervised learning being performed on the basis of a generative network and a discriminant network, and the generative network obtained by means of the alternative training of adversarial and semi-supervised learning being taken as the cell protein semantic segmentation model; and performing data splicing according to the protein semantic segmentation result set to obtain a target protein semantic segmentation result. The data volume for performing protein semantic segmentation using the model each time is decreased, requirements for GPU hardware conditions of model training are reduced, and performance of training having less data volume is effectively improved.

Description

电子显微断层数据的语义分割方法、装置、设备及介质Semantic segmentation method, device, equipment and medium for electron microtomography data
本申请要求于2021年02月26日提交中国专利局、申请号为2021102196153,发明名称为“电子显微断层数据的语义分割方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on February 26, 2021 with the application number 2021102196153 and the invention titled "Method, Apparatus, Equipment and Medium for Semantic Segmentation of Electron Microtomography Data", all of which The contents are incorporated herein by reference.
技术领域technical field
本申请涉及到数字医疗技术领域,特别是涉及到一种电子显微断层数据的语义分割方法、装置、设备及介质。The present application relates to the field of digital medical technology, and in particular, to a method, device, device and medium for semantic segmentation of electron microscopic tomography data.
背景技术Background technique
电子显微断层数据是计算生物学领域重要的一类3D(三维)数据。电子显微断层技术适用尺度非常广泛,包括从分子水平的蛋白质、到亚细胞水平的细胞器、以至细胞水平的组织结构,可以利用其得到生理环境下重要分子机器在细胞环境中的三维空间分布以及组装,从而为深入理解这些分子机器的相互作用机理提供重要而有益的信息。而细胞电子显微断层数据的语义分割任务对于研究细胞内大分子结构的空间分布及3D形态具有重要意义。Electron microtomography data is an important type of 3D (three-dimensional) data in the field of computational biology. Electron microtomography is applicable to a wide range of scales, including proteins at the molecular level, organelles at the subcellular level, and tissue structures at the cellular level. It can be used to obtain the three-dimensional spatial distribution of important molecular machines in the cellular environment and assembly, thereby providing important and beneficial information for a deep understanding of the interaction mechanism of these molecular machines. The semantic segmentation task of cell electron microtomography data is of great significance for studying the spatial distribution and 3D morphology of macromolecular structures in cells.
发明人意识到现有技术中细胞电子显微断层相关数据集包含数据量少、单个3D数据体积较大,导致针对细胞电子显微断层数据的语义分割任务开展的相关研究较少、模型训练困难、GPU(图形处理器)硬件条件难以支持。The inventor realizes that the related data sets of cell electron microtomography in the prior art contain a small amount of data and a single 3D data volume is large, resulting in less related research on the semantic segmentation task of cell electron microtomography data and difficulty in model training. , GPU (graphics processing unit) hardware conditions are difficult to support.
技术问题technical problem
现有技术中细胞电子显微断层相关数据集包含数据量少、单个3D数据体积较大,导致针对细胞电子显微断层数据的语义分割任务开展的相关研究较少、模型训练困难、GPU硬件条件难以支持的技术问题。In the prior art, cell electron microtomography-related datasets contain a small amount of data and a single 3D data volume is relatively large, resulting in less research on semantic segmentation tasks of cell electron microtomography data, difficulty in model training, and GPU hardware conditions. Difficult to support technical issues.
技术解决方案technical solutions
本申请的主要目的为提供一种电子显微断层数据的语义分割方法、装置、设备及介质,旨在解决现有技术中细胞电子显微断层相关数据集包含数据量少、单个3D数据体积较大,导致针对细胞电子显微断层数据的语义分割任务开展的相关研究较少、模型训练困难、GPU硬件条件难以支持的技术问题。The main purpose of this application is to provide a semantic segmentation method, device, equipment and medium for electron microtomography data, which aims to solve the problem that the related data sets of cell electron microtomography in the prior art contain a small amount of data and a single 3D data volume is relatively small. This leads to the technical problems of less research on the semantic segmentation task of cell electron microtomography data, difficult model training, and unsupported GPU hardware conditions.
为了实现上述发明目的,本申请提出一种电子显微断层数据的语义分割方法,所述方法包括:In order to achieve the above purpose of the invention, the present application proposes a method for semantic segmentation of electron microscopic tomography data, the method comprising:
获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;Acquiring a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;The cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
本申请还提出了一种电子显微断层数据的语义分割装置,所述装置包括:The present application also proposes a semantic segmentation device for electron microtomography data, the device comprising:
数据获取模块,用于获取根据同一份待切分的细胞电子显微断层数据进行切 分得到的多个待分析的电子显微断层数据;The data acquisition module is used to acquire a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
蛋白质语义分割模块,用于采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;A protein semantic segmentation module is used to perform protein semantic segmentation on each of the plurality of electron microscopic tomography data to be analyzed by using a cellular protein semantic segmentation model, and obtain the plurality of to-be-analyzed electron microtomography data. The set of protein semantic segmentation results corresponding to the electron microscopic tomography data of the Cellular protein semantic segmentation model;
数据拼接模块,用于根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。A data splicing module, configured to perform data splicing according to the protein semantic segmentation result set corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain the target protein semantics corresponding to the plurality of to-be-analyzed electron microscopic tomographic data Split result.
本申请还提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如下方法步骤:The present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the following method steps when executing the computer program:
获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;Acquiring a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;The cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下方法步骤:The present application also proposes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following method steps are implemented:
获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;Acquiring a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;The cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
有益效果beneficial effect
本申请的电子显微断层数据的语义分割方法、装置、设备及介质,通过根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据,采用细胞蛋白质语义分割模型分别对多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,根据多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合进行数据拼接,得到多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果,减少了每次采用细胞蛋白质语义分割模型进行蛋白质语义分割的数据量,降低了对模型训练的GPU硬件条件的要求;通过基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监 督学习的交替训练得到的生成网络作为细胞蛋白质语义分割模型,基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能。The semantic segmentation method, device, equipment and medium of electron microtomography data of the present application are obtained by segmenting a plurality of electron microtomography data to be analyzed according to the same cell electron microtomography data to be segmented, using The cellular protein semantic segmentation model separately performs protein semantic segmentation on each of the electron micro tomographic data to be analyzed, and obtains the protein semantic segmentation results corresponding to the plurality of electron micro tomographic data to be analyzed. Set, perform data splicing according to the set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed, and obtain the target protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed. The amount of data for protein semantic segmentation by the semantic segmentation model reduces the requirements for GPU hardware conditions for model training; through the alternate training of adversarial and semi-supervised learning based on the generative network and discriminant network, the alternate training of adversarial and semi-supervised learning is obtained. The generative network is used as a semantic segmentation model of cellular proteins. Based on adversarial training, the performance of training with a small amount of data is effectively improved, and the generalization effect of the model is enhanced. Semi-supervised learning training is used to enhance the performance of the model with unlabeled data.
附图说明Description of drawings
图1为本申请一实施例的电子显微断层数据的语义分割方法的流程示意图;1 is a schematic flowchart of a method for semantic segmentation of electron microtomography data according to an embodiment of the present application;
图2为本申请一实施例的电子显微断层数据的语义分割装置的结构示意框图;FIG. 2 is a schematic structural block diagram of a device for semantic segmentation of electron microtomography data according to an embodiment of the application;
图3为本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.
本发明的实施方式Embodiments of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
为了解决现有技术中细胞电子显微断层相关数据集包含数据量少、单个3D数据体积较大,导致针对细胞电子显微断层数据的语义分割任务开展的相关研究较少、模型训练困难、GPU硬件条件难以支持的技术问题,本申请提出了一种电子显微断层数据的语义分割方法,所述方法应用于人工智能技术领域,所述方法还可以应用于数字医疗技术领域。所述电子显微断层数据的语义分割方法根据同一份待切分的细胞电子显微断层数据进行切分后再进行蛋白质语义分割,减少了每次采用模型进行蛋白质语义分割的数据量,降低了对模型训练的GPU硬件条件的要求,基于生成网络和判别网络进行对抗与半监督学习的交替训练得到模型,基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能。In order to solve the problem that the related data sets of cell electron microtomography in the prior art contain a small amount of data and a single 3D data volume is large, resulting in less related research on the semantic segmentation task of cell electron microtomography data, difficulty in model training, GPU The technical problem that hardware conditions are difficult to support, the present application proposes a semantic segmentation method of electron microscopic tomography data, the method is applied in the field of artificial intelligence technology, and the method can also be applied in the field of digital medical technology. The method for semantic segmentation of electron microtomography data is segmented according to the same cell electron microtomography data to be segmented, and then the protein semantic segmentation is performed, which reduces the amount of data for performing protein semantic segmentation by using a model each time, and reduces the cost of protein semantic segmentation. For the requirements of GPU hardware conditions for model training, the model is obtained by alternating training of adversarial and semi-supervised learning based on the generative network and discriminant network. Semi-supervised learning training enhances model performance with unlabeled data.
参照图1,本申请实施例中提供一种电子显微断层数据的语义分割方法,所述方法包括:Referring to FIG. 1 , an embodiment of the present application provides a method for semantic segmentation of electron microscopic tomography data, and the method includes:
S1:获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;S1: Acquire a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
S2:采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;S2: Using a cellular protein semantic segmentation model to perform protein semantic segmentation on each of the plurality of electron microtomography data to be analyzed, respectively, to obtain the plurality of electron microtomography tomography data to be analyzed The set of protein semantic segmentation results corresponding to the data, wherein the alternate training of confrontation and semi-supervised learning is performed based on the generative network and the discriminant network, and the generative network obtained by the alternate training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model ;
S3:根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。S3: Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
本实施例通过根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据,采用细胞蛋白质语义分割模型分别对多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,根据多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合进行数据拼接,得到多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果,减少了每次采用细胞蛋白质语义分割模型进行蛋白质语义分割的数据量,降低了对模型训练的 GPU硬件条件的要求;通过基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的生成网络作为细胞蛋白质语义分割模型,基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能。In this embodiment, a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented, the cell protein semantic segmentation model is used to separate the multiple electron microtomography tomography data to be analyzed respectively. Perform protein semantic segmentation for each electron microtomography data to be analyzed in the data, and obtain a set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed. The semantic segmentation result set is used for data splicing, and the target protein semantic segmentation results corresponding to the multiple electron microtomography data to be analyzed are obtained, which reduces the amount of data for protein semantic segmentation using the cell protein semantic segmentation model each time, and reduces the training of the model. The requirements of GPU hardware conditions; through the alternating training of adversarial and semi-supervised learning based on the generation network and the discriminant network, the generation network obtained by the alternating training of the adversarial and semi-supervised learning is used as a cell protein semantic segmentation model, and the data is effectively improved based on the adversarial training. The performance of the training with less amount of training enhances the generalization effect of the model, and training with semi-supervised learning enhances the performance of the model with unlabeled data.
对于S1,可以获取用户输入的根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据多个待分析的电子显微断层数据,也可以从数据库中获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据,还可以是第三方应用系统发送的根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据。For S1, a plurality of electron microtomography data to be analyzed obtained by dividing the same cell electron microtomography data to be segmented can be obtained, and multiple electron microtomography data to be analyzed can also be obtained from The database obtains multiple electron microtomography data to be analyzed obtained by dividing the same cell electron microtomography data to be segmented, or it can be sent by a third-party application system according to the same cell electron microtomography data to be segmented. A plurality of electron microtomography data to be analyzed are obtained by slicing the microtomography data.
同一份待切分的细胞电子显微断层数据,是对细胞的组织结构提取的电子显微断层数据。细胞电子显微断层数据中包括12种蛋白质的空间分布。The same piece of cell electron microtomography data to be segmented is the electron microtomography data extracted from the tissue structure of the cells. The spatial distribution of 12 proteins was included in the cell electron microtomography data.
其中,采用滑动窗口的方法对同一份待切分的细胞电子显微断层数据进行切分,将一次切分得到数据作为一个待分析的电子显微断层数据。比如,细胞电子显微断层数据为200*512*512,采用滑动窗口的方法切分成50*64*64的小体积数据,也就是说,待分析的电子显微断层数据的尺寸是50*64*64,在此举例不做具体限定。Wherein, the method of sliding window is used to segment the same cell electron microtomography data to be segmented, and the data obtained from one segment is used as one electron microtomography data to be analyzed. For example, the cell electron microtomography data is 200*512*512, and the sliding window method is used to divide the data into small volume data of 50*64*64, that is to say, the size of the electron microtomography data to be analyzed is 50*64 *64, there is no specific limitation in this example.
对于S2,分别将所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据输入细胞蛋白质语义分割模型进行蛋白质语义分割,细胞蛋白质语义分割模型针对每个待分析的电子显微断层数据输出一个蛋白质语义分割结果,将所有蛋白质语义分割结果作为所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合。For S2, each of the plurality of electron microtomography data to be analyzed is input into the cellular protein semantic segmentation model to perform protein semantic segmentation, and the cellular protein semantic segmentation model is for each to be analyzed. The electron micro tomography data outputs a protein semantic segmentation result, and all protein semantic segmentation results are used as a set of protein semantic segmentation results corresponding to the plurality of electron micro tomographic data to be analyzed.
其中,所述基于生成网络和判别网络进行对抗与半监督学习的交替训练,也就是,对生成网络和判别网络循环依次进行一次对抗训练和进行一次半监督学习直至满足收敛条件,从而基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能。Wherein, the alternating training of confrontation and semi-supervised learning based on the generation network and the discriminant network, that is, the generation network and the discriminant network are cycled through one confrontation training and one semi-supervised learning until the convergence conditions are met, so that the confrontation training is based on Effectively improve the performance of training with a small amount of data, enhance the generalization effect of the model, use semi-supervised learning training, and use unlabeled data to enhance the performance of the model.
所述生成网络可以从现有技术中选择可以进行语义分割的网络。The generating network may select a network that can perform semantic segmentation from the prior art.
所述判别网络可以从现有技术中选择可以用于对抗训练的网络。The discriminant network can be selected from the prior art networks that can be used for adversarial training.
可以理解的是,蛋白质语义分割结果是待分析的电子显微断层数据中每个体素点的蛋白质分类结果。比如,蛋白质分类结果是12种蛋白质中的任一种,在此举例不做具体限定。It can be understood that the protein semantic segmentation result is the protein classification result of each voxel point in the electron microtomography data to be analyzed. For example, the protein classification result is any one of 12 kinds of proteins, which is not specifically limited in this example.
对于S3,采用所述多个待分析的电子显微断层数据携带的位置数据将所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合中所有蛋白质语义分割结果进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。也就是说,目标蛋白质语义分割结果是待切分的细胞电子显微断层数据的蛋白质语义分割结果。For S3, use the position data carried by the plurality of electron microscopic tomographic data to be analyzed to perform data analysis on all the protein semantic segmentation results in the protein semantic segmentation result set corresponding to the plurality of to-be-analyzed electron microscopic tomographic data splicing to obtain the target protein semantic segmentation result corresponding to the plurality of electron microscopic tomographic data to be analyzed. That is to say, the target protein semantic segmentation result is the protein semantic segmentation result of the cell electron microtomography data to be segmented.
在一个实施例中,上述采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合的步骤之前,包括:In one embodiment, the cellular protein semantic segmentation model is used to perform protein semantic segmentation on each of the plurality of electron microscopic tomographic data to be analyzed, respectively, to obtain the plurality of to-be-analyzed electron microscopic tomographic data. The steps before the collection of the corresponding protein semantic segmentation results of the electron microtomography data include:
S021:获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合;S021: Obtain a labeled training sample set and an unlabeled training sample set obtained by obtaining the same training sample of the cell electron microtomography data to be divided;
S022:从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标 记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本;S022: obtain a marked training sample as a target marked training sample from the set of marked training samples, and obtain an unmarked training sample as a target unmarked training sample from the set of unmarked training samples;
S023:根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练,其中,所述生成网络采用分割网络U-net++,所述判别网络采用全卷积鉴别器;S023: Perform adversarial training on the generation network and the discrimination network according to the target marked training samples, wherein the generation network adopts a segmentation network U-net++, and the discrimination network adopts a full convolution discriminator;
S024:根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练;S024: Perform semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training;
S025:重复执行所述从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本的步骤,直至对抗与半监督学习的交替训练均达到收敛条件,将对抗与半监督学习的交替训练均达到收敛条件的所述生成网络确定为所述细胞蛋白质语义分割模型。S025: Repeat the process of obtaining a labeled training sample from the labeled training sample set as a target labeled training sample, and obtaining an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample step, until the alternating training of adversarial and semi-supervised learning reaches the convergence condition, and the generation network whose alternating training of confrontation and semi-supervised learning both reaches the convergence condition is determined as the cell protein semantic segmentation model.
本实施例实现了基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型,基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能;通过对抗与半监督学习的交替训练,实现了对每次对抗训练的结果进行一次增强,进一步提高了模型的准确性。This embodiment realizes the alternating training of confrontation and semi-supervised learning based on the generation network and the discriminant network. The generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model, which effectively improves the performance based on the confrontation training. The performance of training with a small amount of data enhances the generalization effect of the model. Semi-supervised learning training is used to enhance the performance of the model by using unlabeled data. Through the alternating training of confrontation and semi-supervised learning, the performance of each confrontation training is realized. The results are augmented once to further improve the accuracy of the model.
对于S021,可以获取用户输入的同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合,也可以从数据库中获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合,还可以是第三方应用系统发送的同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合。For S021, the labeled training sample set and the unlabeled training sample set obtained from the same training sample of the cell electron microtomography data to be segmented input by the user can be obtained, or the same copy of the cell electron micrograph to be segmented can be obtained from the database The set of labeled training samples and the set of unlabeled training samples obtained from the training samples of microscopic tomography data can also be the set of labeled training samples obtained from the same training sample of cell electron microtomography data to be segmented sent by a third-party application system and a set of unlabeled training samples.
可选的,所述已标记训练样本集合中的已标记训练样本的数量大于所述未标记训练样本集合中的未标记训练样本的数量。Optionally, the number of labeled training samples in the labeled training sample set is greater than the number of unlabeled training samples in the unlabeled training sample set.
所述已标记训练样本包括:电子显微断层样本数据、蛋白质标定数据,蛋白质标定数据是对电子显微断层样本数据中每个体素点的蛋白质分类结果。每个所述已标记训练样本包括一个电子显微断层样本数据和一个蛋白质标定数据。The labeled training samples include: electron microscopy tomography sample data and protein calibration data, where the protein calibration data is the protein classification result of each voxel point in the electron microscopy tomography sample data. Each of the labeled training samples includes one electron microtomography sample data and one protein calibration data.
所述未标记训练样本包括:电子显微断层样本数据。The unlabeled training samples include: electron microtomography sample data.
对于S022,从所述已标记训练样本集合中依次获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中依次获取一个未标记训练样本作为目标未标记训练样本。For S022, sequentially obtain a labeled training sample from the labeled training sample set as a target labeled training sample, and sequentially obtain an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample.
对于S023,根据所述目标已标记训练样本的电子显微断层样本数据和蛋白质标定数据对所述生成网络和所述判别网络进行对抗训练的方法可以从现有技术中选择,从而有效提升数据量较少的训练的性能,增强了模型泛化效果。For S023, the method for adversarial training of the generation network and the discriminant network according to the electron microtomography sample data and protein calibration data of the target marked training sample can be selected from the prior art, thereby effectively increasing the amount of data Less training performance, enhanced model generalization.
可选的,所述生成网络采用分割网络U-net++,所述分割网络U-net++依次包括:4个卷积层、4个反卷积层、12个1*1的卷积核。4个卷积层用于进行特征抽取,4个反卷积层用于进行反卷积还原,12个1*1的卷积核用于得到12个类别的分类概率(也就是12中蛋白质的分类概率)。Optionally, the generation network adopts a segmentation network U-net++, and the segmentation network U-net++ sequentially includes: 4 convolution layers, 4 deconvolution layers, and 12 1*1 convolution kernels. 4 convolutional layers are used for feature extraction, 4 deconvolutional layers are used for deconvolution reduction, and 12 1*1 convolution kernels are used to obtain the classification probabilities of 12 categories (that is, the 12 proteins). classification probability).
对于S024,根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练的方法可以从现有技术中选择,从而使用半监督学习训练,利用未标记数据增强了模型的性能。For S024, the method for semi-supervised training of the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training can be selected from the prior art, so as to use semi-supervised learning and training, Leveraging unlabeled data enhances the performance of the model.
对于S025,重复执行步骤S022至步骤S025,直至对抗与半监督学习的交替训练均达到收敛条件。For S025, steps S022 to S025 are repeatedly executed until the alternate training of adversarial and semi-supervised learning reaches the convergence condition.
其中,所述对抗与半监督学习的交替训练均达到收敛条件包括:对抗训练的损失值、半监督训练的损失值均达到第一收敛条件,或者,对抗与半监督学习的交替训练的训练次数达到第二收敛条件。Wherein, the condition that the alternating training of adversarial and semi-supervised learning both reach the convergence condition includes: the loss value of adversarial training and the loss value of semi-supervised training both reach the first convergence condition, or, the number of training times of alternating training of adversarial and semi-supervised learning The second convergence condition is reached.
其中,所述对抗训练的损失值、半监督训练的损失值均达到第一收敛条件,是指,对抗训练的中所述生成网络对应的损失值、对抗训练的中所述判别网络对应的损失值、半监督训练的损失值全部达到第一收敛条件。Wherein, the loss value of the adversarial training and the loss value of the semi-supervised training both reach the first convergence condition, which refers to the loss value corresponding to the generating network in the adversarial training and the loss corresponding to the discriminating network in the adversarial training. value and the loss value of semi-supervised training all reach the first convergence condition.
第一收敛条件,是指同一个网络相邻两次计算的损失值的大小满足lipschitz条件(利普希茨连续条件)。The first convergence condition means that the size of the loss value calculated twice adjacent to the same network satisfies the Lipschitz condition (the Lipschitz continuity condition).
对抗与半监督学习的交替训练的训练次数达到第二收敛条件,是指生成网络和判别网络被用于对抗与半监督学习的交替训练的次数,也就是说,对抗与半监督学习的交替训练一次,对抗与半监督学习的交替训练的训练次数增加1。The number of training times of the alternating training of adversarial and semi-supervised learning reaches the second convergence condition, which refers to the number of times that the generative network and the discriminative network are used for the alternating training of adversarial and semi-supervised learning, that is, the alternating training of adversarial and semi-supervised learning. Once, the number of training sessions for alternating training with adversarial and semi-supervised learning is increased by 1.
在一个实施例中,上述获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合的步骤,包括:In one embodiment, the above-mentioned steps of obtaining the labeled training sample set and the unlabeled training sample set obtained from the same training sample of the cell electron microtomography data to be segmented include:
S0211:获取所述细胞电子显微断层数据训练样本;S0211: Obtain the training sample of the cell electron microtomography data;
S0212:采用滑动窗口的方法将所述细胞电子显微断层数据训练样本进行切分,得到多个电子显微断层样本数据;S0212: Use a sliding window method to segment the cell electron microtomography data training sample to obtain a plurality of electron microtomography sample data;
S0213:采用预设比例对所述多个电子显微断层样本数据进行划分,得到待标记训练样本集合和未标记训练样本集合,其中,所述待标记训练样本集合中的电子显微断层样本数据的数量大于所述未标记训练样本集合中的所述电子显微断层样本数据的数量;S0213: Use a preset ratio to divide the plurality of electron microscopy tomography sample data to obtain a to-be-labeled training sample set and an unlabeled training sample set, wherein the electron microscopy tomography sample data in the to-be-labeled training sample set The quantity is greater than the quantity of the electron microscopy tomography sample data in the unlabeled training sample set;
S0214:分别对所述待标记训练样本集合中每个所述电子显微断层样本数据进行蛋白质语义分割标定,得到所述已标记训练样本集合。S0214: Perform protein semantic segmentation and calibration on each of the electron microscopy tomography sample data in the to-be-labeled training sample set, respectively, to obtain the labeled training sample set.
本实施例实现了从一个所述细胞电子显微断层数据训练样本确定已标记训练样本集合和未标记训练样本集合,从而实现了在数据量较少的情况下也可以进行模型的训练。This embodiment realizes that a labeled training sample set and an unlabeled training sample set are determined from one of the cell electron microscopy tomography data training samples, so that model training can be performed even with a small amount of data.
对于S0211,可以获取用户输入的所述细胞电子显微断层数据训练样本,也可以从数据库中获取所述细胞电子显微断层数据训练样本,还可以是第三方应用系统发送的所述细胞电子显微断层数据训练样本。For S0211, the training sample of the cell electron microscope tomography data input by the user can be obtained, the training sample of the cell electron microscope tomography data can also be obtained from the database, or the cell electron microscope tomography data sent by a third-party application system can be obtained. Microtomography data training samples.
细胞电子显微断层数据训练样本,是对细胞的组织结构提取的电子显微断层数据。细胞电子显微断层数据训练样本中包括12种蛋白质的空间分布。The training sample of cell electron micro tomography data is the electron micro tomography data extracted from the tissue structure of cells. The spatial distribution of 12 proteins was included in the training sample of TEM data.
对于S0212,采用滑动窗口的方法对一份所述细胞电子显微断层数据训练样本进行切分,将一次切分得到数据作为一个电子显微断层样本数据。For S0212, a sliding window method is used to segment a training sample of the cell electron micro tomography data, and the data obtained from one segment is used as an electron micro tomography sample data.
对于S0213,采用预设比例对所述多个电子显微断层样本数据进行划分,也就是将所述多个电子显微断层样本数据中的其中一部分的电子显微断层样本数据作为待标记训练样本集合,将所述多个电子显微断层样本数据中的另一部分的电子显微断层样本数据作为未标记训练样本集合。可以理解的是,同一电子显微断层样本数据只能被划分到一个集合(也就是已标记训练样本集合和未标记训练样本集合中的一个集合)。For S0213, use a preset ratio to divide the plurality of electron micro tomography sample data, that is, use a part of the electron micro tomography sample data in the plurality of electron micro tomography sample data as the training sample to be marked Set, and use another part of the electron microtomography sample data in the plurality of electron microtomography sample data as an unlabeled training sample set. It can be understood that the same electron microtomography sample data can only be divided into one set (that is, one set of the labeled training sample set and the unlabeled training sample set).
可选的,预设比例设置为85:15,也就是所述多个电子显微断层样本数据中的85%电子显微断层样本数据作为待标记训练样本集合,所述多个电子显微断层样本数据中的剩下15%的电子显微断层样本数据作为未标记训练样本集合。Optionally, the preset ratio is set to 85:15, that is, 85% of the electron microtomography sample data in the multiple electron microtomography sample data is used as the set of training samples to be marked, and the multiple electron microtomography tomography sample data The remaining 15% of the electron microtomography sample data in the sample data is used as a set of unlabeled training samples.
对于S0214,分别对所述待标记训练样本集合中每个所述电子显微断层样本数据进行蛋白质语义分割标定,将对一个所述电子显微断层样本数据进行蛋白质 语义分割标定的结果作为一个蛋白质标定数据;将所述待标记训练样本集合中所有的所述电子显微断层样本数据和各自对应的蛋白质标定数据作为所述已标记训练样本集合。For S0214, perform protein semantic segmentation and calibration on each of the electron microtomography sample data in the to-be-labeled training sample set respectively, and use the result of protein semantic segmentation and calibration on one of the electron microtomography sample data as a protein Calibration data; take all the electron microscopy tomography sample data and the respective corresponding protein calibration data in the to-be-labeled training sample set as the labeled training sample set.
在一个实施例中,上述根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练的步骤,包括:In one embodiment, the above-mentioned step of adversarial training of the generation network and the discriminant network according to the target marked training samples includes:
S0231:将所述目标已标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第一训练结果;S0231: Input the electron microtomography sample data of the target labeled training sample into the generation network to perform protein semantic segmentation, and obtain a first training result;
S0232:将所述目标已标记训练样本的蛋白质标定数据和所述第一训练结果输入所述判别网络进行判别,得到第一置信结果;S0232: Input the protein calibration data of the target marked training sample and the first training result into the discrimination network for discrimination, and obtain a first confidence result;
S0233:采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练。S0233: Use the protein calibration data of the target labeled training sample, the first training result, and the first confidence result to perform adversarial training on the generation network and the discriminant network.
本实施例基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果。This embodiment effectively improves the performance of training with a small amount of data based on adversarial training, and enhances the effect of model generalization.
对于S0231,将所述目标已标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,将蛋白质语义分割的结果作为第一训练结果。也就是说,第一训练结果是所述目标已标记训练样本的电子显微断层样本数据中每个体素点的蛋白质分类结果。For S0231, input the electron microscopic tomography sample data of the target labeled training sample into the generation network to perform protein semantic segmentation, and use the result of the protein semantic segmentation as the first training result. That is, the first training result is the protein classification result of each voxel point in the electron microtomography sample data of the target labeled training sample.
对于S0232,将所述目标已标记训练样本的蛋白质标定数据和所述第一训练结果输入所述判别网络进行判别,得到所述第一训练结果对应的第一置信结果。也就是说,第一置信结果中的每个置信结果是所述第一训练结果中每个蛋白质分类结果的置信度。For S0232, input the protein calibration data of the target labeled training sample and the first training result into the discrimination network for discrimination, and obtain a first confidence result corresponding to the first training result. That is, each confidence result in the first confidence result is a confidence level for each protein classification result in the first training result.
对于S0233,采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果对所述生成网络进行训练,训练时更新一次所述生成网络的参数;采用所述第一置信结果对所述判别网络进行训练,训练时更新一次所述判别网络的参数。For S0233, use the protein calibration data of the target marked training sample and the first training result to train the generation network, and update the parameters of the generation network once during training; use the first confidence result The discriminant network is trained, and the parameters of the discriminant network are updated once during training.
在一个实施例中,上述采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练的步骤,包括:In one embodiment, the above-mentioned step of adversarial training of the generation network and the discriminant network using the protein calibration data of the target labeled training samples, the first training result and the first confidence result ,include:
S02331:将所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果输入第一损失函数进行计算,得到所述生成网络的第一损失值,根据所述第一损失值更新所述生成网络的参数;S02331: Input the protein calibration data of the target labeled training sample and the first training result into a first loss function for calculation, obtain a first loss value of the generation network, and update according to the first loss value the parameters of the generating network;
S02332:将所述第一置信结果输入第二损失函数进行计算,得到所述判别网络的第二损失值,根据所述第二损失值更新所述判别网络的参数;S02332: Input the first confidence result into a second loss function for calculation, obtain a second loss value of the discriminant network, and update the parameters of the discriminant network according to the second loss value;
其中,所述第一损失函数的计算公式L ce为: Wherein, the calculation formula L ce of the first loss function is:
Figure PCTCN2021084568-appb-000001
Figure PCTCN2021084568-appb-000001
所述第二损失函数的计算公式L adv为: The calculation formula L adv of the second loss function is:
Figure PCTCN2021084568-appb-000002
Figure PCTCN2021084568-appb-000002
X n是所述目标已标记训练样本的所述电子显微断层样本数据,,h是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的宽度,w是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的高度,c是所述目标已标记 训练样本的所述电子显微断层样本数据的尺寸的通道数,S(X n) (h,w,c)是所述第一训练结果,log()是对数函数,Y n (h,w,c)是所述目标已标记训练样本的所述蛋白质标定数据,C是细胞蛋白质的种类数量;D(S(X n)) (h,w)是所述第一置信结果。 Xn is the electron microtomography sample data of the target labeled training sample, h is the width of the size of the electron microtomography sample data of the target labeled training sample, w is the target height of the dimension of the electron microtomography sample data of the labeled training sample, c is the channel number of the dimension of the electron microtomography sample data of the target labeled training sample, S(X n ) (h,w , c) is the first training result, log() is a logarithmic function, Y n (h,w,c) is the protein calibration data of the target labeled training sample, C is the number of cell protein species ; D(S(X n )) (h,w) is the first confidence result.
本实施例使用半监督学习训练,利用未标记数据增强了模型的性能。This example uses semi-supervised learning training to enhance the performance of the model with unlabeled data.
对于S02331,根据所述第一损失值更新所述生成网络的参数的方法可以从现有技术中选择,在此不做赘述。For S02331, the method for updating the parameters of the generation network according to the first loss value can be selected from the prior art, and details are not described here.
对于S02332,根据所述第二损失值更新所述判别网络的参数的方法可以从现有技术中选择,在此不做赘述。For S02332, the method for updating the parameters of the discriminating network according to the second loss value can be selected from the prior art, and details are not described here.
h×w×c是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸,也就是所述电子显微断层样本数据的长度h、宽度w、通道数c。h×w×c is the size of the electron microtomography sample data of the target marked training sample, that is, the length h, the width w, and the number of channels c of the electron microtomography sample data.
在一个实施例中,上述根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练的步骤,包括:In one embodiment, the above-mentioned step of performing semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training includes:
S0241:将所述目标未标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第二训练结果;S0241: Input the electron microtomography sample data of the target unlabeled training sample into the generation network to perform protein semantic segmentation to obtain a second training result;
S0242:将所述第二训练结果输入所述判别网络进行判别,得到第二置信结果;S0242: Input the second training result into the discrimination network for discrimination, and obtain a second confidence result;
S0243:根据所述第二置信结果,确定所述目标未标记训练样本对应的可信赖结果;S0243: Determine a reliable result corresponding to the target unlabeled training sample according to the second confidence result;
S0244:采用所述目标未标记训练样本对应的所述可信赖结果和所述第二训练结果对所述生成网络进行半监督训练。S0244: Perform semi-supervised training on the generation network by using the reliable results and the second training results corresponding to the target unlabeled training samples.
本实施例从而使用半监督学习训练,利用未标记数据增强了模型的性能。This embodiment thus enhances the performance of the model with unlabeled data using semi-supervised learning training.
对于S0241,将所述目标未标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,将蛋白质语义分割的结果作为第二训练结果。也就是说,第二训练结果是所述目标未标记训练样本的电子显微断层样本数据中每个体素点的蛋白质分类结果。For S0241, the electron microtomography sample data of the target unlabeled training sample is input into the generation network to perform protein semantic segmentation, and the result of the protein semantic segmentation is used as the second training result. That is, the second training result is the protein classification result of each voxel point in the electron microtomography sample data of the target unlabeled training sample.
对于S0242,将所述第二训练结果输入所述判别网络进行判别,得到第二置信结果。也就是说,第二置信结果中的每个置信结果是所述第二训练结果中每个蛋白质分类结果的置信度。For S0242, the second training result is input into the discrimination network for discrimination, and a second confidence result is obtained. That is, each confidence result in the second confidence result is a confidence level for each protein classification result in the second training result.
对于S0243,对所述第二置信结果进行二值化处理,将二值化后的所述第二置信结果中满足置信阈值的数据作为所述目标未标记训练样本对应的可信赖结果。For S0243, binarize the second confidence result, and use the data satisfying the confidence threshold in the binarized second confidence result as the reliable result corresponding to the target unlabeled training sample.
对于S0244,采用所述目标未标记训练样本对应的所述可信赖结果和所述第二训练结果对所述生成网络进行半监督训练,训练时更新一次所述生成网络的参数。For S0244, semi-supervised training is performed on the generation network by using the reliable result corresponding to the target unlabeled training sample and the second training result, and the parameters of the generation network are updated once during training.
在一个实施例中,上述采用所述目标未标记训练样本对应的所述可信赖结果和所述第二训练结果对所述生成网络进行半监督训练的步骤,包括:In one embodiment, the above-mentioned step of semi-supervised training of the generating network by using the reliable result corresponding to the target unlabeled training sample and the second training result includes:
S02441:将所述目标未标记训练样本对应的所述可信赖结果、所述第二训练结果输入第三损失函数进行计算,得到所述生成网络的第三损失值,根据所述第三损失值更新所述生成网络的参数;S02441: Input the reliable result and the second training result corresponding to the target unlabeled training sample into a third loss function for calculation, to obtain a third loss value of the generation network, according to the third loss value updating the parameters of the generating network;
其中,所述第三损失函数的计算公式L semi为: Wherein, the calculation formula L semi of the third loss function is:
Figure PCTCN2021084568-appb-000003
Figure PCTCN2021084568-appb-000003
X n是所述目标未标记训练样本的所述电子显微断层样本数据,h×w×c是所述目标未标记训练样本的所述电子显微断层样本数据的尺寸,S(X n) (h,w,c)是所述第二训练结果,D(S(X n)) (h,w)是所述目标未标记训练样本对应的所述可信赖结果,log()是对数函数,T semi是控制自学习过程灵敏度的阈值,
Figure PCTCN2021084568-appb-000004
是自学习的目标值,I()是指示函数,自学习的目标值
Figure PCTCN2021084568-appb-000005
和指示函数I()是常数。
X n is the SEM sample data of the target unlabeled training sample, h×w×c is the size of the SEM sample data of the target unlabeled training sample, S(X n ) (h,w,c) is the second training result, D(S(X n )) (h,w) is the reliable result corresponding to the target unlabeled training sample, log() is the logarithm function, T semi is the threshold that controls the sensitivity of the self-learning process,
Figure PCTCN2021084568-appb-000004
is the target value of self-learning, I() is the indicator function, the target value of self-learning
Figure PCTCN2021084568-appb-000005
And the indicator function I() is constant.
本实施例从而使用半监督学习训练,利用未标记数据增强了模型的性能。This embodiment thus enhances the performance of the model with unlabeled data using semi-supervised learning training.
对于S02441,根据所述第三损失值更新所述生成网络的参数的方法可以从现有技术中选择,在此不做赘述。For S02441, the method for updating the parameters of the generating network according to the third loss value can be selected from the prior art, and details are not described here.
控制自学习过程灵敏度的阈值可以从数据库中获取,还可以从第三方应用系统中获取,也可以写入实现本申请的程序文件中。The threshold for controlling the sensitivity of the self-learning process can be obtained from a database, a third-party application system, or written into a program file implementing the present application.
参照图2,本申请还提出了一种电子显微断层数据的语义分割装置,所述装置包括:Referring to FIG. 2 , the present application also proposes a semantic segmentation device for electron microtomography data, the device includes:
数据获取模块100,用于获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;The data acquisition module 100 is used for acquiring a plurality of electron microscopy tomography data to be analyzed obtained by dividing according to the same cell electron microscopy tomography data to be segmented;
蛋白质语义分割模块200,用于采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;The protein semantic segmentation module 200 is configured to perform protein semantic segmentation on each of the plurality of electron microscopic tomographic data to be analyzed in the plurality of electron microscopic tomographic data to be analyzed by using a cellular protein semantic segmentation model, and obtain the The set of protein semantic segmentation results corresponding to the analyzed electron microscopic tomography data, wherein the alternate training of confrontation and semi-supervised learning is performed based on the generation network and the discriminant network, and the generation network obtained by the alternate training of confrontation and semi-supervised learning is used as the described cellular protein semantic segmentation model;
数据拼接模块300,用于根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。The data splicing module 300 is configured to perform data splicing according to the protein semantic segmentation result set corresponding to the plurality of electron microscopic tomographic data to be analyzed, so as to obtain the target protein corresponding to the plurality of to-be-analyzed electron microscopic tomographic data Semantic segmentation results.
本申请还提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述任一项所述方法的步骤。The present application also proposes a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when the processor executes the computer program.
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的方法的步骤。The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any one of the methods described above.
本实施例通过根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据,采用细胞蛋白质语义分割模型分别对多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,根据多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合进行数据拼接,得到多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果,减少了每次采用细胞蛋白质语义分割模型进行蛋白质语义分割的数据量,降低了对模型训练的GPU硬件条件的要求;通过基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的生成网络作为细胞蛋白质语义分割模型,基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能。In this embodiment, a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented, the cell protein semantic segmentation model is used to separate the multiple electron microtomography tomography data to be analyzed respectively. Perform protein semantic segmentation for each electron microtomography data to be analyzed in the data, and obtain a set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed. The semantic segmentation result set is used for data splicing, and the target protein semantic segmentation results corresponding to the multiple electron microtomography data to be analyzed are obtained, which reduces the amount of data for protein semantic segmentation using the cell protein semantic segmentation model each time, and reduces the training of the model. The requirements of GPU hardware conditions; through the alternating training of adversarial and semi-supervised learning based on the generation network and the discriminant network, the generation network obtained by the alternating training of the adversarial and semi-supervised learning is used as a cell protein semantic segmentation model, and the data is effectively improved based on the adversarial training. The performance of the training with less amount of training enhances the generalization effect of the model, and training with semi-supervised learning enhances the performance of the model with unlabeled data.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服 务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于储存电子显微断层数据的语义分割方法等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种电子显微断层数据的语义分割方法。所述电子显微断层数据的语义分割方法,包括:获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。Referring to Fig. 3, an embodiment of the present application further provides a computer device, the computer device may be a server, and its internal structure may be as shown in Fig. 3 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as semantic segmentation methods of electron microscopic tomography data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program, when executed by the processor, implements a method for semantic segmentation of electron microtomography data. The method for semantic segmentation of electron microscopic tomography data includes: acquiring a plurality of electron microscopic tomographic data to be analyzed obtained by segmenting the same piece of cell electron microscopic tomographic data to be segmented; adopting a cell protein semantic segmentation model Perform protein semantic segmentation on each of the plurality of electron microscopic tomographic data to be analyzed, respectively, to obtain a set of protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data , wherein the alternate training of confrontation and semi-supervised learning is performed based on the generation network and the discriminant network, and the generation network obtained by the alternate training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model; Perform data splicing on the set of protein semantic segmentation results corresponding to the electron microscopic tomography data to obtain the target protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed.
本实施例通过根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据,采用细胞蛋白质语义分割模型分别对多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,根据多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合进行数据拼接,得到多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果,减少了每次采用细胞蛋白质语义分割模型进行蛋白质语义分割的数据量,降低了对模型训练的GPU硬件条件的要求;通过基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的生成网络作为细胞蛋白质语义分割模型,基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能。In this embodiment, a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented, the cell protein semantic segmentation model is used to separate the multiple electron microtomography tomography data to be analyzed respectively. Perform protein semantic segmentation for each electron microtomography data to be analyzed in the data, and obtain a set of protein semantic segmentation results corresponding to multiple electron microtomography data to be analyzed. The semantic segmentation result set is used for data splicing, and the target protein semantic segmentation results corresponding to the multiple electron microtomography data to be analyzed are obtained, which reduces the amount of data for protein semantic segmentation using the cell protein semantic segmentation model each time, and reduces the training of the model. The requirements of GPU hardware conditions; through the alternating training of adversarial and semi-supervised learning based on the generation network and the discriminant network, the generation network obtained by the alternating training of the adversarial and semi-supervised learning is used as a cell protein semantic segmentation model, and the data is effectively improved based on the adversarial training. The performance of the training with less amount of training enhances the generalization effect of the model, and training with semi-supervised learning enhances the performance of the model with unlabeled data.
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现一种电子显微断层数据的语义分割方法,包括步骤:获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements a method for semantic segmentation of electron microscopic tomography data, including the steps of: obtaining data according to the same A plurality of electron microscopy tomography data to be analyzed obtained by segmenting the segmented cell electron microscopy tomography data; using a cell protein semantic segmentation model to separately analyze each of the plurality of electron microscopy tomography data to be analyzed Perform protein semantic segmentation on the electron micro tomography data obtained by obtaining a set of protein semantic segmentation results corresponding to the plurality of electron micro tomography data to be analyzed, wherein the alternating training of confrontation and semi-supervised learning is performed based on the generation network and the discriminant network, The generation network obtained by the alternate training of confrontation and semi-supervised learning is used as the cellular protein semantic segmentation model; data splicing is performed according to the protein semantic segmentation result set corresponding to the plurality of electron microscopic tomography data to be analyzed, Obtain the target protein semantic segmentation result corresponding to the plurality of electron microtomography data to be analyzed.
上述执行的电子显微断层数据的语义分割方法,通过根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据,采用细胞蛋白质语义分割模型分别对多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,根据多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合进行数据拼接,得到多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果,减少了每次采用细胞蛋白质语义分割模型进行蛋白质语 义分割的数据量,降低了对模型训练的GPU硬件条件的要求;通过基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的生成网络作为细胞蛋白质语义分割模型,基于对抗训练有效提升数据量较少的训练的性能,增强了模型泛化效果,使用半监督学习训练,利用未标记数据增强了模型的性能。The semantic segmentation method for electron microtomography data performed above is obtained by segmenting a plurality of electron microtomography data to be analyzed according to the same piece of cell electron microtomography data to be segmented, using a cell protein semantic segmentation model to separate the data. Perform protein semantic segmentation on each electron micro tomography data to be analyzed in the plurality of electron micro tomography data to be analyzed, and obtain a set of protein semantic segmentation results corresponding to the plurality of electron micro tomography data to be analyzed. Data splicing is performed on the set of protein semantic segmentation results corresponding to the analyzed electron micro tomography data, and the target protein semantic segmentation results corresponding to multiple electron micro tomography data to be analyzed are obtained, which reduces the use of the cell protein semantic segmentation model each time for protein semantic segmentation. The amount of divided data reduces the requirements for GPU hardware conditions for model training; through alternate training of adversarial and semi-supervised learning based on the generative network and discriminant network, the generative network obtained by alternate training of adversarial and semi-supervised learning is used as a cell protein Semantic segmentation model, based on adversarial training, effectively improves the performance of training with less data, enhances the generalization effect of the model, uses semi-supervised learning training, and uses unlabeled data to enhance the performance of the model.
所述计算机存储介质可以是非易失性,也可以是易失性。The computer storage medium can be non-volatile or volatile.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, device, article or method comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, apparatus, article or method. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related The technical field is similarly included in the scope of patent protection of this application.

Claims (20)

  1. 一种电子显微断层数据的语义分割方法,其中,所述方法包括:A method for semantic segmentation of electron microscopic tomography data, wherein the method comprises:
    获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;Acquiring a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
    采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;The cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
    根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
  2. 根据权利要求1所述的电子显微断层数据的语义分割方法,其中,所述采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合的步骤之前,包括:The method for semantic segmentation of electron microscopic tomography data according to claim 1, wherein the use of a cell protein semantic segmentation model is used to separately analyze each electron micrograph of the plurality of electron microscopic tomographic data to be analyzed. Before the steps of performing protein semantic segmentation on the tomographic data, and obtaining a set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, the steps include:
    获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合;Obtaining a set of labeled training samples and a set of unlabeled training samples obtained from the same training sample of cell electron microtomography data to be segmented;
    从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本;Obtain a labeled training sample from the labeled training sample set as a target labeled training sample, and obtain an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample;
    根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练,其中,所述生成网络采用分割网络U-net++,所述判别网络采用全卷积鉴别器;Adversarial training is performed on the generation network and the discriminant network according to the target labeled training samples, wherein the generation network adopts a segmentation network U-net++, and the discriminant network adopts a fully convolutional discriminator;
    根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练;Perform semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training;
    重复执行所述从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本的步骤,直至对抗与半监督学习的交替训练均达到收敛条件,将对抗与半监督学习的交替训练均达到收敛条件的所述生成网络确定为所述细胞蛋白质语义分割模型。Repeating the steps of obtaining a labeled training sample from the labeled training sample set as a target labeled training sample, and obtaining an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample, Until the alternate training of adversarial and semi-supervised learning reaches the convergence condition, the generation network whose alternating training of confrontation and semi-supervised learning both reaches the convergence condition is determined as the cell protein semantic segmentation model.
  3. 根据权利要求2所述的电子显微断层数据的语义分割方法,其中,所述获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合的步骤,包括:The method for semantic segmentation of electron microscopy tomography data according to claim 2, wherein the set of labeled training samples and the set of unlabeled training samples obtained by acquiring the same training sample of cell electron microscopy tomography data to be segmented steps, including:
    获取所述细胞电子显微断层数据训练样本;obtaining the training sample of the cell electron microtomography data;
    采用滑动窗口的方法将所述细胞电子显微断层数据训练样本进行切分,得到多个电子显微断层样本数据;The cell electron micro tomography data training sample is segmented by the sliding window method to obtain a plurality of electron micro tomography sample data;
    采用预设比例对所述多个电子显微断层样本数据进行划分,得到待标记训练样本集合和未标记训练样本集合,其中,所述待标记训练样本集合中的电子显微断层样本数据的数量大于所述未标记训练样本集合中的所述电子显微断层样本数据的数量;The plurality of electron microscopy tomography sample data are divided by a preset ratio to obtain a to-be-labeled training sample set and an unlabeled training sample set, wherein the number of electron microscopy tomography sample data in the to-be-labeled training sample set greater than the quantity of the electron microtomography sample data in the unlabeled training sample set;
    分别对所述待标记训练样本集合中每个所述电子显微断层样本数据进行蛋白质语义分割标定,得到所述已标记训练样本集合。Perform protein semantic segmentation and calibration on each of the electron microscopy tomography sample data in the to-be-labeled training sample set, respectively, to obtain the labeled training sample set.
  4. 根据权利要求2所述的电子显微断层数据的语义分割方法,其中,所述根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练的步骤,包括:The method for semantic segmentation of electron microscopic tomography data according to claim 2, wherein the step of adversarial training of the generation network and the discriminant network according to the target marked training samples comprises:
    将所述目标已标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第一训练结果;Inputting the electron microscopy tomography sample data of the target marked training sample into the generating network to perform protein semantic segmentation to obtain a first training result;
    将所述目标已标记训练样本的蛋白质标定数据和所述第一训练结果输入所述判别网络进行判别,得到第一置信结果;Inputting the protein calibration data of the target labeled training sample and the first training result into the discriminant network for discrimination, and obtaining a first confidence result;
    采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练。The generator network and the discriminant network are adversarially trained using the protein calibration data of the target labeled training samples, the first training result, and the first confidence result.
  5. 根据权利要求4所述的电子显微断层数据的语义分割方法,其中,所述采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练的步骤,包括:The method for semantic segmentation of electron microscopy tomography data according to claim 4, wherein said protein calibration data using said target labeled training sample, said first training result and said first confidence result are paired The steps of adversarial training of the generating network and the discriminating network include:
    将所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果输入第一损失函数进行计算,得到所述生成网络的第一损失值,根据所述第一损失值更新所述生成网络的参数;Inputting the protein calibration data and the first training result of the target labeled training sample into a first loss function for calculation to obtain a first loss value of the generation network, and updating the first loss value according to the first loss value. Generate the parameters of the network;
    将所述第一置信结果输入第二损失函数进行计算,得到所述判别网络的第二损失值,根据所述第二损失值更新所述判别网络的参数;Inputting the first confidence result into a second loss function for calculation to obtain a second loss value of the discriminant network, and updating the parameters of the discriminant network according to the second loss value;
    其中,所述第一损失函数的计算公式L ce为: Wherein, the calculation formula L ce of the first loss function is:
    Figure PCTCN2021084568-appb-100001
    Figure PCTCN2021084568-appb-100001
    所述第二损失函数的计算公式L adv为: The calculation formula L adv of the second loss function is:
    Figure PCTCN2021084568-appb-100002
    Figure PCTCN2021084568-appb-100002
    X n是所述目标已标记训练样本的所述电子显微断层样本数据,h是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的宽度,w是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的高度,c是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的通道数,S(X n) (h,w,c)是所述第一训练结果,log()是对数函数,Y n (h,w,c)是所述目标已标记训练样本的所述蛋白质标定数据,C是细胞蛋白质的种类数量;D(S(X n)) (h,w)是所述第一置信结果。 Xn is the electron microtomography sample data of the target labeled training sample, h is the width of the size of the electron microtomography sample data of the target labeled training sample, w is the target labeled training sample The height of the size of the SEM sample data of the training sample, c is the channel number of the size of the SEM sample data of the target labeled training sample, S(X n ) (h,w, c) is the first training result, log() is a logarithmic function, Y n (h, w, c) is the protein calibration data of the target labeled training sample, and C is the number of types of cellular proteins; D(S(X n )) (h,w) is the first confidence result.
  6. 根据权利要求2所述的电子显微断层数据的语义分割方法,其中,所述根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练的步骤,包括:The method for semantic segmentation of electron microtomography data according to claim 2, wherein the generating network after adversarial training is semi-supervised according to the target unlabeled training samples and the discriminant network after adversarial training The training steps include:
    将所述目标未标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第二训练结果;Inputting the electron microtomography sample data of the target unlabeled training sample into the generating network to perform protein semantic segmentation to obtain a second training result;
    将所述第二训练结果输入所述判别网络进行判别,得到第二置信结果;Inputting the second training result into the discriminating network for discrimination to obtain a second confidence result;
    根据所述第二置信结果,确定所述目标未标记训练样本对应的可信赖结果;According to the second confidence result, determine the reliable result corresponding to the target unlabeled training sample;
    采用所述目标未标记训练样本对应的所述可信赖结果和所述第二训练结果对所述生成网络进行半监督训练。Semi-supervised training is performed on the generating network using the trusted results and the second training results corresponding to the target unlabeled training samples.
  7. 根据权利要求6所述的电子显微断层数据的语义分割方法,其中,所述采用所述目标未标记训练样本对应的所述可信赖结果和所述第二训练结果对所述生成网络进行半监督训练的步骤,包括:The method for semantic segmentation of electron microscopic tomography data according to claim 6, wherein the generating network is semi-processed by using the reliable results corresponding to the target unlabeled training samples and the second training results. The steps of supervised training include:
    将所述目标未标记训练样本对应的所述可信赖结果、所述第二训练结果输入第三损失函数进行计算,得到所述生成网络的第三损失值,根据所述第三损失值 更新所述生成网络的参数;Input the reliable result and the second training result corresponding to the target unlabeled training sample into a third loss function for calculation, to obtain the third loss value of the generation network, and update the third loss value according to the third loss value. Describe the parameters of the generated network;
    其中,所述第三损失函数的计算公式L semi为: Wherein, the calculation formula L semi of the third loss function is:
    Figure PCTCN2021084568-appb-100003
    Figure PCTCN2021084568-appb-100003
    X n是所述目标未标记训练样本的所述电子显微断层样本数据,h×w×c是所述目标未标记训练样本的所述电子显微断层样本数据的尺寸,S(X n) (h,w,c)是所述第二训练结果,D(S(X n)) (h,w)是所述目标未标记训练样本对应的所述可信赖结果,log()是对数函数,T semi是控制自学习过程灵敏度的阈值,
    Figure PCTCN2021084568-appb-100004
    是自学习的目标值,I()是指示函数,自学习的目标值
    Figure PCTCN2021084568-appb-100005
    和指示函数I()是常数。
    X n is the SEM sample data of the target unlabeled training sample, h×w×c is the size of the SEM sample data of the target unlabeled training sample, S(X n ) (h,w,c) is the second training result, D(S(X n )) (h,w) is the reliable result corresponding to the target unlabeled training sample, log() is the logarithm function, T semi is the threshold that controls the sensitivity of the self-learning process,
    Figure PCTCN2021084568-appb-100004
    is the target value of self-learning, I() is the indicator function, the target value of self-learning
    Figure PCTCN2021084568-appb-100005
    And the indicator function I() is constant.
  8. 一种电子显微断层数据的语义分割装置,其中,所述装置包括:A device for semantic segmentation of electron microtomography data, wherein the device comprises:
    数据获取模块,用于获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;The data acquisition module is used to acquire a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
    蛋白质语义分割模块,用于采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;A protein semantic segmentation module is used to perform protein semantic segmentation on each of the plurality of electron microscopic tomography data to be analyzed by using a cellular protein semantic segmentation model, and obtain the plurality of to-be-analyzed electron microtomography data. The set of protein semantic segmentation results corresponding to the electron microscopic tomography data of the Cellular protein semantic segmentation model;
    数据拼接模块,用于根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。A data splicing module, configured to perform data splicing according to the protein semantic segmentation result set corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain the target protein semantics corresponding to the plurality of to-be-analyzed electron microscopic tomographic data Split result.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现如下方法步骤:A computer device includes a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the following method steps when executing the computer program:
    获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;Acquiring a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
    采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;The cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
    根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
  10. 根据权利要求9所述的计算机设备,其中,所述采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合的步骤之前,包括:The computer device according to claim 9, wherein the protein semantic segmentation is performed on each of the plurality of electron microscopic tomographic data to be analyzed by using a cellular protein semantic segmentation model, respectively, Before the step of obtaining the protein semantic segmentation result set corresponding to the plurality of electron microtomography data to be analyzed, the steps include:
    获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合;Obtaining a set of labeled training samples and a set of unlabeled training samples obtained from the same training sample of cell electron microtomography data to be segmented;
    从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本;Obtain a labeled training sample from the labeled training sample set as a target labeled training sample, and obtain an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample;
    根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练,其中,所述生成网络采用分割网络U-net++,所述判别网络采用全卷积鉴别 器;Adversarial training is performed on the generation network and the discrimination network according to the target marked training samples, wherein the generation network adopts a segmentation network U-net++, and the discrimination network adopts a fully convolutional discriminator;
    根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练;Perform semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training;
    重复执行所述从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本的步骤,直至对抗与半监督学习的交替训练均达到收敛条件,将对抗与半监督学习的交替训练均达到收敛条件的所述生成网络确定为所述细胞蛋白质语义分割模型。Repeating the steps of obtaining a labeled training sample from the labeled training sample set as a target labeled training sample, and obtaining an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample, Until the alternate training of adversarial and semi-supervised learning reaches the convergence condition, the generation network whose alternating training of confrontation and semi-supervised learning both reaches the convergence condition is determined as the cell protein semantic segmentation model.
  11. 根据权利要求10所述的计算机设备,其中,所述获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合的步骤,包括:The computer device according to claim 10, wherein the step of obtaining the labeled training sample set and the unlabeled training sample set obtained from the same training sample of the cell electron microtomography data to be divided comprises:
    获取所述细胞电子显微断层数据训练样本;obtaining the training sample of the cell electron microtomography data;
    采用滑动窗口的方法将所述细胞电子显微断层数据训练样本进行切分,得到多个电子显微断层样本数据;The cell electron micro tomography data training sample is segmented by the sliding window method to obtain a plurality of electron micro tomography sample data;
    采用预设比例对所述多个电子显微断层样本数据进行划分,得到待标记训练样本集合和未标记训练样本集合,其中,所述待标记训练样本集合中的电子显微断层样本数据的数量大于所述未标记训练样本集合中的所述电子显微断层样本数据的数量;The plurality of electron microscopy tomography sample data are divided by a preset ratio to obtain a to-be-labeled training sample set and an unlabeled training sample set, wherein the number of electron microscopy tomography sample data in the to-be-labeled training sample set greater than the quantity of the electron microtomography sample data in the unlabeled training sample set;
    分别对所述待标记训练样本集合中每个所述电子显微断层样本数据进行蛋白质语义分割标定,得到所述已标记训练样本集合。Perform protein semantic segmentation and calibration on each of the electron microscopy tomography sample data in the to-be-labeled training sample set, respectively, to obtain the labeled training sample set.
  12. 根据权利要求10所述的计算机设备,其中,所述根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练的步骤,包括:The computer device according to claim 10, wherein the step of performing adversarial training on the generation network and the discriminant network according to the target labeled training samples comprises:
    将所述目标已标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第一训练结果;Inputting the electron microscopy tomography sample data of the target marked training sample into the generating network to perform protein semantic segmentation to obtain a first training result;
    将所述目标已标记训练样本的蛋白质标定数据和所述第一训练结果输入所述判别网络进行判别,得到第一置信结果;Inputting the protein calibration data of the target labeled training sample and the first training result into the discriminant network for discrimination, and obtaining a first confidence result;
    采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练。The generator network and the discriminant network are adversarially trained using the protein calibration data of the target labeled training samples, the first training result, and the first confidence result.
  13. 根据权利要求12所述的计算机设备,其中,所述采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练的步骤,包括:13. The computer device of claim 12, wherein said generating network and said generating network using said protein calibration data of said target labeled training sample, said first training result and said first confidence result The steps of adversarial training of the discriminative network include:
    将所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果输入第一损失函数进行计算,得到所述生成网络的第一损失值,根据所述第一损失值更新所述生成网络的参数;Inputting the protein calibration data and the first training result of the target labeled training sample into a first loss function for calculation to obtain a first loss value of the generation network, and updating the first loss value according to the first loss value. Generate the parameters of the network;
    将所述第一置信结果输入第二损失函数进行计算,得到所述判别网络的第二损失值,根据所述第二损失值更新所述判别网络的参数;Inputting the first confidence result into a second loss function for calculation to obtain a second loss value of the discriminant network, and updating the parameters of the discriminant network according to the second loss value;
    其中,所述第一损失函数的计算公式L ce为: Wherein, the calculation formula L ce of the first loss function is:
    Figure PCTCN2021084568-appb-100006
    Figure PCTCN2021084568-appb-100006
    所述第二损失函数的计算公式L adv为: The calculation formula L adv of the second loss function is:
    Figure PCTCN2021084568-appb-100007
    Figure PCTCN2021084568-appb-100007
    X n是所述目标已标记训练样本的所述电子显微断层样本数据,h是所述目 标已标记训练样本的所述电子显微断层样本数据的尺寸的宽度,w是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的高度,c是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的通道数,S(X n) (h,w,c)是所述第一训练结果,log()是对数函数,Y n (h,w,c)是所述目标已标记训练样本的所述蛋白质标定数据,C是细胞蛋白质的种类数量;D(S(X n)) (h,w)是所述第一置信结果。 Xn is the electron microtomography sample data of the target labeled training sample, h is the width of the size of the electron microtomography sample data of the target labeled training sample, w is the target labeled training sample The height of the size of the SEM sample data of the training sample, c is the channel number of the size of the SEM sample data of the target labeled training sample, S(X n ) (h,w, c) is the first training result, log() is a logarithmic function, Y n (h, w, c) is the protein calibration data of the target labeled training sample, and C is the number of types of cellular proteins; D(S(X n )) (h,w) is the first confidence result.
  14. 根据权利要求10所述的计算机设备,其中,所述根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练的步骤,包括:The computer device according to claim 10, wherein the step of performing semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training comprises:
    将所述目标未标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第二训练结果;Inputting the electron microtomography sample data of the target unlabeled training sample into the generating network to perform protein semantic segmentation to obtain a second training result;
    将所述第二训练结果输入所述判别网络进行判别,得到第二置信结果;Inputting the second training result into the discriminating network for discrimination to obtain a second confidence result;
    根据所述第二置信结果,确定所述目标未标记训练样本对应的可信赖结果;According to the second confidence result, determine the reliable result corresponding to the target unlabeled training sample;
    采用所述目标未标记训练样本对应的所述可信赖结果和所述第二训练结果对所述生成网络进行半监督训练。Semi-supervised training is performed on the generating network using the trusted results and the second training results corresponding to the target unlabeled training samples.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下方法步骤:A computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the following method steps are implemented:
    获取根据同一份待切分的细胞电子显微断层数据进行切分得到的多个待分析的电子显微断层数据;Acquiring a plurality of electron microtomography data to be analyzed obtained by segmenting the same cell electron microtomography data to be segmented;
    采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合,其中,基于生成网络和判别网络进行对抗与半监督学习的交替训练,将对抗与半监督学习的交替训练得到的所述生成网络作为所述细胞蛋白质语义分割模型;The cell protein semantic segmentation model is used to perform protein semantic segmentation on each of the electron microtomography data to be analyzed in the plurality of electron microtomography data to be analyzed, so as to obtain the corresponding The set of protein semantic segmentation results, wherein, based on the generation network and the discriminant network, the alternating training of confrontation and semi-supervised learning is performed, and the generation network obtained by the alternating training of confrontation and semi-supervised learning is used as the cell protein semantic segmentation model;
    根据所述多个待分析的电子显微断层数据对应的所述蛋白质语义分割结果集合进行数据拼接,得到所述多个待分析的电子显微断层数据对应的目标蛋白质语义分割结果。Perform data splicing according to the set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, to obtain target protein semantic segmentation results corresponding to the plurality of to-be-analyzed electron microscopic tomographic data.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述采用细胞蛋白质语义分割模型分别对所述多个待分析的电子显微断层数据中每个待分析的电子显微断层数据进行蛋白质语义分割,得到所述多个待分析的电子显微断层数据对应的蛋白质语义分割结果集合的步骤之前,包括:The computer-readable storage medium according to claim 15, wherein the protein analysis is performed on each of the plurality of electron microtomography data to be analyzed by using a cellular protein semantic segmentation model. Semantic segmentation, before the step of obtaining a set of protein semantic segmentation results corresponding to the plurality of electron microscopic tomographic data to be analyzed, includes:
    获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合;Obtaining a set of labeled training samples and a set of unlabeled training samples obtained from the same training sample of cell electron microtomography data to be segmented;
    从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本;Obtain a labeled training sample from the labeled training sample set as a target labeled training sample, and obtain an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample;
    根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练,其中,所述生成网络采用分割网络U-net++,所述判别网络采用全卷积鉴别器;Adversarial training is performed on the generation network and the discriminant network according to the target labeled training samples, wherein the generation network adopts a segmentation network U-net++, and the discriminant network adopts a fully convolutional discriminator;
    根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进行半监督训练;Perform semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training;
    重复执行所述从所述已标记训练样本集合中获取一个已标记训练样本作为目标已标记训练样本,从所述未标记训练样本集合中获取一个未标记训练样本作为目标未标记训练样本的步骤,直至对抗与半监督学习的交替训练均达到收敛条件,将对抗与半监督学习的交替训练均达到收敛条件的所述生成网络确定为所述 细胞蛋白质语义分割模型。Repeating the steps of obtaining a labeled training sample from the labeled training sample set as a target labeled training sample, and obtaining an unlabeled training sample from the unlabeled training sample set as a target unlabeled training sample, Until the alternate training of adversarial and semi-supervised learning reaches the convergence condition, the generation network whose alternating training of confrontation and semi-supervised learning both reaches the convergence condition is determined as the cell protein semantic segmentation model.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述获取同一份待切分的细胞电子显微断层数据训练样本得到的已标记训练样本集合和未标记训练样本集合的步骤,包括:The computer-readable storage medium according to claim 16, wherein the step of obtaining the labeled training sample set and the unlabeled training sample set obtained from the same training sample of the cell electron microtomography data to be segmented comprises:
    获取所述细胞电子显微断层数据训练样本;obtaining the training sample of the cell electron microtomography data;
    采用滑动窗口的方法将所述细胞电子显微断层数据训练样本进行切分,得到多个电子显微断层样本数据;The cell electron micro tomography data training sample is segmented by the sliding window method to obtain a plurality of electron micro tomography sample data;
    采用预设比例对所述多个电子显微断层样本数据进行划分,得到待标记训练样本集合和未标记训练样本集合,其中,所述待标记训练样本集合中的电子显微断层样本数据的数量大于所述未标记训练样本集合中的所述电子显微断层样本数据的数量;The plurality of electron microscopy tomography sample data are divided by a preset ratio to obtain a to-be-labeled training sample set and an unlabeled training sample set, wherein the number of electron microscopy tomography sample data in the to-be-labeled training sample set greater than the quantity of the electron microtomography sample data in the unlabeled training sample set;
    分别对所述待标记训练样本集合中每个所述电子显微断层样本数据进行蛋白质语义分割标定,得到所述已标记训练样本集合。Perform protein semantic segmentation and calibration on each of the electron microscopy tomography sample data in the to-be-labeled training sample set, respectively, to obtain the labeled training sample set.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述目标已标记训练样本对所述生成网络和所述判别网络进行对抗训练的步骤,包括:The computer-readable storage medium of claim 16, wherein the step of adversarially training the generator network and the discriminant network according to the target labeled training samples comprises:
    将所述目标已标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第一训练结果;Inputting the electron microscopy tomography sample data of the target marked training sample into the generating network to perform protein semantic segmentation to obtain a first training result;
    将所述目标已标记训练样本的蛋白质标定数据和所述第一训练结果输入所述判别网络进行判别,得到第一置信结果;Inputting the protein calibration data of the target labeled training sample and the first training result into the discriminant network for discrimination, and obtaining a first confidence result;
    采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练。The generator network and the discriminant network are adversarially trained using the protein calibration data of the target labeled training samples, the first training result, and the first confidence result.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述采用所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果和所述第一置信结果对所述生成网络和所述判别网络进行对抗训练的步骤,包括:19. The computer-readable storage medium of claim 18, wherein the generative network is generated using the protein calibration data of the target labeled training samples, the first training result, and the first confidence result. The steps of adversarial training with the discriminant network include:
    将所述目标已标记训练样本的所述蛋白质标定数据、所述第一训练结果输入第一损失函数进行计算,得到所述生成网络的第一损失值,根据所述第一损失值更新所述生成网络的参数;Inputting the protein calibration data and the first training result of the target labeled training sample into a first loss function for calculation to obtain a first loss value of the generation network, and updating the first loss value according to the first loss value. Generate the parameters of the network;
    将所述第一置信结果输入第二损失函数进行计算,得到所述判别网络的第二损失值,根据所述第二损失值更新所述判别网络的参数;Inputting the first confidence result into a second loss function for calculation to obtain a second loss value of the discriminant network, and updating the parameters of the discriminant network according to the second loss value;
    其中,所述第一损失函数的计算公式L ce为: Wherein, the calculation formula L ce of the first loss function is:
    Figure PCTCN2021084568-appb-100008
    Figure PCTCN2021084568-appb-100008
    所述第二损失函数的计算公式L adv为: The calculation formula L adv of the second loss function is:
    Figure PCTCN2021084568-appb-100009
    Figure PCTCN2021084568-appb-100009
    X n是所述目标已标记训练样本的所述电子显微断层样本数据,h是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的宽度,w是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的高度,c是所述目标已标记训练样本的所述电子显微断层样本数据的尺寸的通道数,S(X n) (h,w,c)是所述第一训练结果,log()是对数函数,Y n (h,w,c)是所述目标已标记训练样本的所述蛋白质标定数据,C是细胞蛋白质的种类数量;D(S(X n)) (h,w)是所述第一置信结果。 Xn is the electron microtomography sample data of the target labeled training sample, h is the width of the size of the electron microtomography sample data of the target labeled training sample, w is the target labeled training sample The height of the size of the SEM sample data of the training sample, c is the channel number of the size of the SEM sample data of the target labeled training sample, S(X n ) (h,w, c) is the first training result, log() is a logarithmic function, Y n (h, w, c) is the protein calibration data of the target labeled training sample, and C is the number of types of cellular proteins; D(S(X n )) (h,w) is the first confidence result.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述目标未标记训练样本和对抗训练后的所述判别网络对对抗训练后的所述生成网络进 行半监督训练的步骤,包括:The computer-readable storage medium according to claim 16, wherein the step of performing semi-supervised training on the generation network after adversarial training according to the target unlabeled training samples and the discriminant network after adversarial training, include:
    将所述目标未标记训练样本的电子显微断层样本数据输入所述生成网络进行蛋白质语义分割,得到第二训练结果;Inputting the electron microtomography sample data of the target unlabeled training sample into the generating network to perform protein semantic segmentation to obtain a second training result;
    将所述第二训练结果输入所述判别网络进行判别,得到第二置信结果;Inputting the second training result into the discriminating network for discrimination to obtain a second confidence result;
    根据所述第二置信结果,确定所述目标未标记训练样本对应的可信赖结果;According to the second confidence result, determine the reliable result corresponding to the target unlabeled training sample;
    采用所述目标未标记训练样本对应的所述可信赖结果和所述第二训练结果对所述生成网络进行半监督训练。Semi-supervised training is performed on the generating network using the trusted results and the second training results corresponding to the target unlabeled training samples.
PCT/CN2021/084568 2021-02-26 2021-03-31 Semantic segmentation method and apparatus for electron microtomography data, device, and medium WO2022178949A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110219615.3 2021-02-26
CN202110219615.3A CN112949646B (en) 2021-02-26 2021-02-26 Semantic segmentation method, device, equipment and medium for electron microscopic fault data

Publications (1)

Publication Number Publication Date
WO2022178949A1 true WO2022178949A1 (en) 2022-09-01

Family

ID=76246562

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084568 WO2022178949A1 (en) 2021-02-26 2021-03-31 Semantic segmentation method and apparatus for electron microtomography data, device, and medium

Country Status (2)

Country Link
CN (1) CN112949646B (en)
WO (1) WO2022178949A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614921A (en) * 2018-12-07 2019-04-12 安徽大学 A kind of cell segmentation method for the semi-supervised learning generating network based on confrontation
US20190122120A1 (en) * 2017-10-20 2019-04-25 Dalei Wu Self-training method and system for semi-supervised learning with generative adversarial networks
CN109740560A (en) * 2019-01-11 2019-05-10 济南浪潮高新科技投资发展有限公司 Human cellular protein automatic identifying method and system based on convolutional neural networks
CN110097059A (en) * 2019-03-22 2019-08-06 中国科学院自动化研究所 Based on file and picture binary coding method, system, the device for generating confrontation network
CN110853703A (en) * 2019-10-16 2020-02-28 天津大学 Semi-supervised learning prediction method for protein secondary structure
CN111242922A (en) * 2020-01-13 2020-06-05 上海极链网络科技有限公司 Protein image classification method, device, equipment and medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549895A (en) * 2018-04-17 2018-09-18 深圳市唯特视科技有限公司 A kind of semi-supervised semantic segmentation method based on confrontation network
CN108734211B (en) * 2018-05-17 2019-12-24 腾讯科技(深圳)有限公司 Image processing method and device
CN109035269B (en) * 2018-07-03 2021-05-11 怀光智能科技(武汉)有限公司 Cervical cell pathological section pathological cell segmentation method and system
CN109949317B (en) * 2019-03-06 2020-12-11 东南大学 Semi-supervised image example segmentation method based on gradual confrontation learning
CN110517272B (en) * 2019-08-29 2022-03-25 电子科技大学 Deep learning-based blood cell segmentation method
CN111080645B (en) * 2019-11-12 2023-08-15 中国矿业大学 Remote sensing image semi-supervised semantic segmentation method based on generation type countermeasure network
CN112270686B (en) * 2020-12-24 2021-03-16 北京达佳互联信息技术有限公司 Image segmentation model training method, image segmentation device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122120A1 (en) * 2017-10-20 2019-04-25 Dalei Wu Self-training method and system for semi-supervised learning with generative adversarial networks
CN109614921A (en) * 2018-12-07 2019-04-12 安徽大学 A kind of cell segmentation method for the semi-supervised learning generating network based on confrontation
CN109740560A (en) * 2019-01-11 2019-05-10 济南浪潮高新科技投资发展有限公司 Human cellular protein automatic identifying method and system based on convolutional neural networks
CN110097059A (en) * 2019-03-22 2019-08-06 中国科学院自动化研究所 Based on file and picture binary coding method, system, the device for generating confrontation network
CN110853703A (en) * 2019-10-16 2020-02-28 天津大学 Semi-supervised learning prediction method for protein secondary structure
CN111242922A (en) * 2020-01-13 2020-06-05 上海极链网络科技有限公司 Protein image classification method, device, equipment and medium

Also Published As

Publication number Publication date
CN112949646A (en) 2021-06-11
CN112949646B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
EP3879485B1 (en) Tissue nodule detection and model training method and apparatus thereof, device and system
CN110245721B (en) Training method and device for neural network model and electronic equipment
WO2021159774A1 (en) Object detection model training method and apparatus, object detection method and apparatus, computer device, and storage medium
US20210295162A1 (en) Neural network model training method and apparatus, computer device, and storage medium
Friedman et al. Being Bayesian about network structure
Buchholz et al. DenoiSeg: joint denoising and segmentation
WO2022142450A1 (en) Methods and apparatuses for image segmentation model training and for image segmentation
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN112102237A (en) Brain tumor recognition model training method and device based on semi-supervised learning
CN113870289B (en) Facial nerve segmentation method and device for decoupling and dividing treatment
BR112015029806A2 (en) systems and methods for performing Bayesian optimization
WO2022134805A1 (en) Document classification prediction method and apparatus, and computer device and storage medium
CN109712128A (en) Feature point detecting method, device, computer equipment and storage medium
EP3716215A1 (en) Artificial intelligence enabled volume reconstruction
CN110472049B (en) Disease screening text classification method, computer device and readable storage medium
WO2020034801A1 (en) Medical feature screening method and apparatus, computer device, and storage medium
CN113313169B (en) Training material intelligent identification method, device and equipment based on deep learning
CN114841947A (en) Method and device for multi-scale feature extraction and prognosis analysis of H & E staining pathological image tumor region
Zhou et al. One-shot learning with attention-guided segmentation in cryo-electron tomography
CN117115291A (en) CT image generation method and device based on large model
Liu et al. Using simulated training data of voxel-level generative models to improve 3D neuron reconstruction
Dyhr et al. 3D surface reconstruction of cellular cryo-soft X-ray microscopy tomograms using semisupervised deep learning
WO2022178949A1 (en) Semantic segmentation method and apparatus for electron microtomography data, device, and medium
US11397868B2 (en) Fungal identification by pattern recognition
Thomas et al. Characterization of tissue types in basal cell carcinoma images via generative modeling and concept vectors

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927383

Country of ref document: EP

Kind code of ref document: A1