CN114299284A

CN114299284A - Training method, using method, device, equipment and medium of segmentation model

Info

Publication number: CN114299284A
Application number: CN202111113808.7A
Authority: CN
Inventors: 周昵昀; 严欣; 姚建华
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2022-04-08

Abstract

The application discloses a training method, a using method, a device, equipment and a medium of a segmentation model, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring a sample three-dimensional medical image and a sample example segmentation result; inputting the feature representation of the sample three-dimensional medical image into a semantic segmentation branch of a segmentation model, and outputting a semantic segmentation result; dividing a target detection branch of the model, and outputting position information and classification results of the detection frame; obtaining a prediction instance segmentation result according to the semantic segmentation result, the position information of the detection frame and the classification result; and training based on the error between the prediction example segmentation result and the sample example segmentation result to obtain a trained segmentation model. The method realizes the example segmentation of the three-dimensional medical image by combining the semantic segmentation branch and the target detection branch; selecting a three-dimensional medical image for semantic segmentation, and increasing the dimensionality for describing the three-dimensional interest target; the ability to analyze information on biological structures is improved.

Description

Training method, using method, device, equipment and medium of segmentation model

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a training method, a using method, a device, equipment and a medium of a segmentation model.

Background

The three-dimensional cryoelectron microscope image contains biological structure information, and the position and the three-dimensional structure information of intracellular organelles and protein compounds in a state close to the natural state in a cell environment can be obtained by analyzing the biological structure information.

At present, two-dimensional sections are mostly selected from three-dimensional cryoelectron microscope images and are segmented on the two-dimensional sections to analyze biological structure information.

Because the biological structure information is very complicated, many problems exist in the process of analyzing the position of intracellular organelles and protein compounds contained in the three-dimensional cryoelectron microscope image and the three-dimensional structure information, for example, when the three-dimensional cryoelectron microscope image is segmented on a two-dimensional section, the segmentation result is discontinuous when the segmentation result is expressed in the three-dimensional cryoelectron microscope image, and how to improve the segmentation effect of the three-dimensional cryoelectron microscope image is a problem to be solved urgently.

Disclosure of Invention

The application provides a training method, a using method, a device, equipment and a medium of a segmentation model, which can realize example segmentation of a three-dimensional medical image, and the technical scheme is as follows:

according to an aspect of the present application, there is provided a training method of a segmentation model, the method including:

obtaining a sample three-dimensional medical image and a sample example segmentation result, wherein the sample example segmentation result comprises labels of n three-dimensional interest targets in the sample three-dimensional medical image, and n is an integer greater than 1;

inputting the feature representation of the sample three-dimensional medical image into a semantic segmentation branch of the segmentation model, and outputting a semantic segmentation result of the sample three-dimensional medical image; the semantic segmentation result is used for indicating three-dimensional position information of the three-dimensional interest target; inputting the feature representation of the sample three-dimensional medical image into a target detection branch of the segmentation model, and outputting the position information and classification result of a detection frame; the detection box is used for determining the three-dimensional interest target in the sample three-dimensional medical image;

obtaining a prediction example segmentation result of the sample three-dimensional medical image according to the semantic segmentation result, the position information of the detection frame and the classification result;

and training the segmentation model based on the error between the prediction example segmentation result and the sample example segmentation result to obtain the trained segmentation model.

According to another aspect of the present application, there is provided a method of using a segmentation model, the method including:

acquiring an input three-dimensional medical image;

inputting the feature representation of the input three-dimensional medical image into a semantic segmentation branch of the segmentation model, and outputting a semantic segmentation result of the input three-dimensional medical image; the semantic segmentation result is used for indicating three-dimensional position information of the three-dimensional interest target; inputting the feature representation of the input three-dimensional medical image into a target detection branch of the segmentation model, and outputting position information and classification results of a detection frame; the detection box is used for determining the three-dimensional interest target in the input three-dimensional medical image;

and obtaining a prediction instance segmentation result of the input three-dimensional medical image according to the semantic segmentation result, the position information of the detection frame and the classification result.

According to another aspect of the present application, there is provided a training apparatus for a segmentation model, the apparatus including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample three-dimensional medical image and a sample example segmentation result, the sample example segmentation result comprises labels of n three-dimensional interest targets in the sample three-dimensional medical image, and n is an integer greater than 1;

the first prediction module is used for inputting the feature representation of the sample three-dimensional medical image into a semantic segmentation branch of the segmentation model and outputting a semantic segmentation result of the sample three-dimensional medical image; the semantic segmentation result is used for indicating three-dimensional position information of the three-dimensional interest target;

the second prediction module is used for inputting the feature representation of the sample three-dimensional medical image into a target detection branch of the segmentation model and outputting the position information and the classification result of a detection frame; the detection box is used for determining the three-dimensional interest target in the sample three-dimensional medical image;

the determining module is used for obtaining a prediction example segmentation result of the sample three-dimensional medical image according to the semantic segmentation result, the position information of the detection frame and the classification result;

and the training module is used for training the segmentation model based on the error between the prediction example segmentation result and the sample example segmentation result to obtain the trained segmentation model.

In an alternative design of the present application, the semantic segmentation branch includes: an encoder and a decoder;

the first prediction module comprises:

the encoding unit is used for inputting the characteristic representation of the sample three-dimensional medical image into the encoder, and encoding the characteristic representation to obtain a hidden layer representation of the sample three-dimensional medical image;

and the decoding unit is used for inputting the hidden layer representation into the decoder and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image.

In an alternative design of the present application, the encoder is an encoder of a U-type convolutional neural network U-Net, and the decoder is a decoder of the U-Net;

the encoding unit is further configured to: inputting the feature representation of the sample three-dimensional medical image into the encoder of the U-Net, and encoding to obtain a hidden layer representation of the sample three-dimensional medical image;

the decoding unit is further configured to: and inputting the hidden layer representation into a decoder of the U-Net, and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image.

In an alternative design of the present application, the target detection branch is a regional proposal network, RPN;

the second prediction module is further to: and inputting the feature representation of the sample three-dimensional medical image into the RPN, and outputting the position information and the classification result of the detection frame.

In an alternative design of the present application, the apparatus further includes:

and the correction module is used for correcting the position information by aligning the ROI with the interest region to obtain the corrected position information.

In an optional design of the present application, the semantic segmentation result includes three-dimensional position information of at least one predicted interest target;

the determination module is further to: determining the predicted interest target located in the detection frame based on the position information of the detection frame and the three-dimensional position information of the at least one predicted interest target;

determining the classification result of the detection frame as a classification result of the predicted interest target;

and outputting a prediction instance segmentation result of the sample three-dimensional medical image, wherein the prediction instance segmentation result comprises at least one prediction instance, and each prediction instance corresponds to the three-dimensional position information and the classification result of one prediction interest target.

In an alternative design of the present application, the segmentation model further includes: a feature extraction network;

the device further comprises: and the feature extraction module is used for inputting the sample three-dimensional medical image into the feature extraction network and outputting the feature representation of the sample three-dimensional medical image.

In an optional design of the present application, the feature extraction network is a feature pyramid network FPN;

the feature extraction module is further to: inputting the sample three-dimensional medical image into the FPN, outputting a characteristic pyramid of the sample three-dimensional medical image, and taking the characteristic pyramid as a characteristic representation of the sample three-dimensional medical image.

In an alternative design of the present application, the apparatus further includes: the preprocessing module is used for preprocessing the sample three-dimensional medical image to obtain a preprocessed sample three-dimensional medical image;

wherein the pretreatment method comprises at least one of the following steps:

carrying out image filtering processing; image cropping processing; and (5) image augmentation processing.

In an optional design of the present application, the preprocessing module is further configured to: and carrying out image filtering processing on the sample three-dimensional medical image by using a wiener filter to obtain the sample three-dimensional medical image after the image filtering processing.

In an alternative design of the present application, the preprocessing module is further configured to at least one of:

improving the brightness of the sample three-dimensional medical image to obtain the sample three-dimensional medical image after image augmentation processing;

improving the contrast of the sample three-dimensional medical image to obtain the sample three-dimensional medical image after image augmentation processing;

and adding noise to the sample three-dimensional medical image to obtain the sample three-dimensional medical image after image augmentation processing.

a second acquisition module for acquiring an input three-dimensional medical image;

and the third prediction module is used for inputting the feature representation of the input three-dimensional medical image into the trained segmentation model to obtain a prediction instance segmentation result of the input three-dimensional medical image.

According to another aspect of the present application, there is provided an apparatus for using a segmentation model, the apparatus including:

an acquisition module for acquiring an input three-dimensional medical image;

the first prediction module is used for inputting the feature representation of the input three-dimensional medical image into a semantic segmentation branch of the segmentation model and outputting a semantic segmentation result of the input three-dimensional medical image; the semantic segmentation result is used for indicating three-dimensional position information of the three-dimensional interest target;

the second prediction module is used for inputting the feature representation of the input three-dimensional medical image into a target detection branch of the segmentation model and outputting the position information and the classification result of a detection frame; the detection box is used for determining the three-dimensional interest target in the input three-dimensional medical image;

and the determining module is used for obtaining a prediction example segmentation result of the input three-dimensional medical image according to the semantic segmentation result, the position information of the detection frame and the classification result.

According to another aspect of the present application, there is provided a computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method of training and/or a method of using a segmentation model as described above.

According to another aspect of the present application, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a method of training and/or using a segmentation model as described above.

According to another aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium, from which a processor reads and executes the computer instructions to implement the above-described method of training and/or using a segmentation model as described above.

The beneficial effect that technical scheme that this application provided brought includes at least:

the example segmentation of the three-dimensional medical image is realized by combining the semantic segmentation branch and the target detection branch; namely, the three-dimensional structure information and the classification result of each three-dimensional interest target can be analyzed. Selecting a three-dimensional medical image for semantic segmentation, and increasing the dimensionality for describing the three-dimensional interest target; the accurate three-dimensional structure information of the three-dimensional interest target is obtained, and the problem that an example segmentation result is discontinuous in a three-dimensional medical image is avoided. And by combining the target detection branches, the classification result of the three-dimensional interest target can be obtained, and the example segmentation of the three-dimensional medical image is realized. The three-dimensional medical image is selected for segmentation, so that the example segmentation effect is improved, and the capability of analyzing biological structure information is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram of a computer system for training and/or using a segmentation model provided by one embodiment of the present application;

FIG. 2 is a schematic illustration of a method of training and/or using a segmentation model provided by an exemplary embodiment of the present application;

FIG. 3 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 5 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 6 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 7 is a flowchart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 8 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 9 is a flowchart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 10 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 11 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 12 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 13 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 14 is a flow chart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 15 is a flowchart of a method for training a segmentation model provided by an exemplary embodiment of the present application;

FIG. 16 is a flow chart of a method of using a segmentation model provided by an exemplary embodiment of the present application;

FIG. 17 is a schematic of a two-dimensional cross-section of the raw data of the EMD-7151 dataset;

FIG. 18 is a two-dimensional cross-sectional visualization of an EMD-7151 dataset after an example segmentation by a segmentation model;

FIG. 19 is a three-dimensional image visualization of an EMD-7151 data set after an example segmentation by a segmentation model;

FIG. 20 is a schematic of a two-dimensional cross-section of the raw data of the EMD-7141 dataset;

FIG. 21 is a two-dimensional cross-sectional visualization of an EMD-7141 data set after an example segmentation by a segmentation model;

FIG. 22 is a three-dimensional image visualization of an EMD-7141 data set after an example segmentation by a segmentation model;

FIG. 23 is a block diagram of a training apparatus for a segmentation model provided in an exemplary embodiment of the present application;

FIG. 24 is a block diagram of an apparatus for using a segmentation model provided in an exemplary embodiment of the present application;

fig. 25 is a block diagram of a server according to an exemplary embodiment of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be understood that, although the terms first, second, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first parameter may also be referred to as a second parameter, and similarly, a second parameter may also be referred to as a first parameter, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

FIG. 1 illustrates a schematic diagram of a computer system provided by one embodiment of the present application. The computer system may implement a system architecture that becomes a training method for the segmentation model. The computer system may include: a terminal 100 and a server 200.

The terminal 100 may be an electronic device such as a mobile phone, a tablet Computer, a vehicle-mounted terminal (car machine), a wearable device, a PC (Personal Computer), a door access device, an unmanned terminal, and the like. The terminal 100 may have a client installed therein for running a target application, which may be a game application or another application providing a training function of a segmentation model, and the present application is not limited thereto. The form of the target Application is not limited in the present Application, and may include, but is not limited to, an App (Application program) installed in the terminal 100, an applet, and the like, and may be a web page form.

The server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The server 200 may be a background server of the target application program, and is configured to provide a background service for a client of the target application program.

In the training method of the segmentation model provided by the embodiment of the application, the execution subject of each step may be a computer device, and the computer device refers to an electronic device with data calculation, processing and storage capabilities. Taking the embodiment environment shown in fig. 1 as an example, the terminal 100 may execute the method for training the segmentation model (for example, the client installed and running in the terminal 100 is configured to execute the method for training the segmentation model), the server 200 may execute the method for training the segmentation model, or the terminal 100 and the server 200 may cooperate with each other to execute the method, which is not limited in this application.

In addition, the technical scheme of the application can be combined with the block chain technology. For example, the training method of the segmentation model disclosed in the present application, wherein some data (such as three-dimensional medical image, prediction example segmentation result, and the like) involved in the training method can be saved on the block chain. The terminal 100 and the server 200 may communicate with each other through a network, such as a wired or wireless network.

FIG. 2 illustrates a schematic diagram of a method for training and/or using a segmentation model provided by an exemplary embodiment of the present application. For a segmentation model, there are a training process 210 and a using process 230.

In the training process 210, the segmentation model 213 needs to be trained, and the segmentation model 213 includes: a feature extraction network 214, a semantic segmentation branch 216, and a target detection branch 218. Acquiring a sample three-dimensional medical image 211, and preprocessing the sample three-dimensional medical image 211 to obtain a processed sample three-dimensional medical image 212. The processed sample three-dimensional medical image 212 is input to a feature extraction network 214, outputting a feature representation 215 of the sample three-dimensional medical image.

The feature representation 215 is input to a semantic segmentation branch 216 and a target detection branch 218, respectively. Semantic segmentation branch 216 outputs semantic segmentation result 217, and target detection branch 218 outputs position information 219 of the detection box and classification result 220.

And obtaining a prediction example segmentation result 221 of the sample three-dimensional medical image according to the semantic segmentation result 217, the position information 219 of the detection frame and the classification result 220. Based on the error between the prediction example segmentation result 221 and the sample example segmentation result 222 of the sample three-dimensional medical image, the segmentation model 213 is trained to obtain a trained segmentation model 232.

In the using process 230, a trained segmentation model 232 needs to be used, and the trained segmentation model 232 includes: a trained feature extraction network 233, a trained semantic segmentation network 234, and a trained target detection network 235.

The input three-dimensional medical image 231 is acquired, and the input three-dimensional medical image 231 is preprocessed to obtain a preprocessed input three-dimensional medical image 231 a. The preprocessed input three-dimensional medical image 231a is input to the trained segmentation model 232, and a prediction instance segmentation result 236 of the input three-dimensional medical image 231 is obtained.

FIG. 3 is a flow chart illustrating a training method of a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 410: acquiring a sample three-dimensional medical image and a sample example segmentation result;

the three-dimensional medical image is a three-dimensional image for describing a biological structure, the three-dimensional medical image including at least one three-dimensional object of interest. A three-dimensional object of interest is a structure of an organism having a three-dimensional structure. Illustratively, the three-dimensional object of interest includes, but is not limited to, at least one of the following: cells, intracellular organelles, protein complexes. Illustratively, the three-dimensional medical image is a three-dimensional image obtained based on tomography, and the method for acquiring the three-dimensional medical image includes, but is not limited to, at least one of the following methods: cryoelectron Microscopy Tomography (Cryo-Electron Tomography, Cryo-ET), X-ray crystal Diffraction analysis (X-ray Diffraction Methods), Nuclear Magnetic Resonance (NMR), Scanning Tunneling Microscopy (Scanning Tunneling Microscopy). In the embodiments of the present application, a detailed description will be given taking an example in which a three-dimensional medical image is acquired based on a frozen electron microscope tomography. Cryoelectron microscopy tomography can resolve the three-dimensional structure of intracellular organelles and protein complexes in a near-natural state in a cellular environment.

The sample example segmentation result comprises labels of n three-dimensional interest targets in the sample three-dimensional medical image, wherein n is an integer larger than 1. The label of the three-dimensional interest target comprises position information of the three-dimensional interest target and a classification result of the three-dimensional interest target. The position information of the three-dimensional interest target is used for describing the position of the three-dimensional interest target in the three-dimensional medical image, namely the position information of one or more pixel points included in the three-dimensional interest target in the three-dimensional medical image. The classification result of the three-dimensional interest object is used to describe category information of the three-dimensional interest object, and exemplary categories of the three-dimensional interest object include, but are not limited to, at least one of the following categories: mitochondria, vesicles, tubulin, ribosomes.

Step 420: inputting the feature representation of the sample three-dimensional medical image into a semantic segmentation branch of a segmentation model, and outputting a semantic segmentation result of the sample three-dimensional medical image;

the semantic segmentation result of the sample three-dimensional medical image is included in the three-dimensional medical image, and the category information of each pixel point is included in the semantic segmentation result. Illustratively, the category information of the pixel points includes, but is not limited to, at least one of the following cases: the pixel points are three-dimensional interest targets and the pixel points are three-dimensional background images.

The semantic segmentation branch is a machine learning model for predicting a semantic segmentation result of the sample three-dimensional medical image based on the feature representation of the sample three-dimensional medical image. In the present embodiment, no limitation is made on the implementation manner of semantic division branching. Exemplary, implementations of semantic segmentation branching include, but are not limited to, at least one of the following models: full Convolution Networks (FCN), U-type convolutional neural Networks (U-Net), and segmented convolutional neural Networks (SegNet). Those skilled in the art will appreciate that the above model can be used independently to construct semantic segmentation branches; the models can be combined for use to construct semantic segmentation branches.

Step 430: inputting the feature representation of the sample three-dimensional medical image into a target detection branch of the segmentation model, and outputting the position information and classification result of the detection frame;

the position information of the detection frame is used for describing the position of the detection frame in the three-dimensional medical image, namely the position information of one or more pixel points included in the detection frame in the three-dimensional medical image. Illustratively, the shape of the detection frame is fixed; such as: the detection frame is a cube with the side length of 150 pixel points and/or the detection frame is a cuboid with the length-width-height ratio of 3:3: 4. The classification result of the detection frame is used to describe the category information of the detection frame, and exemplary categories of the detection frame include, but are not limited to, at least one of the following categories: mitochondria, vesicles, tubulin, ribosomes.

It should be noted that the position information of the detection frame is different from the position information of the three-dimensional interest target, and the different position information of the three-dimensional interest target may describe three-dimensional structure information of the three-dimensional interest target. The position information of the detection frame is the position information of the detection frame output by the target detection branch, and the three-dimensional structure information such as the boundary position and the space structure of the three-dimensional interest target cannot be described.

The target detection branch is a machine learning model for predicting the position information and classification result of the detection frame based on the feature representation of the sample three-dimensional medical image. In the present embodiment, no limitation is imposed on the implementation manner of the target detection branch. Exemplary, target detection branch implementations include, but are not limited to, at least one of the following models: region Proposed Networks (RPN), Region with CNN Features (R-CNN), and Region-based full Convolutional Networks (R-FCN). Those skilled in the art will appreciate that the above models can be used independently to construct target detection branches; the above models may be used in combination with each other to construct a target detection branch.

It should be noted that, in the present embodiment, no limitation is made on the timing relationship between step 420 and step 430. Illustratively, step 420 may be performed before, after, or simultaneously with step 430.

Step 440: obtaining a prediction example segmentation result of the sample three-dimensional medical image according to the semantic segmentation result, the position information of the detection frame and the classification result;

the prediction example segmentation result comprises labels of m three-dimensional interest objects in the sample three-dimensional medical image predicted by the segmentation model, wherein m is an integer larger than 1. The magnitude relationship between m and n is not subject to any limitation, and m may be greater than, less than, or equal to n.

The m three-dimensional interest objects are predicted by a segmentation model. The label comprises position information of the three-dimensional interest target obtained through prediction of the segmentation model and a classification result of the three-dimensional interest target obtained through prediction of the segmentation model. The prediction example segmentation result of the sample three-dimensional medical image is obtained based on the semantic segmentation result, the position information of the detection frame and the classification result.

Step 450: and training the segmentation model based on the error between the prediction example segmentation result and the sample example segmentation result to obtain the trained segmentation model.

The segmentation model is performed to minimize the error between the prediction instance segmentation result and the sample instance segmentation result, and in the present embodiment, no limitation is made on the type of error between the prediction instance segmentation result and the sample instance segmentation result selected when the segmentation model is trained. For example, the error between the selected prediction instance segmentation result and the sample instance segmentation result when the segmentation model is trained includes, but is not limited to, the following errors: cross Entropy Loss (Cross-Entrol Loss), 0-1 Loss (Zero-One Loss).

In summary, the method provided by this embodiment implements instance segmentation of a three-dimensional medical image by combining the semantic segmentation branch and the target detection branch; namely, the three-dimensional structure information and the classification result of each three-dimensional interest target can be analyzed. Selecting a three-dimensional medical image for semantic segmentation, and increasing the dimensionality for describing the three-dimensional interest target; the accurate three-dimensional structure information of the three-dimensional interest target is obtained, and the problem that an example segmentation result is discontinuous in a three-dimensional medical image is avoided. And by combining the target detection branches, the classification result of the three-dimensional interest target can be obtained, and the example segmentation of the three-dimensional medical image is realized. The three-dimensional medical image is selected for segmentation, so that the example segmentation effect is improved, and the capability of analyzing biological structure information is improved.

FIG. 4 is a flow chart illustrating a training method of a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 410, step 430, step 440, and step 450 refer to the steps in the embodiment shown in fig. 3, which are not described again in this embodiment. In this embodiment, the semantic segmentation branch includes: an encoder and a decoder.

Step 422: inputting the feature representation of the sample three-dimensional medical image into an encoder, and encoding to obtain a hidden layer representation of the sample three-dimensional medical image;

the encoder is used for encoding the characteristic representation of the sample three-dimensional medical image, and the encoding obtains the hidden layer representation of the sample three-dimensional medical image.

Step 424: and inputting the hidden layer representation into a decoder, and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image.

The decoder is used for decoding the hidden layer representation of the sample three-dimensional medical image to obtain a semantic segmentation result of the sample three-dimensional medical image.

Optionally, the activation function may be used to process the semantic segmentation result of the sample three-dimensional medical image, so as to increase the nonlinearity of the semantic segmentation result. Illustratively, the activation function is a modified Linear Unit (ReLU).

It should be noted that, in the present embodiment, no limitation is imposed on the timing relationship between the semantic segmentation step and the target detection step, the semantic segmentation step includes step 422 and step 424, and the target detection step includes step 430. Illustratively, any of the semantic segmentation steps may be performed before, after, or simultaneously with the target detection step.

Illustratively, as shown in fig. 6, the

steps

422 and 424 include the following sub-steps:

in the embodiment shown in FIG. 5, the encoder is a U-Net encoder and the decoder is a U-Net decoder. The U-Net solves the semantic segmentation task of segmenting on the cell level, and obtains the category information of pixel points in a U-shaped convolution network structure.

Step 422 a: inputting the feature representation of the sample three-dimensional medical image into a U-Net encoder, and encoding to obtain a hidden layer representation of the sample three-dimensional medical image;

and the encoder of the U-Net is used for encoding the characteristic representation of the sample three-dimensional medical image, and the encoding obtains the hidden layer representation of the sample three-dimensional medical image.

Step 424 a: and inputting the hidden layer representation into a U-Net decoder, and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image.

And the decoder of the U-Net is used for decoding the hidden layer representation of the sample three-dimensional medical image to obtain a semantic segmentation result of the sample three-dimensional medical image.

It should be noted that, in the present embodiment, no limitation is imposed on the timing relationship between the semantic segmentation step and the target detection step, the semantic segmentation step includes

steps

422a and 424a, and the target detection step includes step 430. Illustratively, any of the semantic segmentation steps may be performed before, after, or simultaneously with the target detection step.

In summary, in the method provided by this embodiment, the encoder and the decoder structure are used to construct the semantic segmentation branch, so that the feature information included in the feature representation of the sample three-dimensional medical image is fully utilized, the accuracy of the semantic segmentation result is ensured, and a good foundation is laid for improving the example segmentation effect of the three-dimensional medical image.

FIG. 6 is a flow chart illustrating a training method of a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 410, step 420, step 440, and step 450 refer to the steps in the embodiment shown in fig. 3, which are not described again in this embodiment.

Step 432: and inputting the feature representation of the sample three-dimensional medical image into the RPN, and outputting the position information and classification result of the detection frame.

The RPN is a machine learning model for predicting the position information and classification result of the detection frame from the feature representation of the sample three-dimensional medical image. The RPN solves the task of extracting the detection frame, and the detection frame is extracted according to the feature representation of the sample three-dimensional medical image.

In this embodiment, no limitation is imposed on the method of predicting the classification result of the detection frame by the RPN. Illustratively, the RPN calculates Bounding Box (BBox) scores of regions of Interest (regions of Interest, RoI) determined by the detection Box, and determines RoI with a true BBox score greater than 0.3 for overlap (IoU) as a positive sample; and determining the RoI with the real BBox score of the overlap degree (IoU) less than or equal to 0.3 as a negative sample. In the present embodiment, no limitation is made on the method of predicting the position information of the detection frame by the RPN. Illustratively, the RPN obtains the location information of the RoI determined by the detection box according to the BBox score by using a Non-Maximum Suppression (NMS) algorithm.

Optionally, the position information of the detection frame and the classification result may be processed by using an activation function, so as to increase the nonlinearity of the position information of the detection frame and the classification result. Illustratively, the activation function is a normalized method (Softmax).

It should be noted that, in the present embodiment, no limitation is made on the timing relationship between step 420 and step 432. Illustratively, step 420 may be performed before, after, or simultaneously with step 432.

In summary, the method provided by this embodiment makes full use of the feature information included in the feature representation of the sample three-dimensional medical image, ensures the accuracy of the position information and classification result of the detection frame output by the RPN, and lays a good foundation for improving the example segmentation effect of the three-dimensional medical image.

FIG. 7 is a flow chart illustrating a training method of a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 410, step 420, step 432, step 440, and step 450 refer to the steps in the embodiment shown in fig. 3, and are not described again in this embodiment.

Step 434: and correcting the position information through the alignment of the interest region to obtain the corrected position information.

The alignment of interest areas (RoI Align) is used to correct the position information of the detection frame, illustratively, the boundary pixel points are determined according to the position information of the detection frame, and the feature information of 4 real pixel points closest to the boundary pixel points is used to perform bilinear interpolation, so as to ensure the accuracy of the position information of the detection frame obtained according to the boundary pixel point information.

It should be noted that, in the present embodiment, no limitation is imposed on the timing relationship between the semantic segmentation step and the target detection step, the semantic segmentation step includes step 420, and the target detection step includes step 432 and step 434. Illustratively, any of the target detection steps may be performed before, after, or simultaneously with the semantic segmentation step.

In summary, the method provided by this embodiment corrects the position information of the detection frame by using the region of interest alignment, avoids the offset of the detection frame on the sample three-dimensional medical image, ensures the accuracy of determining the predicted interest target, and lays a good foundation for improving the example segmentation effect of the three-dimensional medical image.

FIG. 8 is a flow chart illustrating a method for training a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 410, step 420, step 430, and step 450 refer to the steps in the embodiment shown in fig. 3, which are not described again in this embodiment.

In this embodiment, the semantic segmentation result includes three-dimensional position information of at least one predicted object of interest.

Step 442: determining a predicted interest target located in the detection frame based on the position information of the detection frame and the three-dimensional position information of the at least one predicted interest target;

the predicted interest target is a three-dimensional interest target obtained by the prediction of the segmentation model. And matching the position information of the detection frame with the three-dimensional position information of at least one predicted interest target, and determining the predicted interest target positioned in the detection frame.

Step 444: determining the classification result of the detection frame as the classification result of the predicted interest target;

the classification result of the predicted interest target is the classification result of the detection box output by the target detection branch.

Step 446: and outputting a prediction example segmentation result of the sample three-dimensional medical image.

The prediction example segmentation result comprises at least one prediction example, and each prediction example corresponds to the three-dimensional position information and the classification result of one prediction interest target.

In summary, the method provided by this embodiment determines the predicted interest target and the classification result of the predicted interest target, implements example segmentation of the sample three-dimensional medical image, and can analyze the three-dimensional structure information and the classification result of each predicted interest target. The problem that the example segmentation result is discontinuous in the three-dimensional medical image is avoided, and the example segmentation effect of the three-dimensional medical image is improved.

FIG. 9 is a flow chart illustrating a training method of a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 410, step 420, step 430, step 440, and step 450 refer to the steps in the embodiment shown in fig. 3, and are not described again in this embodiment.

Step 415: and inputting the sample three-dimensional medical image into a feature extraction network, and outputting the feature representation of the sample three-dimensional medical image.

The feature extraction network is a machine learning model for predicting a feature representation of the sample three-dimensional medical image from the sample three-dimensional medical image. In the present embodiment, no limitation is imposed on the implementation of the feature extraction network. Illustratively, the implementation of the feature extraction network includes, but is not limited to, at least one of the following models: feature Pyramid Networks (FPN), Spatial Pyramid Pooling Networks (SPP), Network In Network (NIN) structures, Convolutional Neural Networks (CNN) structures. Those skilled in the art will appreciate that the above models can be used independently to construct a feature extraction network; the models can be combined for use to construct a feature extraction network.

It should be noted that, in this embodiment, different feature extraction networks may be used to predict the feature representation of the sample three-dimensional medical image. Illustratively, a first feature extraction network is used for predicting a first feature representation input semantic segmentation branch of a sample three-dimensional medical image; and predicting a second feature of the sample three-dimensional medical image to represent an input target detection branch by using a second feature extraction network.

Illustratively, as shown in FIG. 10, step 415 includes the sub-steps of:

step 415 a: inputting the sample three-dimensional medical image into the FPN, outputting a characteristic pyramid of the sample three-dimensional medical image, and representing the characteristic pyramid as the characteristic of the sample three-dimensional medical image.

The FPN is a machine learning model for predicting a feature pyramid of the sample three-dimensional medical image from the sample three-dimensional medical image. The feature pyramid serves as a feature representation of the sample three-dimensional medical image.

Illustratively, a first feature representation of the sample three-dimensional medical image is input to a semantic segmentation branch of the segmentation model, the first feature representation being a first-level feature representation of the FPN extracted sample three-dimensional medical image.

In summary, the method provided by the embodiment provides a feature extraction network, provides a method for obtaining feature representation of the sample three-dimensional medical image, and lays a foundation for implementing example segmentation of the sample three-dimensional medical image by a segmentation model.

FIG. 11 is a flow chart illustrating a method for training a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

Step 411: and preprocessing the sample three-dimensional medical image to obtain a preprocessed sample three-dimensional medical image.

Next, a detailed description is given of the preprocessing method, which is not limited in this embodiment, and the preprocessing method includes, but is not limited to, at least one of the following methods:

image filtering processing;

the filter is used to filter the interference data in the original three-dimensional electron microscope image, and in this embodiment, no limitation is imposed on the type of the filter used. Illustratively, the filters used include, but are not limited to, at least one of the following: wiener Filter (Wiener Filter), Guided Filter (Guided Filter), Sobel Filter (Sobel Filter).

Image cropping processing;

the original three-dimensional electron microscope image is cropped into a three-dimensional electron microscope image with a proper size, and in this embodiment, no limitation is imposed on the size of the cropped original three-dimensional electron microscope image. For example, the original three-dimensional electron microscope image is cut into a three-dimensional electron microscope image with the length, width and height of 128 pixels, 128 pixels and 32 pixels in sequence, and/or the original three-dimensional electron microscope image is cut into a three-dimensional electron microscope image with the length, width and height of 100 pixels, 100 pixels and 100 pixels in sequence.

Image augmentation processing.

The attribute of the original three-dimensional electron microscope image is changed, and the image characteristics of the original three-dimensional electron microscope image are highlighted. Illustratively, the method of image augmentation processing includes, but is not limited to, at least one of the following methods: noise is added, image brightness is improved, and image contrast is improved.

Illustratively, as shown in fig. 12, step 411 includes the following sub-steps:

step 411 a: and carrying out image filtering processing on the sample three-dimensional medical image by using a wiener filter to obtain the sample three-dimensional medical image after the image filtering processing.

Illustratively, the image filtering process is performed using a wiener filter, and a Transfer Function of the wiener filter is a Contrast Transfer Function (CTF).

Illustratively, as shown in fig. 13, step 411 includes the following sub-steps:

step 411 b: improving the brightness of the sample three-dimensional medical image to obtain the sample three-dimensional medical image after image augmentation processing;

such as: the brightness of the sample three-dimensional medical image is improved by 40%.

Step 411 c: improving the contrast of the sample three-dimensional medical image to obtain the sample three-dimensional medical image after image augmentation processing;

such as: the contrast of the sample three-dimensional medical image is improved by 30%.

Step 411 d: and adding noise to the sample three-dimensional medical image to obtain the sample three-dimensional medical image after image augmentation processing.

Illustratively, the type of noise added includes, but is not limited to, at least one of: gaussian noise, impulse noise, poisson noise.

It should be noted that, step 411a, step 411b, and step 411c may all be executed, or only a part of the steps may be executed, which is not limited in this embodiment; in the case where any two or all of step 411a, step 411b, and step 411c are performed, the present embodiment does not set any limit to the timing relationship between step 411a, step 411b, and step 411 c. Those skilled in the art can understand that step 411a, step 411b, and step 411c can be implemented independently, or can be combined freely to combine a new embodiment to implement the segmentation model training method of the present application.

In summary, the method provided by this embodiment, in combination with the characteristic that the signal-to-noise ratio of the sample three-dimensional medical image is low, preprocesses the sample three-dimensional medical image, reduces the influence of the interference data in the sample three-dimensional medical image on the example segmentation of the three-dimensional medical image, and lays a good foundation for improving the example segmentation effect of the three-dimensional medical image.

Those skilled in the art can understand that the above embodiments can be implemented independently, or the above embodiments can be freely combined to combine a new embodiment to implement the segmentation model training method of the present application.

Next, a detailed description will be given taking an example in which a three-dimensional medical image is acquired based on a frozen electron microscope tomographic imaging.

FIG. 14 is a flow chart illustrating a method for training a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 510: acquiring a three-dimensional cryoelectron microscope image of a sample and a sample example segmentation result;

cryoelectron microscopy tomography can resolve the three-dimensional structure of intracellular organelles and protein complexes in a near-natural state in a cellular environment. The sample example segmentation result comprises labels of n three-dimensional interest targets in the sample three-dimensional cryoelectron microscope image, wherein n is an integer larger than 1.

Step 520: preprocessing a sample three-dimensional cryoelectron microscope image to obtain a preprocessed sample three-dimensional cryoelectron microscope image;

the method has the advantages that the characteristic that the signal-to-noise ratio of the sample three-dimensional cryoelectron microscope image is low is combined, the sample three-dimensional cryoelectron microscope image is preprocessed, and the influence of interference data in the sample three-dimensional cryoelectron microscope image on example segmentation of the three-dimensional cryoelectron microscope image is reduced.

Step 530: inputting the sample three-dimensional cryoelectron microscope image into a feature extraction network, and outputting feature representation of the sample three-dimensional cryoelectron microscope image;

a feature extraction network is provided, and a method is provided for obtaining feature representation of a sample three-dimensional cryoelectron microscope image. Illustratively, the implementation of the feature extraction network is FPN.

Step 540: inputting the feature representation of the sample three-dimensional cryoelectron microscope image into a semantic segmentation branch of a segmentation model, and outputting a semantic segmentation result of the sample cryoelectron microscope image;

and the semantic segmentation result of the sample three-dimensional cryoelectron microscope image is included in the three-dimensional cryoelectron microscope image, and the category information of each pixel point is included. Illustratively, the category information of the pixel points includes, but is not limited to, at least one of the following cases: the pixel points are three-dimensional interest targets and the pixel points are three-dimensional background images.

Step 550: inputting the characteristic representation of the sample three-dimensional cryoelectron microscope image into a target detection branch of the segmentation model, and outputting the position information and classification result of the detection frame;

the position information of the detection frame is used for describing the position of the detection frame in the three-dimensional cryoelectron microscope image, namely the position information of one or more pixel points included in the detection frame in the three-dimensional cryoelectron microscope image. Illustratively, the shape of the detection frame is fixed; such as: the detection frame is a cube with the side length of 150 pixel points and/or the detection frame is a cuboid with the length-width-height ratio of 3:3: 4. The classification result of the detection frame is used to describe the category information of the detection frame, and exemplary categories of the detection frame include, but are not limited to, at least one of the following categories: mitochondria, vesicles, tubulin, ribosomes.

Step 560: obtaining a prediction example segmentation result of the sample three-dimensional cryoelectron microscope image according to the semantic segmentation result, the position information of the detection frame and the classification result;

the prediction example segmentation result comprises labels of m three-dimensional interest objects in the sample three-dimensional cryoelectron microscope image predicted by the segmentation model of the three-dimensional cryoelectron microscope image, wherein m is an integer larger than 1. The magnitude relationship between m and n is not subject to any limitation, and m may be greater than, less than, or equal to n.

The m three-dimensional interest objects are predicted by a segmentation model. The label comprises position information of the three-dimensional interest target obtained through prediction of the segmentation model and a classification result of the three-dimensional interest target obtained through prediction of the segmentation model. And the prediction example segmentation result of the sample three-dimensional cryoelectron microscope image is obtained based on the semantic segmentation result, the position information of the detection frame and the classification result.

Step 570: and training the segmentation model based on the error between the prediction example segmentation result and the sample example segmentation result to obtain the trained segmentation model.

The purpose of the segmentation model is to minimize the error between the prediction instance segmentation result and the sample instance segmentation result.

In summary, the method provided by the embodiment realizes instance segmentation of the three-dimensional cryoelectron microscope image by combining the semantic segmentation branch and the target detection branch; namely, the three-dimensional structure information and the classification result of each three-dimensional interest target can be analyzed. Selecting a three-dimensional cryoelectron microscope image for semantic segmentation, and increasing the dimensionality for describing a three-dimensional interest target; the three-dimensional structure information of the three-dimensional interest target is obtained, and the problem that an example segmentation result is discontinuous in a three-dimensional cryoelectron microscope image is avoided. And by combining with the target detection branch, a classification result of the three-dimensional interest target can be obtained, and the example segmentation of the three-dimensional cryoelectron microscope image is realized. The three-dimensional cryoelectron microscope image is selected for segmentation, so that the example segmentation effect is improved, and the capability of analyzing biological structure information is improved.

FIG. 15 is a flow chart illustrating a method for training a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 460: acquiring an input three-dimensional medical image;

illustratively, the three-dimensional medical image is acquired based on cryo-electron microscopy tomography.

Step 470: and inputting the feature representation of the input three-dimensional medical image into the trained segmentation model to obtain a prediction case segmentation result of the input three-dimensional medical image.

The trained segmentation model comprises: the trained semantic segmentation branch and the trained target detection branch.

In summary, in the method provided by this embodiment, the trained segmentation model is used to obtain the prediction example segmentation result of the input three-dimensional medical image, so that the problem that the example segmentation result is discontinuous in the three-dimensional medical image is avoided, and the example segmentation effect of the three-dimensional medical image is improved.

FIG. 16 is a flow chart illustrating a method of using a segmentation model according to the present application. The method may be performed by a computer device. The method comprises the following steps:

step 610: acquiring an input three-dimensional medical image;

the three-dimensional medical image is a three-dimensional image for describing a biological structure, the three-dimensional medical image including at least one three-dimensional object of interest. A three-dimensional object of interest is a structure of an organism having a three-dimensional structure. Illustratively, the three-dimensional object of interest includes, but is not limited to, at least one of the following: cells, intracellular organelles, protein complexes. Illustratively, the three-dimensional medical image is a three-dimensional image obtained based on tomography, and the method for acquiring the three-dimensional medical image includes, but is not limited to, at least one of the following methods: cryoelectron Microscopy Tomography (Cryo-Electron Tomography, Cryo-ET), X-ray crystal Diffraction analysis (X-ray Diffraction Methods), Nuclear Magnetic Resonance (NMR), Scanning Tunneling Microscopy (Scanning Tunneling Microscopy).

Optionally, preprocessing the input three-dimensional medical image to obtain a preprocessed input three-dimensional medical image; wherein, the pretreatment method comprises at least one of the following steps: carrying out image filtering processing; image cropping processing; and (5) image augmentation processing.

Illustratively, preprocessing the input three-dimensional medical image to obtain a preprocessed input three-dimensional medical image includes: and carrying out image filtering processing on the input three-dimensional medical image by using a wiener filter to obtain the input three-dimensional medical image after the image filtering processing.

Illustratively, the input three-dimensional medical image is preprocessed to obtain a preprocessed input three-dimensional medical image, which includes at least one of: improving the brightness of the input three-dimensional medical image to obtain the input three-dimensional medical image after image augmentation processing;

improving the contrast of the input three-dimensional medical image to obtain the input three-dimensional medical image after image augmentation processing;

and adding noise to the input three-dimensional medical image to obtain the input three-dimensional medical image after image augmentation processing.

Optionally, the segmentation model further includes: a feature extraction network; the input three-dimensional medical image is input to a feature extraction network, and feature representation of the input three-dimensional medical image is output.

Further optionally, the feature extraction network is FPN; inputting the input three-dimensional medical image into the FPN, outputting a characteristic pyramid of the input three-dimensional medical image, and taking the characteristic pyramid as the characteristic representation of the input three-dimensional medical image.

Step 620: inputting the feature representation of the input three-dimensional medical image into a semantic segmentation branch of a segmentation model, and outputting a semantic segmentation result of the input three-dimensional medical image;

the semantic segmentation result of the input three-dimensional medical image is included in the three-dimensional medical image, and the category information of each pixel point is included in the three-dimensional medical image. Illustratively, the category information of the pixel points includes, but is not limited to, at least one of the following cases: the pixel points are three-dimensional interest targets and the pixel points are three-dimensional background images.

The semantic segmentation branch is a machine learning model for predicting a semantic segmentation result of an input three-dimensional medical image from a feature representation of the input three-dimensional medical image. In the present embodiment, no limitation is made on the implementation manner of semantic division branching. Exemplary, implementations of semantic segmentation branching include, but are not limited to, at least one of the following models: full Convolution Networks (FCN), U-type convolutional neural Networks (U-Net), and segmented convolutional neural Networks (SegNet). Those skilled in the art will appreciate that the above model can be used independently to construct semantic segmentation branches; the models can be combined for use to construct semantic segmentation branches.

Optionally, the semantic segmentation branch includes: an encoder and a decoder; inputting the feature representation of the input three-dimensional medical image into an encoder, and encoding to obtain hidden layer representation of the input three-dimensional medical image; and inputting the hidden layer representation into a decoder, and decoding to obtain a semantic segmentation result of the input three-dimensional medical image.

Further optionally, the encoder is a U-Net encoder and the decoder is a U-Net decoder; inputting the feature representation of the input three-dimensional medical image into a U-Net encoder, and encoding to obtain the hidden layer representation of the input three-dimensional medical image; and inputting the hidden layer representation into a U-Net decoder, and decoding to obtain a semantic segmentation result of the input three-dimensional medical image.

Step 630: inputting the feature representation of the input three-dimensional medical image into a target detection branch of the segmentation model, and outputting the position information and classification result of the detection frame;

The target detection branch is a machine learning model for predicting the position information and classification result of the detection frame from the feature representation of the input three-dimensional medical image. In the present embodiment, no limitation is imposed on the implementation manner of the target detection branch. Exemplary, target detection branch implementations include, but are not limited to, at least one of the following models: region Proposed Networks (RPN), Region with CNN Features (R-CNN), and Region-based full Convolutional Networks (R-FCN). Those skilled in the art will appreciate that the above models can be used independently to construct target detection branches; the above models may be used in combination with each other to construct a target detection branch.

It should be noted that, in the present embodiment, no limitation is made on the timing relationship between step 620 and step 630. Illustratively, step 620 may be performed before, after, or simultaneously with step 630.

Optionally, the target detection branch is an RPN, the feature representation of the input three-dimensional medical image is input to the RPN, and the position information and the classification result of the detection frame are output.

Optionally, the position information is corrected by aligning the interest region with the ROI Align, so as to obtain corrected position information.

Step 640: and obtaining a prediction instance segmentation result of the input three-dimensional medical image according to the semantic segmentation result, the position information of the detection frame and the classification result.

The prediction example segmentation result comprises labels of m three-dimensional interest objects in the input three-dimensional medical image predicted by the segmentation model, wherein m is an integer larger than 1.

The m three-dimensional interest objects are predicted by a segmentation model. The label comprises position information of the three-dimensional interest target obtained through prediction of the segmentation model and a classification result of the three-dimensional interest target obtained through prediction of the segmentation model. The position information of the three-dimensional interest target is used for describing the position of the three-dimensional interest target in the three-dimensional medical image, namely the position information of one or more pixel points included in the three-dimensional interest target in the three-dimensional medical image. The classification result of the three-dimensional interest object is used to describe category information of the three-dimensional interest object, and exemplary categories of the three-dimensional interest object include, but are not limited to, at least one of the following categories: mitochondria, vesicles, tubulin, ribosomes. The prediction case segmentation result of the input three-dimensional medical image is obtained based on the semantic segmentation result, the position information of the detection box and the classification result.

Illustratively, the semantic segmentation result includes three-dimensional position information of at least one predicted interest target;

determining a predicted interest target located in the detection frame based on the position information of the detection frame and the three-dimensional position information of the at least one predicted interest target;

determining the classification result of the detection frame as the classification result of the predicted interest target;

and outputting a prediction example segmentation result of the input three-dimensional medical image, wherein the prediction example segmentation result comprises at least one prediction example, and each prediction example corresponds to the three-dimensional position information and the classification result of one prediction interest target.

In summary, the method provided by this embodiment obtains the predicted example segmentation result of the input three-dimensional medical image by using segmentation, avoids the problem that the example segmentation result is discontinuous in the three-dimensional medical image, and improves the example segmentation effect of the three-dimensional medical image. The segmentation model is verified through the EMD-7141 data set and the EMD-7151 data set, and the experimental result is shown in the table I, so that the accuracy and the segmentation performance of the segmentation model are good. The EMD-7141 dataset included cryoelectron microscopy tomography of Glutamate Dehydrogenase (Glutamate Dehydrogenase) on porous carbon grids; the EMD-7151 dataset included cryo-electron microscopy tomographic imaging of T20 proteasomes (T20S proteosomes) on a porous carbon grid.

Watch 1

Data set	Average accuracy	Accuracy of measurement	Recall rate	F1 score	Coefficient of dice
						EMD-7151	41.36	0.5654	0.6102	0.5869	0.5290
EMD-7141	46.68	0.7035	0.5543	0.6201	0.4256

Precision (Precision) is used to represent the number of predicted objects of interest for which the classification result is correct divided by the total number of predicted objects of interest. The Average Precision (Average Precision) is the Average of the Precision values in the Precision recall curve (P-R curve). The Recall (Recall) is used to represent the number of predicted objects of interest for which the classification result is correct divided by the total number of real three-dimensional objects of interest. The F1 score (F1-score) is used to represent the harmonic mean of recall and accuracy. Dice Coefficient (Dice Coefficient) is used to represent the segmentation performance of the segmentation model.

Fig. 17 shows image information of a two-dimensional cross section of original data of an EMD-7151 data set, fig. 18 shows a two-dimensional cross section visualization effect diagram of the EMD-7151 data set subjected to example segmentation by a segmentation model, and fig. 19 shows a three-dimensional image visualization effect diagram of the EMD-7151 data set subjected to example segmentation by the segmentation model, wherein the EMD-7151 data set comprises a first prediction interest object 1, a second prediction interest object 2, a third prediction interest object 3 and a fourth prediction interest object 4.

Fig. 20 shows image information of a two-dimensional cross section of original data of an EMD-7141 data set, fig. 21 shows a two-dimensional cross section visualization effect diagram of the EMD-7141 data set subjected to example segmentation by a segmentation model, fig. 22 shows a three-dimensional image visualization effect diagram of the EMD-7141 data set subjected to example segmentation by the segmentation model, and the EMD-7141 data set includes a first prediction interest object 1, a second prediction interest object 2, a third prediction interest object 3, and a fourth prediction interest object 4.

Fig. 23 is a block diagram illustrating a training apparatus for a segmentation model according to an exemplary embodiment of the present application. The device includes:

a first obtaining module 2302, configured to obtain a sample three-dimensional medical image and a sample instance segmentation result, where the sample instance segmentation result includes tags of n three-dimensional interest targets in the sample three-dimensional medical image, and n is an integer greater than 1;

a first prediction module 2304, configured to input the feature representation of the sample three-dimensional medical image into a semantic segmentation branch of the segmentation model, and output a semantic segmentation result of the sample three-dimensional medical image; the semantic segmentation result is used for indicating three-dimensional position information of the three-dimensional interest target;

a second prediction module 2306, configured to input the feature representation of the sample three-dimensional medical image into a target detection branch of the segmentation model, and output position information and a classification result of a detection frame; the detection box is used for determining the three-dimensional interest target in the sample three-dimensional medical image;

a determining module 2308, configured to obtain a prediction instance segmentation result of the sample three-dimensional medical image according to the semantic segmentation result, the position information of the detection box, and the classification result;

a training module 2310, configured to train the segmentation model based on an error between the prediction instance segmentation result and the sample instance segmentation result, to obtain a trained segmentation model.

the first prediction module 2304, comprising:

an encoding unit 2304a, configured to input the feature representation of the sample three-dimensional medical image to the encoder, and encode the feature representation to obtain a hidden layer representation of the sample three-dimensional medical image;

a decoding unit 2304b, configured to input the hidden layer representation to the decoder, and decode to obtain a semantic segmentation result of the sample three-dimensional medical image.

the encoding unit 2304a is further configured to: inputting the feature representation of the sample three-dimensional medical image into the encoder of the U-Net, and encoding to obtain a hidden layer representation of the sample three-dimensional medical image;

the decoding unit 2304b is further configured to: and inputting the hidden layer representation into a decoder of the U-Net, and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image.

the second prediction module 2306 is further configured to: and inputting the feature representation of the sample three-dimensional medical image into the RPN, and outputting the position information and the classification result of the detection frame.

a correcting module 2312, configured to correct the position information by aligning the region of interest with the ROI Align, so as to obtain corrected position information.

In an optional design of the present application, the semantic segmentation result includes three-dimensional position information of at least one predicted interest target; the determining module 2308 is further configured to: determining the predicted interest target located in the detection frame based on the position information of the detection frame and the three-dimensional position information of the at least one predicted interest target;

the device further comprises: a feature extraction module 2314, configured to input the sample three-dimensional medical image to the feature extraction network, and output a feature representation of the sample three-dimensional medical image.

the feature extraction module 2314 is further configured to:

inputting the sample three-dimensional medical image into the FPN, outputting a characteristic pyramid of the sample three-dimensional medical image, and taking the characteristic pyramid as a characteristic representation of the sample three-dimensional medical image.

a preprocessing module 2316, configured to preprocess the sample three-dimensional medical image to obtain a preprocessed sample three-dimensional medical image; wherein the pretreatment method comprises at least one of the following steps: carrying out image filtering processing; image cropping processing; and (5) image augmentation processing.

In an alternative design of the present application, the preprocessing module 2316 is further configured to: and carrying out image filtering processing on the sample three-dimensional medical image by using a wiener filter to obtain the sample three-dimensional medical image after the image filtering processing.

In an alternative design of the present application, the preprocessing module 2316 is further configured to at least one of:

a second obtaining module 2318 for obtaining an input three-dimensional medical image;

a third prediction module 2320, configured to input the feature representation of the input three-dimensional medical image into the trained segmentation model, so as to obtain a prediction instance segmentation result of the input three-dimensional medical image.

Fig. 24 is a block diagram illustrating a training apparatus for a segmentation model according to an exemplary embodiment of the present application. The device includes:

an obtaining module 2402, configured to obtain an input three-dimensional medical image;

a first prediction module 2404, configured to input the feature representation of the input three-dimensional medical image into a semantic segmentation branch of the segmentation model, and output a semantic segmentation result of the input three-dimensional medical image; the semantic segmentation result is used for indicating three-dimensional position information of the three-dimensional interest target;

a second prediction module 2406, configured to input the feature representation of the input three-dimensional medical image to a target detection branch of the segmentation model, and output position information and a classification result of a detection frame; the detection box is used for determining the three-dimensional interest target in the input three-dimensional medical image;

a determining module 2408, configured to obtain a prediction instance segmentation result of the input three-dimensional medical image according to the semantic segmentation result, the position information of the detection box, and the classification result.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the above functional modules is illustrated, and in practical applications, the above functions may be distributed by different functional modules according to actual needs, that is, the content structure of the device is divided into different functional modules, so as to complete all or part of the functions described above.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present application further provides a computer device, where the computer device includes: the device comprises a processor and a memory, wherein at least one instruction, at least one program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to realize the training method and/or the using method of the segmentation model provided by the method embodiments.

Optionally, the computer device is a server. Illustratively, fig. 25 is a block diagram of a server according to an exemplary embodiment of the present application. In general, the server 2500 includes: a processor 2501 and a memory 2502. The processor 2501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 2501 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 2501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 2501 may be integrated with a Graphics Processing Unit (GPU) which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 2501 may further include an Artificial Intelligence (AI) processor for processing computational operations related to machine learning.

Memory 2502 may include one or more computer-readable storage media, which may be non-transitory. Memory 2502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 2502 is used to store at least one instruction for execution by the processor 2501 to implement the tracking method for video data provided by the method embodiments herein.

In some embodiments, the server 2500 may further optionally include: an input interface 2503 and an output interface 2504. The processor 2501, the memory 2502, the input interface 2503 and the output interface 2504 may be connected by a bus or a signal line. Various peripheral devices may be connected to the input interface 2503 and the output interface 2504 via a bus, a signal line, or a circuit board. The Input interface 2503 and the Output interface 2504 can be used to connect at least one peripheral device related to Input/Output (I/O) to the processor 2501 and the memory 2502. In some embodiments, the processor 2501, memory 2502, and input and

output interfaces

2503, 2504 are integrated on the same chip or circuit board; in some other embodiments, the processor 2501, the memory 2502, and any one or both of the input interface 2503 and the output interface 2504 may be implemented on a single chip or circuit board, which is not limited in this application.

Those skilled in the art will appreciate that the architecture shown in FIG. 25 does not constitute a limitation on the server 2500, and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a chip is further provided, which includes programmable logic circuits and/or program instructions, and when the chip is run on a computer device, is used for implementing the training method and/or the using method of the segmentation model according to the above aspects.

In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the training method and/or the using method of the segmentation model provided by the method embodiments.

In an exemplary embodiment, a computer-readable storage medium is further provided, in which at least one program code is stored, which when loaded and executed by a processor of a computer device, implements the method for training and/or using a segmentation model provided by the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for training a segmentation model, the method comprising:

2. The method of claim 1, wherein the semantic segmentation branch comprises: an encoder and a decoder;

the inputting the feature representation of the sample three-dimensional medical image into a semantic segmentation branch of the segmentation model and outputting the semantic segmentation result of the sample three-dimensional medical image comprises the following steps:

inputting the feature representation of the sample three-dimensional medical image into the encoder, and encoding to obtain a hidden layer representation of the sample three-dimensional medical image;

and inputting the hidden layer representation into the decoder, and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image.

3. The method of claim 2, wherein the encoder is a U-convolutional neural network U-Net encoder and the decoder is a U-Net decoder;

the inputting the feature representation of the sample three-dimensional medical image into the encoder, and the encoding resulting in the hidden layer representation of the sample three-dimensional medical image, comprises:

inputting the feature representation of the sample three-dimensional medical image into the encoder of the U-Net, and encoding to obtain a hidden layer representation of the sample three-dimensional medical image;

the inputting the hidden layer representation into the decoder, and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image includes:

and inputting the hidden layer representation into a decoder of the U-Net, and decoding to obtain a semantic segmentation result of the sample three-dimensional medical image.

4. The method according to claim 1, characterized in that the target detection branch is a regional proposal network, RPN;

the inputting the feature representation of the sample three-dimensional medical image into the target detection branch of the segmentation model, and outputting the position information and classification result of the detection frame comprise:

and inputting the feature representation of the sample three-dimensional medical image into the RPN, and outputting the position information and the classification result of the detection frame.

5. The method of claim 4, further comprising:

and correcting the position information by aligning the interest region with ROIAlign to obtain the corrected position information.

6. The method of claim 1, wherein the semantic segmentation result comprises three-dimensional position information of at least one predicted object of interest;

the obtaining a prediction example segmentation result of the sample three-dimensional medical image according to the semantic segmentation result, the position information of the detection frame and the classification result includes:

determining the predicted interest target located in the detection frame based on the position information of the detection frame and the three-dimensional position information of the at least one predicted interest target;

7. The method of any of claims 1 to 6, wherein the segmentation model further comprises: a feature extraction network;

the method further comprises the following steps:

inputting the sample three-dimensional medical image into the feature extraction network, and outputting a feature representation of the sample three-dimensional medical image.

8. The method of claim 7, wherein the feature extraction network is a Feature Pyramid Network (FPN);

the inputting the sample three-dimensional medical image into the feature extraction network and outputting a feature representation of the sample three-dimensional medical image comprises:

9. The method of any of claims 1 to 6, further comprising:

preprocessing the sample three-dimensional medical image to obtain a preprocessed sample three-dimensional medical image;

wherein the pretreatment method comprises at least one of the following steps:

carrying out image filtering processing;

image cropping processing;

and (5) image augmentation processing.

10. The method according to claim 9, wherein the preprocessing the sample three-dimensional medical image to obtain a preprocessed sample three-dimensional medical image comprises:

and carrying out image filtering processing on the sample three-dimensional medical image by using a wiener filter to obtain the sample three-dimensional medical image after the image filtering processing.

11. The method of claim 9, wherein the preprocessing the sample three-dimensional medical image to obtain a preprocessed sample three-dimensional medical image comprises at least one of:

12. The method of any of claims 1 to 6, further comprising:

acquiring an input three-dimensional medical image;

and inputting the feature representation of the input three-dimensional medical image into the trained segmentation model to obtain a prediction instance segmentation result of the input three-dimensional medical image.

13. A method of using a segmentation model, the method comprising:

acquiring an input three-dimensional medical image;

14. An apparatus for training a segmentation model, the apparatus comprising:

15. An apparatus for using a segmentation model, the apparatus comprising:

an acquisition module for acquiring an input three-dimensional medical image;

16. A computer device, characterized in that the computer device comprises: a processor and a memory, wherein at least one program is stored in the memory; the processor is configured to execute the at least one program in the memory to implement the training method of the segmentation model according to any one of claims 1 to 12 or the using method of the segmentation model according to claim 13.

17. A computer-readable storage medium, wherein the computer-readable storage medium stores executable instructions, which are loaded and executed by a processor, to implement a method for training a segmentation model according to any one of claims 1 to 12 or a method for using a segmentation model according to claim 13.

18. A computer program product or computer program, characterized in that it comprises computer instructions stored in a computer-readable storage medium, which are read by a processor and executed to implement the method for training a segmentation model according to any one of claims 1 to 12 or the method for using a segmentation model according to claim 13.