CN116363152A

CN116363152A - Image segmentation method, method and device for training image segmentation model

Info

Publication number: CN116363152A
Application number: CN202310268255.5A
Authority: CN
Inventors: 张灵; 王法凯; 吕乐
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-03-15
Filing date: 2023-03-15
Publication date: 2023-06-30
Anticipated expiration: 2043-03-15
Also published as: CN116363152B

Abstract

The embodiment of the application discloses an image segmentation method, a method and a device for training an image segmentation model. The main technical scheme comprises the following steps: acquiring an image to be segmented; inputting the image to be segmented into a first image segmentation model, and obtaining a first segmentation result output by the first image segmentation model, wherein the first segmentation result comprises the position information of at least one first target area in the image to be segmented; mixing the image information of the first target area into a non-first target area according to the first segmentation result to obtain an enhanced image to be segmented; inputting the enhanced image to be segmented into a second image segmentation model, and obtaining a second segmentation result output by the second image segmentation model, wherein the second segmentation result comprises the position information of at least one first target area in the image to be segmented; the first image segmentation model and the second image segmentation model are both trained in advance based on a deep learning model. The method and the device can improve the accuracy rate and recall rate of image segmentation.

Description

Image segmentation method, method and device for training image segmentation model

Technical Field

The present disclosure relates to the field of artificial intelligence and image processing technologies, and in particular, to a method for image segmentation, a method for training an image segmentation model, and an apparatus thereof.

Background

Image segmentation, the technique and process of dividing an image into a plurality of specific regions with unique properties and determining the target region of interest, is a vital pretreatment for image recognition and computer vision. Image segmentation has many applications in the medical field, the autopilot field, the satellite imaging field, and the like.

In some special fields, the accuracy and recall rate of image segmentation are high, and in the process of focus detection in medical fields, the accuracy and recall rate of focus segmentation results of medical images are high. Although there are related techniques for segmenting images using a deep learning model, the accuracy and recall of the segmentation result still remain to be improved.

Disclosure of Invention

In view of this, the present application provides a method for image segmentation, a method and apparatus for training an image segmentation model, so as to improve the accuracy and recall of image segmentation results.

The application provides the following scheme:

In a first aspect, there is provided an image segmentation method, the method comprising:

acquiring an image to be segmented;

inputting the image to be segmented into a first image segmentation model, and obtaining a first segmentation result output by the first image segmentation model, wherein the first segmentation result comprises the position information of at least one first target area in the image to be segmented;

mixing the image information of the first target area into a non-first target area according to the first segmentation result to obtain an enhanced image to be segmented;

inputting the enhanced image to be segmented into a second image segmentation model, and obtaining a second segmentation result output by the second image segmentation model, wherein the second segmentation result comprises the position information of at least one first target area in the image to be segmented;

the first image segmentation model and the second image segmentation model are both trained in advance based on a deep learning model.

According to an implementation manner of the embodiment of the present application, before the image information of the first target area is mixed into the non-first target area according to the first segmentation result, the method further includes:

and filtering the first target region according to the region volume requirement and/or the shape requirement of each type of first target.

According to an implementation manner in the embodiment of the present application, the first segmentation result further includes position information of a second target area of a preset type in the image to be segmented;

the blending of the image information of the first target area into the non-first target area includes: mixing image information of the first target area into a non-first target area in the second target area;

the second image segmentation model corresponds to the second target;

the second image segmentation model segments the second target area to obtain the position information of the first target area in the second target area as the second segmentation result.

According to an implementation manner of the embodiment of the present application, the blending the image information of the first target area into the non-first target area includes:

and randomly cutting a first cutting area from a first target area, randomly cutting a second cutting area with the same size as the first cutting area from a non-first target area, and inserting the first cutting area into a position corresponding to the second cutting area in the non-first target area.

According to an implementation manner in the embodiment of the present application, inputting the image to be segmented into a first image segmentation model, and obtaining a first segmentation result output by the first image segmentation model includes:

Inputting the image to be segmented into M first image segmentation models, respectively applying scale factors on probability values of each image block belonging to a first target area, which are output by the M first image segmentation models, and obtaining M first segmentation results determined according to the probability values obtained after the scale factors are applied; wherein different first image segmentation models correspond to different scale factors, and M is a positive integer greater than 1;

the method further comprises the steps of: and respectively obtaining corresponding M second segmentation results based on the M first segmentation results, and carrying out fusion processing on the M second segmentation results to obtain a third segmentation result, wherein the third segmentation result comprises the position information of at least one first target area in the image to be segmented.

According to an implementation manner in an embodiment of the present application, the method further includes:

and carrying out fusion processing on the M first segmentation results and the third segmentation results to obtain a fourth segmentation result, wherein the fourth segmentation result comprises the position information of at least one first target area in the image to be segmented.

In a second aspect, there is provided an image segmentation method applied to a medical scene, the method comprising:

Acquiring a medical image;

inputting the medical image into a first image segmentation model, and obtaining a first segmentation result output by the first image segmentation model, wherein the first segmentation result comprises the position information of at least one focus area in the medical image;

mixing the image information of the focus area into a non-focus area according to the first segmentation result to obtain an enhanced medical image;

inputting the enhanced medical image into a second image segmentation model, and obtaining a second segmentation result output by the second image segmentation model, wherein the second segmentation result comprises the position information of at least one focus area in the medical image;

According to an implementation manner of the embodiment of the present application, the first segmentation result further includes location information of a preset type of organ region in the medical image;

the second image segmentation model corresponds to the organ;

the second image segmentation model segments the region of the preset type organ to obtain the position information of the focus region in the region of the preset type organ as the second segmentation result.

According to an implementation manner of the embodiment of the present application, inputting the medical image into a first image segmentation model, and obtaining a first segmentation result output by the first image segmentation model includes:

inputting the medical image into M first image segmentation models, respectively applying a scale factor on probability values of each image block which is output by the M first image segmentation models and belongs to a first target area, and obtaining M first segmentation results which are determined according to the probability values obtained after the scale factors are applied; wherein different first image segmentation models correspond to different scale factors, and M is a positive integer greater than 1;

the method further comprises the steps of: and respectively obtaining corresponding M second segmentation results based on the M first segmentation results, and carrying out fusion processing on the M second segmentation results to obtain a third segmentation result, wherein the third segmentation result comprises the position information of at least one focus area in the medical image.

In a third aspect, a method of training an image segmentation model is provided, the method comprising:

acquiring training data comprising a plurality of training samples, wherein the training samples comprise enhanced image samples and at least one first target area labeled by the image samples; the enhanced image sample is obtained by mixing the image information of the first target area in the image sample into a non-first target area;

Training a second image segmentation model based on a deep learning model using the training data, the trained targets comprising: the difference between the second segmentation result output by the second image segmentation model for the enhanced image sample and the labeled label of the image sample is minimized.

In a fourth aspect, there is provided an image segmentation method performed by a cloud server, the method comprising:

acquiring an image to be segmented from a user terminal;

returning the information of the second segmentation result to the user terminal;

In a fifth aspect, there is provided an image segmentation method, the method comprising:

displaying an image to be segmented on a presentation picture of a virtual reality VR or augmented reality AR device;

driving the VR device or the AR device to render and display the second segmentation result;

In a sixth aspect, there is provided an image segmentation apparatus, the apparatus comprising:

an image acquisition unit configured to acquire an image to be segmented;

the first segmentation unit is configured to input the image to be segmented into a first image segmentation model, obtain a first segmentation result output by the first image segmentation model, and the first segmentation result comprises the position information of at least one first target area in the image to be segmented;

the image enhancement unit is configured to mix the image information of the first target area into the non-first target area according to the first segmentation result to obtain an enhanced image to be segmented;

the second segmentation unit is configured to input the enhanced image to be segmented into a second image segmentation model, and acquire a second segmentation result output by the second image segmentation model, wherein the second segmentation result comprises the position information of at least one first target area in the image to be segmented;

In a seventh aspect, there is provided an apparatus for training an image segmentation model, the apparatus comprising:

A sample acquisition unit configured to acquire training data including a plurality of training samples including an enhanced image sample and a label of at least one first target area to which the image sample is labeled; the enhanced image sample is obtained by mixing the image information of the first target area in the image sample into a non-first target area;

a model training unit configured to train a second image segmentation model implemented based on a deep learning model using the training data, the trained targets comprising: the difference between the second segmentation result output by the second image segmentation model for the enhanced image sample and the labeled label of the image sample is minimized.

According to an eighth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the first to fifth aspects described above.

According to a ninth aspect, there is provided an electronic device comprising:

one or more processors; and

a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any one of the first to fifth aspects above.

According to a specific embodiment provided by the application, the application discloses the following technical effects:

1) According to the image segmentation method and device, through a two-stage image segmentation mode, based on a first segmentation result obtained by a first image segmentation model, image information of a first target area is mixed into a non-first target area to obtain an enhanced image to be segmented, and the enhanced image to be segmented is subjected to image segmentation again through a second image segmentation model, so that the accuracy rate and recall rate of image segmentation are improved.

2) According to the method and the device, the first target area and the second target area can be obtained through the first image segmentation model, then the second image segmentation model which corresponds to the second target and is more specific is used for carrying out more precise segmentation in the second target area, and the more precise second target area is obtained.

3) According to the method, different scale factors are applied to the first image segmentation model, so that a plurality of first segmentation results with different sensitivities are obtained, and a second segmentation result obtained based on the plurality of first segmentation results is fused, so that FN (false negative example) rate of the obtained segmentation results is reduced.

4) In the application, after the image information of the first target area in the image sample is mixed into the non-first target area, the image information is used for training the second image segmentation model, so that the segmentation effect of the second image segmentation model is improved, and the FP (false positive) rate of the image segmentation result is reduced.

5) The two-stage image segmentation mode can be applied to medical scenes and used for segmenting focus areas of medical images, so that the accuracy rate and recall rate of segmentation results are improved, and the accuracy of focus detection is further improved.

Of course, not all of the above-described advantages need be achieved at the same time in practicing any one of the products of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a system architecture to which embodiments of the present application are applicable;

fig. 2 is a flowchart of an image segmentation method provided in an embodiment of the present application;

FIG. 3 is a flowchart of a method for training a second image segmentation model according to an embodiment of the present application;

fig. 4 is a flowchart of an image segmentation method applied to a medical scene according to an embodiment of the present application;

FIG. 5 is a schematic block diagram of an image segmentation apparatus provided in an embodiment of the present application;

FIG. 6 is a schematic block diagram of an apparatus for training an image segmentation model provided in an embodiment of the present application;

fig. 7 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

In most of the conventional image segmentation methods, an image segmentation model is trained by using an image sample and a label of a target area corresponding to the image sample, and image segmentation is performed by the image segmentation model. However, the accuracy and recall rate of the image segmentation result obtained in some special scenes in this way are low, and particularly in similar complex scenes such as segmentation of focal areas of medical images, the accuracy and recall rate need to be improved.

In view of this, the present application provides a new image segmentation concept, which adopts a two-stage image segmentation method to improve the accuracy and recall rate of image segmentation. To facilitate an understanding of the present application, a brief description of a system architecture to which the present application applies is first provided. Fig. 1 shows an exemplary system architecture to which embodiments of the present application may be applied, as shown in fig. 1, which includes a model training device and an image segmentation device, and in the embodiments of the present application mainly involves two image segmentation models, which are referred to as a first image segmentation model and a second image segmentation model, respectively.

The model training device is used for performing model training in an off-line stage. That is, after training data is obtained, model training may be performed by using the method provided in the embodiment of the present application to obtain a first image segmentation model and a second image segmentation model. The first image segmentation model and the second image segmentation model can be trained by using different model training devices, and only one model training device is shown in the system.

The image segmentation device is used for carrying out image segmentation processing on the image to be segmented by utilizing the trained first image segmentation model and the trained second image segmentation model on the line to obtain an image segmentation result.

The model training device and the image segmentation device can be respectively arranged as independent servers, can be arranged in the same server or server group, and can be also arranged in independent or same cloud servers. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPs, virtual Private Server) service. The model training device and the image segmentation device can also be arranged on a computer terminal with stronger computing power.

In addition to the image segmentation on the line, the image segmentation device may also perform image segmentation off-line, for example, each image segmentation for a batch of images to be segmented.

It should be understood that the number of model training means, image segmentation means, first image segmentation model and second image segmentation model in fig. 1 is only illustrative. There may be any number of model training means, image segmentation means, first image segmentation model and second image segmentation model, as required by the implementation.

It should be noted that the limitations of "first", "second", and the like in this disclosure are not limitations in terms of size, order, and number, but are merely intended to be distinguished by names. For example, a "first image segmentation model" and a "second image segmentation model" are used to distinguish two voices in terms of name. For another example, "first segmentation result" and "second segmentation result" are used to distinguish the two segmentation results in terms of name. For another example, "first object" and "second object" are used to distinguish between the two objects by name. Etc.

Fig. 2 is a flowchart of an image segmentation method according to an embodiment of the present application, which may be performed by the image segmentation apparatus in the system shown in fig. 1. As shown in fig. 2, the method may include the steps of:

Step 202: and acquiring an image to be segmented.

Step 204: inputting an image to be segmented into a first image segmentation model, and obtaining a first segmentation result output by the first image segmentation model, wherein the first segmentation result comprises the position information of at least one first target area in the image to be segmented.

Step 206: and mixing the image information of the first target area into the non-first target area according to the first segmentation result to obtain the enhanced image to be segmented.

Step 208: inputting the enhanced image to be segmented into a second image segmentation model, and obtaining a second segmentation result output by the second image segmentation model, wherein the second segmentation result comprises the position information of at least one first target area in the image to be segmented; the first image segmentation model and the second image segmentation model are both obtained through pre-training based on a deep learning model.

From the above flow, according to the two-stage image segmentation method, based on the first segmentation result obtained by the first image segmentation model, the image information of the first target area is mixed into the non-first target area to obtain the enhanced image to be segmented, and the image segmentation is performed again on the enhanced image to be segmented by the second image segmentation model, so that the accuracy rate and recall rate of image segmentation are improved.

The above steps are described in detail below. The images to be segmented referred to in the present application generally contain different objects in different fields of application. For example, in the field of face recognition, the image to be segmented is typically an image containing a face, and the purpose of image segmentation is to determine the location of a face region contained in the image. For another example, in the medical field, the image is typically a medical image, such as a contrast image, a CT (computed tomography) image, an MRI (Magnetic Resonance Imaging) image, an ultrasound examination image, or the like, including organs such as lung, liver, pancreas, colon, or including lesions such as cysts, tumors, tuberculosis, polyps, stones, inflammation, or the like.

The image to be segmented may be a two-dimensional image or a three-dimensional image. Either a gray scale image or a color image.

The step 204, namely, "inputting the image to be segmented into the first image segmentation model and obtaining the first segmentation result output by the first image segmentation model", will be described in detail with reference to the embodiments.

In this embodiment of the present application, the first image segmentation model is configured to perform an initial segmentation process on an image to be segmented, segment the image to be segmented into a plurality of regions, and identify a region corresponding to a first target, that is, a first target region.

If the image to be segmented is a two-dimensional image, the first image segmentation model can divide the image to be segmented into tiles, and each tile can comprise one pixel or a plurality of pixels; after the feature representation of each image block is extracted by the first image segmentation model, classifying is carried out by utilizing the feature representation of each image block, the probability that each image block belongs to a preset first target is determined, and the probability is used for determining the areas of which image blocks correspond to the first target, namely the position information of the first target area.

The first image segmentation model may employ, among other things, a U-Net, FCN (Fully Convolutional Network, full convolutional neural network), or the like. U-Net is preferred, and is a variant of FCN, which has been proposed to solve biomedical imaging problems, and has been widely used in various fields of image segmentation since it is effective. Both U-Net and FCN employ an Encoder (Encoder) -Decoder structure, where Encoder is responsible for feature extraction and Decoder is responsible for classification for target prediction.

If the image to be segmented is a three-dimensional image, the first image segmentation model may divide the image to be segmented into tiles, where the tiles may include one voxel or may include a plurality of voxels; after the feature representation of each image block is extracted by the first image segmentation model, classifying is carried out by utilizing the feature representation of each image block, the probability that each image block belongs to a preset first target is determined, and the probability is used for determining the areas of which image blocks correspond to the first target, namely the position information of the first target area.

The first image segmentation model may adopt a model such as 3D U-Net, or may convert a three-dimensional image to be segmented into a two-dimensional slice (2D slice) sequence, then segment the image by using a network such as U-Net or FCN, and then synthesize the outputs of different two-dimensional slices into the output of a three-dimensional block. 3D U-Net is preferred, and unlike U-Net, 3D U-Net uses three-dimensional convolution, which is reflected mainly in channel (channel) variations, i.e., 3 channels.

In this step, a first image segmentation model obtained by training in advance is used. The first image segmentation model is a current more general segmentation mode, and the training process can comprise the following steps: training data comprising a plurality of first training samples is first acquired, wherein the first training samples comprise image samples and labels of at least one first target area to which the image samples are labeled. Training a first image segmentation model using the training data, the trained targets comprising: the difference between the first segmentation result output by the second image segmentation model for the image sample and the label to which the image sample is marked is minimized.

The image sample is an image sample containing a first target, and the first target is a target of a preset type. Taking the medical scenario as an example, the first target may be a preset type of organ, such as lung, liver, pancreas, colon, etc., or a preset type of lesion, such as cyst, tumor, tuberculosis, polyp, calculus, inflammation, etc. Accordingly, the image sample may be a medical image containing a preset type of organ or a medical image containing a preset type of lesion. Taking an autopilot scenario as an example, the first target may be a preset type of obstacle, e.g. a vehicle, a pedestrian, etc., and the image sample may accordingly be a picture containing the preset type of obstacle.

The labeling of the image sample may be manually performed. However, given the high cost and difficulty of obtaining such labeled samples (e.g., often requiring some expert knowledge), the number of such labeled samples is relatively small. In the embodiment of the application, an initial first image segmentation model can be trained on the basis of a limited first training sample, then the image of an unlabeled image sample is segmented by using the initial first image segmentation model, and the image sample is labeled by using an obtained first segmentation result to form a pseudo label. And after the generated pseudo tag is evaluated, screened or corrected, further training on the initial first image segmentation model by utilizing an image sample with the pseudo tag to obtain an updated first image segmentation model. And then the updated first image segmentation model is used for further generating a pseudo tag, and the pseudo tag is used for further training the first image segmentation model. And by analogy, after a plurality of training iterations, a final first image segmentation model is obtained.

The labeling of the first target area in the image sample may be performed in a mask manner, i.e. the first target area is masked in a specific manner.

In addition to the above-mentioned first segmentation result of the image to be segmented using the first image segmentation model, in order to reduce FN (False positive, false Negative) of the image segmentation, FN refers to erroneously identifying a positive instance (a tile of the first object) as a Negative instance (a tile of the non-first object). A plurality of first segmentation results of different sensitivities may be obtained by adjusting a scale factor of the first image segmentation model.

As one of the realizable modes, inputting an image to be segmented into M first image segmentation models, respectively applying a scale factor on probability values of each block belonging to a first target area output by the M first image segmentation models, and obtaining M first segmentation results determined according to the probability values obtained after the scale factors are applied; wherein the different first image segmentation models correspond to different scale factors, and M is a positive integer greater than 1.

Since the identification of the first target region is typically determined based on a relationship between a probability value of each tile belonging to the first target region and a predetermined threshold, i.e. if the probability value is greater than the predetermined threshold, the tile is considered to belong to the first target region, otherwise the tile is considered not to belong to the first target region. The application of the scaling factor to the probability values corresponds to the application of the same scaling factor f to each probability value output by one first image segmentation model and the application of different scaling factors f to each probability value output by a different first image segmentation model. Thus, the larger the scale factor applied, the larger the first target area corresponding to the first image segmentation model; conversely, the smaller the applied scale factor, the smaller the first target region corresponding to the first image segmentation model, that is, the different sensitivities. The appropriate sensitivity needs to be chosen to be balanced, and in embodiments of the present application the first segmentation result of the plurality of sensitivities may be chosen for subsequent processing steps.

It is assumed in the present embodiment that the two scaling factors f are applied to the two first image segmentation models, respectively ₁ And f ₂ Obtaining a first segmentation result of two sensitivities, respectively denoted as Sens ₁ And Sens ₄ The method is used for representing a low-sensitivity first segmentation result and a high-sensitivity first segmentation result.

In addition, after the first segmentation result is obtained in step 204, an optimization process may be further performed on the first segmentation result. As one of the realizations, the first target region may be filtered according to region volume requirements and/or shape requirements of each type of first target.

Under a specific scene, the first target region which is not in accordance with the region volume requirement and/or the shape requirement in the segmented first target region can be deleted for filtering processing when the volume requirement or the shape requirement of the first target region in interest in the image is different. For example in a medical scene, the volume is very small for the region (e.g. less than 0.1cm ³ ) The stones or cysts, etc. may be filtered, i.e. considered to belong to the background area instead of the first target area. For example, in an unmanned scene, the preset type of obstacle often has a specific shape, so that the obstacle area which does not meet the shape requirement can be filtered.

In addition, the first segmentation result obtained by segmentation of the first image segmentation model may further include position information of the second target region of the preset type, in addition to the position information of the first target region. In the medical field, the second target may be a preset type of organ and the first target may be a lesion.

The following describes the step 206 in detail, namely, "the image information of the first target area is mixed into the non-first target area according to the first segmentation result, so as to obtain the enhanced image to be segmented".

The "blending" referred to in this step means that a part of pixels in the image of the partial first target area is added in the non-first target area or a part of pixels in the non-first target area is replaced with a part of pixels in the image of the first target area without changing the respective image properties of the first target area and the non-first target area. For example, a first crop area may be randomly cropped from a first target area and a second crop area of the same size as the first crop area may be randomly cropped from a non-first target area, and the first crop area may be inserted into a position of the second crop area corresponding to the non-first target area, thereby forming a hybrid tile. In a scene of segmenting a focus area, such as a medical field, the feature information of the focus area is relatively fine and difficult to distinguish, and the mixed image blocks obtained through the processing of the step can enhance the information of a first target (such as a focus) of an image to be segmented, namely enhance the context awareness of the image to be segmented.

As one of the realizations, when the first clipping region is randomly clipped from the first target region, the clipped first clipping region includes the centroid of the first target region.

The number of the mixed blocks formed therein can be preset, and can be set to an empirical value and an experimental value.

If the first image segmentation model segments to obtain the first segmentation result that includes the position information of the second target region of the preset type in addition to the position information of the first target region, then in this step, the image information of the first target region within the second target region may be mixed into the non-second target region within the second target region. Taking the first target as a focus and the second target as an organ as an example. A first crop region may be randomly cropped from a focal region in an organ region, the crop region including a focal centroid, a second crop region of the same size as the first crop region is randomly cropped from a non-focal region in the organ region, and the first crop region is inserted into a corresponding position of the second crop region in the non-first target region.

The step 208, namely, "the enhanced image to be segmented is input into the second image segmentation model, and the second segmentation result output by the second image segmentation model is obtained" will be described in detail below with reference to the embodiment.

The second image segmentation model can adopt the same structure as the first image segmentation model, for example, a U-Net or 3DU-Net network is adopted, but the second image segmentation model is obtained based on the training of the enhanced image sample, so that the influence of noise can be effectively reduced, and the image segmentation capability is stronger.

Further, the second image segmentation model may employ a more specific second image segmentation model and segment in a more specific region according to the first segmentation result. As one of the realizable modes, the first segmentation result obtained by segmentation of the first image segmentation model may include, in addition to the position information of the first target region, the position information of the second target region of a preset type. In the medical field, the second target may be a preset type of organ and the first target may be a lesion. The second image segmentation model may be a model corresponding to the second object, and the number may be plural. The second image segmentation model segments the second target area to obtain the position information of the first target area in the second target area as a second segmentation result.

For example, the first segmentation result obtained through the first image segmentation model includes a liver region and a lesion region. The focal area is here a coarser segmentation. A specific second image segmentation model for the liver may be selected, for example, to identify a liver lesion. More specifically, a plurality of second image segmentation models may be selected to segment lesions such as tumors of the liver, inflammations of the liver, and the like, respectively.

The segmentation of the second image segmentation model may be performed in the second target region by using a more specific model, so as to reduce FP (False Positive) and improve the accuracy of image segmentation. By FP is meant that a negative instance (a tile that is not the first target) is erroneously identified as a positive instance (a tile of the first target).

The training process of the second image segmentation model may be as shown in fig. 3, comprising the steps of:

step 302: acquiring training data comprising a plurality of second training samples, wherein the second training samples comprise enhanced image samples and at least one first target area labeled by the image samples; the enhanced image sample is obtained by mixing the image information of the first target area in the image sample into the non-first target area.

In this step, the image sample and the label of at least one first target area to which the image sample is labeled may be first acquired. The image sample is an image sample containing a first target, and the first target is a target of a preset type. Taking the medical scenario as an example, the first target may be a preset type of organ, such as lung, liver, pancreas, colon, etc., or a preset type of lesion, such as cyst, tumor, tuberculosis, polyp, calculus, inflammation, etc. Accordingly, the image sample may be a medical image containing a preset type of organ or a medical image containing a preset type of lesion. Taking an autopilot scenario as an example, the first target may be a preset type of obstacle, e.g. a vehicle, a pedestrian, etc., and the image sample may accordingly be a picture containing the preset type of obstacle.

The labeling of the image sample may be manually performed. The image sample may be subjected to image segmentation by using the trained first image segmentation model, and then labeled according to the obtained first segmentation result.

And then mixing the image information of the first target area in the image sample into the non-first target area to obtain the enhanced image sample. For example, a first clipping region is randomly clipped from a first target region, and a second clipping region of the same size as the first clipping region is randomly clipped from a non-first target region, and the first clipping region is inserted into a position corresponding to the second clipping region in the non-first target region. The processing of this portion is similar to that of step 206 shown in fig. 2 and will not be described in detail herein.

Step 304: training a second image segmentation model based on a deep learning model using training data, the training targets comprising: the difference between the second segmentation result output by the second image segmentation model for the enhanced image sample and the labeled label of the image sample is minimized.

The enhanced image sample is input into a second image segmentation model, the second image segmentation model extracts the characteristic representation of each block from the enhanced image sample, the characteristic representation of each block is utilized to classify, the probability that each block belongs to a preset first target is determined, and the probability is used for determining the position information of the areas of which blocks correspond to the first target, namely the first target area, as a second segmentation result.

The training object is actually that the second segmentation result is consistent with the labeled label of the image sample as far as possible, that is, the region of the segmented first object is consistent with the region of the labeled first object. And constructing a loss function according to the training target, and updating model parameters of the second image segmentation model in a gradient descent mode by utilizing the value of the loss function in each round of iteration until a preset training ending condition is met. The training ending condition may include, for example, the value of the loss function being less than or equal to a preset loss function threshold, the number of iterations reaching a preset number of times threshold, etc.

When training a specific second image segmentation model, specific classes of image samples and labels are used. For example, a second image segmentation model for segmenting liver tumors may employ a medical image sample of the liver and a tumor region labeled with the medical image sample. For another example, a second image segmentation model that segments liver inflammation may employ a medical image sample of the liver and an inflammation region labeled with the medical image sample.

As can be seen from the flow shown in fig. 3, the present application performs context-aware enhancement on the image sample by mixing the image information of the first target area into the non-first target area, so that the second image segmentation model obtained by training in this way can improve the accuracy rate and recall rate of image segmentation.

With continued reference to step 208 in fig. 2. If M first image segmentation models are used in step 204, M first segmentation results of different sensitivities are obtained. The

steps

206 and 208 may be performed for the first segmentation results of different sensitivities, respectively, that is, enhancement processing is performed on the image to be processed based on the first segmentation results of different sensitivities, so as to obtain M enhanced images to be processed; and then, respectively carrying out segmentation processing on the M enhanced images to be processed by using a second image segmentation model to obtain M second segmentation results. And then carrying out fusion processing on the M second segmentation results to obtain a third segmentation result, wherein the third segmentation result comprises the position information of at least one first target area in the image to be segmented. Where the fusion process may be such as averaging, interpolation, voting mechanisms, etc.

As one of the realizations, the second segmentation result may be taken as the final segmentation result of the image to be processed.

As another possible way, the third segmentation result may be used as the final segmentation result of the image to be processed.

As still another possible way, the above M first segmentation results and the third segmentation result may be fused to obtain a fourth segmentation result, where the fourth segmentation result is used as a final segmentation result of the image to be processed. The fourth segmentation result comprises the position information of at least one first target region in the image to be segmented. Where the fusion process may be such as averaging, interpolation, voting mechanisms, etc.

The method provided by the embodiment of the application can be applied to image segmentation of various application scenes, such as medical scenes, unmanned scenes, satellite imaging scenes and the like. The above method is described herein by way of example in the context of a medical scenario.

The deep learning model for automatic detection of lesions has been applied in many organs, and the machine learning model can help automatic localization and classification of organ lesions, not only reduces medical costs (i.e. reduces reliance on expert knowledge), but also improves diagnostic performance. The machine learning model is superior to human in terms of acuity and suspicious pattern recognition, thereby improving the detection rate of early lesions. And after the image segmentation model segments the medical image to obtain a focus area, the focus area is used for carrying out statistical analysis and auxiliary diagnosis and treatment in the aspect of medical research.

The flow of the image segmentation method applied to a medical scene according to the above-described method embodiment of the present application will be described below by taking detection of liver tumor as an example, as shown in fig. 4. First, a medical image is acquired. The medical image may be, for example, a contrast image, a CT image, an MRI image, an ultrasound examination image, or the like, in this example, a CT image of the liver, i.e., an image obtained by computed tomography of the liver.

The medical image, i.e. the liver CT image, is then input into the first image segmentation model. And acquiring a first segmentation result output by the first image segmentation model, wherein the first segmentation result comprises the position information of the focus area in the medical image. Furthermore, in order to better identify liver tumors, the first image segmentation model may also segment the second object, i.e. the liver region, simultaneously. In this example, the output first segmentation result includes a liver region and a lesion region in the liver CT image.

Wherein the first image segmentation model is pre-trained based on a deep learning model (e.g., a 3D U-Net model). CT images of some organs may be collected in advance and labeled for organs and lesions. Since labeling data of CT images of organs are obtained by cooperative diagnosis from radiologists, liver specialists, case specialists, and the like, labeling cost is high and it is difficult to obtain the labeling data. On the basis of limited liver CT image annotation data (namely, including liver CT images and the annotations of liver areas and focus areas in the liver CT images), an initial first image segmentation model is trained first, then the unlabeled liver CT images are subjected to image segmentation by using the initial first image segmentation model, and the obtained first segmentation results are used for annotating the liver CT images to form pseudo labels. And after the generated pseudo tag is evaluated, screened or corrected, further training on the initial first image segmentation model by utilizing the liver CT image with the pseudo tag to obtain an updated first image segmentation model. And then the updated first image segmentation model is used for further generating a pseudo tag, and the pseudo tag is used for further training the first image segmentation model. And by analogy, after a plurality of training iterations, a final first image segmentation model is obtained. After the pseudo tag is obtained, some area calculation and lesion matching algorithms may be used to verify or correct the pseudo tag, and the partial algorithms may be used in the existing algorithms, which are not described in detail herein.

It should be noted that the first image segmentation model may segment a liver region and a focal region of the liver. Regions of multiple organs and focal regions may also be segmented.

In the above process, two first image segmentation models can be adopted, different scale factors are applied to the probability that each block in the liver CT image output by each image segmentation model belongs to the focus area, so as to obtain first segmentation results with different sensitivities, and the table is formedShown as Sens ₁ And Sens ₄ 。

On the basis of the obtained first segmentation results, mixing the image information of the focus area into a non-focus area to obtain Sens ₁ And Sens ₄ The corresponding enhanced medical image is a liver CT image. Specifically, a three-dimensional first clipping region may be randomly clipped from a focal region in the liver region, the region including the centroid of the focal region, a second clipping region of the same size as the first clipping region may be randomly clipped from a non-focal region in the liver region, and the first clipping region may be inserted into the non-focal region along the z-axis of the location prior to the second clipping region to obtain an enhanced liver CT image.

And inputting the enhanced medical image, namely the enhanced liver CT image, into a second image segmentation model to obtain a second segmentation result output by the second image segmentation model. The second segmentation result comprises the position information of at least one focus area in the medical image, namely the focus area in the liver CT image. The second image segmentation model is also trained in advance based on a deep learning model, and the second image segmentation model can be an image segmentation model which is specially used for the liver, and the number of the second image segmentation model can be one or more. As one of the realizations, the second image segmentation model may be a segmentation model for a specific lesion of the liver, which is a lesion segmentation model dedicated to the liver, for example, a second image segmentation model for a tumor, a second image segmentation model for a fatty liver, etc., so that a finer reclassification can be performed for lesions such as a liver tumor, a fatty liver, etc., in a liver region, thereby obtaining a second segmentation result.

For Sens ₁ And Sens ₄ The corresponding enhanced liver CT images all obtain second segmentation results, and fusion processing can be carried out on each second segmentation result to obtain a third segmentation result.

Sens can then be used ₁ And Sens ₄ And further carrying out fusion processing on the liver CT image and the third segmentation result to obtain a fourth segmentation result of the liver CT image as a final segmentation result. The final segmentation result contains information about the location of the lesion in the liver, e.g. asIf the liver tumor area is segmented, the final segmentation result contains the position information of the liver tumor area; if the fatty liver region is segmented, the final segmentation result includes positional information of the fatty liver region.

Information of the lesion area may be output for reference by a user (e.g., a patient or physician, etc.) based on the final segmentation result. It should be noted that, the method provided in the present application is applied to the medical field only in relation to the segmentation processing of the image, and the segmentation result, for example, the lesion area may be output in a manner of marking in the image, highlighting in the image, or the like.

By outputting the first segmentation results with various sensitivities, false negative of liver tumor detection can be effectively reduced, and by mixing the image information of the focus area into the non-focus area, the enhanced liver CT image is obtained and further segmentation is carried out, so that false positive of liver tumor detection can be effectively reduced.

As one of the possible embodiments, the image segmentation method described above may be executed by a cloud server, that is, the image segmentation function is integrated in the cloud, and the image segmentation service is provided for the user. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPS, virtual Private Server) service.

When a user wishes to perform image segmentation on a certain image, the image can be used as an image to be segmented, and the image to be segmented is uploaded to a cloud server through a user terminal.

The above-mentioned user terminal may be, but is not limited to, such as: a cell phone, tablet, notebook, PDA (Personal Digital Assistant ), wearable device, PC (Personal Computer, personal computer), etc. The wearable device may include, among other things, devices such as smart watches, smart glasses, smart bracelets, virtual reality devices, augmented reality devices, mixed reality devices (i.e., devices that can support virtual reality and augmented reality), and so forth.

The cloud server acquires an image to be segmented from a user terminal; inputting an image to be segmented into a first image segmentation model, and obtaining a first segmentation result output by the first image segmentation model, wherein the first segmentation result comprises the position information of at least one first target area in the image to be segmented; mixing the image information of the first target area into a non-first target area according to the first segmentation result to obtain an enhanced image to be segmented; inputting the enhanced image to be segmented into a second image segmentation model, and obtaining a second segmentation result output by the second image segmentation model, wherein the second segmentation result comprises the position information of at least one first target area in the image to be segmented; and returning the information of the second segmentation result to the user terminal.

The specific implementation process of the first image segmentation model and the second image segmentation model may be referred to the relevant description in the above embodiment, which is not repeated herein.

As one of the possible embodiments, the above-described image segmentation method may be applied to VR (Virtual Reality) or AR (augmented Reality) scenes.

Firstly, displaying an image to be segmented on a presentation picture on VR or AR equipment; the image segmentation device inputs an image to be segmented into a first image segmentation model, and obtains a first segmentation result output by the first image segmentation model, wherein the first segmentation result comprises the position information of at least one first target area in the image to be segmented; mixing the image information of the first target area into a non-first target area according to the first segmentation result to obtain an enhanced image to be segmented; inputting the enhanced image to be segmented into a second image segmentation model, and obtaining a second segmentation result output by the second image segmentation model, wherein the second segmentation result comprises the position information of at least one first target area in the image to be segmented; driving the VR device or AR device to render and present the second segmentation result.

With the content of the above embodiments, the user can view the image to be segmented and the image segmentation result through the VR device or the AR device. The method for mapping the image to be segmented and the image segmentation result into the virtual space structure is not limited, and any method in the prior art can be adopted.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

According to an embodiment of another aspect, an image segmentation apparatus is provided. Fig. 5 shows a schematic block diagram of an image segmentation apparatus according to an embodiment. As shown in fig. 5, the apparatus 500 includes: the image acquisition unit 501, the first segmentation unit 502, the image enhancement unit 503, and the second segmentation unit 504 may further include an image fusion unit 505. Wherein the main functions of each constituent unit are as follows:

An image acquisition unit 501 is configured to acquire an image to be segmented.

The first segmentation unit 502 is configured to input the image to be segmented into a first image segmentation model, obtain a first segmentation result output by the first image segmentation model, and the first segmentation result includes position information of at least one first target region in the image to be segmented.

The image enhancement unit 503 is configured to mix the image information of the first target area into the non-first target area according to the first segmentation result, so as to obtain an enhanced image to be segmented.

And a second segmentation unit 504 configured to input the enhanced image to be segmented into a second image segmentation model, and obtain a second segmentation result output by the second image segmentation model, where the second segmentation result includes position information of at least one first target region in the image to be segmented.

The first image segmentation model and the second image segmentation model are both obtained through pre-training based on a deep learning model.

As one of the realizable ways, the first segmentation unit 502 is further configured to filter the first target region according to the region volume requirement and/or the shape requirement of each type of the first target.

As another implementation manner, the first segmentation result may further include position information of a second target area of a preset type in the image to be segmented.

The image enhancement unit 503 may be specifically configured to: image information of the first target area is mixed into the non-first target area in the second target area.

In this case, the second image segmentation model may correspond to the second object. The second image segmentation model segments the second target area to obtain the position information of the first target area in the second target area as the second segmentation result.

As one of the realizations, the image enhancement unit 503 may be specifically configured to: and randomly cutting a first cutting area from a first target area, randomly cutting a second cutting area with the same size as the first cutting area from a non-first target area, and inserting the first cutting area into a position corresponding to the second cutting area in the non-first target area.

As one of the realizable modes, inputting the image to be segmented into M first image segmentation models, respectively applying a scale factor on probability values of each image block belonging to a first target area output by the M first image segmentation models, and obtaining M first segmentation results determined according to the probability values obtained after the scale factors are applied; wherein the different first image segmentation models correspond to different scale factors, and M is a positive integer greater than 1.

The image fusion unit 505 is configured to obtain corresponding M second segmentation results based on the M first segmentation results, and perform fusion processing on the M second segmentation results to obtain a third segmentation result, where the third segmentation result includes position information of at least one first target area in the image to be segmented.

Further, the image fusion unit 505 may further perform fusion processing on the M first segmentation results and the third segmentation result to obtain a fourth segmentation result as a final segmentation result, where the fourth segmentation result includes position information of at least one first target region in the image to be segmented.

FIG. 6 is a schematic block diagram of an apparatus for training an image segmentation model according to an embodiment of the present application, as shown in FIG. 6, the apparatus may include: a sample acquisition unit 601 and a model training unit 602. Wherein the main functions of each constituent unit are as follows:

a sample acquiring unit 601 configured to acquire training data including a plurality of training samples, the training samples including enhanced image samples and labels of at least one first target area to which the image samples are labeled; the enhanced image sample is obtained by mixing the image information of the first target area in the image sample into the non-first target area.

A model training unit 602 configured to train a second image segmentation model implemented based on a deep learning model using training data, the trained targets comprising: the difference between the second segmentation result output by the second image segmentation model for the enhanced image sample and the labeled label of the image sample is minimized.

As one of the realizable ways, the sample acquiring unit 601, when mixing the image information of the first target area in the image sample into the non-first target area, is specifically configured to: and randomly cutting the first cutting area from the first target area, randomly cutting a second cutting area with the same size as the first cutting area from the non-first target area, and inserting the first cutting area into a position corresponding to the second cutting area in the non-first target area.

If the image sample is a two-dimensional image, the second image segmentation model may employ a model such as U-Net, FCN, or the like. If the image sample is a three-dimensional image, the second image segmentation model may adopt a model such as 3D U-Net, or may convert the three-dimensional image sample into a two-dimensional slice (2D slice) sequence, then segment the image sample with a network such as U-Net or FCN, and then synthesize the outputs of the different two-dimensional slices into the output of the three-dimensional block.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

In addition, the embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method of any one of the foregoing method embodiments.

And an electronic device comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.

The present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the preceding method embodiments.

Fig. 7 illustrates an architecture of an electronic device, which may include a processor 710, a video display adapter 711, a disk drive 712, an input/output interface 713, a network interface 714, and a memory 720, among others. The processor 710, the video display adapter 711, the disk drive 712, the input/output interface 713, the network interface 714, and the memory 720 may be communicatively connected via a communication bus 730.

The processor 710 may be implemented by a general-purpose CPU, a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided herein.

The Memory 720 may be implemented in the form of ROM (Read Only Memory), RAM (RandomAccess Memory ), static storage device, dynamic storage device, or the like. The memory 720 may store an operating system 721 for controlling the operation of the electronic device 700, and a Basic Input Output System (BIOS) 722 for controlling the low-level operation of the electronic device 700. In addition, a web browser 723, a data storage management system 724, and an image segmentation apparatus/model training apparatus 725, etc. may also be stored. The image segmentation apparatus/model training apparatus 725 may be an application program that specifically implements the operations of the foregoing steps in the embodiments of the present application. In general, when implemented in software or firmware, the relevant program code is stored in memory 720 and executed by processor 710.

The input/output interface 713 is used to connect with an input/output module to enable information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The network interface 714 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 730 includes a path to transfer information between various components of the device (e.g., processor 710, video display adapter 711, disk drive 712, input/output interface 713, network interface 714, and memory 720).

It should be noted that although the above devices illustrate only the processor 710, the video display adapter 711, the disk drive 712, the input/output interface 713, the network interface 714, the memory 720, the bus 730, etc., the device may include other components necessary to achieve proper operation in an implementation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the present application, and not all the components shown in the drawings.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer program product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

The foregoing has outlined the detailed description of the preferred embodiment of the present application, and the detailed description of the principles and embodiments of the present application has been provided herein by way of example only to facilitate the understanding of the method and core concepts of the present application; also, as will occur to those of ordinary skill in the art, many modifications are possible in view of the teachings of the present application, both in the detailed description and the scope of its applications. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. An image segmentation method, the method comprising:

acquiring an image to be segmented;

2. The method of claim 1, further comprising, prior to blending image information of the first target region into the non-first target region in accordance with the first segmentation result:

3. The method according to claim 1, wherein the first segmentation result further comprises position information of a second target region of a preset type in the image to be segmented;

the second image segmentation model corresponds to the second target;

4. The method of claim 1, wherein blending the image information of the first target region into the non-first target region comprises:

5. The method according to any one of claims 1 to 4, wherein inputting the image to be segmented into a first image segmentation model, obtaining a first segmentation result output by the first image segmentation model comprises:

6. The method according to claim 4, wherein the method further comprises:

7. An image segmentation method applied to a medical scene, the method comprising:

Acquiring a medical image;

8. An image segmentation method performed by a cloud server, the method comprising:

acquiring an image to be segmented from a user terminal;

9. An image segmentation method, the method comprising:

10. A method of training an image segmentation model, the method comprising:

11. An image segmentation apparatus, the apparatus comprising:

an image acquisition unit configured to acquire an image to be segmented;

12. An apparatus for training an image segmentation model, the apparatus comprising:

13. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.

14. An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of claims 1 to 10.