CN109461162B

CN109461162B - Method for segmenting target in image

Info

Publication number: CN109461162B
Application number: CN201811478643.1A
Authority: CN
Inventors: 张勇东; 闵少波; 谢洪涛
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2018-12-03
Filing date: 2018-12-03
Publication date: 2020-05-12
Anticipated expiration: 2038-12-03
Also published as: CN109461162A

Abstract

The invention discloses a method for segmenting a target in an image, which comprises the following steps: processing the input image through a trained multi-task full convolution network to obtain a segmentation result and a shape parameter prediction result; optimizing the shape parameter prediction result by separating the maximum pooling operation; and optimizing the segmentation result by using the optimized shape parameter prediction result based on the segmentation fusion strategy, thereby realizing the target segmentation. The method can realize target segmentation based on shape constraint, can smooth the segmentation edge and solve the adhesion problem by verifying on different biological data sets, and has obvious segmentation effect due to the traditional scheme.

Description

Method for segmenting target in image

Technical Field

The invention relates to the technical field of image processing, in particular to a method for segmenting a target in an image.

Background

The objective segmentation algorithm has gained wide attention in recent years, and the task is to segment an interested objective region in an image to obtain a label different from the background. Since the target segmentation is one of the bases of scene understanding, the task has wide application scenes in the fields of automatic driving, medical image analysis and the like.

Among many target segmentation methods, convolutional neural networks are widely used to extract semantic information of images. By simulating the structure of human visual perception, the convolutional neural network can autonomously learn the optimal feature expression according to task requirements, so that a better segmentation effect is achieved. However, the current methods still fail to solve the problem of rough edges and sticking in the target segmentation.

Disclosure of Invention

The invention aims to provide a method for segmenting an object in an image, which can smooth segmentation edges and solve the problem of adhesion.

The purpose of the invention is realized by the following technical scheme:

a method of object segmentation in an image, comprising:

processing the input image through a trained multi-task full convolution network to obtain a segmentation result and a shape parameter prediction result;

optimizing the shape parameter prediction result by separating the maximum pooling operation;

and optimizing the segmentation result by using the optimized shape parameter prediction result based on the segmentation fusion strategy, thereby realizing the target segmentation.

According to the technical scheme provided by the invention, the target segmentation can be realized based on shape constraint, the edges can be smoothly segmented and the adhesion problem can be solved by verifying on different biological data sets, and the segmentation effect is obviously superior to that of the traditional scheme.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a flowchart of a method for segmenting an object in an image according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a separation max-pooling operation provided by an embodiment of the present invention;

fig. 3 is a schematic diagram of a segmentation fusion policy provided in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention relates to a method for segmenting a target in an image, which can explicitly integrate shape prior knowledge of the target into a network structure; the device mainly comprises three parts: multitask full convolution network, maximum pooling for separation, and a segment fusion strategy. The multitask full convolution network employs a Full Convolution Network (FCN) model, typically based on VGG-16 feature extraction. To explicitly express shape constraints in the network, a multitask fully convolutional network can simultaneously segment an image and predict a set of shape parameters for each target object in the image. By defining the shape parameters differently, such as angle, length, width, etc., the shape parameters may describe a standard shape, such as an ellipse. The obtained segmentation result and the shape parameter can be mutually supplemented, and the obtained segmentation result and the shape parameter are optimized to each other, so that the problems of smooth edge and separation adhesion are finally achieved. However, shape parameters are difficult to predict accurately in practice, so we improve the prediction accuracy of shape parameters by separating the max pooling operations. By analyzing the relevance between the segmentation result and the parameter prediction, separating the maximum pooling can remove some unreliable shape predictions, and only more accurate shape parameters are reserved, so that the segmentation result is better optimized. And finally, optimizing the obtained segmentation result by using the predicted target shape parameters through a segmentation fusion strategy. Typically, lesions in biological data will cause the target shape to deviate significantly from our shape priors. In this case, the object shape resulting from the segmentation is therefore more reliable, since the object shape described by the shape parameters is too standard. For most normal data, the shape parameters result in shapes that are very referential. Based on the above consideration, the segmentation fusion strategy provided by the invention can be adaptive while preserving the variability of the segmentation result and the regularity of the shape parameter, so as to optimize the final target segmentation result as much as possible. The scheme of the embodiment of the invention mainly comprises the following three points:

1) shape constraints are effectively introduced into the network to be used as a target shape segmentation algorithm.

2) Separating the max-pooling operation enables optimization of the segmentation and parameter prediction portions in a multitasking network.

3) The segmentation fusion strategy can flexibly optimize the segmentation result by using the predicted shape constraint.

As shown in fig. 1, a method for segmenting an object in an image according to an embodiment of the present invention mainly includes the following steps:

1. and processing the input image through the trained multi-task full convolution network to obtain a segmentation result and a shape parameter prediction result.

In the embodiment of the invention, the multitask full convolution network (Multi-task FCN) comprises seven groups of convolutional layer structures, each group of structures comprises a plurality of convolutional layers and a ReLU activation function, and a maximum pooling layer is inserted between the groups; the convolution kernels in the first five groups of convolution block layers are the same in number and are sequentially connected in series, and the number of the convolution kernels in different groups is sequentially increased with the deepening of the network; let the first five convolutional block layers be simplified as ConvNet in FIG. 1, and their obtained feature map is defined as X_iThe segmentation result P and the shape parameter prediction { T } are then based on the feature map X by leaving two sets of convolution block layers, respectively_iAnd (6) obtaining a prediction.

Illustratively, the method can be realized by a VGG-16 structure, and the number of channels of the output characteristics of five groups of convolution structures is: 64,128,256,512,512.

In the embodiment of the present invention, each element in the segmentation result is a value of [0,1], if greater than 0.5, it indicates that the pixel belongs to the target region, and if less than 0.5, it indicates that the pixel value is the background region.

In the embodiment of the invention, in the training stage of the multitask full convolution network, the elliptical shape is assumed as the prior knowledge, and the predicted shape parameter of the ith pixel point is recorded as T_iThe shape parameter prediction result obtained in each training is { theta, mu_c,v_cA, b }; wherein θ represents the inclination angle of the ellipse; mu.s_c,v_cRepresents the center coordinates of the ellipse; a and b represent the length of the major and minor axes of the ellipse; finally, each pixel point has the 5 shape parameters which are respectively expressed as:

wherein, { mu, v } is the spatial coordinate of the pixel point, and H and W are the length and width of the image; wherein

In the embodiment of the invention, the target loss function of the multitask full convolution network is expressed as follows:

wherein N is the number of pixel points, P_iIs the partition prediction value, P, of the ith pixel point_i∈P；T_k.iRepresents T_iThe kth shape parameter of (1);

and

corresponding representation P_iAnd T_k.iThe true value of (d); λ is a balance parameter, L_clsIs softmax classification loss, L_cls＝-∑_iP_ilnP_i；L_regIs a common L in object detection₁Smoothing the constraint error:

in the training stage of the multitask full convolution network, data in the data set is subjected to data expansion operations such as folding, scaling, random cropping, and the like, and the data is shuffled, batched (for example, 8) and fixed in size, thereby forming a training set.

During training, a random gradient descent method is adopted as an optimizer to train network parameters; illustratively, the learning rate decay strategy selects an exponential decay, and the initial learning rate is 0.01. In addition, the ratio of Dropout in the regularization operation is 0.5, and the coefficient of the L2 penalty term is 0.0005.

For the initial values of all hyper-parameters in the network, a MSRA initialization method is used, and the principle is that the weight parameter of each layer in the network is initialized to meet the requirement

Wherein n is the number of the layer weight parameters. The penalty of regularization operation L2 in the network is just based on the Gaussian prior hypothesis of the network parameters to carry out penalty, so the initialization method can improve the network training efficiency and the network performance in the end-to-end training.

2. The shape parameter prediction results were optimized by Split Max Pooling (Split Max Pooling) operation. When we get the segmentation result P and the shape parameter prediction T, we use separate max pooling for optimization.

The pooling formula for the maximum pooling operation of the separation is:

wherein N is_iIs the area adjacent to the ith pixel point, exemplary N_iIs a region of 3 × 3 pixels near the pixel point i. By separating the maximum pooling operation from N_iMiddle maximum P_iAnd its corresponding T_iAnd propagating to the next layer of network (namely, the segmentation fusion strategy layer in the figure 1), thereby realizing the optimization of the shape parameters.

An example of a split max pooling operation is shown in FIG. 2: t and P are input and are respectively slid by windows with the size of 3 multiplied by 3, and only T (12) at the position corresponding to the maximum numerical value (0.7) in the window in P can be used as output and kept.

The split max-pooling operation also participates in the training process of the multitask full convolution network, wherein:

in the process of back propagation, for T_iThe expression of (a) is:

wherein L represents the target loss function of the aforementioned multitask full convolution network; m is N_iThe number of pixels in the window;

in the process of back propagation, for P_iThe expression of (a) is:

wherein α represents a hyperparameter, and the optimal value is determined by experimental analysis_iThe gradient of (a) is composed of two parts, one part is conducted from the direct output P' in fig. 2, and the other part is conducted from the output T in fig. two. Thus in the formula

The gradient of P' conduction from the output in FIG. 2 is expressed, the latter term

The gradient of T conduction.

As shown in fig. 2, P 'and P have the same content, that is, the segmentation prediction values of the pixel points are completely the same (P'_iContent of (A) and P_iSame), therefore, the gradient

And gradient of

The same applies.

As will be appreciated by those skilled in the art, in the forward propagation process, two outputs are generated after the input data, one output being itself and the other being the data affected by the input data; in particular, in the present invention, gradients are input

There will be two outputs, one of which is itself (denoted as self for distinction)

) And the other output is the input gradient

The gradient of the prediction of the affected shape parameter, i.e.

The expression is a back propagation process expression, and input and output are exchanged, so that the left side of the equal sign is output, and the right side of the equal sign is input, which is opposite to the forward propagation process.

3. And optimizing the segmentation result by using the optimized shape parameter prediction result based on a segmentation Fusion strategy (Piecewise Fusion), thereby realizing the target segmentation.

In the embodiment of the present invention, all the segmentation results P are not optimized because all the segmentation shapes tend to be standard, and therefore, only part of P is optimized_iUsing the shape parameters as optimization; setting two thresholds tau₁And τ₂And optimizing the pixel point between the two thresholds:

wherein:

dμ＝cos(θ)(μ-μ_c)+sin(θ)(ν-v_c)

dν＝-sin(θ)(μ-μ_c)+cos(θ)(ν-v_c)。

the piecewise fusion strategy is shown in FIG. 3, where a given threshold τ is₁And τ₂The values of (a) are merely examples.

To verify the effectiveness of the above protocol of embodiments of the present invention, experiments were performed on two biological reference datasets.

1) Synthetic vector dataset: the data set contained 100 high resolution (1019 x 1053) neurosynaptic electron microscope pictures with expert labeled tags as supervisory information. Through data cropping, 7322 training data and 1465 test data were finally generated. Our target object is the vesicular structure in the synapse. Most vesicle structures exhibit a more regular elliptical shape.

2) The glob Segmentation Change Contest: the data set contains image data of a human gland, including partially diseased and normal. Among them 85 pictures were used for training and 80 for testing. The normal human gland is elliptical in shape, while the diseased gland is less regular. The task is to segment all the gland regions in the target image.

After 240 epochs of training, the network achieved the best current results in both biobasis datasets. The two biomedical data sets (cryo-electron microscopy data and glandular cell data) are included, and the segmentation result IoU (intersection area is compared with union area) is 83.77% and 85.60% respectively; this effect is clearly due to the conventional solution.

Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of object segmentation in an image, comprising:

optimizing the segmentation result by using the optimized shape parameter prediction result based on a segmentation fusion strategy, thereby realizing target segmentation;

the multitask full convolution network comprises seven groups of convolution layer structures, each group of structure comprises a plurality of convolution layers and a ReLU activation function, and a maximum pooling layer is inserted between each group; the convolution kernels in the first five groups of convolution block layers are the same in number and are sequentially connected in series, and the number of the convolution kernels in different groups is sequentially increased with the deepening of the network;

obtaining a feature map X by the first five sets of convolution block layers_iBy leaving two convolution block layers according to the feature map X respectively_iPredicting to obtain a segmentation result P and a shape parameter prediction { T };

in the training stage of the multitask full convolution network, the elliptical shape is assumed as the prior knowledge, and the predicted shape parameter of the ith pixel point is marked as T_iThe shape parameter prediction result obtained in each training is { theta, mu_c，v_cA, b }; wherein θ represents the inclination angle of the ellipse; mu.s_c，v_cRepresents the center coordinates of the ellipse; a, b represents the length of the long axis and the short axis of the ellipse, and { mu, v } is the space coordinate of the pixel point;

the pooling formula for the maximum pooling operation of the separation is:

wherein N is_iFor the area adjacent to the ith pixel point, separating the maximum pooling operation to obtain N_iMiddle maximum P_iAnd its corresponding T_iDownward propagation of the execution segment fusion policy, P_iThe segmentation predicted value of the ith pixel point is obtained;

in the process of back propagation, for T_iThe expression of (a) is:

wherein L represents a target loss function of the multitask full convolution network; m is N_iThe number of pixels in the window;

in the process of back propagation, for P_iThe expression of (a) is:

wherein α represents a hyperparameter, P'_iContent of (A) and P_iThe same;

the optimizing the segmentation result using the optimized shape parameter prediction result based on the segmentation fusion strategy comprises:

setting two thresholds tau₁And τ₂And optimizing the pixel point between the two thresholds:

wherein:

dμ＝cos(θ)(μ-μ_c)+sin(θ)(v-v_c)

dv＝-sin(θ)(μ-μ_c)+cos(θ)(v-v_c)。

2. the method of claim 1, wherein each element in the segmentation result is a value of [0,1], if greater than 0.5, the pixel belongs to the target region, and if less than 0.5, the pixel is a background region.

3. The method of claim 1, wherein each pixel has 5 shape parameters, which are respectively expressed as:

where H and W are the length and width of the image.

4. The method of claim 3, wherein the objective loss function of the multitask fully convolutional network is expressed as:

wherein N is the number of pixel points, P_i∈P；T_k.iRepresents T_iThe kth shape parameter of (1);

and

corresponding representation P_iAnd T_k.iThe true value of (d); λ is a balance parameter, L_clsIs softmax classification loss, L_regIs a common L in object detection₁And smoothing the constraint error.

5. The method for segmenting the target in the image according to the claim 3, characterized in that in the training stage of the multitask full convolution network, data in the data set is subjected to data augmentation operations including folding, stretching and/or random cropping, and then the data is subjected to disordering, batching and fixing size, so that the training set is formed;

during training, a random gradient descent method is adopted as an optimizer to train network parameters; for the initial values of all hyper-parameters in the network, the MSRA initialization method is used.