CN109461162B - Method for segmenting target in image - Google Patents
Method for segmenting target in image Download PDFInfo
- Publication number
- CN109461162B CN109461162B CN201811478643.1A CN201811478643A CN109461162B CN 109461162 B CN109461162 B CN 109461162B CN 201811478643 A CN201811478643 A CN 201811478643A CN 109461162 B CN109461162 B CN 109461162B
- Authority
- CN
- China
- Prior art keywords
- segmentation
- network
- shape parameter
- target
- parameter prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10056—Microscopic image
- G06T2207/10061—Microscopic image from scanning electron microscope
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for segmenting a target in an image, which comprises the following steps: processing the input image through a trained multi-task full convolution network to obtain a segmentation result and a shape parameter prediction result; optimizing the shape parameter prediction result by separating the maximum pooling operation; and optimizing the segmentation result by using the optimized shape parameter prediction result based on the segmentation fusion strategy, thereby realizing the target segmentation. The method can realize target segmentation based on shape constraint, can smooth the segmentation edge and solve the adhesion problem by verifying on different biological data sets, and has obvious segmentation effect due to the traditional scheme.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a method for segmenting a target in an image.
Background
The objective segmentation algorithm has gained wide attention in recent years, and the task is to segment an interested objective region in an image to obtain a label different from the background. Since the target segmentation is one of the bases of scene understanding, the task has wide application scenes in the fields of automatic driving, medical image analysis and the like.
Among many target segmentation methods, convolutional neural networks are widely used to extract semantic information of images. By simulating the structure of human visual perception, the convolutional neural network can autonomously learn the optimal feature expression according to task requirements, so that a better segmentation effect is achieved. However, the current methods still fail to solve the problem of rough edges and sticking in the target segmentation.
Disclosure of Invention
The invention aims to provide a method for segmenting an object in an image, which can smooth segmentation edges and solve the problem of adhesion.
The purpose of the invention is realized by the following technical scheme:
a method of object segmentation in an image, comprising:
processing the input image through a trained multi-task full convolution network to obtain a segmentation result and a shape parameter prediction result;
optimizing the shape parameter prediction result by separating the maximum pooling operation;
and optimizing the segmentation result by using the optimized shape parameter prediction result based on the segmentation fusion strategy, thereby realizing the target segmentation.
According to the technical scheme provided by the invention, the target segmentation can be realized based on shape constraint, the edges can be smoothly segmented and the adhesion problem can be solved by verifying on different biological data sets, and the segmentation effect is obviously superior to that of the traditional scheme.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a flowchart of a method for segmenting an object in an image according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a separation max-pooling operation provided by an embodiment of the present invention;
fig. 3 is a schematic diagram of a segmentation fusion policy provided in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention relates to a method for segmenting a target in an image, which can explicitly integrate shape prior knowledge of the target into a network structure; the device mainly comprises three parts: multitask full convolution network, maximum pooling for separation, and a segment fusion strategy. The multitask full convolution network employs a Full Convolution Network (FCN) model, typically based on VGG-16 feature extraction. To explicitly express shape constraints in the network, a multitask fully convolutional network can simultaneously segment an image and predict a set of shape parameters for each target object in the image. By defining the shape parameters differently, such as angle, length, width, etc., the shape parameters may describe a standard shape, such as an ellipse. The obtained segmentation result and the shape parameter can be mutually supplemented, and the obtained segmentation result and the shape parameter are optimized to each other, so that the problems of smooth edge and separation adhesion are finally achieved. However, shape parameters are difficult to predict accurately in practice, so we improve the prediction accuracy of shape parameters by separating the max pooling operations. By analyzing the relevance between the segmentation result and the parameter prediction, separating the maximum pooling can remove some unreliable shape predictions, and only more accurate shape parameters are reserved, so that the segmentation result is better optimized. And finally, optimizing the obtained segmentation result by using the predicted target shape parameters through a segmentation fusion strategy. Typically, lesions in biological data will cause the target shape to deviate significantly from our shape priors. In this case, the object shape resulting from the segmentation is therefore more reliable, since the object shape described by the shape parameters is too standard. For most normal data, the shape parameters result in shapes that are very referential. Based on the above consideration, the segmentation fusion strategy provided by the invention can be adaptive while preserving the variability of the segmentation result and the regularity of the shape parameter, so as to optimize the final target segmentation result as much as possible. The scheme of the embodiment of the invention mainly comprises the following three points:
1) shape constraints are effectively introduced into the network to be used as a target shape segmentation algorithm.
2) Separating the max-pooling operation enables optimization of the segmentation and parameter prediction portions in a multitasking network.
3) The segmentation fusion strategy can flexibly optimize the segmentation result by using the predicted shape constraint.
As shown in fig. 1, a method for segmenting an object in an image according to an embodiment of the present invention mainly includes the following steps:
1. and processing the input image through the trained multi-task full convolution network to obtain a segmentation result and a shape parameter prediction result.
In the embodiment of the invention, the multitask full convolution network (Multi-task FCN) comprises seven groups of convolutional layer structures, each group of structures comprises a plurality of convolutional layers and a ReLU activation function, and a maximum pooling layer is inserted between the groups; the convolution kernels in the first five groups of convolution block layers are the same in number and are sequentially connected in series, and the number of the convolution kernels in different groups is sequentially increased with the deepening of the network; let the first five convolutional block layers be simplified as ConvNet in FIG. 1, and their obtained feature map is defined as XiThe segmentation result P and the shape parameter prediction { T } are then based on the feature map X by leaving two sets of convolution block layers, respectivelyiAnd (6) obtaining a prediction.
Illustratively, the method can be realized by a VGG-16 structure, and the number of channels of the output characteristics of five groups of convolution structures is: 64,128,256,512,512.
In the embodiment of the present invention, each element in the segmentation result is a value of [0,1], if greater than 0.5, it indicates that the pixel belongs to the target region, and if less than 0.5, it indicates that the pixel value is the background region.
In the embodiment of the invention, in the training stage of the multitask full convolution network, the elliptical shape is assumed as the prior knowledge, and the predicted shape parameter of the ith pixel point is recorded as TiThe shape parameter prediction result obtained in each training is { theta, muc,vcA, b }; wherein θ represents the inclination angle of the ellipse; mu.sc,vcRepresents the center coordinates of the ellipse; a and b represent the length of the major and minor axes of the ellipse; finally, each pixel point has the 5 shape parameters which are respectively expressed as:
wherein, { mu, v } is the spatial coordinate of the pixel point, and H and W are the length and width of the image; wherein
In the embodiment of the invention, the target loss function of the multitask full convolution network is expressed as follows:
wherein N is the number of pixel points, PiIs the partition prediction value, P, of the ith pixel pointi∈P;Tk.iRepresents TiThe kth shape parameter of (1);andcorresponding representation PiAnd Tk.iThe true value of (d); λ is a balance parameter, LclsIs softmax classification loss, Lcls=-∑iPilnPi;LregIs a common L in object detection1Smoothing the constraint error:
in the training stage of the multitask full convolution network, data in the data set is subjected to data expansion operations such as folding, scaling, random cropping, and the like, and the data is shuffled, batched (for example, 8) and fixed in size, thereby forming a training set.
During training, a random gradient descent method is adopted as an optimizer to train network parameters; illustratively, the learning rate decay strategy selects an exponential decay, and the initial learning rate is 0.01. In addition, the ratio of Dropout in the regularization operation is 0.5, and the coefficient of the L2 penalty term is 0.0005.
For the initial values of all hyper-parameters in the network, a MSRA initialization method is used, and the principle is that the weight parameter of each layer in the network is initialized to meet the requirementWherein n is the number of the layer weight parameters. The penalty of regularization operation L2 in the network is just based on the Gaussian prior hypothesis of the network parameters to carry out penalty, so the initialization method can improve the network training efficiency and the network performance in the end-to-end training.
2. The shape parameter prediction results were optimized by Split Max Pooling (Split Max Pooling) operation. When we get the segmentation result P and the shape parameter prediction T, we use separate max pooling for optimization.
The pooling formula for the maximum pooling operation of the separation is:
wherein N isiIs the area adjacent to the ith pixel point, exemplary NiIs a region of 3 × 3 pixels near the pixel point i. By separating the maximum pooling operation from NiMiddle maximum PiAnd its corresponding TiAnd propagating to the next layer of network (namely, the segmentation fusion strategy layer in the figure 1), thereby realizing the optimization of the shape parameters.
An example of a split max pooling operation is shown in FIG. 2: t and P are input and are respectively slid by windows with the size of 3 multiplied by 3, and only T (12) at the position corresponding to the maximum numerical value (0.7) in the window in P can be used as output and kept.
The split max-pooling operation also participates in the training process of the multitask full convolution network, wherein:
in the process of back propagation, for TiThe expression of (a) is:
wherein L represents the target loss function of the aforementioned multitask full convolution network; m is NiThe number of pixels in the window;
in the process of back propagation, for PiThe expression of (a) is:
wherein α represents a hyperparameter, and the optimal value is determined by experimental analysisiThe gradient of (a) is composed of two parts, one part is conducted from the direct output P' in fig. 2, and the other part is conducted from the output T in fig. two. Thus in the formulaThe gradient of P' conduction from the output in FIG. 2 is expressed, the latter termThe gradient of T conduction.
As shown in fig. 2, P 'and P have the same content, that is, the segmentation prediction values of the pixel points are completely the same (P'iContent of (A) and PiSame), therefore, the gradientAnd gradient ofThe same applies.
As will be appreciated by those skilled in the art, in the forward propagation process, two outputs are generated after the input data, one output being itself and the other being the data affected by the input data; in particular, in the present invention, gradients are inputThere will be two outputs, one of which is itself (denoted as self for distinction)) And the other output is the input gradientThe gradient of the prediction of the affected shape parameter, i.e.
The expression is a back propagation process expression, and input and output are exchanged, so that the left side of the equal sign is output, and the right side of the equal sign is input, which is opposite to the forward propagation process.
3. And optimizing the segmentation result by using the optimized shape parameter prediction result based on a segmentation Fusion strategy (Piecewise Fusion), thereby realizing the target segmentation.
In the embodiment of the present invention, all the segmentation results P are not optimized because all the segmentation shapes tend to be standard, and therefore, only part of P is optimizediUsing the shape parameters as optimization; setting two thresholds tau1And τ2And optimizing the pixel point between the two thresholds:
wherein:
dμ=cos(θ)(μ-μc)+sin(θ)(ν-vc)
dν=-sin(θ)(μ-μc)+cos(θ)(ν-vc)。
the piecewise fusion strategy is shown in FIG. 3, where a given threshold τ is1And τ2The values of (a) are merely examples.
To verify the effectiveness of the above protocol of embodiments of the present invention, experiments were performed on two biological reference datasets.
1) Synthetic vector dataset: the data set contained 100 high resolution (1019 x 1053) neurosynaptic electron microscope pictures with expert labeled tags as supervisory information. Through data cropping, 7322 training data and 1465 test data were finally generated. Our target object is the vesicular structure in the synapse. Most vesicle structures exhibit a more regular elliptical shape.
2) The glob Segmentation Change Contest: the data set contains image data of a human gland, including partially diseased and normal. Among them 85 pictures were used for training and 80 for testing. The normal human gland is elliptical in shape, while the diseased gland is less regular. The task is to segment all the gland regions in the target image.
After 240 epochs of training, the network achieved the best current results in both biobasis datasets. The two biomedical data sets (cryo-electron microscopy data and glandular cell data) are included, and the segmentation result IoU (intersection area is compared with union area) is 83.77% and 85.60% respectively; this effect is clearly due to the conventional solution.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (5)
1. A method of object segmentation in an image, comprising:
processing the input image through a trained multi-task full convolution network to obtain a segmentation result and a shape parameter prediction result;
optimizing the shape parameter prediction result by separating the maximum pooling operation;
optimizing the segmentation result by using the optimized shape parameter prediction result based on a segmentation fusion strategy, thereby realizing target segmentation;
the multitask full convolution network comprises seven groups of convolution layer structures, each group of structure comprises a plurality of convolution layers and a ReLU activation function, and a maximum pooling layer is inserted between each group; the convolution kernels in the first five groups of convolution block layers are the same in number and are sequentially connected in series, and the number of the convolution kernels in different groups is sequentially increased with the deepening of the network;
obtaining a feature map X by the first five sets of convolution block layersiBy leaving two convolution block layers according to the feature map X respectivelyiPredicting to obtain a segmentation result P and a shape parameter prediction { T };
in the training stage of the multitask full convolution network, the elliptical shape is assumed as the prior knowledge, and the predicted shape parameter of the ith pixel point is marked as TiThe shape parameter prediction result obtained in each training is { theta, muc,vcA, b }; wherein θ represents the inclination angle of the ellipse; mu.sc,vcRepresents the center coordinates of the ellipse; a, b represents the length of the long axis and the short axis of the ellipse, and { mu, v } is the space coordinate of the pixel point;
the pooling formula for the maximum pooling operation of the separation is:
wherein N isiFor the area adjacent to the ith pixel point, separating the maximum pooling operation to obtain NiMiddle maximum PiAnd its corresponding TiDownward propagation of the execution segment fusion policy, PiThe segmentation predicted value of the ith pixel point is obtained;
in the process of back propagation, for TiThe expression of (a) is:
wherein L represents a target loss function of the multitask full convolution network; m is NiThe number of pixels in the window;
in the process of back propagation, for PiThe expression of (a) is:
wherein α represents a hyperparameter, P'iContent of (A) and PiThe same;
the optimizing the segmentation result using the optimized shape parameter prediction result based on the segmentation fusion strategy comprises:
setting two thresholds tau1And τ2And optimizing the pixel point between the two thresholds:
wherein:
dμ=cos(θ)(μ-μc)+sin(θ)(v-vc)
dv=-sin(θ)(μ-μc)+cos(θ)(v-vc)。
2. the method of claim 1, wherein each element in the segmentation result is a value of [0,1], if greater than 0.5, the pixel belongs to the target region, and if less than 0.5, the pixel is a background region.
4. The method of claim 3, wherein the objective loss function of the multitask fully convolutional network is expressed as:
5. The method for segmenting the target in the image according to the claim 3, characterized in that in the training stage of the multitask full convolution network, data in the data set is subjected to data augmentation operations including folding, stretching and/or random cropping, and then the data is subjected to disordering, batching and fixing size, so that the training set is formed;
during training, a random gradient descent method is adopted as an optimizer to train network parameters; for the initial values of all hyper-parameters in the network, the MSRA initialization method is used.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811478643.1A CN109461162B (en) | 2018-12-03 | 2018-12-03 | Method for segmenting target in image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811478643.1A CN109461162B (en) | 2018-12-03 | 2018-12-03 | Method for segmenting target in image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109461162A CN109461162A (en) | 2019-03-12 |
CN109461162B true CN109461162B (en) | 2020-05-12 |
Family
ID=65612421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811478643.1A Active CN109461162B (en) | 2018-12-03 | 2018-12-03 | Method for segmenting target in image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109461162B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378916B (en) * | 2019-07-03 | 2021-11-09 | 浙江大学 | TBM image slag segmentation method based on multitask deep learning |
CN112749801A (en) * | 2021-01-22 | 2021-05-04 | 上海商汤智能科技有限公司 | Neural network training and image processing method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103839244A (en) * | 2014-02-26 | 2014-06-04 | 南京第五十五所技术开发有限公司 | Real-time image fusion method and device |
CN105931226A (en) * | 2016-04-14 | 2016-09-07 | 南京信息工程大学 | Automatic cell detection and segmentation method based on deep learning and using adaptive ellipse fitting |
CN107492121A (en) * | 2017-07-03 | 2017-12-19 | 广州新节奏智能科技股份有限公司 | A kind of two-dimension human body bone independent positioning method of monocular depth video |
CN107506761A (en) * | 2017-08-30 | 2017-12-22 | 山东大学 | Brain image dividing method and system based on notable inquiry learning convolutional neural networks |
CN108335306A (en) * | 2018-02-28 | 2018-07-27 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103413347B (en) * | 2013-07-05 | 2016-07-06 | 南京邮电大学 | Based on the extraction method of monocular image depth map that prospect background merges |
US9972069B2 (en) * | 2014-10-06 | 2018-05-15 | Technion Research & Development Foundation Limited | System and method for measurement of myocardial mechanical function |
US10580131B2 (en) * | 2017-02-23 | 2020-03-03 | Zebra Medical Vision Ltd. | Convolutional neural network for segmentation of medical anatomical images |
CN108664971B (en) * | 2018-05-22 | 2021-12-14 | 中国科学技术大学 | Pulmonary nodule detection method based on 2D convolutional neural network |
-
2018
- 2018-12-03 CN CN201811478643.1A patent/CN109461162B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103839244A (en) * | 2014-02-26 | 2014-06-04 | 南京第五十五所技术开发有限公司 | Real-time image fusion method and device |
CN105931226A (en) * | 2016-04-14 | 2016-09-07 | 南京信息工程大学 | Automatic cell detection and segmentation method based on deep learning and using adaptive ellipse fitting |
CN107492121A (en) * | 2017-07-03 | 2017-12-19 | 广州新节奏智能科技股份有限公司 | A kind of two-dimension human body bone independent positioning method of monocular depth video |
CN107506761A (en) * | 2017-08-30 | 2017-12-22 | 山东大学 | Brain image dividing method and system based on notable inquiry learning convolutional neural networks |
CN108335306A (en) * | 2018-02-28 | 2018-07-27 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109461162A (en) | 2019-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106897714B (en) | Video motion detection method based on convolutional neural network | |
CN109859190B (en) | Target area detection method based on deep learning | |
CN107729819B (en) | Face labeling method based on sparse fully-convolutional neural network | |
Hou et al. | An end-to-end 3d convolutional neural network for action detection and segmentation in videos | |
US20210326638A1 (en) | Video panoptic segmentation | |
CN108491766B (en) | End-to-end crowd counting method based on depth decision forest | |
CN114445670B (en) | Training method, device and equipment of image processing model and storage medium | |
Cai et al. | Classification complexity assessment for hyper-parameter optimization | |
CN111259779A (en) | Video motion detection method based on central point trajectory prediction | |
CN112990222B (en) | Image boundary knowledge migration-based guided semantic segmentation method | |
CN112347977B (en) | Automatic detection method, storage medium and device for induced pluripotent stem cells | |
CN111274994A (en) | Cartoon face detection method and device, electronic equipment and computer readable medium | |
Wang et al. | Face mask extraction in video sequence | |
CN112529005B (en) | Target detection method based on semantic feature consistency supervision pyramid network | |
CN109461162B (en) | Method for segmenting target in image | |
Xiong et al. | Contrastive learning for automotive mmWave radar detection points based instance segmentation | |
CN108564582B (en) | MRI brain tumor image automatic optimization method based on deep neural network | |
CN114511710A (en) | Image target detection method based on convolutional neural network | |
CN113297936A (en) | Volleyball group behavior identification method based on local graph convolution network | |
CN112528058A (en) | Fine-grained image classification method based on image attribute active learning | |
CN115249313A (en) | Image classification method based on meta-module fusion incremental learning | |
CN113239866B (en) | Face recognition method and system based on space-time feature fusion and sample attention enhancement | |
Lv et al. | Memory‐augmented neural networks based dynamic complex image segmentation in digital twins for self‐driving vehicle | |
CN116311387B (en) | Cross-modal pedestrian re-identification method based on feature intersection | |
CN117576149A (en) | Single-target tracking method based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |