CN111080666B

CN111080666B - Object segmentation method and device based on cyclic convolution

Info

Publication number: CN111080666B
Application number: CN201911374778.8A
Authority: CN
Inventors: 周晓巍; 鲍虎军; 彭思达
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2022-07-15
Anticipated expiration: 2039-12-27
Also published as: CN111080666A

Abstract

The invention discloses an object segmentation method and device, which are used for segmenting an object by predicting the contour line of the object in an image. In order to predict the contour line of an object, the invention provides a feature learning method based on cyclic convolution and a curve deformation method. The implementation of the invention comprises: constructing a feature vector for each node of the curve based on the initialized closed curve; performing feature learning on a feature vector sequence defined on a closed curve by using cyclic convolution; based on cyclic convolution, a deep neural network is provided for curve deformation; realizing object segmentation based on a curve deformation method; a target object containing a plurality of connected regions is processed. According to the method, through cyclic convolution, efficient feature learning on the curve is achieved, and the accuracy of the object segmentation method based on the contour line is improved.

Description

Object segmentation method and device based on cyclic convolution

Technical Field

The invention belongs to the technical field of computers, and particularly relates to an object segmentation method and device based on cyclic convolution.

Background

In the related object segmentation technology, the conventional image processing method obtains an object contour curve by optimizing an initial curve, but is prone to fall into a local optimal point. Some recent deep learning methods directly regress the object contour curve, but the segmentation effect is not accurate enough. There are also implementations that use graph convolution to perform feature learning on the initial curve to predict the object profile curve. But using generalized graph convolution does not take full advantage of the topological features of the curves, making feature learning not necessarily very efficient.

Disclosure of Invention

The invention aims to provide a method for learning characteristics on a curve based on cyclic convolution aiming at the defects of the prior art, and the learned characteristics are used for deforming the curve. The invention performs object segmentation based on a curve deformation method.

According to a first aspect of the present invention, there is provided a method for learning features on a curve based on cyclic convolution, comprising:

1. generation of image features: and giving a picture to be subjected to object segmentation, processing the picture by using a deep neural network, and extracting picture features. The picture features are similar to pictures and are a tensor matrix. The resolution of the picture features is determined according to the input picture and the neural network. The extraction of picture features can use most existing deep neural networks.

2. Construction of features on the curves: a closed curve is given on the image, and the curve consists of N nodes. Based on the picture features, a feature vector can be constructed for each node.

3. A cyclic convolution-based feature learning method on a curve comprises the following steps: in graph theory, a closed curve is a cyclic graph. In the cyclic graph, N nodes form a closed chain, each node having a degree of 2, i.e., each node is an end point of two edges. When the curve is not closed, the feature vector on the curve is a one-dimensional discrete signal, and can be subjected to one-dimensional convolution with a one-dimensional convolution kernel, so that signal processing is realized. When the curve is closed, the feature vector on the curve is a periodic one-dimensional discrete signal. In this case, the one-dimensional convolution on such a periodic sequence of feature vectors is called a cyclic convolution.

4. Cyclic convolution based neural networks: the standard one-dimensional convolution uses a one-dimensional convolution kernel to perform convolution with a one-dimensional discrete signal, and the cyclic convolution uses a one-dimensional convolution kernel to perform convolution with a one-dimensional periodic discrete signal. Therefore, the circular convolution can be used for forming a neural network layer like one-dimensional convolution, and the neural network layer is embedded into a common deep neural network for feature extraction.

According to a second aspect of the present finding, there is provided a curve deformation method comprising:

1. and (3) offset prediction: and (3) giving a picture and a closed curve, and performing feature learning on the curve provided by the invention. After feature learning, each node now has a feature vector of high-level semantic information. An offset can be predicted at each curve node using a regressor commonly used for deep learning, such as a multi-layer perceptron or 1 × 1 convolution. This offset represents the offset from the current node coordinates to the target node coordinates. In object segmentation, the target curve is the object contour, and the offset is from the curve node to the object edge point.

2. And (3) deformation of the curve: and after the offset is predicted by each node, adding the offset to the coordinates of each node, updating the coordinates of the nodes of the curve, and realizing the deformation of the curve.

According to a third aspect of the present invention, there is provided an object segmentation method comprising:

1. an object detector: the object segmentation method of the present invention can use most existing target object detectors. The target object detector based on deep learning is composed of two parts, one is a neural network backbone structure, and the other is a regressor. And reading the target picture by the neural network backbone structure and outputting the picture characteristics. Based on the picture features, the regressor predicts the location and class of objects in the picture. The position of the object is represented by a two-dimensional rectangular box and the class of the object is represented by a unique heat vector.

2. Initializing a curve: the invention performs curve initialization based on a two-dimensional rectangular frame given by the target object detector. Each side of the two-dimensional rectangular frame is provided with a middle point, and the four middle points are connected to obtain a quadrangle. The quadrangle is a closed curve, and four corner points of the quadrangle can be deformed into four poles of the object by adopting the curve deformation method. Based on the four poles of the object, the invention constructs an octagon which is relatively close to the object, and the octagon is used as an initialized curve.

3. Iterative deformation curves: the initialized curve has only 8 nodes as an octagon. The invention re-samples the data evenly to obtain N nodes. Meanwhile, for the edge contour line of the target object, the method also carries out uniform sampling to obtain N nodes. The two curves are aligned according to the poles on the object, so that each node in the initialization curve has a target node. The curve deformation method provided by the invention can be used for deforming the initialization curve to the contour line of the target object. Considering that the initial curve is far away from the target curve, the curve deformation method can be used for multiple times to iterate the deformation curve to obtain the final object contour line, and the object in the image is segmented.

According to a fourth aspect of the present invention, there is provided a method of processing an object containing a plurality of connected regions, comprising:

when the object is not occluded, the object is a connected region, represented by a closed curve. When the object is shielded, the object is divided into a plurality of connected areas which are represented by a plurality of closed curves. If only one initialization curve is used, the object cannot be completely segmented. Therefore, in a complete two-dimensional rectangular frame of the object, the rectangular frames of all the connected regions of the object are detected, and then the rectangular frames are deformed into the contour lines of the connected regions. And merging the contour lines in the complete object rectangular frame, and segmenting the complete object.

The beneficial effects of the invention are: the object is segmented by predicting the contour of the object in the image. In order to predict the contour line of an object, the invention provides a feature learning method based on cyclic convolution and a curve deformation method. According to the method, through cyclic convolution, efficient feature learning on the curve is achieved, and the accuracy of the object segmentation method based on the contour line is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic diagram of a feature learning method and a deformation curve on a curve based on circular convolution according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of feature learning on a curve using circular convolution according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a deep neural network constructed based on cyclic convolution according to an embodiment of the present invention.

Fig. 4 is a flow chart of object segmentation according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of an embodiment of the present invention for processing an object containing a plurality of connected regions.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and it will be appreciated by those skilled in the art that the present invention may be practiced without departing from the spirit and scope of the present invention and that the present invention is not limited by the specific embodiments disclosed below.

As shown in fig. 1, the present invention provides a method for learning features on a curve based on cyclic convolution, which performs curve deformation based on the learned features. The method comprises the following specific steps:

1. a target picture is input, and a picture feature F can be extracted by using an existing deep neural network.

2. An initialization curve is given, such as in fig. 1, the curve surrounds the target object. The curve is a closed chain of N nodes. The mathematical representation of the curve is { x }_i1, N, where x_iRepresenting the two-dimensional coordinates of the ith node.

3. Constructing a feature vector of each node on the curve: each node has a two-dimensional coordinate x_iFor each node, the invention constructs a corresponding feature vector: [ F (x)_i)；x′_i]Wherein [; a means of]Representing the concatenation of two vectors, F (x)_i) Is picture feature F at x_iLocation extracted feature, x'_iIs x_iA translation invariance version of (c). The known picture feature F is a three-dimensional tensor, x, similar to a picture_iIs a two-dimensional coordinate, F (x)_i) Can be represented by x on F_iAnd (5) position interpolation is carried out. F (x)_i) The method brings high-level features learned by the network for the pictures, and the features contain the contents of the pictures. Besides semantic content, the deformation curve also needs to know the relative distribution of each node, so that a two-dimensional coordinate x of the node is added_i. Considering that the curve deformation will not change due to the position of the curve on the picture, the invention constructs a picture withTwo-dimensional coordinate x 'of translational invariance'_iThe construction process comprises the following steps: for all nodes { x_i1., N, then, all nodes are subtracted by the smallest x-axis coordinate and the smallest y-axis coordinate.

4. Feature learning is performed on the curve by circular convolution as shown in fig. 2. There is one feature vector on each node, and there are N feature vectors for N nodes. In general, the N eigenvectors can be regarded as one-dimensional discrete functions

One-dimensional convolution processing can be used directly. However, the standard one-dimensional convolution does not take into account the periodicity of the closed curve, destroying the topology of the closed curve. The present invention uses circular convolution to perform feature learning on a sequence of feature vectors defined on a curve. The sequence of feature vectors defined on the curve being a periodic signal f_NAnd can be defined as:

wherein (f)_N)_iRepresenting a periodic sequence of feature vectors f_NI mod N denotes that i takes the remainder of N.

The invention processes this periodic signal f using a cyclic convolution_NDefined as:

where symbol denotes a standard one-dimensional convolution,

is a learnable convolution kernel, k_jThe jth parameter vector representing the convolution kernel k. In the formula, the size of the convolution kernel is 2r + 1.

Fig. 2 shows an example of a circular convolution on a curve. As shown in fig. 2, the lowest circular node sequence is a feature vector sequence defined on a curve, the middle 5 nodes represent one-dimensional convolution kernels of size 5, and the top is the result of the output of the circular convolution. The meaning of the convolution kernel is the same as that of a standard one-dimensional convolution, inner products are respectively made on 5 parameter vectors of the convolution kernel and feature vectors of 5 nodes taking the current node as the center, and the 5 inner product results are added to obtain the output of the current position.

5. Based on the learned features, the regression offset is convolved using a multi-layered perceptron or 1x 1. FIG. 1 shows a schematic diagram of an offset regression. Specifically, a feature vector sequence on a curve is defined, and a feature vector learned on each node is obtained after a series of cyclic convolution network layers. For each node, a multi-layer perceptron or a plurality of 1x1 convolutional network layers can be used for feature transformation, features are mapped to node offset space, and the offset of each node is predicted. The initialized curve has N nodes, and the object contour line also has N nodes. The nodes are in one-to-one correspondence, and the offset of each node is predicted by the method.

As shown in fig. 3, the present invention provides a deep neural network based on cyclic convolution as a curve deformer. The deep neural network comprises three parts: a network backbone structure, a feature fusion part and a prediction regression part. The network backbone structure is used for feature learning on the curve. The network backbone structure is composed of a plurality of cyclic convolution network layers. And the characteristic fusion part pools the characteristic vectors learned by the network backbone structure on each node, fuses the information of all nodes on the closed curve to obtain a fusion vector, and then connects the fusion vector to the characteristic vector learned by each node. The prediction regression part uses a multilayer perceptron or a plurality of 1x1 convolutional network layers to map the fused feature vector on each node into two-dimensional offset and points to a target node.

As shown in fig. 4, the present invention provides an object segmentation method based on curve deformation. A picture is input, and an initial closed curve is obtained by using the target object detector. The closed curve can be a rectangular frame, or a closed curve which is rough around the object. When the closed curve is a rectangular frame, the invention takes the midpoints of four sides of the rectangle to connect the four sides into a quadrangle. Inputting the quadrangle into the curve deformer provided by the invention, deforming four nodes of the quadrangle, and predicting to obtain four object poles. The object poles are pixel points at the top, bottom, left and right edges of the object. Based on the predicted poles of the object, an octagon may be constructed. In particular, the four poles extending horizontally and vertically may form a rectangular frame. For the upper extreme point, a line segment with the length of a quarter rectangle is extended along the horizontal direction, and for the lower left and right extreme points, similar operation is carried out, so that four line segments can be obtained. And connecting the four line segments to obtain the octagon. And sampling the octagon to obtain N nodes (preferably uniform sampling), inputting the N nodes into a curve deformer, deforming the sampled N nodes, and predicting to obtain N nodes on the object contour line.

As shown in FIG. 5, the present invention provides a method of processing an object comprising a plurality of connected regions. In fig. 5, a car is sectioned by pillars into three unconnected regions. The invention firstly uses a target object detector to detect a complete object rectangular frame, and then detects rectangular frames of all connected regions in the complete object rectangular frame. The rectangular frame of the connected region can be used as an initial closed curve, the rectangular frame is transformed into the contour lines of the connected region through the flow provided by the step shown in fig. 4, and the contour lines of the three connected regions are combined to complete the segmentation of the target object in the graph.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points.

The foregoing is merely a preferred embodiment of the present invention, and although the present invention has been disclosed in the context of preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make many possible variations and modifications to the disclosed solution, or to modify equivalent embodiments, without departing from the scope of the solution, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims

1. An object segmentation method based on cyclic convolution, the method comprising:

determining an initialized closed curve of a target image, constructing a feature vector for each node of the closed curve, and performing feature learning on the closed curve through cyclic convolution;

predicting an offset on each node of the closed curve by using a regressor, wherein the offset points to the object contour line node from the curve node; adding offset to each node coordinate to realize curve deformation;

and (5) iterating the deformation curve to obtain a final object contour line, and segmenting an object in the image.

2. The object segmentation method according to claim 1, wherein the constructing a feature vector for each node of the closed curve comprises: and extracting picture features of the target image by using a deep neural network, and constructing a feature vector for each node based on the picture features.

3. The object segmentation method according to claim 2, wherein the feature vector of each node is formed by splicing a feature extracted by the node at a corresponding two-dimensional coordinate in the picture feature and a two-dimensional coordinate with translation invariance of the node.

4. The object segmentation method according to claim 1, wherein the feature learning by cyclic convolution includes: the characteristic vector sequence on the closed curve is a periodic one-dimensional discrete signal, and the periodic signal is processed by using cyclic convolution to realize characteristic learning.

5. The object segmentation method according to claim 1, wherein the regressor is a multi-layer perceptron or a plurality of 1x1 convolutional network layers.

6. The object segmentation method according to claim 1, wherein the determining an initialized closed curve of the target image comprises: the position of a target object is predicted using a target object detector based on deep learning, and the position of the target object is represented by a closed curve surrounding the object.

7. The object segmentation method according to claim 6, wherein the position of the target object in the target image is represented by a two-dimensional rectangular frame, the midpoints of each side of the two-dimensional rectangular frame are connected to obtain a quadrangle, the quadrangle is subjected to curve deformation, four corner points of the quadrangle are deformed into four poles of the object, an octagon is constructed based on the four poles, the octagon is used as an initialized closed curve, and the initialized closed curve is sampled and then subjected to curve deformation to obtain an object contour line.

8. The object segmentation method according to claim 1, wherein when the object is occluded, rectangular frames of respective connected regions of the object are detected in a complete object two-dimensional rectangular frame, then the rectangular frames are deformed into the contour lines of the connected regions, and the contour lines in the complete object rectangular frames are merged to segment the complete object.

9. The object segmentation method according to claim 1, wherein the feature vectors learned at each node are pooled, information of all nodes on the closed curve is fused to obtain a fused vector, and the fused vector is connected to the feature vector learned at each node before and then predictive regression is performed.

10. An apparatus for object segmentation based on cyclic convolution, the apparatus comprising:

the characteristic learning module: inputting an initialized closed curve of a target object, constructing a feature vector for each node of the closed curve, and performing feature learning on the closed curve through cyclic convolution;

a curve deformation module: for each node of the closed curve, inputting a feature vector obtained by learning of the node, predicting an offset on the node by using a regressor, and enabling the offset to point to an object contour line node from the curve node; adding offset to each node coordinate to realize curve deformation;

an object segmentation module: and (5) iterating the deformation curve to obtain a final object contour line, and segmenting the object in the image.