CN110796666B

CN110796666B - Texture segmentation algorithm based on shape descriptor and twin neural network

Info

Publication number: CN110796666B
Application number: CN201910949663.0A
Authority: CN
Inventors: 李卫平; 武海燕
Original assignee: Railway police college
Current assignee: Railway police college
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2023-03-31
Anticipated expiration: 2039-10-08
Also published as: CN110796666A

Abstract

The invention discloses a texture segmentation algorithm based on a shape descriptor and a twin neural network, wherein an image is assumed to be composed of regions, and each region is provided with a fixed shape descriptor; combining each shape descriptor with a twin neural network; the texture segmentation is then designed into an optimization problem, and when a region of interest is selected, the segmentation is an optimal solution, so that the learned shape descriptors are almost constant in the region, and the optimal solution is found according to a method of minimizing energy. The texture segmentation algorithm is superior to other algorithms in both contour indexes and region indexes, and can achieve a good segmentation effect on complex geometric transformation or complex disturbing images.

Description

Texture segmentation algorithm based on shape descriptor and twin neural network

Technical Field

The invention relates to a texture segmentation algorithm, in particular to a texture segmentation algorithm based on a shape descriptor and a twin neural network.

Background

The image texture may be qualitatively described in terms of physical quantities such as intensity, density, orientation, etc. Texture segmentation of images is a fundamental problem in the field of computer vision, and its segmentation quality plays a key role in image post-processing tasks such as object classification, extraction, etc. Currently, image texture segmentation has gradually become an important research direction in the field of image analysis.

Common methods for texture segmentation can be roughly classified into two categories, namely edge-based methods and region-based methods. Where edge-based methods attempt to locate edges as responses to the filter bank, then post-process these responses to fill the gaps, and generate segmentations. Although the edge-based approach works well, it still suffers from the difficulty of generating a segmentation from the edge, which requires a method that relies on manual fabrication and this problem is not yet fully solved. For the region-based approach, the main idea is to divide the image into regions according to a global intensity distribution. Because the spatial relationship is lost, such methods attempt to incorporate the spatial relationship by a neighborhood distribution of pixels. For example, gabor filters output a larger neighborhood in different scales and directions and group it into other methods of texture segmentation.

However, the region-based approach suffers from the following problems: because neighborhoods that aggregate statistics across partition boundaries are difficult to group, describing neighborhoods without knowledge or knowledge of partition estimates is prone to error.

The natural way to segment textures is to construct descriptors at each pixel that are invariant to text variations within the texture and that are distinctive to text in different textures, so that these descriptors can be grouped to ultimately form a segmentation. The existing segmentation method takes the estimation and the segmentation of the descriptor as a joint problem, and has better effect on the image segmentation of simple geometric figures. However, since the descriptors they construct are made by hand, they do not exhibit invariance to complex image scenes. A local invariant descriptor is the image statistics at each pixel that describes the neighborhood in a way that is invariant to geometric and photometric perturbations. Typically, descriptors are computed by aggregating smooth orientation gradients within a neighborhood of pixels. These descriptors play an important role in characterizing local texture properties. This is because the texture is composed of small marks, which may vary due to small geometric and photometric errors, but are otherwise static. Careful construction of these descriptors is crucial because they play a key role in low-level segmentation, which in turn plays a role in higher-level tasks such as object detection and segmentation.

Existing shape descriptors aggregate directional gradients in a predefined pixel neighborhood, which may contain image data from different texture regions, especially near texture boundaries. However, this leads to ambiguity in packet descriptors, especially for nearby descriptor boundaries. This may lead to segmentation errors if the descriptors are grouped to form segmentations, and this problem is even more pronounced when the texture geometry in the image is large.

Disclosure of Invention

In view of the above problems, the present invention aims to provide a texture segmentation algorithm, i.e. a texture segmentation algorithm based on a shape descriptor and a twin neural network, which can achieve a better segmentation effect on complex geometric transformations or complex nuisance images.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a shape descriptor and twin neural network based texture segmentation algorithm comprising the steps of:

s1: setting an image from N _r Each region having a fixed shape descriptor therein;

s2: combining each shape descriptor with a twin neural network;

s3: and selecting the interested region, and solving the optimal solution according to a texture segmentation algorithm.

Further, the shape descriptor in step S1 is defined as the solution of the Poisson partial differential equation

Wherein Ω is the field of the image, set ^ H>

For a region of interest, N for j =1 _c ，/>

For a channel of the image, is a gradient, Δ is the Laplacian, ™ ->

Is the boundary of R, N is->

Units of outward normal, α _i ∈R ⁺ Is scale, i =1 _s Said N is _c Is the number of channels, N _s Is the number of dimensions.

Further, the Poisson partial differential equation

By minimizing the energy formula E = & _R (J _j (x)-u _ij (x)) ² dx+α _i ∫ _R |▽u _ij (x)| ² dx is calculated to determine, set->

Vectors for all scales and channels>

Said u is _ij Is the smooth pass of the image.

Further, the specific operation steps of step S2 are:

s21: selecting two different shape descriptors u (x) and v (x);

s22: let function f be u (x) E R at a particular pixel ⁿ As an input parameter, a descriptor with m components is then returned, i.e., f R ⁿ →R ^m N = N _s ×N _c ；

S23: inputting u (x) and v (x) to f, respectively, then outputting two shape descriptors with m components, calculating the weighting L of the shape descriptor difference ² A norm;

s24: weighting L the shape descriptor differences ² The norm is a result value obtained by a Sigmoid function, and a value of 1 indicates that the descriptors are from different divided areas, and a value of 0 indicates that the descriptors are from the same area.

Further, in the twin network, for the shape descriptors u (x) and v (x) of the pixels x and y, the metric is defined as

Wherein, ω is _i Is the weight, ω _i ≥0,i＝1,...,m，f(u(x)) _i Is f (u (x)) _i The ith component of (a).

Further, the specific operation in step S3 is to set u ⁱ (x)∈R ⁿ Is a region R _i A basic shape descriptor of ⁱ ∈R ^m To represent the shape descriptor of a region, region R _i The energy of the division is

To minimize a ⁱ Is solved to obtain

Wherein R _i | denotes R _i I.e. the average of the descriptors in the region.

Further, the region R is processed by gradient descent method _i And optimizing the segmented energy, and calculating the minimum energy.

Further, the specific operation of minimizing energy includes:

1) Initialization phi _i ；

2) Setting an area: r _i ＝{x∈Ω:i＝argmax _j φ _j (x)}；

3) Calculating R _i Expansion of D (R) _i )；

4) At D (R) _i ) In calculating u ⁱ Calculating

5) Calculation of B _i ＝D(R _i )∩D(Ω\R _i )；

6) For x ∈ B _i Calculating

Evaluating f by a neural network;

7) Updating pixel x ∈ D (R) _i )∩D(R _j ) As follows

φ _i ^T+ΔT (x)＝φ _i ^T (x)-ΔT(G _i (x)-G _j (x))|▽φ _i ^T (x)+ΔT·βk _i |▽φ _i ^T (x)|；

8) Update all other pixels to

9) Clipping is performed between 0 and 1: phi is a _i ＝max{0,min{1,φ _i }}；

10 Repeating step 2) until the regions are fused.

Further, the

Satisfies partial differential equation->

The beneficial effects of the invention are:

1. the shape descriptors proposed in the present invention are computed only aggregating image statistics within the region of interest, so they do not need to be mixed across texture boundaries;

2. the shape descriptor based on the twin neural network is realized by using a learning descriptor derived from a shape customization descriptor based on manual fabrication by using the neural network, so that invariance to complex disturbance is shown, and texture segmentation of a complex disturbance image can be better coped with;

3. the texture segmentation method based on the shape descriptor and the twin neural network is effective and feasible, is superior to other algorithms in both contour indexes and region indexes, and can achieve a good segmentation effect on complex geometric transformation or complex nuisance images.

Drawings

FIG. 1 is a diagram showing a twin neural network structure according to the present invention.

FIG. 2 is a texture image artwork of the composite dataset of the present invention.

Fig. 3 is a true segmentation diagram of the synthetic texture image of fig. 2.

Fig. 4 is an image of the synthetic texture image of fig. 2 segmented using the mcg algorithm.

Fig. 5 is an image of the synthetic texture image of fig. 2 segmented using the gPb algorithm.

Fig. 6 is an image of the synthetic texture image of fig. 2 segmented using the CTF algorithm.

Fig. 7 is an image of the synthetic texture image of fig. 2 segmented using the STLD algorithm.

FIG. 8 is an image of the composite texture image of FIG. 2 segmented using the texture segmentation algorithm of the present invention.

Fig. 9 shows a real image original in a real texture dataset according to the present invention.

Fig. 10 is a true segmentation of the true image of fig. 9.

Fig. 11 is an image obtained by segmenting the real image in fig. 9 by using the mcg algorithm.

Fig. 12 is an image of the real image in fig. 9 segmented using the gPb algorithm.

Fig. 13 is a picture obtained by segmenting the real picture in fig. 9 by using the CTF algorithm.

Fig. 14 is an image obtained by segmenting the real image in fig. 9 by using the STLD algorithm.

Fig. 15 is an image of the real image in fig. 9 segmented using the texture segmentation algorithm of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following further describes the technical solution of the present invention with reference to the drawings and the embodiments.

s1: suppose an image is composed of N _r Each region having a fixed shape descriptor therein;

specifically, the shape descriptor is composed of a shape-dependent gradient scale space, and the shape-dependent scale space is a solution of a poisson partial differential equation. The scale space is defined in an interest area with any shape, image information outside the interest area is not fused, and the scale space is specifically defined as follows:

let Ω be the domain of the image, N for j =1 _c (N _c As the number of channels),

is the channel of the image, e.g. color, directional gradient, etc. Device for combining or screening>

It may be made of any shape, being the region of interest. The shape descriptor is defined as the solution of the Poisson partial differential equation

Where ∑ is the gradient, Δ is the laplace operator,

is the boundary of R, N is->

Units of outward normal. Alpha is alpha _i ∈R ⁺ Is a scale, also known as a smoothing factor. i =1 _s ，N _s Is the number of dimensions. The formula (1) can be minimized formula of energy de E = & _R (J _j (x)-u _ij (x))| ² dx+α _i ∫ _R |▽u _ij (x)| ² And d, calculating and solving. The solution to the partial differential equation is a balance between fidelity and smoothness of the image, and alpha _i Larger means smoother. Is set->

As vectors of all scales and channels

Wherein u is _ij Is the smooth pass of the image. Since partial differential equation (1) is defined in a specific region of interest R, image information outside the R region does not affect u _ij The value of (c). This is important in region-based segmentation methods, i.e. the descriptors do not aggregate the image data across region boundaries and therefore do not mix irrelevant statistical information.

S2: combining each shape descriptor with a twin neural network;

in particular, the shape descriptor described above has geometric invariance to simple geometric transformations because it contains a smoothing factor. In order to have invariance to more complex geometric transformations or complex photometric transformations (e.g. illumination changes), shape descriptors are here combined with a twin neural network.

Setting the function f at a particular pixel u (x) ∈ R ⁿ As an input parameter, a descriptor with m components is then returned, i.e. the description f: R ⁿ →R ^m Wherein N = N _s ×N _c From the space of basic shape descriptors to another vector space, there is better invariance. Since descriptors are eventually used to distinguish between descriptors of different regions, and it is necessary to take into account the case of descriptors from the same or different regions, the present invention selects a twin neural network for deep learning of shape descriptors.

As shown in fig. 1, the network structure is symmetrical, i.e. two different basic descriptors u (x) and v (x) are used. Two basic descriptors u, v are input to f, respectively, and then two descriptors with m components are output. Computing a weighting L for descriptor differences ² Norm, and finally obtaining a result value through a Sigmoid function; metric D R of neural network ⁿ ×R ⁿ →[0,1]The value 1 indicates that the descriptors are from different divided regions, and the value 0 indicates that the descriptors are from the same region.

In particular, in the twin network, the descriptors for pixels x and y are measured as

(3)

Wherein, ω is _i Is the weight, ω _i ≥0,i＝1,...,m。f(u(x)) _i Is f (u (x)) _i The ith component of (a).

Further, data for training the network is generated from the true segmented images in the training set. On the basis of a given training image, the shape descriptor is calculated by using a real segmentation method. For a truly adjacent region or any pair of pixels x and y in the same region in the same image, the training data is represented as

Wherein u is _l (x) Is R _l The shape descriptor, u, calculated at x _k (y) is R _k The resulting shape descriptor is computed at y. Since only the adjacent regions need to be distinguished during segmentation, only the adjacent regions are selected here.

Specifically, a region of interest is selected, and texture segmentation in the region is designed as an optimization problem, i.e., when the region is selected, the segmentation is an optimal solution, so that the learned shape descriptor is almost constant in the region. The shape descriptors and the twin neural network described above are applied in a texture segmentation algorithm. When the result of the Sigmoid function is 1, 2 pixels to be compared come from different areas; the result is 0, indicating that the 2 pixels compared are from the same region. And by analogy, gradually fusing the pixels in the same area to finally obtain the segmented texture image.

Specifically, let u ⁱ (x)∈R ⁿ Is a region R _i A basic shape descriptor of ⁱ ∈R ^m Is a shape descriptor representing a region. The energy of the division is as follows

Wherein, N _r Beta is more than 0 for the number of regions. The first term of the above equation measures the shape descriptor and constant vector a of each pixel in the region ⁱ To a similar degree. The second term of the above equation includes the penalty boundary length (ds is the arc length element), which leads to the spatial regularity of the segmentation. Thus, the optimal solution region is a region with almost constant descriptors.

To minimizea ⁱ By solving equation (5), the result is

Wherein R _i I represents R _i I.e. the average value of the descriptors in the region. Since the energy of equation (5) is non-convex in the region, the descriptor u ⁱ Non-linear dependence on R _i And f is non-convex, so the gradient descent method is used here to optimize the energy. Item i of R _i Has a gradient of ^ 5>

Wherein k is _i Is that

Has a curvature of sign, N _i Is->

Tr is a trace, D is a derivative, A is a diagonal matrix having a diagonal element size n>

J is a vector having a size n>

Satisfy partial differential equation

The first term in equation (6) is generated due to a change in the integral when the boundary is deformed, and the second term is generated due to a change in the descriptor when the boundary is changed. To simplify the implementation, the change in its descriptor can be ignored, since the numerical algorithm involves only a small fraction of the boundary at each iterationChange, and descriptor u ⁱ Has small change

To achieve a gradient descent in value, the gradient descent region φ may be represented by a relaxed index or "level set" function _i :Ω→[0,1],i＝1,...,N _r 。R _j Is phi _j In all i =1 _r Up to the region of maximum value, and then the boundary evolution can be transformed into a similar level set method phi _i Is disclosed. To extend the evolution beyond the boundary, the terms in the gradient are extended to a range around the boundary. The algorithm flow for computing the full gradient of energy, ignoring the change in descriptor terms, can obtain the minimum energy is as follows:

1) Initialization phi _i ；

2) Setting an area: r _i ＝{x∈Ω:i＝argmax _j φ _j (x)}；

3) Calculating R _i Expansion D (R) of _i )；

4) At D (R) _i ) In calculating u ⁱ Calculating

5) Calculating B _i ＝D(R _i )∩D(Ω\R _i )；

6) For x ∈ B _i Calculating

Evaluating f by the neural network;

7) Updating pixel x ∈ D (R) _i )∩D(R _j ) As follows

8) Update all other pixels to

9) Clipping is performed between 0 and 1:φ _i ＝max{0,min{1,φ _i }}；

10 Repeat step 2) until the regions merge.

The first embodiment is as follows:

in the twin neural network structure, two symmetrical networks each have a fully connected layer. The input base shape descriptors are 40-dimensional descriptors, i.e. RGB channels, gray scale and 4 directional gradients on 5 scales, where the 5 scales are α = (10, 20,30,40, 50), respectively. The output descriptor f of the twin network is the same as the number of hidden units used. The Sigmoid function of the two twin network weighted differences is used to compute the metric D for a pair of descriptors.

In order to compare the texture segmentation results, the texture segmentation algorithm of the present invention is compared with the mcg, gPb, CTF, and STLD algorithms in this embodiment. Then, the boundary accuracy and the area accuracy of the algorithm are evaluated using the ODS and OIS indices. For all metrics, a higher value indicates that the segmentation result is closer to the true value.

In this embodiment, a Brodatz composite dataset is used, consisting of images of two texture regions of 200 different shapes, 100 images are selected as the training set, and the remaining 100 images are used as the test set. The experimental results are shown in table 1, and it can be seen from the observation of table 1 that the texture segmentation algorithm of the present invention exhibits the best effect on both the contour index and the region index.

TABLE 1 index of the synthetic texture segmentation dataset

Wherein a schematic diagram of the experimental results is shown in figures 2-8. As can be found by observing the attached figures 2-8, the effect of the texture segmentation algorithm is the closest to the real segmentation effect, so the segmentation effect of the texture segmentation algorithm is more ideal.

Example two:

experiments were performed on the real texture dataset. 128 images from the documents N.khan, M.Algarni, A.Yezzi, and G.Sundaramoorthi.shape-related local descriptors and the pair application to segmentation and tracking in Proceedings of the IEEE Conference on Computer Vision and Pattern registration, pages 3890-3899,2015 and 150 images from the Berkeley segmentation dataset were used as training sets, and the remaining images were then used as test sets. The algorithm proposed by the present invention is then initialized by a 5 x 5 standard block subdivision. The final segmentation result index is shown in table 2. As can be seen from the observation of Table 2, the texture segmentation algorithm of the present invention exhibits the best effect on both the contour index and the region index.

Table 2 results of texture segmentation dataset

One of the experimental results is shown schematically in FIGS. 9-15. As can be seen from FIGS. 9-15, the segmentation area of the Mcg algorithm is large, and the segmentation effect of the gPb, CTF and STLD algorithms on the edge contour portion is general. Comprehensively, compared with the prior art, the texture segmentation result of the texture segmentation algorithm is closer to the real segmentation effect in the aspects of region and edge.

To test the effect of the initial values on the robustness of the texture segmentation algorithm of the present invention, experiments were performed using 3 × 3, 4 × 4, and 5 × 5 standard blocks, respectively, and the results are shown in table 3. As can be seen from the observation of Table 3, the texture segmentation algorithm of the present invention has better robustness for the selection of the initial value.

TABLE 3 test results for different initial values

In summary, the texture segmentation algorithm based on the shape descriptor and the twin neural network provided by the invention is practical and has certain beneficial effects. First, since the descriptors proposed by the present invention are computed only within the region of interest, aggregating image statistics, they do not mix statistics across texture boundaries. Second, it exhibits invariance to complex perturbations by using neural networks to learn descriptors derived from substantially handmade shape-customization descriptors. Experiments show that the descriptor provided by the invention can better deal with texture segmentation of complex harassing images.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, and such changes and modifications are within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A shape descriptor and twin neural network based texture segmentation algorithm comprising the steps of:

s1: setting the image to be composed of N _r Each region having a fixed shape descriptor therein; the shape descriptor is defined as a solution to a poisson partial differential equation;

the Poisson partial differential equation is

Where Ω is the domain of the image, and is set

For a region of interest, N for j =1 _c ，/>

For a channel of the image, is a gradient, Δ is the Laplacian, ™ ->

Is the boundary of R, N is->

Units of outward normal, α _i ∈R ⁺ Is scale, i =1 _s Said N is _c Is the number of channels, N _s Is the number of scales;

said poisson partial differential equation

By minimizing the energy formula E = & _R (J _j (x)-u _ij (x)) ² dx+α _i ∫ _R ▽u _ij (x) ² dx calculation to determine, set>

Vectors for all scales and channels

U is a unit of _ij Is the smooth channel of the image;

s2: combining each shape descriptor with a twin neural network;

s3: selecting an interested area, and solving an optimal solution according to a texture segmentation algorithm;

the specific operation steps of the step S2 are as follows:

s21: selecting two different shape descriptors u (x) and v (y);

s22: let function f be u (x) E R at a particular pixel ⁿ As an input parameter, a descriptor with m components is then returned, i.e., f: R ⁿ →R ^m N = N _s ×N _c ；

S23: inputting u (x) and v (y) to f, respectively, then outputting two shape descriptors with m components, calculating the weighting L of the shape descriptor difference ² A norm; in the twin network, for the shape descriptors u (x) and v (y) of pixels x and y, the metric is defined as

Wherein, ω is _i Is a weight, ω _i ≥0,i＝1,...,m，f(u(x)) _i Is the ith component of f (u (x));

s24: weighting L the shape descriptor differences ² The norm obtains a result value by using a Sigmoid function, wherein the result value indicates that the descriptors come from different divided areas when the norm value is 1 and indicates that the descriptors come from the same area when the norm value is 0;

the specific operation of step S3 is: selecting a region of interest, and designing texture segmentation in the region into an optimization problem, namely after the region is selected, the segmentation is an optimal solution, so that the learned shape descriptor is almost constant in the region; applying the shape descriptor and the twin neural network described in the steps S1 and S2 to a texture segmentation algorithm, wherein when the result of the Sigmoid function is 1, 2 pixels which are compared are from different areas; the result is 0,2 pixels which are compared come from the same area, and by analogy, the pixels in the same area are gradually fused, and finally the segmented texture image can be obtained;

let u ⁱ (x)∈R ⁿ Is a region R _i A basic shape descriptor of ⁱ ∈R ^m For shape descriptors representing regions, region R _i The energy of the division is