CN114067101A

CN114067101A - Image significance detection method of double-stream decoder based on information complementation

Info

Publication number: CN114067101A
Application number: CN202111304064.7A
Authority: CN
Inventors: 林怡炜; 许金山; 陈镇钦; 汪梦婷; 楼柯辰
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-02-18

Abstract

An image significance detection method based on a dual-stream decoder with complementary information comprises the following steps: step S1, decomposing the label image to obtain a corresponding main body label image and a corresponding outline detail label image; step S2, random cutting, random rotation, normalization and graying processing are carried out on the training data set image to enhance the diversity of the sample; step S3, inputting an image, preprocessing the image by using a VGG16 frame, and collecting image features of different sizes by using a group of encoding blocks with different dimensions; step S4, inputting the five-layer output characteristic diagram obtained by the encoder into an Embedding layer, and unifying dimensions; step S5, respectively transmitting the coded features of the target main body graph and the coded features of the contour detail graph into a salient branch and a contour graph branch, and respectively supervising the obtained main body feature graph and contour feature graph on the respective branches; and step S6, adding and fusing the target main body characteristics and the contour detail characteristics obtained on the two main branches to obtain a final prediction image.

Description

Image significance detection method of double-stream decoder based on information complementation

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image significance detection method of a double-stream decoder based on information complementation.

Background

Traditional saliency detection is by acquiring specific features of the target. The acquisition method defines the specificity of the object types, but also falls into the limitation of the algorithm inevitably. Saliency detection is the task of segmenting objects or regions that are most visually distinctive of an image, in other words, it is also said that these studies are intended to be able to identify the subject of an image. Saliency detection, unlike other detailed techniques such as semantic segmentation or instance segmentation, always focuses on several main areas. It is therefore often used as the first step in many research studies, such as target tracking, target recognition, motion classification, etc.

The main difficulty of saliency detection is to distinguish saliency objects and object edges in the image. Inspired by the correlation between salient objects and object contours. Compared with a single method for progress learning by using edge information or an overall graph, the method for progress learning by using the edge information or the overall graph obtains a target main body graph and a contour detail graph of the target by using an information complementation method, so that the target main body graph and the contour detail graph are used for supervising a learning training model together, the influence caused by extreme imbalance of edge pixel distribution of the significant target is reduced, and the effect is improved.

Disclosure of Invention

The present invention overcomes the above-mentioned shortcomings of the prior art and provides a significance detection method for a dual-stream decoder based on information complementation.

The invention combines the VGG model, and the parameter quantity contained in the VGG model is obviously smaller than that of other VGG-based methods; and the advantages of the characteristic fusion module and the loss function designed by the method realize better detection effect of the salient object.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

an image significance detection method based on a double-stream decoder with complementary information comprises the following steps:

step S1, generating a contour detail map by expanding and eroding the protrusions and calculating the difference value between the label images, and searching the area in the closed contour by adopting a seed filling algorithm so as to obtain a target main body map;

step S2, random cutting, random rotation, normalization and graying processing are carried out on the training data set image to enhance the diversity of the sample;

s3, inputting an image with H multiplied by W dimension, preprocessing the image by using a VGG16 frame, and collecting image features with different sizes by using a group of encoding blocks with different dimensions;

step S4, inputting the five-layer output characteristic diagram obtained by the encoder into an Embedding layer, and unifying dimensions;

step S5, respectively transmitting the coded features of the target main body image and the coded features of the contour detail image into a salient branch and a contour image branch, and respectively supervising the obtained main body feature image and contour feature image on the respective branches, wherein on one hand, the information of the main body image and the contour feature image is interactively fused to achieve information complementation, and on the other hand, the main body image and the contour feature image are respectively transmitted into respective image decoders to enrich the features;

and step S6, adding and fusing the target main body characteristics and the contour detail characteristics obtained on the two main branches, and then obtaining a final prediction image through an up-sampling operation and a final connection operation.

The invention provides an image significance detection method of a double-flow decoder based on information complementation, which utilizes an image real label graph to obtain a target main body graph and an outline detail graph, wherein the main body graph mainly comprises the subject information of an image, and the outline graph mainly comprises the edge detail information of the image. The training of the model is supervised by the main body graph and the contour graph together, and the main body graph and the contour graph which are optimized continuously are added and fused in the feature fusion module, and because the two images contain corresponding complementary information, the features of the two images are subjected to iterative training, so that a better obvious target prediction effect is obtained.

The invention has the advantages that: for the emphasis on the correlation between the salient object and the object contour, the real image is divided into two complementary information which respectively focus on different areas. The salient target image and the target contour map supervise the iterative training model together, and the two features are added and fused, so that more information of the main body part and the contour edge part of the salient target is utilized, and a better prediction effect is obtained.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of a feature interaction module of the decoder of the flowchart of fig. 1.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

The technical scheme of the invention is described below by combining the accompanying drawings.

step S5, respectively transmitting the coded features of the target main body image and the coded features of the contour detail image into a salient branch and a contour branch, and simultaneously respectively monitoring the obtained main body feature image and contour feature image on the respective branches, on one hand, interactively fusing the information of the main body feature image and the contour feature image to achieve information complementation, and on the other hand, respectively transmitting the main body feature image and the contour feature image into respective image decoders to enrich the features;

Step S1, generating a contour detail map by using the expansion and erosion highlights and calculating the difference between the label images, and finding the area in the closed contour by using a seed filling algorithm, thereby obtaining a target subject map, specifically including:

s11, inputting a label image, wherein a significant target area is a white part, a background is a black part, the size is not required, the pixel value of the background part is 0, and the pixel value of the foreground part is 1;

s12, generating a contour detail map by utilizing the expansion and erosion protrusions and calculating the difference value between the label images, and searching for an area in a closed contour by adopting a seed filling algorithm so as to obtain a target main body map. The calculation formula is as follows:

wherein, X and Y respectively represent erosion and expansion operations, and B (X) represents a structural element, and each point X in the working space E is correspondingly operated.

Step S3 specifically includes: the method comprises the steps of training a pre-training model obtained by using a VGG16 model, loading parameters, inputting data, obtaining 5 layers of feature output except a full connection layer, using a group of encoding blocks with different dimensions, collecting image features with different sizes respectively, and recording the image features as F ═ F respectively_i|i＝1,2,3,4,5}；

In order to reduce the feature channels and the calculation amount, a channel pooling layer is added at the top of each feature map so that information can be transmitted through different channels, and the channel pooling layer is specifically defined as follows:

Emebdding_i＝cp(E_i) (3)

wherein j and K are integers, i ∈ [1,5 ]]Index values representing each feature map, p (x) represents a channel pooling operation,

then represents the second of the feature map X

Channels, while the pooling layer collects each

Maximum value of channel, where N and M are respectively expressedShowing an input channel and an output channel.

Step S5 specifically includes: each feature fusion module comprises two branches, namely a main branch and a contour branch, each branch is additionally provided with a corresponding new branch and is supervised by a main graph and a contour graph respectively, then the obtained features are superposed and fused, and the obtained new features are input into the opposite main branch by different new branches for feature fusion. The final output is mainly a significant target, so that a main branch is mainly used, the result is input into the next feature fusion module, and the modules can be superposed, and the method comprises the following specific steps:

and S51, for each feature fusion module, adding two data, wherein two input sources are respectively output from a feature output layer corresponding to the encoder and a module in the previous layer. For the first feature fusion module of the main body graph, since there is no input feature from the previous layer module, the 5 th layer output in the encoder is used as the input of the previous layer decoder, wherein the branch intermediate connection is implemented as follows:

wherein A is_iAnd B_iRespectively representing the body branch and the profile branch, conv representing the convolution operation, the subscript of conv representing the corresponding branch, P_i ^AAnd P_i ^BRepresents a prediction of the relevant task, wherein additional supervision is added.

S52, each feature fusion module comprises two branches, namely a main branch and a contour branch, each branch is additionally provided with a corresponding new branch and is supervised by a main graph and a contour graph respectively, then the obtained features are overlapped and fused, the obtained new features are input into the opposite main branch by different new branches for feature fusion until the modules are overlapped for 5 times repeatedly, and after the final input features are fused, a theme prediction graph is output, and the specific operation and final prediction of the branches are as follows:

wherein upsample and concat represent the upsampling and concatenating operations, respectively. For the final prediction, all features in the subject branch are connected to balance the hierarchical information, specifically expressed as:

Final＝conv_Final(concat([upsample(A_i),i＝1,2,3,4,5])) (9)

wherein all A are_iThe input size is upsampled before concatenation and the final prediction is aggregated over the concatenation characteristics.

Step S6 specifically includes: comparing the three prediction graphs obtained by each module, including the subject prediction graph, the contour detail prediction graph and the significant target prediction graph with the real data label, calculating a loss value, and reversely propagating and updating the weight in the model, wherein the calculation content of the loss value is as follows:

and (3) solving a loss value of the obtained main body prediction graph and the obtained contour detail prediction graph and the corresponding label graph, wherein the loss value is calculated by adopting a Binary Cross entry function:

wherein g (x, y) is the value corresponding to the pixel of the label map at (x, y), and the value range is [0,1 ]](ii) a p (x, y) is the pixel corresponding value of the prediction map at (x, y); l_bce(x, y) is a loss value.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image significance detection method based on a double-stream decoder with complementary information comprises the following steps:

2. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S1, generating a contour detail map by using the expansion and erosion highlights and calculating the difference between the label images, and finding the area in the closed contour by using a seed filling algorithm, thereby obtaining a target subject map, specifically including:

s12, generating a contour detail map by utilizing the expansion and erosion protrusions and calculating the difference value between label images, and searching for an area in a closed contour by adopting a seed filling algorithm to obtain a target main body map, wherein the calculation formula is as follows:

3. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S3 specifically includes: pre-training model obtained by training VGG16 model, loading parameters, inputting data, dividing into wholeAnd obtaining 5 layers of feature output outside the connection layer, using a group of coding blocks with different dimensionalities, respectively collecting image features with different sizes, and respectively recording the image features as F ═ F_i|i＝1,2,3,4,5}；

Emebdding_i＝cp(E_i) (3)

then represents the second of the feature map X

Channels, while the pooling layer collects each

The maximum value of the channel, where N and M represent the input channel and the output channel, respectively.

4. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S5 specifically includes: each feature fusion module comprises two branches, namely a main branch and a contour branch, each branch is additionally provided with a corresponding new branch and is supervised by a main graph and a contour graph respectively, then the obtained features are superposed and fused, and the obtained new features are input into the opposite main branch by different new branches for feature fusion; the final output is mainly a significant target, so that a main branch is mainly used, the result is input into the next feature fusion module, and the modules can be superposed, and the method comprises the following specific steps:

s51, for each feature fusion module, adding two data, wherein two input sources are respectively output from a feature output layer corresponding to the encoder and a module of the previous layer; for the first feature fusion module of the main body graph, since there is no input feature from the previous layer module, the 5 th layer output in the encoder is used as the input of the previous layer decoder, wherein the branch intermediate connection is implemented as follows:

wherein A is_iAnd B_iRespectively representing the body branch and the profile branch, conv representing the convolution operation, the subscript of conv representing the corresponding branch, P_i ^AAnd P_i ^BRepresents a prediction of the relevant task, wherein additional supervision is added;

A_i＝conv_Ai(upsample(concat(A_i+1,B'_i,Embedding_i))) (7)

B_i＝conv_Bi(concat(A'_i,upsample(B_i+1))) (8)

wherein upsample and concat respectively represent upsampling and connection operations, and for final prediction, all features in the main branch are connected to balance hierarchical information, specifically represented as:

Final＝conv_Final(concat([upsample(A_i),i＝1,2,3,4,5])) (9)

5. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S6 specifically includes: comparing the three prediction graphs obtained by each module, including the subject prediction graph, the contour detail prediction graph and the significant target prediction graph with the real data label, calculating a loss value, and reversely propagating and updating the weight in the model, wherein the calculation content of the loss value is as follows: