CN116805360B

CN116805360B - Obvious target detection method based on double-flow gating progressive optimization network

Info

Publication number: CN116805360B
Application number: CN202311048090.7A
Authority: CN
Inventors: 易玉根; 张宁毅; 黄龙军; 周唯; 谢更生; 石艳娇
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2023-12-05
Anticipated expiration: 2043-08-21
Also published as: CN116805360A

Abstract

The application relates to a salient target detection method based on a double-flow gating progressive optimization network, which relates to the technical field of image processing and computer vision and comprises the following steps: acquiring an image data set and acquiring a detail label of each image; preprocessing an image; constructing a double-flow gating progressive optimization network, wherein the double-flow gating progressive optimization network comprises a gating fusion network, a cross guiding module and a characteristic fusion module; training and testing a double-flow gating progressive optimization network, and carrying out weighted summation by adopting global loss, detail loss and fusion loss to serve as detail perception loss of the double-flow gating progressive optimization network; and detecting the remarkable target in the image by adopting the trained double-flow gating progressive optimization network. The method can overcome the defects of characteristic dilution, noise interference and confirmation of interpretation existing in the existing obvious target detection method and the problem of unbalanced pixel distribution of the traditional edge label, and can effectively improve the detection precision of the corresponding obvious target object.

Description

Obvious target detection method based on double-flow gating progressive optimization network

Technical Field

The application relates to the technical field of image processing and computer vision, in particular to a salient target detection method based on a double-flow gating progressive optimization network.

Background

Existing salient object detection methods are mostly based on encoder-decoder structures, where the encoder is used to generate multi-layer features and the decoder is used to combine features from different stages to achieve efficient localization and segmentation of salient objects. When the encoder feature extraction performance is limited, the decoder plays an important role in aggregating these features and predicting the saliency map. Current research has proposed a method of combining multi-layer features, such as projecting two adjacent layers of features into the same potential space through a multi-cascade decoder architecture, performing a feature stacking operation connect and a cascade operation cascade; or constructing an image pyramid for the original image, and then performing element addition operation on the output characteristics of each layer of the pyramid, and predicting. The two methods use the advantages of the multi-layer characteristics to extract low-level and high-level semantic information of the image, and combine the multi-scale semantic information to obtain better detection precision. However, these fusion approaches lack information flow control components, which may lead to the transfer of redundant information, compromising significant target detection performance, have three major problems:

(1) Characteristic dilution: some pixels or channels in the multi-layer feature may contain invalid information, resulting in useful information being diluted, affecting the final prediction result;

(2) Noise interference: the low-level features contain complex details and noise information of the salient objects, and the simple feature aggregation method cannot effectively filter noise, so that non-salient information is mixed in the prediction graph, and accurate identification of the salient objects is affected;

(3) Lack of interpretability: it is difficult to determine which pixels or channels are critical to the final output result in the multi-layer feature aggregation process, making it difficult to interpret the predictive behavior of the model and judge the importance of the features.

In addition, predicting edge pixels with blurred boundaries is more difficult than predicting center pixels, presenting a significant challenge to segmentation of salient objects.

Disclosure of Invention

The invention aims to provide a significant target detection method based on a double-flow gating progressive optimization network, which can overcome the defects of characteristic dilution, noise interference and interpretation confirmation existing in the existing significant target detection method, can also overcome the problem of unbalanced pixel distribution of the traditional edge label, and can improve the detection precision of a corresponding significant target object by utilizing the complementary relation between a global branch and a detail branch.

The technical scheme adopted by the invention is as follows: a significant target detection method based on a double-flow gating progressive optimization network comprises the following steps:

s1: collecting an image data set for salient object detection, acquiring salient labels of each image in the image data set, and acquiring detail labels of each image by extracting edge information of the salient labels;

s2: preprocessing each image in the image data set, and dividing the image data set into a training set and a testing set;

s3: constructing a double-flow gating progressive optimization network GPONet, wherein the double-flow gating progressive optimization network GPONet comprises a gating fusion network, a cross guiding module and a feature fusion module;

the gating fusion network is used for extractingNGlobal features of individual phasesGAnd detail featuresEAnd respectively and complementarily fusing the global features of adjacent stages through the global branches and the detail branchesGOr detail featuresEGenerating a fused global featureG ^out And fused detail featuresE ^out ；

The cross guiding module is used for fusing global featuresG ^out And fused detail featuresE ^out Communication interaction and cross guidance are carried out, and the fused global features are subjected toG ^out And fused detail features E ^out Complementary generation of cross-guided global featuresAnd detail features after cross guidance->And according to the global feature after cross-guidance +.>And detail features after cross guidance->Generating a global prediction graph and an edge prediction graph;

the feature fusion module is used for fusing the global prediction graph and the edge prediction graph to generate a fusion feature graph, and generating a significant prediction graph containing a significant target by predicting the fusion feature graph;

s4: training the double-flow gating progressive optimization network GPONet by using a training set, and adopting global lossTo optimize the prediction of global branches, exploiting detail penalty +.>To optimize the prediction of detail branches and to employ fusion penalty +.>To optimize the prediction of the saliency prediction map; global loss->Loss of detail->And fusion loss->Weighted summation is carried out, and detail perception loss of GPONet serving as a double-flow gating progressive optimization network is ∈ ->The method comprises the steps of carrying out a first treatment on the surface of the Testing the network performance of the dual-flow gating progressive optimization network GPONet by using a test set;

s5: and detecting the remarkable target in the image by adopting the trained double-flow gating progressive optimization network GPONet.

Further, the preprocessing in step S2 includes data enhancement by image transformation, normalization of pixel values of the image, unification of image size, and conversion of the image into tensor data types.

Further, the gating fusion network comprises an encoder, a global branch and a detail branch, and the input image is subjected to characteristic mapping by the encoder to extractNGlobal features of individual phasesGAnd detail featuresEBoth global and detail branches includeN-1 gated fusion unit, wherein,N≧2；Nglobal features of individual phasesGAnd detail featuresERespectively viaN-1 gated fusion unit process to beNThe features of the adjacent stages are complementarily fused;

the gating fusion unit obtains a first gating value according to the corresponding low-level features and high-level featuresAnd a second gating value->The specific formula is as follows:

；

wherein,F _L represent the firstLThe characteristics of the phases are that,represent the firstLCharacterization of +1 stageF _L+1 The features obtained after the up-sampling are used,F= {GorE"i.e. featuresFIs global featureGOr detail featuresE，w ₁ A weight parameter representing the learned first gating value,w ₂ a weight parameter representing the learned second threshold value,brepresenting a learnable bias parameter, +.>Representing a Sigmoid activation function;

with a first gate valueAnd a second gating value->As a weight, according toLCharacteristics of the stagesF _L And (d)LCharacterization of +1 stageF _L+1 The operation obtains the firstLPost-phase fusion feature->The specific formula is as follows:

；

wherein,represent the first LLow-level activation of phase->Represent the firstLHigh-level activation of phase->Represent the firstLHigh semantic information of the stage.

Further, the cross guiding module comprisesNA cross guiding unit for gating the fused global features of the same stage of the fused network outputG ^out And fused detail featuresE ^out Respectively atNThe cross guiding units generate fusion characteristics through convolution operation and connection operationF _(E,G) The specific expression is as follows:

；

wherein,F _L(E,G) represent the firstLThe fusion characteristics of the phases are that,representing superposition operations in the channel dimension,/->Represent the firstLGlobal features after fusion of phases, +.>Represent the firstLThe fused detail characteristics of the phases,w _FG the representation is from the firstLGlobal features after fusion of phases +.>To the firstLFusion features of stagesF _L(E,G) Is used to determine the first parameter of the first computer program,w _IG the representation is from the firstLGlobal features after fusion of phases +.>To the firstLFusion features of stagesF _L(E,G) Is used to determine the second parameter of the model,w _FE the representation is from the firstLPost-fusion detail feature of stage->To the firstLFusion features of stagesF _L(E,G) Is used to determine the first parameter of the first computer program,w _IE the representation is from the firstLPost-fusion detail feature of stage->To the firstLFusion features of stagesF _L(E,G) Is a second learnable parameter of (a);

fusion featuresF _(E,G) Mapping back global branch and detail branch respectively by convolution operation, and combining with the fused global feature G ^out And fused detail featuresE ^out Adding, and finally performing convolution operation to generate global features after cross guidanceAnd detail features after cross guidance->The specific expression is as follows:

；

wherein,w _OG the representation is from the firstLFusion features of stagesF _L(E,G) A first learnable parameter mapped back to the global branch,w _PG the representation is from the firstLFusion features of stagesF _L(E,G) A second learnable parameter mapped back to the global branch,w _OE the representation is from the firstLFusion features of stagesF _L(E,G) A first learnable parameter mapped back to the detail branch,w _PE the representation is from the firstLFusion features of stagesF _L(E,G) A second learnable parameter mapped back to the detail branch; cross-guided global featuresAnd detail features->A global prediction graph and an edge prediction graph are generated through a prediction head operation.

Further, the feature fusion module outputs the global feature after cross guidance output by the cross guidance moduleAnd detail features after cross guidance->Connecting in the channel dimension to generate a fused feature graphF _fuse The specific formula is as follows:

；

wherein,L= 1，2，…，N；

feature map fusion through global average pooling layer and full connection layer pairF _fuse Processing to obtain attention score of the fusion feature mapSThe specific formula is as follows:

；

wherein,representing global average pooling operations,/- >Representing a fully connected operation;

will fuse the feature mapF _fuse Attention score with fused feature mapSMultiplying for weighting, calculating channel weighted fusion result, and generating a significant prediction graph containing significant targets through a pre-measurement head layerPThe specific formula is as follows:

；

wherein,representing the predictive head operation.

Further, in the step S4, each pixel point of the image detail label is traversed through the convolution kernel, and whether the pixel point belongs to an important detail pixel is determined according to the average value in the convolution kernel, where the determination rule is as follows:

；

wherein, the method comprises the following steps ofx，y) The coordinate value is expressed as%x，y) Is used for the display of the display panel,Detail Pixela detail pixel is represented and,Body Pixelthe foreground pixels are represented by a representation of the foreground pixels,Backgroud Pixelrepresenting background pixels #x，y) _avg The coordinate value is expressed as%x，y) Taking the pixel points of the (C) as the center, wherein the convolution kernel size is the average value of the pixel values of all the pixel points in the convolution kernel range of the radius;

according to the average value in convolution kernelx，y) _avg Calculating a pixel weight matrixWherein the pixel weight matrix +.>Inner firstxLine 1yColumn number +.>Representative coordinate value is%x，y) The calculation formula of the weight value of the pixel point of (2) is as follows:

；

wherein,is a super-parameter of the pixel weight matrix,Detail _{x y(，)} the coordinate value in the detail label is%x，y) Pixel values of the pixel points of (a);

the global penaltyLoss of detail- >And fusion loss->The expression of (2) is as follows:

；

wherein,Hthe height of the image is represented by the height,Wthe width of the image is represented and,the coordinate value in the global predictive graph representing the global branch is%x，y) Predicted value of pixel point of +.>The coordinate value in the marked label is%x，y) True value of pixel point of +.>The coordinate value in the edge prediction graph representing the detail branch is%x，y) Predicted value of pixel point of +.>The coordinate value in the detail label is expressed as%x，y) True value of pixel point of +.>Representing a saliency prediction map containing saliency targetsPThe middle coordinate value is%x，y) A predicted value of a pixel of (a); global loss->Loss of detail->And fusion loss->Weighted summation is carried out, and the detail perception loss of the double-flow gating progressive optimization network GPONet is obtained>The specific formula is as follows:

；

wherein,weights representing global loss, +.>Weight representing loss of detail, +.>Indicating the weight of the fusion loss.

Further, in the step S4, the following is performedFScore indexMean absolute error MAE, enhancement of alignment indexAnd structural similarity index->The network performance of the dual-flow gating progressive optimization network GPONet is detected, and the specific expression is as follows:

；

wherein,Precisionthe accuracy rate is indicated as a function of the accuracy,Recallthe recall rate is indicated as being the result of the recall,is a non-negative real number and is used for adjusting the importance balance between the accuracy rate and the recall rate; nThe number of pixels is indicated and,P _i represent the firstiThe predicted value of the individual pixel(s),G _i represent the firstiThe true value of the individual pixels,ithe number of pixels is indicated and,i= 1，2，…，n；Wthe width of the image is represented and,Hrepresenting the image height +.>The alignment function is used for calculating the alignment degree of the predicted target and the real target;S _r the region similarity value is represented by a value of similarity,S _o representing boundary similarity value, ++>Representing weight parameters for controllingS _r AndS _o is a specific gravity of (c).

The invention has the beneficial effects that:

(1) The invention improves the low-level characteristics from top to bottom through the gating fusion network with the gating fusion unit, and utilizes the first gating valueAnd a second gating value->Extracting complementary information of the low-level features and the high-level features, and fusing the features of adjacent stages in a complementary mode in an explanatory manner, so that the problems of feature dilution, noise interference and the like caused by the transmission of redundant information are avoided;

(2) The invention provides a gating fusion unit for separating active features and effective features between adjacent layer features; the activation feature means respective activation of two adjacent layers of features, such as low-level features of shallow edge textures and high-level features of deep foreground identification; the effective characteristics mean effective information in the transmission process of two adjacent layers of characteristics, and the transmission of redundant information is overcome; the method can help to understand the operation process of the double-flow gating progressive optimization network GPONet in theory, and the details of the feature fusion process can be displayed in a visual mode, so that the interpretability of the double-flow gating progressive optimization network GPONet is improved;

(3) According to the method, the edge prediction graph is generated in the double-flow gating progressive optimization network GPONet by utilizing the detail labels of the image data, and the edge prediction quality of the predicted significant graph is obviously improved by fusing the supplementary information of the details, so that additional manual labeling or training is not needed;

(4) According to the application, the fitting speed of two global branches and detail branches is accelerated by applying the cross guiding module between different feature layers, and the prediction quality of the obvious prediction graph is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a dual-flow gating progressive optimization network GPONet in an embodiment of the present application;

FIG. 3 is a schematic diagram of a gating fusion unit according to an embodiment of the present application;

FIG. 4 is a diagram showing a detail label in comparison with a conventional edge label in an embodiment of the present application;

FIG. 5 is a schematic diagram of a cross guiding unit according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a feature fusion module according to an embodiment of the present application;

FIG. 7 is a visual result of the internal activation value of the gating fusion unit according to an embodiment of the present application;

FIG. 8 is a graph of performance comparison of an embodiment of the present application with a conventional salient object detection method for detecting edges;

FIG. 9 is a graph comparing results of performing edge detection using conventional edge tags and detail tags;

fig. 10 is a schematic diagram of the results of a significant prediction graph obtained by the present embodiment and the existing significant target detection method.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced in other ways than those described herein, and therefore the present application is not limited to the specific embodiments disclosed below.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application pertains. The terms "first," "second," and the like in the description and in the claims, are not used for any order, quantity, or importance, but are used for distinguishing between different elements. Likewise, the terms "a" or "an" and the like do not denote a limitation of quantity, but rather denote the presence of at least one. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate a relative positional relationship, which changes accordingly when the absolute position of the object to be described changes.

As shown in fig. 1, a significant target detection method based on a dual-flow gating progressive optimization network includes the following steps:

s1: and acquiring an image data set for salient object detection, acquiring salient labels of each image in the image data set, and acquiring detail labels of each image by extracting edge information of the salient labels. The digital image data collected in the embodiments of the present invention are derived from common data sets commonly used in the five significant object detection fields, namely, a DUTS data set, an HKU-IS data set, a DUTS-OMRON data set, a PASCAL-S data set, and an ESCCD data set. These datasets contain diversity and richness of salient objects and backgrounds, covering various scenes and salient object types.

The DUTS dataset contains 10553 training images DUTS-TR and 5019 test images DUTS-TE. Both training and test sets contained very challenging scenarios in saliency detection, with accurate pixel-level truth manual annotation by 50 volunteers.

The ECSSD dataset contained 1000 images of a real world complex scene with common textures and structures and corresponding labels, the images were manually annotated by five volunteers and then averaged as the final result.

The pasal-S dataset contained 850 images, and salient objects in the 850 images were annotated according to the eye movement data of the pasal-S dataset, forming the final dataset.

The DUTS-OMRON dataset contained 5168 high quality images containing one or more salient objects and a relatively complex background, each of which was annotated with eye movement data by five observers.

The HKU-IS dataset annotated significant objects in all 7320 images by three volunteers, culled images that were not labeled consistently, and retained 4447 challenging images with low contrast or multiple significant objects.

The embodiment of the invention uses a Canny operator to extract edge information from the image sample, generates an edge image, and only selects the interior and edge parts of the salient label as final detail labels.

S2: each image in the image dataset is preprocessed and divided into a training set and a testing set. In an embodiment of the invention, the preprocessing includes data enhancement by image transformation, normalization of pixel values of the image, unification of image size, and conversion of the image into tensor data types.

The data enhancement comprises three modes of horizontal overturn, vertical overturn and random clipping, the diversity of a data set is increased by transforming and amplifying images, the data volume is increased, and the generalization capability and the robustness of a network model are improved.

Normalization is to subtract the mean from the original image pixel value and divide by the standard deviation such that the pixel value has zero mean and unit variance, and if the label pixel value is between 0 and 255, then divide the label pixel value by 255, thereby scaling the image pixel value and the label pixel value to a fixed range. The training process can be more stable through normalization processing, and the convergence rate of the network model is accelerated.

The unified image size is used for adapting to the input requirement of a network model, the images are adjusted to be the same size, all the images are ensured to have the same input dimension, and batch processing is convenient.

The purpose of converting an image into a tensor data type is to convert the image data into the data type required by the network model so that the image data can be processed by the network model.

After the data preprocessing is completed, the data set is divided into a training set and a testing set according to the proportion of 7:3. The training set is used to train parameters of the network model and the test set is used to evaluate the performance of the network model on the new sample, i.e. the performance of the network model.

S3: and constructing a double-flow gating progressive optimization network GPONet, wherein the structure of the double-flow gating progressive optimization network GPONet is shown in figure 2 and comprises a gating fusion network, a cross guiding module and a feature fusion module. In the figure, " "represents a gated fusion unit,">"represents a feature map,">"indicates predictive head operation,">"means channel overlap,">"means element multiplication.

The gating fusion network is used for extractingNGlobal features of individual phasesGAnd detail featuresEAnd respectively and complementarily fusing the global features of adjacent stages through the global branches and the detail branchesGOr detail featuresEGenerating a fused global featureG ^out And fused detail featuresE ^out 。

The gating fusion network comprises an encoder, a global branch and a detail branch, and an input image is subjected to characteristic mapping by the encoder to extractNGlobal features of individual phasesGAnd detail featuresEBoth global and detail branches includeN-1 gated fusion unit, wherein,N≧2；Nglobal features of individual phasesGAnd detail featuresERespectively viaN-1 gated fusion unit process to beNAnd the features of the adjacent stages are complementarily fused, so that the transmission of redundant information is avoided. In the embodiment of the invention, the adopted encoder is a ResNet50 encoder or a PVTv2 encoder, and can output four layers of characteristics, namelyN=4. The gating fusion unit can activate and transmit the detail information of the lower layer, so that the defect of insufficient semantic information of the upper layer caused by limited receptive fields of the features of the lower layer is overcome, and the obvious object can be accurately positioned and segmented.

As shown in fig. 3, the gating fusion unit obtains a first gating value according to the corresponding low-level features and high-level featuresAnd a second gating value->In the figure "/->"represents element multiplication">"means element addition; first gating value->And a second gating value->By performing two convolution operations, respectively, and performing Sigmoid functions to ensure that their values range between 0 and 1, the specific formula is as follows:

；（1）

；（2）

wherein,F _L represent the firstLThe characteristics of the phases are that,represent the firstLCharacterization of +1 stageF _L+1 The features obtained after the up-sampling are used,F= {GorE"i.e. featuresFIs global featureGOr detail featuresE，w ₁ A weight parameter representing the learned first gating value,w ₂ a weight parameter representing the learned second threshold value,brepresenting a learnable bias parameter, +.>Representing a Sigmoid activation function for mapping output values to [0,1 ]]Within the interval.

With a first gate valueAnd a second gating value->Controlling the flow of the features as weights, multiplying the low-level features and the high-level features respectively to obtain the firstLLow-level activation of phase->And (d)LHigh-level activation of phase->The specific formula is as follows:

；（3）

；（4）

then according to the first gating valueAnd (d)LHigh-level activation of phase->Obtain the firstLHigh semantic information of stage->High semantic information HSIRich in high-level features but lacking in low-level features, only at the firstLHigh-level activation of phase->Relatively large and a first gating value +.>Relatively small, high semantic informationHSIA larger value is obtained. First, theLHigh semantic information of stage->The specific expression of (2) is as follows:

；（5）

to avoid gradient loss problem and enable the firstLLow-level activation of phasesPass to the subsequent layer for further fusion, and carry out the first stepLCharacteristics of the stagesF _L First, theLLow-level activation of phase->And (d)LHigh semantic information of stageIs combined into the firstLPost-phase fusion feature->The specific expression is as follows:

。（6）

the cross guiding module is used for fusing global featuresG ^out And fused detail featuresE ^out Communication interaction and cross guidance are carried out, and the fused global features are subjected toG ^out And fused detail featuresE ^out Complementary generation of cross-guided global featuresAnd detail features after cross guidance->. Since the complementary information between the global branch and the detail branch does not play a role in the interaction during the process of gating the converged network. The complementary information between the global branch and the detail branch can be promoted to be communicated with each other through the cross guiding module, so that the complementary information between the global branch and the detail branch is fully utilized in the process of detecting the obvious target. The cross guiding module is used for guiding the cross according to the global characteristic after cross guiding >And detail features after cross guidance->A global prediction graph and an edge prediction graph are generated.

The cross guiding module comprisesNA cross guiding unit, as shown in FIG. 5, in the embodiment of the inventionN=4, thus provided with fourAnd a cross guiding unit. In the figure, ""means channel overlap,">"represents element addition">"means convolutional layer + normalization layer + activation function. Gating fused global features of the same stage of fused network outputG ^out And fused detail featuresE ^out Respectively atNThe cross guiding units generate fusion characteristics through convolution operation and connection operationF _(E,G) The specific expression is as follows:

；（7）

wherein,F _L(E,G) represent the firstLThe fusion characteristics of the phases are that,representing superposition operations in the channel dimension,/->Represent the firstLGlobal features after fusion of phases, +.>Represent the firstLThe fused detail characteristics of the phases,w _FG the representation is from the firstLGlobal features after fusion of phases +.>To the firstLFusion features of stagesF _L(E,G) Is used to determine the first parameter of the first computer program,w _IG the representation is from the firstLGlobal features after fusion of phases +.>To the firstLFusion features of stagesF _L(E,G) Is used to determine the second parameter of the model,w _FE the representation is from the firstLPost-fusion detail feature of stage->To the firstLFusion features of stagesF _L(E,G) Is used to determine the first parameter of the first computer program, w _IE The representation is from the firstLPost-fusion detail feature of stage->To the firstLFusion features of stagesF _L(E,G) Is provided for the first learning parameter.

Fusion featuresF _(E,G) Mapping back global branch and detail branch respectively by convolution operation, and combining with the fused global featureG ^out And fused detail featuresE ^out Adding, and finally performing convolution operation to generate global features after cross guidanceAnd detail features after cross guidance->The specific expression is as follows:

；（8）

；（9）

wherein,w _OG the representation is from the firstLFusion features of stagesF _L(E,G) A first learnable parameter mapped back to the global branch,w _PG the representation is from the firstLFusion features of stagesF _L(E,G) A second learnable parameter mapped back to the global branch,w _OE the representation is from the firstLFusion features of stagesF _L(E,G) A first learnable parameter mapped back to the detail branch,w _PE the representation is from the firstLFusion features of stagesF _L(E,G) A second learnable parameter mapped back to the detail branch. Global feature after cross-bootAnd detail features->The position information and the edge information of the current stage can be better represented. Cross-guided global features at the same stageAnd detail features->And generating a global prediction graph and an edge prediction graph of each stage through a prediction head operation.

Fig. 4 (a) is an original image, fig. 4 (b) is a detail label extracted from an embodiment of the present invention, and fig. 4 (c) is a conventional edge label. Due to the complex texture and interference of surrounding pixels, the existing method is mostly shown in (c) of fig. 4, and the generated conventional label is difficult to detect edge pixels. Therefore, in the embodiment of the invention, through step S1, edge information is extracted from the image sample by using the Canny operator, an edge image is generated, and a detail label as shown in (b) in fig. 4 is obtained, wherein the detail label does not need additional manual labeling or training, not only extracts the edge of the salient object, but also contains detail textures inside the salient object, and is beneficial to generating a more accurate edge prediction graph.

The feature fusion module is used for fusing the global prediction graph and the edge prediction graph to generate a fusion feature graph, and generating a significant prediction graph containing significant targets by predicting the fusion feature graph.

The structure of the feature fusion module is shown in figure 6, in which ""means element multiplication. The feature fusion module outputs a global prediction graph and an edge prediction graph of each stage output by the cross guiding module, and the global feature after cross guiding is adopted>And detail features after cross guidance->Performing connection operation on the channel dimension to generate a fusion feature mapF _fuse The specific formula is as follows:

；（10）

wherein,L= 1，2，…，N. Due to the embodiment of the inventionN=4, so equation (10) can be expressed as equation (11):

。（11）

feature map fusion through global average pooling layer and full connection layer pairF _fuse Processing to obtain attention score of the fusion feature mapSAttention score of feature map is fusedSThe importance or contribution degree of the fusion feature map to the final significance prediction is represented by the following specific formula:

；（12）

wherein,representing global average pooling operations,/->Representing a fully connected operation.

Will fuse the feature mapF _fuse Attention score with fused feature mapSMultiplying for weighting, calculating channel weighted fusion result, and generating a significant prediction graph containing significant targets through a pre-measurement head layer PThe specific formula is as follows:

；（13）

wherein,representing the predictive head operation.

S4: training the double-flow gating progressive optimization network GPONet by using a training set, and adopting global lossTo optimize the prediction of global branches, exploiting detail penalty +.>To optimize the prediction of detail branches and to employ fusion penalty +.>To optimize the prediction of the saliency prediction map; global loss->Loss of detail->And fusion loss->Weighted summation is carried out, and detail perception loss of GPONet serving as a double-flow gating progressive optimization network is ∈ ->The method comprises the steps of carrying out a first treatment on the surface of the Testing the network properties of a dual-flow gated progressive optimization network GPONet using test setsCan be used. The specific method comprises the following steps: />

At the beginning of training, all images are resized to 352 x 352 and the images are randomly cropped and flipped. In order to eliminate the interference of different encoders on the performance of the prediction result, the embodiment of the invention selects two commonly used encoders ResNet50 and PVTv2 to extract the global characteristic and the detail characteristic of the image; training is carried out by using an Adam optimizer, the default parameter betas is (0.9,0.999), eps is 1e-8, weight_decade is 0, a learning rate is adopted, a warm-up strategy of learning rate is adopted, the learning rate starts from 1e-7, the peak value of 1e-4 is reached after one iteration period, and then the learning rate gradually decreases to 0. During the test, the output image of size 352×352 is restored to the original size using bilinear interpolation.

The embodiment of the invention perceives loss through detailsReflecting the prediction difficulty difference between pixels. Traversing each pixel point of the image detail tag through a convolution kernel with the size of 3 multiplied by 3, and judging whether the pixel point belongs to an important detail pixel according to the average value in the convolution kernel, wherein the judgment rule is as follows:

；（14）

wherein, the method comprises the following steps ofx，y) The coordinate value is expressed as%x，y) Is used for the display of the display panel,Detail Pixela detail pixel is represented and,Body Pixelthe foreground pixels are represented by a representation of the foreground pixels,Backgroud Pixelrepresenting background pixels #x，y) _avg The coordinate value is expressed as%x，y) The convolution kernel size is the average value of the pixel values of all the pixel points in the convolution kernel range of the radius.

According to the average value in convolution kernel of 3X 3 sizex，y) _avg Calculating a pixel weight matrixWherein the pixel weight matrix +.>Inner firstxLine 1yColumn number +.>Representative coordinate value is%x，y) The calculation formula of the weight value of the pixel point of (2) is as follows:

；（15）

wherein,as the super-parameters of the pixel weight matrix, in the embodiment of the present invention, 0.5 is taken,Detail _{x y(，)} is the position coordinate value of the detail label.

The global penaltyLoss of detail->And fusion loss->The expression of (2) is as follows:

；（16）

；（17）

；（18）

wherein,Hthe height of the image is represented by the height,Wthe width of the image is represented and,the coordinate value in the global predictive graph representing the global branch is%x，y) Predicted value of pixel point of +. >The coordinate value in the marked label is%x，y) True value of pixel point of +.>The coordinate value in the edge prediction graph representing the detail branch is%x，y) Predicted value of pixel point of +.>The coordinate value in the detail label is expressed as%x，y) True value of pixel point of +.>Representing a saliency prediction map containing saliency targetsPThe middle coordinate value is%x，y) Is a predicted value of a pixel of (a).

Will global lossLoss of detail->And fusion loss->Weighted summation is carried out, and the detail perception loss of the double-flow gating progressive optimization network GPONet is obtained>The specific formula is as follows:

；（19）

；（20）

In the step S4, throughFScore indexMean absolute error MAE, enhancement alignment index +.>And structural similarity index->The network performance of the dual-flow gating progressive optimization network GPONet is detected, and the specific expression is as follows:

；（21）

；（22）

；（23）

；（24）

wherein,Precisionthe accuracy rate is indicated as a function of the accuracy,Recallthe recall rate is indicated as being the result of the recall,is a non-negative real number and is used for adjusting the importance balance between the accuracy rate and the recall rate, and 0.3 is taken in the embodiment of the invention;nrepresenting the number of pixels;P _i represent the firstiThe predicted value of the individual pixel(s),G _i represent the firstiThe true value of the individual pixels, iThe number of pixels is indicated and,i= 1，2，…，n；Wthe width of the image is represented and,Hrepresenting the image height +.>The alignment function is used for calculating the alignment degree of the predicted target and the real target;S _r the region similarity value is represented by a value of similarity,S _o representing boundary similarity value, ++>Representing weight parameters for controllingS _r AndS _o is a specific gravity of (c).

The technical effects of the embodiments of the present invention are described below with reference to specific experimental data:

in order to verify the effectiveness of the gating fusion network and the cross guiding unit in the embodiment of the present invention, the dual-flow gating progressive optimization network GPONet in the embodiment of the present invention is compared with a commonly used decoder module FPN, and the experimental results are shown in table 1. The three evaluation indexes of the embodiment of the invention on the DUTS data set and the ESCCD data set are all superior to the decoder module FPN, wherein the structural similarity indicates that the obvious object predicted by the embodiment of the invention has better structural consistency, andFscore indexThe average absolute error MAE shows that the embodiments of the present invention can predict a saliency map with higher accuracy. Furthermore, as can be seen from table 1, adding the cross-guiding unit further improves the predictive power of the network model. FPN decoder and cross-pilot unit The combined result is comparable to a single gated fusion network, and the combination of the gated fusion network and the cross-boot unit achieves the best predictive performance.

Table 1 test result comparison table of combined modules of FPN decoder, gated fusion network and cross-boot unit on DUTS-TE data set and ECSSD data set

In order to demonstrate the feature selective transfer flow of the gated fusion unit, the highest activation value in all channels of each pixel is selected, a single channel activation map is created, and the single channel activation map is displayed in a thermodynamic diagram form, so that the experimental result shown in fig. 7 is obtained. Fig. 7 (a) is a shallow pixel activation map, fig. 7 (b) is a deep pixel activation map, fig. 7 (c) is a shallow missing but deep rich pixel activation map, and fig. 7 (d) is an output of the gate fusion unit. Fig. 7 (a) shows the attention area of the low-level gating value, which contains detailed but cluttered information, fig. 7 (b) shows the attention area of the high-level gating value, indicating the approximate location of the salient object, and fig. 7 (c) shows the high-level semantic information lacking in fig. 7 (a) but enriched in fig. 7 (b); fig. 7 (d) shows the fusion characteristics after processing by the gated fusion unit. From left to right in fig. 7, the gated fusion units are shown integrating complementary information between adjacent stages; from top to bottom, it can be seen that the higher level features gradually enrich the details, enhancing the boundaries of salient objects.

In order to verify the superiority of the detail tag in the embodiment of the invention relative to the traditional edge tag, the detail tag and the traditional edge tag are used for training a double-flow gating progressive optimization network GPONet under the same model architecture, which is hereinafter referred to as a GPON_dt network model and a GPON_eg network model respectively, and an RCSB network model and an ITSD network model are selected as additional comparison objects. The lack of a special edge detection branch in the ITSD method may lead to poor edge detection results, but in order to compare the importance of different dual-flow network models to the edge detection task, the ITSD method is used as a control group. The four network models were used for edge detection tasks and comparative experiments were performed to obtain the experimental results shown in fig. 8 and table 2. Fig. 8 (a) is an original image, fig. 8 (b) is a true significant edge graph, fig. 8 (c) is a significant edge detection result of a gpon_dt network model obtained by training a dual-flow gated and progressively optimized network GPONet using a detail tag, fig. 8 (d) is a significant edge detection result of a gpon_eg network model obtained by training a dual-flow gated and progressively optimized network GPONet using a conventional edge tag, fig. 8 (e) is a significant edge detection result of an existing RCSB method, and fig. 8 (f) is a significant edge detection result of an existing ITSD method.

As shown in fig. 8, both the gpon_dt and gpon_eg network models are superior to the other two models in the significant edge pixel segmentation task, but the gpon_eg network models are often disturbed by other texture or blurred boundary pixels, resulting in poor segmentation performance for detail pixels, edge pixels, and other non-significant disturbing pixels. While the gpon_dt network model provides consistent internal details through labels, can output edge structures consistent with salient objects, thereby eliminating interference of other non-salient pixels and guiding accurate segmentation of detail pixels. Record in Table 2FScore indexThe maximum value of (2) is maxF and the average value is meanF. As can be seen from table 2, the average mean f of the gpon_dt network model meanF is competitive and has the highest maximum maxF.

TABLE 2 comparison of quantitative analysis results for different edge detection methods on DUTS-TE data sets

In addition, a set of samples with average absolute error of edge detection MAE less than the average value from the DUTS TEST set was also experimentally selected to demonstrate higher detection accuracy of detail tags on difficult samples, which contained nearly half of the DUTS-TEST dataset.These challenging samples were re-evaluated using the gpon_dt network model and gpon_eg network model, and the results of the test are shown in fig. 9. Fig. 9 (a) is an original image, fig. 9 (b) is a true saliency map corresponding to the original image, fig. 9 (c) is a detection result of the gpon_eg network model, and fig. 9 (d) is a detection result of the gpon_dt network model. The edge detection results of the samples in fig. 9 show that the gpon_dt network model trained using detail labels can identify hidden salient objects through internal texture structures, thereby making the model more focused on the overall structure of salient objects. In contrast, gpon_eg network models trained using edge labels are more susceptible to color or texture changes, resulting in incomplete object segmentation. This means that the detail labels can better reflect the overall characteristics of the salient objects, while the accuracy of the edge labels is affected by the color difference, illustrating the superiority of the detail labels in image edge detection. The GPONet_R network model and the GPONet_T network model are network models obtained by training a double-flow gating progressive optimization network GPONet by taking a residual network ResNet and an attention model transducer as encoders respectively. According to different encoders, the comparison experiment IS divided into two groups, namely, the GPONet_R network model and the GPONet_T network model are respectively compared with the conventional obvious target detection method on a DUTS data set, a HKU-IS data set, a PASCAL-S data set, an ESCCD data set and a DUTS-OMRON data set to obtain experimental results shown in tables 3-7. In the tables 3 to 7 of the drawings, FScore indexEnhancement of alignment index->And structural similarity index->The larger the value of (c) is, the better the performance of the network model is, and the smaller the value of the mean absolute error MAE is, the better the performance of the network model is.

Table 3: comparison of Performance of the embodiment of the invention with the commonly used salient target detection method on DUTS-TE data set

Table 4: comparison of Performance of the inventive examples with the commonly used salient target detection method on HKU-IS data set

Table 5: performance comparison of the embodiment of the invention with the commonly used salient target detection method on the PASCAL-S dataset

Table 6: performance comparison of the embodiment of the invention with the conventional salient object detection method on ECSSD data set

Table 7: comparison of performance of embodiments of the present invention with conventional salient object detection methods on DUT-OMRON datasets

From tables 3-7, the GPONet_T network model is found in the DUTS-TE data set, the PASCAL-S data set and the ECSSD data setFScore indexAnd structural similarity index->The best scores were obtained above, with only small differences from the second term in mean absolute error MAE and enhancement alignment index in both the DUTS-TE data set and the PASCAL-S data set. In a network model with a residual network as an encoder, GPON et_R network model is in structural similarity index +.>The model is equivalent to the existing optimal model, and the model provided by the inventor has good structural consistency. In the network model with the attention model transducer as an encoder, the average absolute error MAE and the enhancement alignment index of the GPONet_T network model on the DUTS data set and the PASCAL-S data set>Next to PGNet network model. This is because the PGNet network model uses training images at 4k-8k resolution, and overall the gponet_t network model is superior to the existing model in terms of various indices of other data sets, and significantly exceeds the VST network model.

Fig. 10 is a schematic diagram of the result of a saliency prediction map obtained by the embodiment of the present invention and the existing saliency target detection method, where (a) in fig. 10 is an original image, (b) in fig. 10 is a true saliency map corresponding to the original image, fig. 10 (c) is a saliency prediction map of a dual-flow-gated progressive optimization network GPONet according to the embodiment of the present invention, fig. 10 (d) is a saliency prediction map of a PGNet network model, and fig. 10 (e) is a saliency prediction map of a VST network model. As shown in fig. 10, the embodiment of the present invention can accurately locate a significant object (e.g., bird) consistent with the segmentation structure, while other methods are affected by non-significant information (e.g., bird nest) or unclear contours (e.g., bird wings), resulting in inaccurate segmentation results.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method for detecting the remarkable target based on the double-flow gating progressive optimization network is characterized by comprising the following steps of:

the gating fusion network is used for extractingNGlobal features of individual phasesGAnd detail featuresEAnd respectively and complementarily fusing the global features of adjacent stages through the global branches and the detail branches GOr detail featuresEGenerating a fused global featureG ^out And fused detail featuresE ^out The method comprises the steps of carrying out a first treatment on the surface of the The gating fusion network comprises an encoder, a global branch and a detail branch, and an input image is subjected to characteristic mapping by the encoder to extractNGlobal features of individual phasesGAnd detail featuresEBoth global and detail branches includeN-1 gated fusion unit, wherein,N≧2；Nglobal features of individual phasesGAnd detail featuresERespectively viaN-1 gated fusion unit process to beNThe features of the adjacent stages are complementarily fused;

the gating fusion unit obtains a first gating value according to the corresponding low-level features and high-level featuresAnd a second gating valueThe specific formula is as follows:

；

wherein,F _L represent the firstLThe characteristics of the phases are that,represent the firstLCharacterization of +1 stageF _L+1 The features obtained after the up-sampling are used,F = {G or E"i.e. featuresFIs global featureGOr detail featuresE，w ₁ A weight parameter representing the learned first gating value,w ₂ a weight parameter representing the learned second threshold value,brepresenting a learnable bias parameter, +.>Representing a Sigmoid activation function;

with a first gate valueAnd a second gating value->As a weight, according toLCharacteristics of the stagesF _L And (d)LCharacterization of +1 stageF _L+1 The operation obtains the firstLPost-phase fusion feature- >The specific formula is as follows:

；

wherein,represent the firstLLow-level activation of phase->Represent the firstLHigh-level activation of phase->Represent the firstLHigh semantic information of the stage;

the cross guiding module is used for fusing global featuresG ^out And fused detail featuresE ^out Communication interaction and cross guidance are carried out, and the fused global features are subjected toG ^out And fused detail featuresE ^out Complementary generation of cross-guided global featuresAnd detail features after cross guidance->And according to the global feature after cross-guidance +.>And detail features after cross guidance->Generating a global prediction graph and an edge prediction graph;

s4: training the double-flow gating progressive optimization network GPONet by using a training set, and adopting global lossTo optimize the prediction of global branches, exploiting detail penalty +.>To optimize the prediction of detail branches and to employ fusion penalty +.>To optimize the prediction of the saliency prediction map; global loss->Loss of detail->And fusion loss->Weighted summation is carried out, and detail perception loss of GPONet serving as a double-flow gating progressive optimization network is ∈ - >The method comprises the steps of carrying out a first treatment on the surface of the Testing the network performance of the dual-flow gating progressive optimization network GPONet by using a test set;

2. The method for salient object detection based on the dual-flow gated progressive optimization network according to claim 1, wherein the preprocessing in step S2 comprises data enhancement by image transformation, normalization of pixel values of images, unification of image sizes, and conversion of images into tensor data types.

3. According toThe salient object detection method based on the dual-flow gating progressive optimization network as claimed in claim 2, wherein the cross guiding module comprisesNA cross guiding unit for gating the fused global features of the same stage of the fused network outputG ^out And fused detail featuresE ^out Respectively atNThe cross guiding units generate fusion characteristics through convolution operation and connection operationF _(E,G) The specific expression is as follows:

；

wherein,F _L(E,G) represent the firstLThe fusion characteristics of the phases are that,representing superposition operations in the channel dimension,/->Represent the firstLGlobal features after fusion of phases, +.>Represent the firstLThe fused detail characteristics of the phases, w _FG The representation is from the firstLGlobal features after fusion of phases +.>To the firstLFusion features of stagesF _L(E,G) Is used to determine the first parameter of the first computer program,w _IG the representation is from the firstLGlobal features after fusion of phases +.>To the firstLFusion features of stagesF _L(E,G) Is used to determine the second parameter of the model,w _FE the representation is from the firstLPost-fusion detail feature of stage->To the firstLFusion features of stagesF _L(E,G) Is used to determine the first parameter of the first computer program,w _IE the representation is from the firstLPost-fusion detail feature of stage->To the firstLFusion features of stagesF _L(E,G) Is a second learnable parameter of (a);

；

wherein,w _OG the representation is from the firstLFusion features of stagesF _L(E,G) A first learnable parameter mapped back to the global branch,w _PG the representation is from the firstLFusion features of stagesF _L(E,G) A second learnable parameter mapped back to the global branch,w _OE the representation is from the firstLFusion features of stagesF _L(E,G) A first learnable parameter mapped back to the detail branch,w _PE the representation is from the firstLFusion features of stagesF _L(E,G) A second learnable parameter mapped back to the detail branch; cross-guided global features And detail features->A global prediction graph and an edge prediction graph are generated through a prediction head operation.

4. The salient object detection method based on the dual-flow gating progressive optimization network according to claim 3, wherein the feature fusion module outputs the global feature after cross guidance output by the cross guidance moduleAnd detail features after cross guidance->Connecting in the channel dimension to generate a fused feature graphF _fuse The specific formula is as follows:

；

wherein,L = 1，2，…，N；

；

wherein,representing a global average pooling operation,/>representing a fully connected operation;

；

wherein,representing the predictive head operation.

5. The method for detecting the salient object based on the dual-flow gating progressive optimization network according to claim 4, wherein in the step S4, each pixel point of the image detail label is traversed through a convolution kernel, and whether the pixel point belongs to an important detail pixel or not is judged according to an average value in the convolution kernel, and a judgment rule is as follows:

；

Wherein, the method comprises the following steps ofx，y) The coordinate value is expressed as%x，y) Is used for the display of the display panel,Detail Pixela detail pixel is represented and,Body Pixelthe foreground pixels are represented by a representation of the foreground pixels,Backgroud Pixelrepresenting background pixels #x，y) _avg The coordinate value is expressed as%x，y) Taking the pixel points of the (C) as the center, wherein the convolution kernel size is the average value of the pixel values of all the pixel points in the convolution kernel range of the radius;

；

wherein,Hthe height of the image is represented by the height,Wthe width of the image is represented and,the coordinate value in the global predictive graph representing the global branch is%x，y) Predicted value of pixel point of +.>The coordinate value in the marked label is%x，y) Is used to determine the true value of the pixel point of (c),the coordinate value in the edge prediction graph representing the detail branch is%x，y) Predicted value of pixel point of +.>The coordinate value in the detail label is expressed as%x，y) True value of pixel point of +.>Representing a saliency prediction map containing saliency targetsPThe middle coordinate value is% x，y) A predicted value of a pixel of (a); global loss->Loss of detail->And fusion loss->Weighted summation is carried out, and the detail perception loss of the double-flow gating progressive optimization network GPONet is obtained>The specific formula is as follows:

；

6. The method for salient object detection based on dual-stream gated progressive optimization network of claim 5, wherein in step S4, the method comprises the steps ofFScore indexMean absolute error MAE, enhancement alignment index +.>And structural similarity index->The network performance of the dual-flow gating progressive optimization network GPONet is detected, and the specific expression is as follows:

；

wherein,Precisionthe accuracy rate is indicated as a function of the accuracy,Recallthe recall rate is indicated as being the result of the recall,is a non-negative real number and is used for adjusting the importance balance between the accuracy rate and the recall rate;nthe number of pixels is indicated and,P _i represent the firstiThe predicted value of the individual pixel(s),G _i represent the firstiThe true value of the individual pixels,ithe number of pixels is indicated and,i = 1，2，…，n；Wthe width of the image is represented and,Hrepresenting the image height +.>The alignment function is used for calculating the alignment degree of the predicted target and the real target;S _r the region similarity value is represented by a value of similarity,S _o representing boundary similarity value, ++ >Representing weight parameters for controllingS _r AndS _o is a specific gravity of (c).