CN114067101A - Image significance detection method of double-stream decoder based on information complementation - Google Patents
Image significance detection method of double-stream decoder based on information complementation Download PDFInfo
- Publication number
- CN114067101A CN114067101A CN202111304064.7A CN202111304064A CN114067101A CN 114067101 A CN114067101 A CN 114067101A CN 202111304064 A CN202111304064 A CN 202111304064A CN 114067101 A CN114067101 A CN 114067101A
- Authority
- CN
- China
- Prior art keywords
- image
- contour
- branch
- graph
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
An image significance detection method based on a dual-stream decoder with complementary information comprises the following steps: step S1, decomposing the label image to obtain a corresponding main body label image and a corresponding outline detail label image; step S2, random cutting, random rotation, normalization and graying processing are carried out on the training data set image to enhance the diversity of the sample; step S3, inputting an image, preprocessing the image by using a VGG16 frame, and collecting image features of different sizes by using a group of encoding blocks with different dimensions; step S4, inputting the five-layer output characteristic diagram obtained by the encoder into an Embedding layer, and unifying dimensions; step S5, respectively transmitting the coded features of the target main body graph and the coded features of the contour detail graph into a salient branch and a contour graph branch, and respectively supervising the obtained main body feature graph and contour feature graph on the respective branches; and step S6, adding and fusing the target main body characteristics and the contour detail characteristics obtained on the two main branches to obtain a final prediction image.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an image significance detection method of a double-stream decoder based on information complementation.
Background
Traditional saliency detection is by acquiring specific features of the target. The acquisition method defines the specificity of the object types, but also falls into the limitation of the algorithm inevitably. Saliency detection is the task of segmenting objects or regions that are most visually distinctive of an image, in other words, it is also said that these studies are intended to be able to identify the subject of an image. Saliency detection, unlike other detailed techniques such as semantic segmentation or instance segmentation, always focuses on several main areas. It is therefore often used as the first step in many research studies, such as target tracking, target recognition, motion classification, etc.
The main difficulty of saliency detection is to distinguish saliency objects and object edges in the image. Inspired by the correlation between salient objects and object contours. Compared with a single method for progress learning by using edge information or an overall graph, the method for progress learning by using the edge information or the overall graph obtains a target main body graph and a contour detail graph of the target by using an information complementation method, so that the target main body graph and the contour detail graph are used for supervising a learning training model together, the influence caused by extreme imbalance of edge pixel distribution of the significant target is reduced, and the effect is improved.
Disclosure of Invention
The present invention overcomes the above-mentioned shortcomings of the prior art and provides a significance detection method for a dual-stream decoder based on information complementation.
The invention combines the VGG model, and the parameter quantity contained in the VGG model is obviously smaller than that of other VGG-based methods; and the advantages of the characteristic fusion module and the loss function designed by the method realize better detection effect of the salient object.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
an image significance detection method based on a double-stream decoder with complementary information comprises the following steps:
step S1, generating a contour detail map by expanding and eroding the protrusions and calculating the difference value between the label images, and searching the area in the closed contour by adopting a seed filling algorithm so as to obtain a target main body map;
step S2, random cutting, random rotation, normalization and graying processing are carried out on the training data set image to enhance the diversity of the sample;
s3, inputting an image with H multiplied by W dimension, preprocessing the image by using a VGG16 frame, and collecting image features with different sizes by using a group of encoding blocks with different dimensions;
step S4, inputting the five-layer output characteristic diagram obtained by the encoder into an Embedding layer, and unifying dimensions;
step S5, respectively transmitting the coded features of the target main body image and the coded features of the contour detail image into a salient branch and a contour image branch, and respectively supervising the obtained main body feature image and contour feature image on the respective branches, wherein on one hand, the information of the main body image and the contour feature image is interactively fused to achieve information complementation, and on the other hand, the main body image and the contour feature image are respectively transmitted into respective image decoders to enrich the features;
and step S6, adding and fusing the target main body characteristics and the contour detail characteristics obtained on the two main branches, and then obtaining a final prediction image through an up-sampling operation and a final connection operation.
The invention provides an image significance detection method of a double-flow decoder based on information complementation, which utilizes an image real label graph to obtain a target main body graph and an outline detail graph, wherein the main body graph mainly comprises the subject information of an image, and the outline graph mainly comprises the edge detail information of the image. The training of the model is supervised by the main body graph and the contour graph together, and the main body graph and the contour graph which are optimized continuously are added and fused in the feature fusion module, and because the two images contain corresponding complementary information, the features of the two images are subjected to iterative training, so that a better obvious target prediction effect is obtained.
The invention has the advantages that: for the emphasis on the correlation between the salient object and the object contour, the real image is divided into two complementary information which respectively focus on different areas. The salient target image and the target contour map supervise the iterative training model together, and the two features are added and fused, so that more information of the main body part and the contour edge part of the salient target is utilized, and a better prediction effect is obtained.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of a feature interaction module of the decoder of the flowchart of fig. 1.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
The technical scheme of the invention is described below by combining the accompanying drawings.
An image significance detection method based on a double-stream decoder with complementary information comprises the following steps:
step S1, generating a contour detail map by expanding and eroding the protrusions and calculating the difference value between the label images, and searching the area in the closed contour by adopting a seed filling algorithm so as to obtain a target main body map;
step S2, random cutting, random rotation, normalization and graying processing are carried out on the training data set image to enhance the diversity of the sample;
s3, inputting an image with H multiplied by W dimension, preprocessing the image by using a VGG16 frame, and collecting image features with different sizes by using a group of encoding blocks with different dimensions;
step S4, inputting the five-layer output characteristic diagram obtained by the encoder into an Embedding layer, and unifying dimensions;
step S5, respectively transmitting the coded features of the target main body image and the coded features of the contour detail image into a salient branch and a contour branch, and simultaneously respectively monitoring the obtained main body feature image and contour feature image on the respective branches, on one hand, interactively fusing the information of the main body feature image and the contour feature image to achieve information complementation, and on the other hand, respectively transmitting the main body feature image and the contour feature image into respective image decoders to enrich the features;
and step S6, adding and fusing the target main body characteristics and the contour detail characteristics obtained on the two main branches, and then obtaining a final prediction image through an up-sampling operation and a final connection operation.
Step S1, generating a contour detail map by using the expansion and erosion highlights and calculating the difference between the label images, and finding the area in the closed contour by using a seed filling algorithm, thereby obtaining a target subject map, specifically including:
s11, inputting a label image, wherein a significant target area is a white part, a background is a black part, the size is not required, the pixel value of the background part is 0, and the pixel value of the foreground part is 1;
s12, generating a contour detail map by utilizing the expansion and erosion protrusions and calculating the difference value between the label images, and searching for an area in a closed contour by adopting a seed filling algorithm so as to obtain a target main body map. The calculation formula is as follows:
wherein, X and Y respectively represent erosion and expansion operations, and B (X) represents a structural element, and each point X in the working space E is correspondingly operated.
Step S3 specifically includes: the method comprises the steps of training a pre-training model obtained by using a VGG16 model, loading parameters, inputting data, obtaining 5 layers of feature output except a full connection layer, using a group of encoding blocks with different dimensions, collecting image features with different sizes respectively, and recording the image features as F ═ F respectivelyi|i=1,2,3,4,5};
In order to reduce the feature channels and the calculation amount, a channel pooling layer is added at the top of each feature map so that information can be transmitted through different channels, and the channel pooling layer is specifically defined as follows:
Emebddingi=cp(Ei) (3)
wherein j and K are integers, i ∈ [1,5 ]]Index values representing each feature map, p (x) represents a channel pooling operation,then represents the second of the feature map XChannels, while the pooling layer collects eachMaximum value of channel, where N and M are respectively expressedShowing an input channel and an output channel.
Step S5 specifically includes: each feature fusion module comprises two branches, namely a main branch and a contour branch, each branch is additionally provided with a corresponding new branch and is supervised by a main graph and a contour graph respectively, then the obtained features are superposed and fused, and the obtained new features are input into the opposite main branch by different new branches for feature fusion. The final output is mainly a significant target, so that a main branch is mainly used, the result is input into the next feature fusion module, and the modules can be superposed, and the method comprises the following specific steps:
and S51, for each feature fusion module, adding two data, wherein two input sources are respectively output from a feature output layer corresponding to the encoder and a module in the previous layer. For the first feature fusion module of the main body graph, since there is no input feature from the previous layer module, the 5 th layer output in the encoder is used as the input of the previous layer decoder, wherein the branch intermediate connection is implemented as follows:
wherein A isiAnd BiRespectively representing the body branch and the profile branch, conv representing the convolution operation, the subscript of conv representing the corresponding branch, Pi AAnd Pi BRepresents a prediction of the relevant task, wherein additional supervision is added.
S52, each feature fusion module comprises two branches, namely a main branch and a contour branch, each branch is additionally provided with a corresponding new branch and is supervised by a main graph and a contour graph respectively, then the obtained features are overlapped and fused, the obtained new features are input into the opposite main branch by different new branches for feature fusion until the modules are overlapped for 5 times repeatedly, and after the final input features are fused, a theme prediction graph is output, and the specific operation and final prediction of the branches are as follows:
wherein upsample and concat represent the upsampling and concatenating operations, respectively. For the final prediction, all features in the subject branch are connected to balance the hierarchical information, specifically expressed as:
Final=convFinal(concat([upsample(Ai),i=1,2,3,4,5])) (9)
wherein all A areiThe input size is upsampled before concatenation and the final prediction is aggregated over the concatenation characteristics.
Step S6 specifically includes: comparing the three prediction graphs obtained by each module, including the subject prediction graph, the contour detail prediction graph and the significant target prediction graph with the real data label, calculating a loss value, and reversely propagating and updating the weight in the model, wherein the calculation content of the loss value is as follows:
and (3) solving a loss value of the obtained main body prediction graph and the obtained contour detail prediction graph and the corresponding label graph, wherein the loss value is calculated by adopting a Binary Cross entry function:
wherein g (x, y) is the value corresponding to the pixel of the label map at (x, y), and the value range is [0,1 ]](ii) a p (x, y) is the pixel corresponding value of the prediction map at (x, y); lbce(x, y) is a loss value.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (5)
1. An image significance detection method based on a double-stream decoder with complementary information comprises the following steps:
step S1, generating a contour detail map by expanding and eroding the protrusions and calculating the difference value between the label images, and searching the area in the closed contour by adopting a seed filling algorithm so as to obtain a target main body map;
step S2, random cutting, random rotation, normalization and graying processing are carried out on the training data set image to enhance the diversity of the sample;
s3, inputting an image with H multiplied by W dimension, preprocessing the image by using a VGG16 frame, and collecting image features with different sizes by using a group of encoding blocks with different dimensions;
step S4, inputting the five-layer output characteristic diagram obtained by the encoder into an Embedding layer, and unifying dimensions;
step S5, respectively transmitting the coded features of the target main body image and the coded features of the contour detail image into a salient branch and a contour branch, and simultaneously respectively monitoring the obtained main body feature image and contour feature image on the respective branches, on one hand, interactively fusing the information of the main body feature image and the contour feature image to achieve information complementation, and on the other hand, respectively transmitting the main body feature image and the contour feature image into respective image decoders to enrich the features;
and step S6, adding and fusing the target main body characteristics and the contour detail characteristics obtained on the two main branches, and then obtaining a final prediction image through an up-sampling operation and a final connection operation.
2. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S1, generating a contour detail map by using the expansion and erosion highlights and calculating the difference between the label images, and finding the area in the closed contour by using a seed filling algorithm, thereby obtaining a target subject map, specifically including:
s11, inputting a label image, wherein a significant target area is a white part, a background is a black part, the size is not required, the pixel value of the background part is 0, and the pixel value of the foreground part is 1;
s12, generating a contour detail map by utilizing the expansion and erosion protrusions and calculating the difference value between label images, and searching for an area in a closed contour by adopting a seed filling algorithm to obtain a target main body map, wherein the calculation formula is as follows:
wherein, X and Y respectively represent erosion and expansion operations, and B (X) represents a structural element, and each point X in the working space E is correspondingly operated.
3. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S3 specifically includes: pre-training model obtained by training VGG16 model, loading parameters, inputting data, dividing into wholeAnd obtaining 5 layers of feature output outside the connection layer, using a group of coding blocks with different dimensionalities, respectively collecting image features with different sizes, and respectively recording the image features as F ═ Fi|i=1,2,3,4,5};
In order to reduce the feature channels and the calculation amount, a channel pooling layer is added at the top of each feature map so that information can be transmitted through different channels, and the channel pooling layer is specifically defined as follows:
Emebddingi=cp(Ei) (3)
wherein j and K are integers, i ∈ [1,5 ]]Index values representing each feature map, p (x) represents a channel pooling operation,then represents the second of the feature map XChannels, while the pooling layer collects eachThe maximum value of the channel, where N and M represent the input channel and the output channel, respectively.
4. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S5 specifically includes: each feature fusion module comprises two branches, namely a main branch and a contour branch, each branch is additionally provided with a corresponding new branch and is supervised by a main graph and a contour graph respectively, then the obtained features are superposed and fused, and the obtained new features are input into the opposite main branch by different new branches for feature fusion; the final output is mainly a significant target, so that a main branch is mainly used, the result is input into the next feature fusion module, and the modules can be superposed, and the method comprises the following specific steps:
s51, for each feature fusion module, adding two data, wherein two input sources are respectively output from a feature output layer corresponding to the encoder and a module of the previous layer; for the first feature fusion module of the main body graph, since there is no input feature from the previous layer module, the 5 th layer output in the encoder is used as the input of the previous layer decoder, wherein the branch intermediate connection is implemented as follows:
wherein A isiAnd BiRespectively representing the body branch and the profile branch, conv representing the convolution operation, the subscript of conv representing the corresponding branch, Pi AAnd Pi BRepresents a prediction of the relevant task, wherein additional supervision is added;
s52, each feature fusion module comprises two branches, namely a main branch and a contour branch, each branch is additionally provided with a corresponding new branch and is supervised by a main graph and a contour graph respectively, then the obtained features are overlapped and fused, the obtained new features are input into the opposite main branch by different new branches for feature fusion until the modules are overlapped for 5 times repeatedly, and after the final input features are fused, a theme prediction graph is output, and the specific operation and final prediction of the branches are as follows:
Ai=convAi(upsample(concat(Ai+1,B'i,Embeddingi))) (7)
Bi=convBi(concat(A'i,upsample(Bi+1))) (8)
wherein upsample and concat respectively represent upsampling and connection operations, and for final prediction, all features in the main branch are connected to balance hierarchical information, specifically represented as:
Final=convFinal(concat([upsample(Ai),i=1,2,3,4,5])) (9)
wherein all A areiThe input size is upsampled before concatenation and the final prediction is aggregated over the concatenation characteristics.
5. The image saliency detection method of claim 1 based on information complementary dual stream decoder characterized by: step S6 specifically includes: comparing the three prediction graphs obtained by each module, including the subject prediction graph, the contour detail prediction graph and the significant target prediction graph with the real data label, calculating a loss value, and reversely propagating and updating the weight in the model, wherein the calculation content of the loss value is as follows:
and (3) solving a loss value of the obtained main body prediction graph and the obtained contour detail prediction graph and the corresponding label graph, wherein the loss value is calculated by adopting a Binary Cross entry function:
wherein g (x, y) is the value corresponding to the pixel of the label map at (x, y), and the value range is [0,1 ]](ii) a p (x, y) is the pixel corresponding value of the prediction map at (x, y); lbce(x, y) is a loss value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111304064.7A CN114067101A (en) | 2021-11-05 | 2021-11-05 | Image significance detection method of double-stream decoder based on information complementation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111304064.7A CN114067101A (en) | 2021-11-05 | 2021-11-05 | Image significance detection method of double-stream decoder based on information complementation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114067101A true CN114067101A (en) | 2022-02-18 |
Family
ID=80274312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111304064.7A Pending CN114067101A (en) | 2021-11-05 | 2021-11-05 | Image significance detection method of double-stream decoder based on information complementation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114067101A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114332489A (en) * | 2022-03-15 | 2022-04-12 | 江西财经大学 | Image salient target detection method and system based on uncertainty perception |
-
2021
- 2021-11-05 CN CN202111304064.7A patent/CN114067101A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114332489A (en) * | 2022-03-15 | 2022-04-12 | 江西财经大学 | Image salient target detection method and system based on uncertainty perception |
CN114332489B (en) * | 2022-03-15 | 2022-06-24 | 江西财经大学 | Image salient target detection method and system based on uncertainty perception |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112966684B (en) | Cooperative learning character recognition method under attention mechanism | |
CN111047551A (en) | Remote sensing image change detection method and system based on U-net improved algorithm | |
CN112258488A (en) | Medical image focus segmentation method | |
CN111369565A (en) | Digital pathological image segmentation and classification method based on graph convolution network | |
CN113450313B (en) | Image significance visualization method based on regional contrast learning | |
CN116797787B (en) | Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network | |
CN112270366B (en) | Micro target detection method based on self-adaptive multi-feature fusion | |
CN114187520B (en) | Building extraction model construction and application method | |
CN115953582B (en) | Image semantic segmentation method and system | |
CN114241274A (en) | Small target detection method based on super-resolution multi-scale feature fusion | |
CN112215079B (en) | Global multistage target tracking method | |
CN111325766A (en) | Three-dimensional edge detection method and device, storage medium and computer equipment | |
CN113888505A (en) | Natural scene text detection method based on semantic segmentation | |
CN115393718A (en) | Optical remote sensing image change detection method based on self-adaptive fusion NestedUNet | |
CN117671432B (en) | Method and device for training change analysis model, electronic equipment and storage medium | |
Jiang et al. | MANet: An Efficient Multi-Dimensional Attention-Aggregated Network for Remote Sensing Image Change Detection | |
CN114067101A (en) | Image significance detection method of double-stream decoder based on information complementation | |
CN111539434B (en) | Infrared weak and small target detection method based on similarity | |
CN117058392A (en) | Multi-scale Transformer image semantic segmentation method based on convolution local enhancement | |
CN114943709A (en) | Method for detecting salient target of optical remote sensing image | |
CN111008986B (en) | Remote sensing image segmentation method based on multitasking semi-convolution | |
Zhang et al. | A Multi-Scale Cascaded Cross-Attention Hierarchical Network for Change Detection on Bitemporal Remote Sensing Images | |
CN114494697A (en) | Semantic understanding method for hip bone image of newborn | |
CN115063685B (en) | Remote sensing image building feature extraction method based on attention network | |
CN111429465B (en) | Parallax-cleaning-based binary residual binocular significant object image segmentation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |