CN114241288A

CN114241288A - Method for detecting significance of remote sensing target guided by selective edge information

Info

Publication number: CN114241288A
Application number: CN202111536484.8A
Authority: CN
Inventors: 颜成钢; 王灵波; 孙垚棋; 张继勇; 李宗鹏
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-25

Abstract

The invention discloses a method for detecting the saliency of a remote sensing target guided by selective edge information. According to the invention, the network autonomously extracts the edge information and autonomously selects and distinguishes the extracted edge information, forward edge characteristic information is reserved, bad edge characteristic information is optimized or eliminated, and the significance segmentation result of the remote sensing target is effectively improved.

Description

Method for detecting significance of remote sensing target guided by selective edge information

Technical Field

The invention belongs to the field of computer vision, and relates to the field of salient object detection and remote sensing image detection. In particular to a method for detecting the significance of a remote sensing target guided by selective edge information.

Background

With the rapid development of deep learning and neural networks, the computer vision field realizes unprecedented spanning. As a classical main category in the field of computer vision, target detection is widely researched and discussed, and great progress is made in all directions such as obvious target detection, pedestrian re-identification, image data evaluation and the like. In life, face scanning, license plate scanning, skynet engineering, and the like all utilize related technologies for target detection.

The human visual system has the ability to quickly search and locate objects of interest when faced with natural scenes, and this visual attention mechanism is an important mechanism for processing visual information in people's daily lives. With the spread of large data volume brought by the internet, how to quickly acquire important information from massive image and video data has become a key problem in the field of computer vision. By introducing such a visual attention mechanism, i.e. visual saliency, in a computer vision task, a series of significant help and improvement can be brought to the visual information processing task. The purpose of salient object detection is to locate the most attractive and visually unique object or region from the image, and the method is widely applied to image segmentation, target relocation, target image foreground annotation and other neighborhoods. The detection of the obvious target of the remote sensing image is different from the conventional significance detection task, the remote sensing image takes the vehicles such as airplanes and ships, and the target objects such as roads, rivers and motion venues as main segmentation objects through means such as aerial photography, aerial scanning and microwave radar, and the target is often small and complex in the image. The detection of the obvious target of the remote sensing image has great effect in the neighborhoods of natural protection, radar detection and the like.

The U-Net is the most popular image segmentation network at present, the network mainly comprises a down-sampling part and an up-sampling part, the image features are extracted through the down-sampling part, and the feature information can be restored through the up-sampling part, so that the network finally outputs a complete image.

With the rapid development of the deep neural network, the detection precision and the detection speed of the obvious target detection of the remote sensing image are greatly improved, and people also find that the detection result can be further improved after edge information is added in the target detection. It is worth noting that the edge information extracted by the network autonomously cannot improve the final detection performance, and especially in the process of autonomously segmenting and extracting the information, due to the fact that the target detection difficulty is large, the target object is complex, the number of targets is large, and the like, the extracted edge feature information often covers a large amount of noise. Similarly, even an edge detection network that manually performs segmentation in advance cannot guarantee that each edge picture will have a positive effect on the final segmentation result.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method for detecting the significance of a remote sensing target guided by selective edge information.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step 1, preprocessing image data to obtain a preprocessed image training set:

the training data set employs an EORSSD image data set. Image preprocessing is carried out on a data set to be trained, image-related noise interference is removed to enable data to be more accurate, an image training set only containing edge contours is obtained by using a matlab tool, and then the obtained image training set is amplified.

Step 2, establishing a target significance detection network:

the network structure adopts an encoder-decoder (encoding-decoding) structure and comprises an encoding part and a decoding part; the encoding part adopts ResNet34, the decoding part adopts two partial decoding structures PDC (partial decoder) and PDC-with edge, the output results of the two partial decoding structures are compared and evaluated with a true value graph (group route) through a Quality evaluation module (Quality evaluation module), a more optimal result is selected to be output as a Coarse image (Coarse map), and the Coarse image is input into an optimized convolution block (reference), and a final result is output.

Step 3, training the target significance detection network through the image training set obtained in the step 1;

specifically, the method comprises the following steps: the encoding (encoder) part comprises five volume blocks, wherein the five volume blocks adopt ResNet34, and the information of the first and second volume blocks is also respectively input into two Edge-information extraction block groups (Edge-extract1, 2) in addition to being propagated downwards, so as to extract shallow rich Edge features (Edge features). The information of the fifth convolution block is inputted into two partial decoding structures PDC and PDC-with edge, wherein the PDC decodes only the information of the encoded main path to obtain a rough segmentation result graph, and the PDC-with edge connects the information of the encoded main path with the edge feature extracted from the shallow layer, thereby outputting a rough segmentation result graph. Inputting the two rough segmentation result graphs into a quality evaluation module, respectively comparing the two rough segmentation result graphs with a group route, selecting a result graph with a better evaluation result, inputting the result graph into an optimized volume block (Refinement), and outputting a final result, namely a high-quality output image.

And finally, the output is restrained by using a mixed loss function, and the edge information is restrained only by using a ssim loss function.

Further, the information extracted from each extraction block included in the edge information extraction block group is defined as:

x_i+1＝Bn(ReLU(conv3×3(x_i)))

wherein x is_iRepresenting input characteristic information, x_i+1Representing the output characteristic information, conv3 × 3 represents the convolution operation with a convolution kernel of 3 × 3, ReLU is the activation function, and Bn is the regularization. Each edge information extraction block group comprises three extraction steps, and each edge information extraction block group comprises 3 extraction blocks for extracting information. Finally, the extracted feature information of two extraction block groups is input into one extraction block group after being connected (Concatenate)The decoder obtains an Edge picture (Edge Map).

Furthermore, the quality evaluation module comprises a 5-layer structure, wherein the sizes of the first four convolutional layer convolution kernels are 3, the step length is 2, the fifth layer is a full-connection layer, and 0 (bad) result and 1 (good) result are generated. The quality evaluation module respectively calculates the output results f of the PDC branches by adopting a result comparison method₁(x) And the output result f of the PDC-with edge branch₂(x) And (4) defining a branch with higher similarity as 1 and otherwise defining the branch as 0 according to the similarity of the true value graph, and taking the branch as a supervision to select an output branch. It should be noted that the input picture and its corresponding edge map are connected and then input to the quality evaluation module, so that the network can learn to accept or reject the edge feature autonomously.

Further, the hybrid loss function is as follows:

l＝l_bce+l_ssim+l_iou

wherein G (x, y), T (x, y) are the values of ground truth and prediction result at each position, μ and σ are the mean and standard deviation, H, W are the height and width of the picture, and C1 is 0.01²，C2＝0.03²For preventing the denominator from being zero.

Further, the specific method in step 3 is as follows:

the optimization is performed by using an Adam optimizer, and the learning rate is set to 0.001, beta is (0.9,0.999), eps is 1e-8, and weight _ decay is 0. The input image resize is 256 × 256, epoch is set to 120, and batch size is 8.

The invention has the following beneficial effects:

the invention notices that in the task of detecting the significance of the remote sensing target, due to the characteristics of large quantity of the remote sensing targets, small targets, high shadow overlapping degree and the like, not all the network autonomous extraction and even the artificially extracted edge information can play a positive feedback effect on the final result output by the network, and the inferior edge information is harmful to the training process and even the convergence optimization process of the network, thereby damaging the final segmentation result. According to the invention, the network autonomously extracts the edge information and autonomously selects and distinguishes the extracted edge information, forward edge characteristic information is reserved, bad edge characteristic information is optimized or eliminated, and the significance segmentation result of the remote sensing target is effectively improved.

Drawings

Fig. 1 is a schematic diagram of an overall network structure according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific embodiments.

As shown in fig. 1, a method for detecting the saliency of a remote sensing target guided by selective edge information specifically includes the following steps:

step 1, preprocessing image data to obtain a preprocessed image training set:

the training data set employs an EORSSD image data set. The method comprises the steps of conducting image preprocessing on a data set to be trained, firstly removing image-related noise interference to enable data to be more accurate, obtaining an image training set only containing edge contours by using a matlab tool, and then amplifying the obtained image training set, so that a better training effect is obtained.

Step 2, establishing a target significance detection network:

the main structure of the network is shown in fig. 1, the method adopts an encoder-decoder method, a network structure encoding part adopts ResNet34, a decoding part adopts two partial decoding structures PDC (partial decoder) and PDC-with edge, the output results of the two partial decoding structures are compared and evaluated by a Quality evaluation module (Quality evaluation module) and a true value graph (group parameter), and a more optimal result is selected and output as a Coarse image (Coarse map).

Specifically, the method comprises the following steps: the encoding (encoder) part comprises five volume blocks, the five volume blocks all adopt ResNet34, the information of the first and second volume blocks is also respectively input into two Edge-information extraction block groups (Edge-extract1, 2) in addition to downward propagation, so as to extract shallow rich Edge features (Edge features), and the information extracted from each extraction block contained in the Edge-information extraction block group is defined as:

x_i+1＝Bn(ReLU(conv3×3(x_i)))

wherein x is_iRepresenting input characteristic information, x_i+1Representing the output characteristic information, conv3 × 3 represents the convolution operation with a convolution kernel of 3 × 3, ReLU is the activation function, and Bn is the regularization. The two Edge information extraction blocks each include three extraction steps, that is, each of Edge-extract1 and Edge-extract2 includes 3 extraction blocks for extracting information. Finally, the feature information extracted by the two extraction block groups is connected (Concatenate) and input into a decoder to obtain an Edge picture (Edge Map).

The information of the fifth convolution block is inputted into two partial decoding structures PDC and PDC-with edge, wherein the PDC decodes only the information of the encoded main path to obtain a rough segmentation result graph, and the PDC-with edge connects the information of the encoded main path with the edge feature extracted from the shallow layer, thereby outputting a rough segmentation result graph. The two rough segmentation result graphs are input into the quality evaluation module to be compared with the Ground channel respectively and output the final result. The quality evaluation module comprises a 5-layer structure, wherein the sizes of the convolution kernels of the first four convolution layers are 3, the step length is 2, the fifth layer is a full-connection layer, and 0 (bad) result and 1 (good) result are generated. The module respectively calculates the output results f of the PDC branches by adopting a result comparison method₁(x) And the output result f of the PDC-with edge branch₂(x) And (4) defining a branch with higher similarity as 1 and otherwise defining the branch as 0 according to the similarity of the true value graph, and taking the branch as a supervision to select an output branch. It should be noted that the input picture and its corresponding edge map are connected and then input to the quality evaluation module, so as to let the web page displayThe network learns the choice of edge features autonomously.

The feature information selected by the quality evaluation Module is input into a Residual reference Module (Residual reference Module) to obtain a high-quality output image.

And finally, the final output is restricted by using a mixed loss function, the edge information is restricted by only using the ssim loss function, and the mixed loss function is as follows:

l＝l_bce+l_ssim+l_iou

Step 3, data input and training:

the network uses Adam optimizer, learning rate is set to 0.001, beta is (0.9,0.999), eps is 1e-8, and weight _ decay is 0. The invention sets the input image resize to 256 × 256, epoch to 120, and batch size to 8.

The invention provides a novel edge information guided remote sensing target significance detection network with an autonomous selection capability. The network selects and optimizes the edge information, retains the edge characteristic information with positive feedback to the final segmentation result, optimizes and screens the edge characteristic information with negative feedback to the final segmentation result, and achieves a higher level on an EORSSD test data set.

Claims

1. A method for detecting the significance of a remote sensing target guided by selective edge information is characterized by comprising the following steps:

step 1, preprocessing image data to obtain a preprocessed image training set:

the training data set adopts an EORSSD image data set; carrying out image preprocessing on a data set to be trained, firstly removing image-related noise interference to enable the data to be more accurate, obtaining an image training set only containing edge contours by using a matlab tool, and then amplifying the obtained image training set;

step 2, establishing a target significance detection network:

the network structure adopts an encoder-decoder structure and comprises an encoding part and a decoding part; the encoding part adopts ResNet34, the decoding part adopts two partial decoding structures PDC and PDC-with edge, the output results of the two partial decoding structures are compared and evaluated with a truth diagram through a quality evaluation module, a better result is selected to be output as a rough image, the rough image is input into an optimization rolling block, and a final result is output;

and 3, training the target significance detection network through the image training set obtained in the step 1.

2. The method for detecting the saliency of remote sensing targets guided by selective edge information according to claim 1, wherein the target saliency detection network in step 2 has a specific structure as follows:

the coding part comprises five rolling blocks, the five rolling blocks all adopt ResNet34, the information of the first and the second rolling blocks is transmitted downwards and also respectively input into the two edge information extraction block groups, thereby extracting the edge characteristics rich in shallow layer; the information of the fifth convolution block is input into two partial decoding structures PDC and PDC-with edge, wherein the PDC only decodes the information of the encoded main path to obtain a rough segmentation result graph, and the PDC-with edge connects the information of the encoded main path with the edge characteristics extracted from the shallow layer, so that a rough segmentation result graph is also output; inputting the two rough segmentation result graphs into a quality evaluation module, respectively comparing the two rough segmentation result graphs with a group route, selecting a result graph with a better evaluation result, inputting the result graph into an optimized volume block, and outputting a final result, namely a high-quality output image;

3. The method for detecting the significance of the remote sensing target guided by the selective edge information as claimed in claim 2, wherein the information extracted from each extraction block included in the edge information extraction block group is defined as:

x_i+1＝Bn(ReLU(conv3×3(x_i)))

wherein x is_iRepresenting input characteristic information, x_i+1Representing output characteristic information, conv3 × 3 represents convolution operation with convolution kernel of 3 × 3, ReLU is an activation function, and Bn is regularization; each edge information extraction block group comprises three extraction steps, and each edge information extraction block group comprises 3 extraction blocks for extracting information; finally, the feature information extracted by the two extracting block groups is input into a decoder after being connected to obtain the edge picture.

4. The method for detecting the significance of the remote sensing target guided by the selective edge information as claimed in claim 3, wherein the quality evaluation module comprises a 5-layer structure, wherein the sizes of the first four convolutional layer convolution kernels are 3, the step length is 2, the fifth layer is a full-connection layer, and 0 and 1 results are generated; the quality evaluation module respectively calculates the output results f of the PDC branches by adopting a result comparison method₁(x) And the output result f of the PDC-with edge branch₂(x) The similarity of the actual value graph and the branch with higher similarity is defined as 1, otherwise, the similarity is defined as 0, and the branch is used as supervision to select the output branch; it should be noted that the input picture and its corresponding edge map are connected and then input to the quality evaluation module, so that the network can learn to accept or reject the edge feature autonomously.

5. The method of claim 4, wherein the hybrid loss function is as follows:

l＝l_bce+l_ssim+l_iou

6. The method for detecting the significance of the remote sensing target guided by the selective edge information as claimed in claim 5, wherein the specific method in the step 3 is as follows:

an Adam optimizer is adopted for optimization, the learning rate is set to be 0.001, beta is equal to (0.9,0.999), eps is equal to 1e-8, and weight _ decay is equal to 0; the input image resize is 256 × 256, epoch is set to 120, and batch size is 8.