CN111666842A

CN111666842A - Shadow detection method based on double-current-cavity convolution neural network

Info

Publication number: CN111666842A
Application number: CN202010449023.6A
Authority: CN
Inventors: 李大威; 王思凡
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2020-09-15
Anticipated expiration: 2040-05-25
Also published as: CN111666842B

Abstract

The invention relates to a shadow detection method based on a double-current cavity convolution neuron network, which comprises the following steps: inputting the image with shadow into the network in a RGB three-channel mode; respectively extracting image features by using a pooling channel and a residual channel; performing fusion of global and local features on the feature map through a multilevel cavity pooling module; the pooling channel utilizes a decoder form to up-sample the feature map to be the same as the size of the input image, the residual channel continuously keeps low-dimensional features, and the two channels are subjected to feature fusion after the up-sampling is carried out to be the same as the size of the input image; training a network by using a cross entropy loss function to obtain a group of weights with the lowest loss values; the weights are used to detect shadows in the test image, and the argmax function is used to generate a shadow binary map. The invention has higher shadow detection accuracy and better shadow edge keeping effect. The method can be used for removing the wrongly detected figure and target shadow pixels after common algorithms such as target detection, change detection and the like.

Description

Shadow detection method based on double-current-cavity convolution neural network

Technical Field

The invention relates to the technical field of deep learning and image processing, in particular to a robust shadow detection method based on a semantic segmentation network for a single picture.

Background

Shadow detection, namely, marking shadow parts in a color image by taking pixels as units, wherein the task to be completed is to utilize a designed network structure to train, obtain a group of weights with highest accuracy to detect the shadow parts in a single picture, call the shadow parts as a foreground, mark the shadow parts as white, and call the rest parts as a background and mark the shadow parts as black. Shadows are ubiquitous in all natural scenes and can be created as long as objects block the path of light from a light source. However, in most cases, the shadow existing in the image is liable to bring interference and difficulty to the image processing, for example, when performing tasks such as foreground detection and segmentation, the shadow is often mistaken for the target because the shadow is the same as the detected target and has a significant difference from the background color, thereby greatly reducing the accuracy of detection. If the shadow can be detected before the machine vision task is carried out, the task accuracy rate can be greatly improved. Therefore, shadow detection has long been a key task in the field of machine vision.

At present, the research on the shadow detection method is developed from the angle of manually extracting features and the depth learning, but the two methods are different in nature. In the early traditional algorithm, some scholars analyze the structural features and the color features of shadows by using an optical angle and an image processing method, and need an algorithm designer to know a large amount of optical knowledge and image processing knowledge, such as analyzing a color histogram under an HSV channel and analyzing the influence of the intensity of ambient light and the transparency of an object on the shadows. At the present stage, some scholars use deep learning, namely, design a detector based on a convolutional neural network to detect shadows in images, and use sampling methods such as convolution, pooling and the like to extract a feature map of each image, so that the network learns the structural features of the shadows by self, thereby realizing end-to-end shadow detection.

Disclosure of Invention

The purpose of the invention is: the shadow area in the single image is accurately detected.

In order to achieve the above object, the technical solution of the present invention is to provide a shadow detection method based on a dual-flow void convolutional neural network, which is characterized by comprising the following steps:

step S1: inputting single pictures in the training set into a designed network in an RGB three-channel mode in sequence;

step S2: the image input network is firstly divided into two channel operations, namely a pooling channel and a residual channel, wherein: the pooling channel is subjected to down-sampling in the form of an encoder through a cavity convolution module, and high-dimensional features are gradually extracted; the residual channel is composed of a plurality of cross-flow residual modules, the characteristics are extracted by convolution through the cavity convolution module, and a layer of characteristic diagram and the characteristic information of the corresponding pooling channel are superposed to keep the low-dimensional characteristics

Step S3: sending feature maps obtained from the front four layers of the pooling channel into a multi-level cavity pooling module, pooling the feature maps obtained from the front three layers into the same size according to cavity convolutions with different expansion rates to obtain a first part three-layer feature map, performing global average pooling on the feature map of the fourth layer, performing bilinear interpolation on the feature map of the fourth layer to obtain a second part feature map, performing feature fusion on the feature map of the fourth layer to obtain the final output of a down-sampling part;

step S4: the method comprises the steps that a decoder is adopted to conduct upsampling which is completely symmetrical to a downsampling process on a multi-level cavity pooling module through a cavity convolution module, and finally an image is upsampled to the size same as that of an input image;

step S5: after the input layer, the hidden layer and the output layer of the network are determined, all the images and the labels in the data set are sent to the network for training according to the steps from S1 to S4, the labels are shadow binary images with the same size as the images and with shadow areas and non-shadow areas marked according to pixels, training algebra is determined according to the convergence trend of a training-obtained loss function, and the calculated result is obtainedThe process comprises two steps: the first step is to mark the logits value of the sample as x and convert the logits value of the sample into probability

Secondly, calculating a loss value by using a cross entropy formula with weight, z ×∑ y '× log (y), wherein y' is a label, y is a logits probability value calculated in the first step, and z is a self-defined weight;

step S6: the image needing to be detected is tested by utilizing the stored weight, after the parameter of the weight is determined, the detected image can output a shadow feature map and a non-shadow feature map after being input into a network, and then the two feature maps are converted into a detected shadow binary map through an argmax function.

Preferably, in step S2, the selected hole convolution module includes four layers: the first layer is a common convolution with a convolution kernel of 3 × 3; the second layer is a hole convolution with the expansion rate of 3 and the relative convolution kernel of 7 multiplied by 7; the third layer is the same as the second layer; the fourth layer is the same as the first layer.

Preferably, the loss function in step S5 is weighted cross-entropy loss.

Preferably, in step S6, after a single color image is input into a designed network, a shadow feature map and a non-shadow feature map are output, and stored in an array form, the argmax function compares the detected values of corresponding pixels in the two feature maps, if the value of the foreground of the part is large, the pixel is considered as a foreground, that is, a required shadow part, which is labeled 255, and is displayed as a white part in the shadow binary map, and the background is the same, that is, the non-shadow part, which is labeled 0, and is displayed as a black part in the shadow binary map, so as to obtain the detected shadow binary map by the above method.

The invention adopts a double-flow network structure, namely, residual flow is used for keeping the stability of low-level image features in learning, and pooling flow is used for keeping the extraction and fusion of the features from low level to deep level of the image. The network can have a good detection effect on large-scale shadows and broken shadows by introducing the hole convolution. The method can be used for removing the character and the target shadow pixels which are detected by mistake after algorithms such as common target detection, change detection and the like are carried out; the present invention can also be used alone as an image shadow area detector.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects:

1) the invention utilizes the cavity convolution module to extract the characteristics, and the cavity convolution is different from the common convolution, so that the cavity convolution can not only increase the receptive field, but also retain the pixel position information of the characteristics, thereby greatly improving the missing detection phenomenon caused by the shadow covering on different texture areas, such as: when the shadow of the vehicle covers the asphalt road with the white lane line, other detectors easily miss the shadow area on the white lane line, but our detector can detect all the shadow areas.

2) The multi-level hole pooling module method extracts the global features and fuses the global features and the local features, so that the network simultaneously extracts the local and global features of the shadow, and can better judge whether the dark color region is a shadow region or a region with a darker color, for example, other detectors can easily detect a person wearing black clothes and the shadow as the shadow together, and the detector can well distinguish the black clothes from the shadow.

3) In order to simplify the computational complexity of the network, the invention designs a cross-flow residual error module, which adds a residual error flow output characteristic diagram of the previous stage, an output characteristic diagram of the previous stage passing through a cavity convolution module and a corresponding pooling flow output characteristic diagram which is up-sampled to the same size, thereby keeping the low-dimensional characteristics and accelerating the training process.

4) And a large number of data sets are adopted for training, so that the robustness of the detector is enhanced, shadows on various textures can be accurately detected, and the detection method is fully adaptive to the change of a detection scene.

5) The detection speed is high, the detection time of a single 480 multiplied by 480 color image and a single image is only 0.12s, and the method is suitable for the shadow detection task in a video sequence.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a network structure diagram of the inventive network, which is composed of three modules, namely a hole convolution module (ACM), a multi-level hole pooling module (MLAPM), and a cross-flow residual error module (CSRM), and the upper right corner is a legend.

Fig. 3 is an internal structure diagram of a hole convolution module, where rate is the expansion rate of a convolution kernel, that is, the size of a hole injected into one convolution kernel, one rate is 3, and a convolution kernel with a size of 3 × 3 performs hole convolution, and the size of its receptive field is the same as that of a 7 × 7 ordinary convolution kernel, in this module, we use ordinary convolution for the first and fourth layers, and use hole convolution with a rate of 3 for the second and third layers.

Fig. 4 is an internal structure of a multi-level cavity pooling module, which is composed of two parts, the first part is the fusion of the first three-layer network feature maps, the cavity convolution is performed by using expansion rates of different sizes, then three groups of feature maps are sampled to have the same size, the second part is simplified multi-scale pyramid pooling, global features are obtained by adopting global tie pooling, then up-sampling is performed to have the same size as the first part, and finally the features of the two parts are fused into one group.

Fig. 5 is a cross-flow residual module that fuses the information of the pooled channels with the residual channel information and preserves the features of the previous stage using the feature preservation method of the residual network to preserve the completeness of the local features.

FIG. 6 is a diagram showing the effect of shadow detection on SBU and ISTD data sets using the present invention, where input is the color image to be tested and ground route is the shadow image of the artificial mark.

Fig. 7 shows shadow detection on a "Bungalows" video sequence on a CDnet2012 data set, five frames are randomly selected for presentation, wherein a second behavior WeSamBe foreground detection algorithm can be seen to detect a foreground object and its shadow together, but the shadow belongs to false detection. The third row is the detection of the shadow area by our shadow detection algorithm based on the WeSamBe result, where the detected shadow area is shown in light gray.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

As shown in fig. 1, the present embodiment discloses a shadow detection method based on a Double-flow void convolutional neural Network (Double-flow convolutional neural Network), which includes the following specific steps:

step S1: acquiring a single color image, and inputting the single color image into a designed network in an RGB (red, green and blue) form;

step S2: the image input network is first divided into two channel operations, a pooling channel and a residual channel. The pooling channel is downsampled in the form of an encoder through a cavity convolution module, and high-dimensional features such as semantics are gradually extracted. And the residual channel consists of a plurality of cross flow residual modules, namely, a cavity convolution module is used for convolution to extract features, and a layer of feature graph and feature information of a corresponding pooling channel are superposed to keep low-dimensional features. It is worth noting that the selected hole convolution module contains four layers, the first layer is a common convolution with a convolution kernel of 3 × 3; the second layer is a hole convolution with a dilation rate of 3 and a relative convolution kernel of 7 × 7, the third layer is the same as the second layer, and the fourth layer is the same as the first layer.

Step S3: and the second part is bilinear interpolation of the feature map of the fourth layer after global average pooling into the size after pooling with the feature map of the first part. And finally, performing feature fusion on the four layers of feature maps to obtain the final output of the down-sampling part.

Step S4: the output characteristic diagram of the multi-level cavity pooling module is up-sampled by a decoder through a cavity convolution module, the up-sampling is completely symmetrical to the down-sampling process, and finally the image is up-sampled to the size same as that of the input image.

Step S5: after an input layer, a hidden layer and an output layer of the network are determined, images and labels (with the same size as the images and binary images of shadow areas and non-shadow areas marked according to pixels) in a data set are all sent to the network for training according to the four steps, a training algebra can be determined according to the convergence trend of a loss function obtained by training, the loss function is weighted cross-error, and the calculation process is divided into two steps: the first step is to mark the sample's logits value as x, and first convert the sample's logits value to a probability

And secondly, calculating a loss value by using a cross entropy formula with weight, z ×∑ y '× log (y), wherein y' is a label, y is a logits probability value calculated in the first step, and z is self-defined weight.

Step S6: the stored weight is used for testing the picture to be detected, the parameters of the weight are input into the network to obtain two types of feature maps, and the two feature maps are converted into a detected shadow binary map through an argmax function.

The step S6 of converting the two feature maps into a shadow binary map specifically includes the following steps:

step 6.1, the detected image is sent into a network and then stored in an array form, and the pixels are divided into two types by our network: one is foreground (shaded) and one is background (unshaded);

step 6.2, coding by using an argmax function, wherein the argmax function compares values of corresponding positions in the two characteristic graphs, if the value of the foreground of the part is large, the pixel is regarded as the foreground, namely the needed shadow part is marked as 255 and is displayed as white; the same background, i.e., the non-shaded portion, marked as 0, is displayed in black, and the detected shaded binary image is obtained by the above method.

Claims

1. A shadow detection method based on a double-current cavity convolution neural network is characterized by comprising the following steps:

step S5: after an input layer, a hidden layer and an output layer of the network are determined, images and labels in a data set are all sent to the network for training according to the steps from S1 to S4, the labels are shadow binary images with the same size as the images and mark shadow areas and non-shadow areas according to pixels, a training algebra is determined according to the convergence trend of a training gain loss function, and the calculation process is divided into two steps: the first step is to mark the logits value of the sample as x and convert the logits value of the sample into probability

The second step calculates the loss value using the cross entropy formula with weight-z ×∑ y '× log (y), where y' is a label and y is a labelThe logits probability value calculated for the first step, z being the self-defined weight;

2. The method for detecting the shadow based on the dual-flow hole convolution neural network of claim 1, wherein in the step S2, the selected hole convolution module includes four layers: the first layer is a common convolution with a convolution kernel of 3 × 3; the second layer is a hole convolution with the expansion rate of 3 and the relative convolution kernel of 7 multiplied by 7; the third layer is the same as the second layer; the fourth layer is the same as the first layer.

3. The method for detecting the shadow based on the dual-flow hole convolution neural network as claimed in claim 1, wherein the loss function in step S5 is weighted cross-entropy loss.

4. The method according to claim 1, wherein in step S6, after the single color image is input into the designed network, a shadow feature map and a non-shadow feature map are output and stored in an array form, an argmax function compares the detected values of corresponding pixels in the two feature maps, if the foreground value of the portion is large, the pixel is considered as a foreground, i.e., a required shadow portion, which is marked as 255, and displayed as a white portion in the shadow binary map, and the background is the same, i.e., the non-shadow portion, which is marked as 0, and displayed as a black portion in the shadow binary map, so as to obtain the detected shadow binary map.