CN114037666A

CN114037666A - Shadow detection method assisted by data set expansion and shadow image classification

Info

Publication number: CN114037666A
Application number: CN202111261591.4A
Authority: CN
Inventors: 李国权; 文凌云; 黄正文; 夏瑞阳; 林金朝; 庞宇
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-02-11

Abstract

The invention requests to protect a shadow detection method assisted by data set expansion and shadow image classification, and belongs to the field of image processing. The invention comprises the following steps: 1. designing a ShadowGAN network structure based on the generation countermeasure network, generating a shadow image, and expanding an original data set by using the generated shadow image; 2. adding a shadow classification module in the existing shadow detection network model; 3. and (3) combining the step 1 and the step 2, further improving the accuracy of detection. The invention provides a data set expansion method for shadow detection and a shadow detection network model assisted by shadow image classification. According to the invention, a deep neural network is utilized for generating a network structure of a countermeasure network by design so as to expand a data set for a shadow image obtained in a natural environment, and the network structure of a shadow detection model assisted by shadow image classification is utilized for more accurately identifying a shadow region in the shadow image.

Description

Shadow detection method assisted by data set expansion and shadow image classification

Technical Field

The invention belongs to the field of image processing, and particularly relates to a shadow detection method.

Background

The presence of shadows can interfere with computer vision tasks such as object detection, object tracking, and semantic segmentation. On the other hand, the shadow also contains information such as the direction of illumination, the position of the camera, and the geometry of the object. Therefore, image shadow detection is an important step.

Early shadow detection methods were designed based on the characteristics of imaging models or manual design, which have high requirements on image quality and are required to meet certain lighting conditions, such as lambert surface, planckian light source. These methods need to satisfy specific conditions and are difficult to adapt to different lighting conditions and more complex environments.

Recently, shadow detection methods based on Convolutional Neural Networks (CNN) achieve higher accuracy and have better generalization performance. CNN-based shadow detection methods require image labeling with pixel-level precision, and collecting one such data set is time consuming and expensive. Existing shadow detection datasets, such as the ISTD and SBU datasets, contain 1,870 (where 1,330 pairs of samples are used as the training set and the remaining 540 pairs of samples are used as the test set) and 4,727 (where 4,089 pairs of samples are used as the training set and the remaining 638 pairs of samples are used as the test set), respectively, pairs of samples. The number of samples of the ISTD dataset is smaller compared to the SBU dataset, which limits to some extent the performance of the model trained on the ISTD dataset by the network. On the other hand, the ISTD data set includes shadow images and unshaded images, and how to improve the performance of shadow detection by using the unshaded images is a problem worthy of study.

Through search, the closest technology is application number CN201910256619.1, a method for removing shadows based on a generative confrontation network, which is to remove shadows of a single image, firstly, design the generative confrontation network and train with a shadow image data set, then train a discriminator and a generator in a confrontation learning manner, and finally recover the shadow removed image with false or spurious effects by the generator. The method only comprises a generating type confrontation network, a shadow detection sub-network and a shadow removal sub-network are respectively designed in a generator, and a cross-stitch module is utilized to adaptively fuse the bottom layer characteristics among different tasks, and the shadow detection is taken as an auxiliary task, so that the shadow removal performance is improved. The invention CN201910256619.1 mainly focuses on image shadow removal work, and shadow detection is only used as a secondary sub-network. The invention focuses on the shadow detection task and improves the accuracy of the shadow detection. The image shadow removal is carried out by using a generation countermeasure network in the invention CN201910256619.1, and the invention expands a shadow detection data set by using the generation countermeasure network. In addition, the CN201910256619.1 of the invention only uses shadow images and shadow masks when training shadow detection sub-networks, and does not fully play the role of non-shadow images in shadow detection. If a model can accurately identify the shadow, the network can distinguish the shadow from the non-shadow area in the image, and further, the model should have the capability of identifying the shadow image and the non-shadow image. Therefore, the shadow image classification module is added in the shadow detection network, the network is further constrained through the two-classification cross loss function, the image classification depends on the semantic features learned by the network, and the module is added to enable the network to learn the more robust semantic features of the shadow, so that the model has higher accuracy.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A shadow detection method assisted by data set expansion and shadow image classification is provided. The technical scheme of the invention is as follows:

a method of shadow detection assisted by data set augmentation and shadow image classification, comprising the steps of:

step 1: randomly selecting a shadow-free image and a shadow mask from an original training set containing the shadow image, the shadow-free image and the shadow mask to be used as the input of a ShadowGAN generator to obtain a new shadow image sample, and performing data set expansion on the existing shadow detection data set;

step 2: adding a shadow image classification task into a shadow detection network; the network model after the classification of the added shadow image is abstracted into three parts, namely: the system comprises a feature extraction network, a shadow detection module and a shadow image classification module; the feature extraction network is of a pyramid structure and is used for extracting features such as shadow edges, semantics and the like, the input of the shadow detection module is a feature map of a feature pyramid and is used for predicting a shadow mask, and the shadow image classification module is used for judging whether a shadow region exists in an image and classifying the image into a shadow-free image and a shadow image;

and step 3: combining the methods in the step 1 and the step 2, training a shadow detection network added with a shadow image classification task on the expanded data set;

and 4, step 4: and (4) inputting the shadow image into the trained model in the step (3) to obtain a prediction result of the shadow mask.

Further, the ShadowGAN network adjusts a bidirectional generation network model of the Cycle-GAN into a unidirectional generation network, so that the network is used for learning the conversion from the shadowless image to the shadowy image, and is added between the real shadowy image and the generated shadowy image

Loss function, naming the network as ShadowGAN.

Further, data expansion using the ShadowGAN can be specifically divided into two stages: in the training stage, the input of the generator is a shadow-free image and a shadow mask image, and the shadow mask designates the shadow-generated area and outputs the shadow-generated area as a shadow image; the input of the discriminator is the generated shadow image and the real shadow image, the output is the judgment result of the category of the input image, 1 represents the shadow image, and 0 represents the non-shadow image;

in the testing stage, a shadow-free image and a shadow mask are randomly selected from the training set as input to a generator, and the generator generates shadows on the shadow-free image according to the regions designated by the shadow mask. By the method, 1,330 new shadow images can be obtained, and finally, the newly generated images are added into the original data set to achieve the purpose of data set expansion.

Further, in data expansion by ShadowGAN, the generator and the arbiter adopt antagonistic training and add

After the loss function, the loss function of the network is:

wherein

I_fAnd I_maskRespectively representing a true shadow image, a non-shadow image and a corresponding shadow mask, P_dataRepresenting the distribution of the data, # representing the merge operation, # representing₁Represents

The loss function, G is the generator, and D is the discriminator.

Further, in the step 2, the feature extraction network is a pre-training model of ResNeXt101 on ImageNet, parameters are updated in the training process, and a full connection layer with 1000-dimensional input and 2-dimensional output is added into the full connection layer of ResNeXt 101; in the training process, when the shadow detection module is activated, the input of the network is a shadow image and a shadow mask, and the output is a shadow detection result; when the shadow classification module is activated, the input of the network is shadow images and unshaded images, and the output is the classification result of whether the images are shadow images or not.

Further, the feature extraction network is in an active state during the whole training process.

Further, the loss function of the shadow detection network comprises two parts: the first part is the shadow detection loss function

Calculating a loss value at each pixel point according to a two-class cross entropy loss function, wherein the loss value of the whole shadow mask is the sum of the two-class cross entropy loss values of all the pixels;

the second part is a shadow classification loss function

In order to classify shadow images, a two-classification cross-loss function is used, as follows:

where y is a label of the image, 1 represents that the image is a shadow image, 0 represents that the image is a shadow-free image,

representing the image class predicted by the network.

Further, in order to implement end-to-end training, the training strategy is: in each round of training, the training of shadow detection is performed once, and the training of shadow classification is performed twice, as shown in the following formula:

where i represents the number of iterations.

In the testing phase, the shadow image classification module is discarded, the input of the model is the shadow image, and the output of the model is the predicted shadow mask.

The invention has the following advantages and beneficial effects:

the innovation of the invention is mainly embodied in step 1 and step 2. The innovation point is mainly to provide a method for expanding a shadow detection data set and a shadow detection method assisted by shadow image classification. Existing shadow detection dataset enhancement methods are limited to geometric transformations on the original image and label, such as cropping and image flipping. The present invention proposes data augmentation using generative countermeasure networks, which may be compatible with existing data enhancement methods. In addition, existing research is limited to training shadow detection networks using only shadow images and shadow masks, without fully exploiting the effect of non-shadow images on shadow detection. The semantic features of the shadow are considered to be very important for shadow detection, if one model can accurately identify the shadow, the network can distinguish shadow and non-shadow areas in the image, and further, the model should have the capability of identifying shadow images and non-shadow images. Therefore, the shadow image classification module is added in the shadow detection network, and the network is further constrained by a two-classification cross loss function. In order to realize the end-to-end training of the network, the invention defines a training strategy of a shadow detection network assisted by shadow image classification. Since the image classification depends on the semantic features learned by the network, the addition of the shadow image classification can guide the network to learn more robust semantic features of the shadow, so that the model has higher accuracy.

Drawings

FIG. 1 is a flow chart of a shadow detection method with data set expansion and shadow image classification assistance according to a preferred embodiment of the present invention.

FIG. 2 is a schematic diagram of the structure of ShadowGAN according to the present invention.

Fig. 3 is a schematic diagram of a shadow detection model network structure assisted by shadow image classification according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

as shown in fig. 1, a shadow detection method based on data set expansion and shadow image classification assistance includes the following steps:

step 1: performing data set expansion on an existing shadow detection data set;

step 2: adding a shadow image classification task into a shadow detection network;

Further, the step 1 specifically comprises: adjusting the bidirectional generation network model of Cycle-GAN into unidirectional generation network, so that the network can focus on learning the conversion from the unshaded image to the shadow image, and adding the real shadow image and the generated shadow image

A loss function. Naming the network as ShadowGAN, data augmentation using ShadowGAN can be divided into two stages. In the training stage, the input of the generator is a shadow-free image and a shadow mask image, and the shadow mask designates the shadow-generated area and outputs the shadow-generated area as a shadow image; the input of the discriminator is the generated shadow image and the real shadow image, and the output is the judgment result of the type of the input image, wherein 1 represents the shadow image, and 0 represents the unshaded image. The generator and the discriminator adopt antagonistic training and join

After loss of function, of networkThe loss function is:

wherein

I_fAnd I_maskRespectively, representing a true shadow image, a non-shadow image and a corresponding shadow mask. P_dataRepresenting the distribution of data. ≧ represents a merge operation. II-₁Represents

A loss function. G is a generator and D is a discriminator.

The step 2 specifically comprises the following steps: for better illustration of the network model, the network model after adding the shadow image classification is abstracted into three parts, which are respectively: the system comprises a feature extraction network F, a shadow detection module D and a shadow image classification module C. Wherein F is a pre-training model of ResNeXt101 on ImageNet, and parameters are updated in the training process. In order to convert the original resenext 101 network from a 1000-class network to a 2-class (shadow image and unshaded image) network, a fully-connected layer with an input of 1000 dimensions and an output of 2 dimensions is added to the fully-connected layer of the resenext 101. In the training process, when the shadow detection module is activated, the input of the network is a shadow image and a shadow mask, and the output is a shadow detection result; when the shadow classification module is activated, the input of the network is shadow images and unshaded images, and the output is the classification result of whether the images are shadow images or not. It should be noted that the feature extraction network is active throughout the training processStatus. The loss function of the network consists of two parts: the first part is the shadow detection loss function

And calculating a loss value at each pixel point according to a two-class cross entropy loss function, wherein the loss value of the whole shadow mask is the sum of the two-class cross entropy loss values of all the pixel points.

The second part is a shadow classification loss function

In order to realize the classification of shadow images, the invention adopts a two-classification cross loss function as follows:

representing the image class predicted by the network.

In order to realize end-to-end training, the invention proposes the following training strategy, in each round of training, the training of shadow detection is carried out once, and the training of shadow classification is carried out twice, as shown in the following formula:

where i represents the number of iterations.

In the testing phase, the classification module is discarded, the input to the model is the shadow image, and the output of the model is the predicted shadow mask.

As shown in fig. 2, an example of the present invention provides a shadow image generation method based on generation of a countermeasure network, including:

the bidirectional generation network of Cycle-GAN is adjusted to a unidirectional generation network so that the network can focus on learning the conversion of the shadowless image to the shadow image.

Adding between real shadow image and generated shadow image

And (4) loss function, so that the generated image has more real details.

As shown in fig. 3, an example of the present invention provides a shadow detection method based on shadow image classification assistance, where the method includes:

and a shadow image classification module is added, a full connection layer with 1000-dimensional input and 2-dimensional output is added to the last layer of ResNeXt101, and the network is converted from 1000 classification to two classification, so that the network can learn more robust shadow features, non-shadow features are inhibited, and the shadow features are enhanced.

And adjusting a training strategy, wherein in each round of training, the shadow detection task is trained for 1 time, and the shadow classification task is trained for 2 times.

In order to verify the effectiveness of the shadow detection method based on data set expansion and shadow image classification assistance provided by the embodiment of the invention, a BDRAR and DSC shadow detection model is adopted as a basic network for carrying out experiments; using the pytorech deep learning framework, the training environment is: ubuntu 16.04, Cuda 10.0, Cudnn7.6.5, GPU (Titan V.times.4), python 3.6.14.

For quantitative evaluation of the results of shadow detection, a Balanced Error Rate (BER) was used as an evaluation index, which was calculated as follows:

in addition, the error of the shadow area which is not detected is represented by PE, and the error of the shadow area which is not detected is represented by NE, which are calculated as follows

Wherein T is_p,N_p,T_nAnd N_nRespectively representing the number of pixels correctly detected as a shadow area, the number of pixels in the shadow area in the label, the number of pixels correctly detected as a non-shadow area, and the number of pixels in the non-shadow area in the label.

Results of the experiment

In this example, the evaluation index BER is used to evaluate the shadow detection performance of the model, and table 1 shows the test results of BDRAR and DSC training on the original data set and the extended data set, as shown in table 1, the ShadowGAN proposed by the present invention can effectively extend the data set

Table 1 test results of BDRAR and DSC training on raw and augmented data sets

Similarly, for BDRAR and DSC, the original model and the shaded image classified model were trained on the original data set with the test results shown in table 2.

Table 2 test results of BDRAR and DSC trained on raw data sets with shadow image classification

In order to verify the compatibility of the data set expansion method provided by the invention and the shadow image classification module, the BDRAR and the DSC added with the shadow image classification module are trained on the expanded data set, and the obtained results are as follows:

TABLE 3 results of data set expansion in combination with shadow image classification module

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A shadow detection method assisted by data set expansion and shadow image classification is characterized by comprising the following steps:

step 2: adding a shadow image classification task into a shadow detection network; the network model after the classification of the added shadow image is abstracted into three parts, namely: the system comprises a feature extraction network, a shadow detection module and a shadow image classification module; the feature extraction network is of a pyramid structure and is used for extracting features including shadow edges and semantics, the input of the shadow detection module is a feature map of a feature pyramid and is used for predicting a shadow mask, and the shadow image classification module is used for judging whether a shadow area exists in an image and classifying the image into a shadow-free image and a shadow image;

2. The method as claimed in claim 1, wherein the ShadowGAN network is a one-way network that is generated by adapting a two-way network model of Cycle-GAN to learn the transformation from the shadow image to the shadow image, and adds the real shadow image and the shadow image to each other

Loss function, naming the network as ShadowGAN.

3. The method of claim 2, wherein the data expansion using ShadowGAN is divided into two stages: in the training stage, the input of the generator is a shadow-free image and a shadow mask image, and the shadow mask designates the shadow-generated area and outputs the shadow-generated area as a shadow image; the input of the discriminator is the generated shadow image and the real shadow image, the output is the judgment result of the category of the input image, 1 represents the shadow image, and 0 represents the non-shadow image;

4. The method of claim 3, wherein the generator and the discriminator employ countermeasure training in the data expansion, and are added in the shadow detection method with the assistance of data set expansion and shadow image classification

After the loss function, the loss function of the network is:

wherein

I_fAnd I_maskRespectively representing a true shadow image, a non-shadow image and a corresponding shadow mask, P_dataWhich represents the distribution of the data, is,

represents a merge operation, | · |₁Represents

The loss function, G is the generator, and D is the discriminator.

5. The method for detecting the shadow assisted by the data set expansion and the shadow image classification as claimed in any one of the claims to 4, wherein in the step 2, the feature extraction network is a pre-training model of ResNeXt101 on ImageNet, parameters are updated in the training process, a full connection layer with the input of 1000 dimensions and the output of 2 dimensions is added to the full connection layer of ResNeXt 101; in the training process, when the shadow detection module is activated, the input of the network is a shadow image and a shadow mask, and the output is a shadow detection result; when the shadow classification module is activated, the input of the network is shadow images and unshaded images, and the output is the classification result of whether the images are shadow images or not.

6. The method of claim 5, wherein the feature extraction network is activated during the whole training process.

7. The method of claim 5, wherein the loss function of the shadow detection network comprises two parts: the first part is the shadow detection loss function

the second part is a shadow classification loss function

representing the image class predicted by the network.

8. The method of claim 5, wherein the training strategy is to implement end-to-end training by using a shadow detection method assisted by data set expansion and shadow image classification as follows: in each round of training, the training of shadow detection is performed once, and the training of shadow classification is performed twice, as shown in the following formula: