CN106447721B

CN106447721B - Image shadow detection method and device

Info

Publication number: CN106447721B
Application number: CN201610817703.2A
Authority: CN
Inventors: 姚聪; 周舒畅; 周昕宇; 何蔚然; 印奇
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2016-09-12
Filing date: 2016-09-12
Publication date: 2021-08-10
Anticipated expiration: 2036-09-12
Also published as: CN106447721A

Abstract

The embodiment of the invention provides an image shadow detection method and device. The image shadow detection method comprises the following steps: acquiring an image to be detected; and processing the image to be detected by using a full convolution network to obtain a detection result of the position information of the shadow area in the image to be detected. According to the image shadow detection method and device provided by the embodiment of the invention, the shadow area in the image can be effectively and accurately detected by using the full convolution network, so that the accuracy and reliability of image identification in an image identification task are improved. In addition, the method has the characteristics of high processing speed and small model volume, so that the method can be conveniently deployed on mobile equipment such as a smart phone, a tablet computer and the like.

Description

Image shadow detection method and device

Technical Field

The present invention relates to the field of image processing, and more particularly, to a method and an apparatus for detecting image shadows.

Background

For many image recognition tasks (e.g., image classification, text recognition, object tracking, etc.), shadows in the image are a serious disturbing factor that can affect the accuracy and stability (i.e., reliability) of image recognition. Therefore, if the shadow can be judged and the area where the shadow is located can be predicted before the image recognition task is executed, the difficulty of the image recognition task can be effectively reduced, and the accuracy and the stability of the image recognition can be improved. However, a mature technique or system that can accurately detect shadows in an image is currently lacking.

Disclosure of Invention

The present invention has been made in view of the above problems. The invention provides an image shadow detection method and device.

According to an aspect of the present invention, there is provided an image shadow detection method. The image shadow detection method comprises the following steps: acquiring an image to be detected; and processing the image to be detected by using a full convolution network to obtain a detection result of the position information of the shadow area in the image to be detected.

Illustratively, the detection result comprises a shadow probability map, and the pixel value of each pixel in the shadow probability map represents the probability that a shadow exists at the pixel with the same coordinate as the pixel in the image to be detected.

Illustratively, the image shadow detection method further comprises: acquiring training data, wherein the training data comprises a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used for indicating the position of a shadow region in the corresponding sample image; and carrying out neural network training by using the training data to obtain the full convolution network.

Illustratively, the mask information is a binary image of the same size as the corresponding sample image, pixels of the binary image having the same pixel coordinates as those located inside the shadow region of the corresponding sample image have a first pixel value, and pixels of the binary image having the same pixel coordinates as those located outside the shadow region of the corresponding sample image have a second pixel value.

Illustratively, the acquiring training data includes: acquiring an initial image set, wherein initial images in the initial image set correspond to sample images in the sample image set in a one-to-one mode; generating a predetermined number of shadow regions for each initial image in the set of initial images; for each initial image in the set of initial images, superimposing the initial image with the predetermined number of shaded regions to obtain a sample image corresponding to the initial image; and for each initial image in the initial image set, obtaining mask information corresponding to the sample image corresponding to the initial image according to the superposition position of the predetermined number of shadow areas in the initial image.

Illustratively, the generating the predetermined number of shadow regions includes: generating a plurality of image blocks; randomly selecting a number of image blocks equal to the predetermined number from the plurality of image blocks; and randomly setting the pixel values of the pixels in each of the selected image blocks to values within a preset shadow value range to obtain the predetermined number of shadow areas.

Illustratively, before the training of the neural network using the training data to obtain the full convolution network, the image shadow detection method further includes: scaling a size of a sample image in the sample image set to a standard size.

Illustratively, the scaling the size of the sample image in the sample image set to a standard size comprises: for each sample image in the set of sample images, scaling the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

Illustratively, the acquiring training data includes: receiving the sample image set and annotation data corresponding to each sample image in the sample image set, wherein the annotation data comprises contour data indicating a contour of a shadow region in the corresponding sample image; and generating mask information corresponding to each sample image in the sample image set from the profile data corresponding to the sample image.

Illustratively, before the processing the image to be detected by using the full convolution network, the image shadow detection method further includes: and scaling the size of the image to be detected into a standard size.

Illustratively, the scaling the size of the image to be detected to a standard size includes: and scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

Illustratively, the processing the image to be detected by using the full convolution network includes: inputting the image to be detected with a standard size into the full convolution network to obtain a primary result of position information on a shadow area in the image to be detected with a standard size; and obtaining the detection result according to the scaling of the image to be detected and the primary result.

Illustratively, the full convolutional network is configured to have the following network structure: the input layer, next two convolutional layers, connects one max pooling layer, next three convolutional layers.

According to another aspect of the present invention, there is provided an image shadow detection apparatus including: the image acquisition module is used for acquiring an image to be detected; and the processing module is used for processing the image to be detected by utilizing a full convolution network so as to obtain a detection result of the position information of the shadow area in the image to be detected.

Illustratively, the image shading detection apparatus further includes: a data acquisition module, configured to acquire training data, where the training data includes a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used to indicate a position of a shadow region in a corresponding sample image; and the training module is used for carrying out neural network training by utilizing the training data to obtain the full convolution network.

Illustratively, the data acquisition module includes: the initial image acquisition sub-module is used for acquiring an initial image set, wherein initial images in the initial image set correspond to sample images in the sample image set one by one; a shadow region generation submodule for generating a predetermined number of shadow regions for each initial image in the set of initial images; a superimposing sub-module for superimposing, for each initial image of the set of initial images, the initial image with the predetermined number of shadow regions to obtain a sample image corresponding to the initial image; and the mask information obtaining submodule is used for obtaining mask information corresponding to the sample image corresponding to the initial image according to the superposition positions of the preset number of shadow areas in the initial image for each initial image in the initial image set.

Illustratively, the shadow region generation submodule includes: an image block generation unit configured to generate a plurality of image blocks; an image block selecting unit configured to randomly select a number of image blocks equal to the predetermined number from the plurality of image blocks; and a pixel value setting unit for randomly setting pixel values of pixels in each of the selected image blocks to values within a preset shadow value range to obtain the predetermined number of shadow areas.

Illustratively, the image shading detection apparatus further includes: a first scaling module, configured to scale a size of a sample image in the sample image set to a standard size before the training module performs neural network training using the training data to obtain the full convolution network.

Illustratively, the first scaling module comprises: a first scaling sub-module for scaling, for each sample image of the set of sample images, the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

Illustratively, the data acquisition module includes: a receiving submodule for receiving the sample image set and annotation data corresponding to each sample image in the sample image set, respectively, wherein the annotation data comprises contour data for indicating a contour of a shadow region in the corresponding sample image; and a mask information generation submodule for generating mask information corresponding to each sample image in the sample image set from the profile data corresponding to the sample image.

Illustratively, the image shading detection apparatus further includes: and the second scaling module is used for scaling the size of the image to be detected to a standard size before the processing module utilizes the full convolution network to process the image to be detected.

Illustratively, the second scaling module comprises: and the second scaling submodule is used for scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

Illustratively, the processing module includes: an input sub-module for inputting the image to be detected in a standard size into the full convolution network to obtain a primary result regarding position information of a shadow region in the image to be detected in a standard size; and the detection result obtaining submodule is used for obtaining the detection result according to the scaling of the image to be detected and the primary result.

According to the image shadow detection method and device provided by the embodiment of the invention, the shadow area in the image can be effectively and accurately detected by using the full convolution network, so that the accuracy and reliability of image identification in an image identification task are improved. In addition, the method has the characteristics of high processing speed and small model volume, so that the method can be conveniently deployed on mobile equipment such as a smart phone, a tablet computer and the like.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 shows a schematic block diagram of an example electronic device for implementing an image shadow detection method and apparatus in accordance with embodiments of the invention;

FIG. 2 shows a schematic flow diagram of an image shadow detection method according to an embodiment of the invention;

FIG. 3 shows a schematic diagram of an image to be detected and a corresponding shadow probability map, according to one embodiment of the invention;

FIG. 4 shows a schematic block diagram of the training steps of a full convolutional network according to one embodiment of the present invention;

FIG. 5 is a flow diagram illustrating the steps of obtaining training data according to one embodiment of the present invention;

FIG. 6 shows a schematic block diagram of an image shadow detection arrangement according to an embodiment of the invention; and

FIG. 7 shows a schematic block diagram of an image shadow detection system according to one embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

In order to solve the above-mentioned problems, embodiments of the present invention provide a method and an apparatus for detecting a shadow in an image based on a full convolution network.

First, an exemplary electronic device 100 for implementing the image shadow detection method and apparatus according to an embodiment of the present invention is described with reference to fig. 1.

As shown in FIG. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images and/or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, etc.

The image capture device 110 may capture an image to be detected for shadow detection and store the captured image in the memory device 104 for use by other components. The image capture device 110 may be a camera. It should be understood that the image capture device 110 is merely an example, and the electronic device 100 may not include the image capture device 110. In this case, the image to be detected may be acquired by other image acquisition means and transmitted to the electronic apparatus 100, or the electronic apparatus 100 may download via a network or directly acquire the image to be detected from a local storage device (for example, the above-mentioned storage device 104).

Illustratively, an exemplary electronic device for implementing the image shadow detection method and apparatus according to embodiments of the present invention may be implemented on various devices having data calculation and processing capabilities, e.g., it may be implemented on a mobile device such as a smartphone, tablet, etc., a personal computer, or a remote server.

According to one aspect of the present invention, an image shadow detection method is provided. FIG. 2 shows a schematic block diagram of an image shadow detection method 200 according to one embodiment of the invention. As shown in fig. 2, the image shadow detection method 200 includes the following steps.

In step S210, an image to be detected is acquired.

The image to be detected may be any image for which shadow detection is required, such as an image for image recognition. The image to be detected can be an original image acquired by an image acquisition device such as a camera or the like, or an original image downloaded via a network or stored locally, or can be an image obtained after preprocessing the original image.

In step S220, the image to be detected is processed by using the full convolution network to obtain a detection result regarding the position information of the shadow area in the image to be detected.

The image may be represented in RGB, HSV, etc. color space. For an image with a shadow, the luminance component of the shadow region in the HSV space is generally lower than the luminance component of the non-shadow region in the HSV space, and thus, for example, the range of the shadow region may be determined according to the luminance value of each pixel in the image. That is, "shadow" and "shadow region" may be divided and defined based on parameters such as pixel luminance values, for example, the entire image may be divided into shadow regions and non-shadow regions.

The image to be detected can be input into the trained full convolution network. The full convolutional network may be represented in a semantic segmentation model M, which includes the network structure of the full convolutional network and its parameters. Through the processing of the full convolution network, the detection result of the position information of the shadow area in the image to be detected can be obtained at the output end of the full convolution network.

According to one embodiment of the invention, the detection result may comprise a shadow probability map P, the pixel value of each pixel in the shadow probability map P representing the probability that a shadow exists at the same pixel as the pixel coordinate in the image to be detected.

The shadow probability map P has the same size as the image to be detected, and pixels having the same coordinates can be regarded as pixels corresponding to each other. In one example, the shadow probability map P is a grayscale image, and the pixel values of its pixels are the grayscale values of its pixels. In the case of using white pixels to represent a shadow region and black pixels to represent a non-shadow region, the greater the pixel value of a certain pixel in the shadow probability map P, the greater the probability that a shadow exists at the corresponding pixel representing the image to be detected. FIG. 3 shows a schematic diagram of an image to be detected and a corresponding shadow probability map, according to one embodiment of the invention. Referring to fig. 3, the left side shows an original image to be detected, which is an image collected for an identification card, and the right side shows a detection result output after processing the image to be detected by using a full convolution network, that is, the shadow probability map. As shown in fig. 3, a shadow region exists at the lower left corner of the to-be-detected image on the left side, the shadow region appears obviously white at the corresponding position in the shadow probability map on the right side, the remaining positions corresponding to the non-shadow regions appear obviously black, and the boundary between the corresponding position of the shadow region and the corresponding position of the non-shadow region is clear. In the embodiment shown in fig. 3, the larger the pixel value of a certain pixel in the shadow probability map P, the greater the probability that a shadow exists at the corresponding pixel representing the image to be detected. Meanwhile, the shadow probability map P shown in fig. 3 can clearly and definitely determine the position of the shadow area in the image to be detected.

Of course, it is understood that, in the case where the shaded region is represented by a blackout pixel and the non-shaded region is represented by a whiteout pixel, the smaller the pixel value of a certain pixel in the shading probability map P, the greater the probability that the shading exists at the corresponding pixel representing the image to be detected, which is contrary to the above case. In this case, the pixel colors of the black and white portions of the shaded probability map on the right side of fig. 3 will be interchanged.

According to the image shadow detection method provided by the embodiment of the invention, whether the shadow exists can be directly detected by utilizing the trained full convolution network, and the area where the shadow is located can be accurately segmented, so that the method has the characteristics of high precision and strong adaptability, and the precision and reliability of image identification in related image identification tasks can be greatly improved. In addition, the image shadow detection method provided by the embodiment of the invention has the characteristics of high processing speed and small model volume, so that the method can be conveniently deployed on mobile equipment such as a smart phone, a tablet computer and the like.

Illustratively, the image shadow detection method according to embodiments of the present invention may be implemented in a device, apparatus or system having a memory and a processor.

The image shadow detection method provided by the embodiment of the invention can be deployed at an image acquisition end, for example, a smart phone end. Alternatively, the image shadow detection method according to the embodiment of the present invention may also be distributively deployed at the server side (or cloud side) and the client side. For example, an image to be detected may be collected at a client, the client transmits the collected image to be detected to a server (or a cloud), and the server (or the cloud) performs image shadow detection.

According to an embodiment of the present invention, the image shadow detection method 200 may further include a training step of a neural network. Fig. 4 shows a schematic block diagram of a training step S400 of a neural network according to one embodiment of the present invention.

As shown in fig. 4, the training step S400 of the neural network may include the following steps.

In step S410, training data is obtained, wherein the training data includes a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used for indicating a position of a shadow region in the corresponding sample image.

The sample image may be any suitable shadow-containing image. The sample image may be an original image acquired by a camera, or an original image downloaded via a network or stored locally, or an image obtained after preprocessing the original image.

The sample image set may include any number of sample images, including but not limited to the case of including only one sample image. Illustratively, a large number (e.g., more than 5000) of sample images may be collected in advance and input to the electronic device 100, and the electronic device 100 performs neural network training using the sample images.

Each sample image has corresponding mask information. In one example, the sample image may be an originally acquired natural image containing a shadow, in which case, a region where the shadow is located in the sample image may be manually marked to obtain marking data, and the mask information may be generated based on the marking data. In another example, the sample images may be synthesized images including shadow regions synthesized using images containing no or substantially no shadows, in which case mask information corresponding to each sample image may be generated according to the synthesis situation.

The mask information may be in any suitable form, primarily for indicating the location of the shaded region in the corresponding sample image. In one example, the mask information may be contour data regarding the contour of a shadow region in the corresponding sample image. The contour data may include coordinates of points on the contour of the shaded region. The position of the shadow area can be known by marking the outline of the shadow area. In another example, the mask information may be a binary image of the same size as the corresponding sample image, and this example will be described in detail below. The mask information in the above form is only an example and not a limitation, and the present invention is not limited thereto.

In step S420, training a neural network with training data to obtain the full convolution network.

Taking each sample image and mask information (such as binary image) corresponding to the sample image as input, training a neural network to obtain the full convolution network. The full convolution network has initial parameters, which may be empirical values, that are continuously updated during the training process, and the desired final parameters are obtained when the training is completed. In step S420, the full convolutional network may be trained by using a conventional neural network training method, for example, by using a back propagation algorithm. The full convolution network finally obtained after training can be used for image shadow detection.

By the training method, the full convolution network with higher detection precision can be obtained through training.

According to the embodiment of the present invention, the mask information is a binary image of the same size as the corresponding sample image, pixels of the binary image having the same pixel coordinates as those located inside the shadow region of the corresponding sample image have the first pixel values, and pixels of the binary image having the same pixel coordinates as those located outside the shadow region of the corresponding sample image have the second pixel values.

The mask information may be obtained by performing binarization processing on the sample image. The image binarization is to set the gray value of a pixel point on the image to be 0 or 255, that is, the whole image presents an obvious black-and-white effect. In one example, a binary image may be obtained by setting the grayscale value of the pixels inside the shadow region in the sample image to 255 and the grayscale values of the remaining pixels to 0. In this way, in the binary image, the shadow area appears white, the rest appears black, and the boundary between the shadow area and the unshaded area is very clear, so that the shadow area is divided. The binary image obtained according to the present example is similar in presentation effect to the right image shown in fig. 3. In another example, a binary image may be obtained by setting the grayscale value of the pixel inside the shadow region in the sample image to 0, and the grayscale values of the remaining pixels to 255. In this way, in the binary image, the shadow area appears black, and the rest appears white, so that the shadow area can be divided.

Therefore, the binary image has the same size as the sample image and has one-to-one correspondence with the pixels, and the binary image can be considered to be converted from the sample image, and the difference between the binary image and the sample image is that the binary image mainly emphasizes a shadow region and is not concerned with the rest information contained in the image.

Since the binary image only contains two gray-scale value data, the image is very simple, the data size is small, and the outline of the interested target can be highlighted. In addition, the amount of calculation required when processing a binary image is also relatively small. Therefore, the method for recording the position of the shadow area in the sample image by using the binary image is a simple and efficient way, and is convenient for the subsequent training of the full convolution network.

As described above, the sample image may be obtained by means of image synthesis, which is described below in conjunction with fig. 5. Fig. 5 shows a flowchart of the step of acquiring training data (step S410) according to one embodiment of the present invention. According to the present embodiment, step S410 may include the following steps.

In step S412, an initial image set is obtained, wherein the initial images in the initial image set correspond to the sample images in the sample image set in a one-to-one manner.

The initial image may be any suitable natural image that may contain as little shadow as possible, preferably the initial image contains no shadow at all. The initial image contains less shadows, so that the detection accuracy of the trained full convolution network is higher. The initial image may be an original image acquired by an image acquisition device such as a camera, or an original image downloaded via a network or stored locally, or may be an image obtained after preprocessing the original image.

A large number (e.g., more than 5000) of initial images may be collected, which may be denoted as the set S ═ { I ═ I₁,I₂,...,I_NI.e. the initial image set, where N represents the number of initial images.

In step S414, a predetermined number of shaded regions are generated for each initial image in the initial image set.

For each initial image I in the set S_kN, H is a predetermined number as described herein, which is a configurable parameter. The number of the shadow areas generated in step S414 may be set as needed, which is not limited by the present invention. In one example, the predetermined number of value ranges may be [1,5 ]]. It should be understood that the predetermined number of value ranges may be set according to actual requirements. In general, there may not be too many regions in an image where shadows exist, so the number of shadow regions generated in step S414 does not need to be too large. The number of shadow regions generated (i.e., the predetermined number) may or may not be the same for any two different initial images in the initial image set.

In one example, step S414 may include: generating a plurality of image blocks; randomly selecting a number of image blocks equal to the predetermined number from the plurality of image blocks; and randomly setting pixel values of pixels in each of the selected image blocks to values within a preset shadow value range to obtain a predetermined number of shadow areas.

The number of generated image blocks may coincide with the number of required shadow areas, or the number of generated image blocks may be larger than the number of required shadow areas. A number equal to the predetermined number of image blocks may then be selected from the generated image blocks such that the selected image blocks correspond one-to-one to the predetermined number of shadow areas. A number equal to the predetermined number of image blocks may be obtained by choosing in an arbitrary way over the image (which may be an arbitrary blank image). For example, a straight line may be randomly generated on a blank image, the straight line dividing the image into two parts, and then one of the parts may be randomly selected as a number equal to a predetermined number of image blocks, in which case the predetermined number is 1. The predetermined number is 2 if both parts divided by the straight line are taken as the number equal to the predetermined number of image blocks. According to another example, a figure of arbitrary shape, such as a circle, a square, a triangle, etc., can be generated on a blank image, and the image block enclosed by the outline of the figure is the desired image block. How many such patterns can be generated for what the desired predetermined number is. For example, assuming that the predetermined number is 5, 5 graphics may be generated, wherein the shape and/or size of any two different graphics may be the same or different.

The preset shadow value range may be any suitable range, and the present invention is not limited thereto. For example, the preset shadow range may be [ -255, -128], and values within this range refer to gray scale values. The pixel value of each pixel in the image block may be randomly set, i.e. the pixel values of different pixels may be different. However, it should be understood that all pixels in the same image block may be set to the same pixel value, and in addition, pixels in any two different image blocks may also be set to the same pixel value.

In the above manner, a predetermined number of shadow areas can be automatically generated.

In step S416, for each initial image in the initial image set, the initial image is superimposed with a predetermined number of shaded regions to obtain a sample image corresponding to the initial image.

The sample image can be synthesized by superimposing the shaded area with the initial image. The overlapping position of each shadow area can be arbitrarily set. The generated shadow area is compared with the initial image I_kAre superposed to obtain a synthesized image (i.e. sample image) I'_kThe plurality of synthesized images are grouped together to form a sample image group.

In step S418, for each initial image in the initial image set, mask information corresponding to the sample image corresponding to the initial image is obtained according to the superimposition positions of the predetermined number of shadow regions in the initial image.

In the case of determining the overlapping position of the shadow region in the initial image (which may be before, after, or simultaneously with the overlapping of the shadow region and the initial image), the mask information described herein may be determined according to the overlapping position of the shadow region. For example, in the case where the mask information is a binary image, the gradation value of a pixel in the sample image within the superimposition position of the shadow region may be set to 255, and the gradation value of a pixel in the sample image outside the superimposition position of the shadow region may be set to 0, thereby obtaining a binary image L_k. The binary image L obtained in the above manner_kIs one from sample image I'_kImages of equal size, for I'_kShaded areas, L_kThe gray scale value of the pixel at the corresponding position in the image is 255, and the gray scale values of the pixels at the rest positions are 0.

Through the above steps, one set T { (I { 'may be constructed'₁,L₁),(I′₂,L₂),...,(I′_N,L_N) And the set comprises a sample image set and mask information corresponding to each sample image in the sample image set, wherein the set T is the training data.

The required training data is obtained by adopting the mode of automatically synthesizing the sample image, so that the complicated process of manually collecting and marking the data can be omitted, and the time cost and the labor cost can be reduced. In addition, since the shadow area is generated by calculation, the position of the shadow area is controllable, and the accuracy of the mask information for indicating the position of the shadow area is high, so that the detection accuracy of the trained full convolution network is high.

According to the embodiment of the present invention, before step S420, the image shadow detection method 200 may further include: the sizes of the sample images in the sample image set are scaled to a standard size.

The full convolution network can process images of various sizes, so that sample images of original sizes can be directly input into the full convolution network for training. Of course, it is understood that the sample image may also be scaled to a standard size, i.e., normalized, before being input into the full convolution network. The image is too small to be convenient for identifying shadow areas, and the image is too large and has larger operation amount, so that the sample image can be normalized to be in a proper size and then input into a full convolution network for training.

According to an embodiment of the present invention, scaling the size of the sample image in the sample image set to a standard size may include: for each sample image in the sample image set, scaling the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

For each sample image, the height and width of the sample image may be compared, and for the larger of the two, scaled to a standard size with the aspect ratio unchanged. The standard size may be set as needed, for example, it may be 160 pixels, 192 pixels, 224 pixels, 256 pixels, and the like.

It is to be noted that, in the above-described embodiment in which the mask information is a binary image, the binary image is to be kept in conformity with the size of the sample image. Therefore, if the sample image is scaled, the binary image may be scaled accordingly, or after the sample image is scaled, the binary image may be generated based on the scaled sample image.

According to one embodiment of the present invention, the shadow area can be obtained by labeling. In this case, step S410 may include: receiving a sample image set and annotation data corresponding to each sample image in the sample image set, wherein the annotation data comprises contour data indicating a contour of a shadow region in the corresponding sample image; and generating mask information corresponding to each sample image in the sample image set from the profile data corresponding to the sample image.

As described above, the training data may be obtained by way of manual collection and labeling. In this case, the sample image may be a natural image containing a shadow. After a large number of sample images are collected, the shadow regions on each sample image can be labeled by a labeling person, for example, typically by labeling points on the outline of the shadow regions, from which labeling data can be obtained. After receiving the sample image and the corresponding annotation data, the electronic device 100 can determine the position of the shadow region in the sample image according to the annotation data, so that the mask information can be generated. For example, after receiving the sample image and the corresponding annotation data, the grayscale value of the pixel inside the outline of the shadow region in the sample image may be set to 255 and the grayscale value of the pixel outside the outline of the shadow region in the sample image may be set to 0 according to the annotation data to obtain a binary image.

According to one embodiment of the invention, the full convolutional network is configured to have the following network structure: the input layer, next two convolutional layers, connects one max pooling layer, next three convolutional layers.

The full convolutional network is a deep neural network structure, and mainly comprises a convolutional layer (convolutional layer), a pooling layer (pooling layer) and an up-sampling layer (up-sampling layer). In the embodiment, the equivalent convolutional layer is adopted to replace a fully-connected layer (full-connected layer) in the traditional full convolutional network, which can simplify the operation to a certain extent and reduce the data calculation amount. Of course, a full convolutional network according to an embodiment of the present invention may also be implemented without using an equivalent convolutional layer instead of a fully-connected layer, and still using a conventional fully-connected layer.

Receiving training data or an image to be detected by an input layer of the full convolution network; the next are two convolutional layers, illustratively, each of which has a number of filters of 6 and a filter size of 3x 3; subsequently, a max-pooling layer (max-pooling layer) is attached; the next are two convolutional layers, illustratively, each of which has a number of filters of 12 and a filter size of 3x 3; subsequently, connecting a maximum pooling layer; the next are three convolutional layers, illustratively, each of the three convolutional layers has a number of filters of 16 and a filter size of 3x 3; subsequently, connecting a maximum pooling layer; the next are three convolutional layers, each of which has a number of filters of 24 and a filter size of 3x 3; subsequently, connecting a maximum pooling layer; the next are three convolutional layers, each of which has a number of filters of 32 and a filter size of 3x 3.

The sample set T may be sent to a designed full convolution network for training to obtain the model M. In the training process, a sample image and corresponding mask information are input into the model M each time, the initial learning rate is 0.00000001, and the learning rate is reduced to 1/10 after 10000 iterations. After 100000 iterations, the training process is terminated and a trained full convolution network can be obtained. Subsequently, when the shadow in the image needs to be detected, the image to be detected can be input into the trained full convolution network for processing, and the shadow probability map is output.

It should be understood that the network structure of the full convolutional network described above is merely an example and not a limitation, and the full convolutional network may have any other suitable network structure, wherein the convolutional layers, the pooling layers, and the like may have other suitable numbers, and the filters in each layer may have other suitable numbers and sizes. The maximum pooling layer may be replaced with another pooling layer such as an average pooling layer.

According to an embodiment of the present invention, before step S220, the image shadow detection method 200 may further include: and scaling the size of the image to be detected to be a standard size.

As described above, the full convolution network can process images of various sizes, and thus, an image to be detected of an original size can be directly input to the full convolution network for processing. Of course, similar to scaling the sample image, the image to be detected may also be scaled during processing of the image to be detected using the full convolution network. Similarly, since the image is too small to be convenient for identifying the shadow area and too large to be calculated, the image to be detected can be normalized to be of a proper size and then input into the full convolution network for processing. It is noted that the scale of the image to be detected may be the same as or different from the scale of any of the sample images. It is to be understood that the scaling may be the same or different for different sample images in the sample image set.

According to an embodiment of the present invention, the scaling the size of the image to be detected to the standard size may include: and scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

The height and width of the image to be detected can be compared and scaled to a standard size for the larger of the two, while the aspect ratio is unchanged. The standard size may be set as needed, for example, it may be 160 pixels, 192 pixels, 224 pixels, 256 pixels, and the like.

According to an embodiment of the present invention, step S220 may include: inputting the standard-size image to be detected into a full convolution network to obtain a primary result of position information of a shadow area in the standard-size image to be detected; and obtaining a detection result according to the scaling and the primary result of the image to be detected.

If the image to be detected is zoomed before the image to be detected is processed by the full convolution network, the size of the image to be detected is changed, so that the result (primary result) output by the full convolution network corresponds to the zoomed image to be detected instead of the original image to be detected, and the detection result corresponding to the original image to be detected can be further obtained according to the zoom ratio of the image to be detected and the primary result. For example, if the shadow probability map is output by the full convolution network, and if the image to be detected is enlarged by 2 times, the shadow probability map directly output by the full convolution network may be scaled opposite to the image to be detected, that is, reduced by 2 times, so as to obtain a final detection result, which is a shadow probability map in which pixels correspond to pixels of the original image to be detected one by one.

According to another aspect of the present invention, there is provided an image shadow detecting apparatus. FIG. 6 shows a schematic block diagram of an image shadow detection apparatus 600 according to an embodiment of the invention.

As shown in fig. 6, the image shadow detection apparatus 600 according to the embodiment of the present invention includes an image acquisition module 610 and a processing module 620. The various modules may perform the various steps/functions of the image shadow detection method described above in connection with fig. 2-5, respectively. Only the main functions of the respective blocks of the image shadow detection apparatus 600 will be described below, and details that have been described above will be omitted.

The image obtaining module 610 is used for obtaining an image to be detected. The image acquisition module 610 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

The processing module 620 is configured to process the image to be detected by using a full convolution network to obtain a detection result regarding the position information of the shadow area in the image to be detected. The processing module 620 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

According to the embodiment of the invention, the detection result comprises a shadow probability map, and the pixel value of each pixel in the shadow probability map represents the probability that a shadow exists at the pixel with the same coordinate in the image to be detected.

According to the embodiment of the present invention, the image shadow detection apparatus 600 further includes: a data acquisition module, configured to acquire training data, where the training data includes a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used to indicate a position of a shadow region in a corresponding sample image; and the training module is used for carrying out neural network training by utilizing the training data to obtain the full convolution network.

According to an embodiment of the present invention, the mask information is a binary image of the same size as the corresponding sample image, pixels of the binary image having the same pixel coordinates as those located inside the shadow area of the corresponding sample image have a first pixel value, and pixels of the binary image having the same pixel coordinates as those located outside the shadow area of the corresponding sample image have a second pixel value.

According to an embodiment of the present invention, the data obtaining module 610 includes: the initial image acquisition sub-module is used for acquiring an initial image set, wherein initial images in the initial image set correspond to sample images in the sample image set one by one; a shadow region generation submodule for generating a predetermined number of shadow regions for each initial image in the set of initial images; a superimposing sub-module for superimposing, for each initial image of the set of initial images, the initial image with the predetermined number of shadow regions to obtain a sample image corresponding to the initial image; and the mask information obtaining submodule is used for obtaining mask information corresponding to the sample image corresponding to the initial image according to the superposition positions of the preset number of shadow areas in the initial image for each initial image in the initial image set.

According to an embodiment of the present invention, the shadow region generation submodule includes: an image block generation unit configured to generate a plurality of image blocks; an image block selecting unit configured to randomly select a number of image blocks equal to the predetermined number from the plurality of image blocks; and a pixel value setting unit for randomly setting pixel values of pixels in each of the selected image blocks to values within a preset shadow value range to obtain the predetermined number of shadow areas.

According to the embodiment of the present invention, the image shadow detection apparatus 600 further includes: a first scaling module, configured to scale a size of a sample image in the sample image set to a standard size before the training module performs neural network training using the training data to obtain the full convolution network.

According to an embodiment of the present invention, the first scaling module includes: a first scaling sub-module for scaling, for each sample image of the set of sample images, the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

According to an embodiment of the present invention, the data obtaining module 610 includes: a receiving submodule for receiving the sample image set and annotation data corresponding to each sample image in the sample image set, respectively, wherein the annotation data comprises contour data for indicating a contour of a shadow region in the corresponding sample image; and a mask information generation submodule for generating mask information corresponding to each sample image in the sample image set from the profile data corresponding to the sample image.

According to the embodiment of the present invention, the image shadow detection apparatus 600 further includes: and the second scaling module is used for scaling the size of the image to be detected to a standard size before the processing module utilizes the full convolution network to process the image to be detected.

According to an embodiment of the present invention, the second scaling module includes: and the second scaling submodule is used for scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

According to an embodiment of the present invention, the processing module 620 includes: an input sub-module for inputting the image to be detected in a standard size into the full convolution network to obtain a primary result regarding position information of a shadow region in the image to be detected in a standard size; and the detection result obtaining submodule is used for obtaining the detection result according to the scaling of the image to be detected and the primary result.

According to an embodiment of the invention, the full convolutional network is configured to have the following network structure: the input layer, next two convolutional layers, connects one max pooling layer, next three convolutional layers.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

FIG. 7 shows a schematic block diagram of an image shadow detection system 700 according to one embodiment of the invention. Image shadow detection system 700 includes an image acquisition device 710, a storage device 720, and a processor 730.

The image acquisition device 710 is used for an image to be detected. Image capture device 710 is optional and image shadow detection system 700 may not include image capture device 710.

The storage 720 stores program codes for implementing respective steps in the image shadow detection method according to the embodiment of the present invention.

The processor 730 is configured to run the program codes stored in the storage 720 to execute the corresponding steps of the image shadow detection method according to the embodiment of the invention, and is configured to implement the image acquisition module 610 and the processing module 620 in the image shadow detection apparatus 600 according to the embodiment of the invention.

In one embodiment, the program code, when executed by the processor 730, causes the image shadow detection system 700 to perform the steps of: acquiring an image to be detected; and processing the image to be detected by using a full convolution network to obtain a detection result of the position information of the shadow area in the image to be detected.

In one embodiment, the detection result comprises a shadow probability map, and the pixel value of each pixel in the shadow probability map represents the probability that a shadow exists at the pixel with the same coordinate in the image to be detected.

In one embodiment, the program code, when executed by the processor 730, further causes the image shadow detection system 700 to perform: acquiring training data, wherein the training data comprises a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used for indicating the position of a shadow region in the corresponding sample image; and carrying out neural network training by using the training data to obtain the full convolution network.

In one embodiment, the mask information is a binary image of the same size as the corresponding sample image, pixels of the binary image having the same pixel coordinates as those located inside the shadow region of the corresponding sample image have a first pixel value, and pixels of the binary image having the same pixel coordinates as those located outside the shadow region of the corresponding sample image have a second pixel value.

In one embodiment, the steps of acquiring training data performed by the image shadow detection system 700 when the program code is executed by the processor 730 include: acquiring an initial image set, wherein initial images in the initial image set correspond to sample images in the sample image set in a one-to-one mode; generating a predetermined number of shadow regions for each initial image in the set of initial images; for each initial image in the set of initial images, superimposing the initial image with the predetermined number of shaded regions to obtain a sample image corresponding to the initial image; and for each initial image in the initial image set, obtaining mask information corresponding to the sample image corresponding to the initial image according to the superposition position of the predetermined number of shadow areas in the initial image.

In one embodiment, the program code when executed by the processor 730 causes the image shadow detection system 700 to perform the step of generating a predetermined number of shadow regions comprising: generating a plurality of image blocks; randomly selecting a number of image blocks equal to the predetermined number from the plurality of image blocks; and randomly setting the pixel values of the pixels in each of the selected image blocks to values within a preset shadow value range to obtain the predetermined number of shadow areas.

In one embodiment, before the program code when executed by the processor 730 causes the image shadow detection system 700 to perform the step of neural network training using the training data to obtain the full convolutional network, the program code when executed by the processor 730 further causes the image shadow detection system 700 to perform: scaling a size of a sample image in the sample image set to a standard size.

In one embodiment, the program code, when executed by the processor 730, causes the image shadow detection system 700 to perform the step of scaling the size of a sample image in the sample image set to a standard size comprising: for each sample image in the set of sample images, scaling the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

In one embodiment, the steps of acquiring training data performed by the image shadow detection system 700 when the program code is executed by the processor 730 include: receiving the sample image set and annotation data corresponding to each sample image in the sample image set, wherein the annotation data comprises contour data indicating a contour of a shadow region in the corresponding sample image; and generating mask information corresponding to each sample image in the sample image set from the profile data corresponding to the sample image.

In one embodiment, before the program code when executed by the processor 730 causes the image shadow detection system 700 to perform the step of processing the image to be detected using a full convolutional network, the program code when executed by the processor 730 further causes the image shadow detection system 700 to perform: and scaling the size of the image to be detected into a standard size.

In one embodiment, the program code when executed by the processor 730 causes the image shadow detection system 700 to perform the step of scaling the size of the image to be detected to a standard size comprising: and scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

In one embodiment, the program code, when executed by the processor 730, causes the image shadow detection system 700 to perform processing of the image to be detected using a full convolutional network, including: inputting the image to be detected with a standard size into the full convolution network to obtain a primary result of position information on a shadow area in the image to be detected with a standard size; and obtaining the detection result according to the scaling of the image to be detected and the primary result.

In one embodiment, the full convolutional network is configured to have the following network structure: the input layer, next two convolutional layers, connects one max pooling layer, next three convolutional layers.

Further, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor, are used for executing the respective steps of the image shadow detection method according to an embodiment of the present invention, and for implementing the respective modules in the image shadow detection apparatus according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media.

In one embodiment, the computer program instructions, when executed by a computer or a processor, may cause the computer or the processor to implement the respective functional modules of the image shadow detection apparatus according to the embodiment of the present invention and/or may perform the image shadow detection method according to the embodiment of the present invention.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the steps of: acquiring an image to be detected; and processing the image to be detected by using a full convolution network to obtain a detection result of the position information of the shadow area in the image to be detected.

In one embodiment, the computer program instructions, when executed by a computer, further cause the computer to perform: acquiring training data, wherein the training data comprises a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used for indicating the position of a shadow region in the corresponding sample image; and carrying out neural network training by using the training data to obtain the full convolution network.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of obtaining training data comprising: acquiring an initial image set, wherein initial images in the initial image set correspond to sample images in the sample image set in a one-to-one mode; generating a predetermined number of shadow regions for each initial image in the set of initial images; for each initial image in the set of initial images, superimposing the initial image with the predetermined number of shaded regions to obtain a sample image corresponding to the initial image; and for each initial image in the initial image set, obtaining mask information corresponding to the sample image corresponding to the initial image according to the superposition position of the predetermined number of shadow areas in the initial image.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of generating a predetermined number of shadow regions comprising: generating a plurality of image blocks; randomly selecting a number of image blocks equal to the predetermined number from the plurality of image blocks; and randomly setting the pixel values of the pixels in each of the selected image blocks to values within a preset shadow value range to obtain the predetermined number of shadow areas.

In one embodiment, before the computer program instructions, when executed by a computer, cause the computer to perform the step of training a neural network with the training data to obtain the full convolutional network, the computer program instructions, when executed by a computer, further cause the computer to perform: scaling a size of a sample image in the sample image set to a standard size.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of scaling the size of a sample image of the sample image set to a standard size, comprising: for each sample image in the set of sample images, scaling the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of obtaining training data comprising: receiving the sample image set and annotation data corresponding to each sample image in the sample image set, wherein the annotation data comprises contour data indicating a contour of a shadow region in the corresponding sample image; and generating mask information corresponding to each sample image in the sample image set from the profile data corresponding to the sample image.

In one embodiment, before the computer program instructions, when executed by a computer, cause the computer to perform the steps of processing the image to be detected using a full convolutional network, the computer program instructions, when executed by a computer, further cause the computer to perform: and scaling the size of the image to be detected into a standard size.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of scaling the size of the image to be detected to a standard size comprising: and scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform processing the image to be detected using a full convolution network comprising: inputting the image to be detected with a standard size into the full convolution network to obtain a primary result of position information on a shadow area in the image to be detected with a standard size; and obtaining the detection result according to the scaling of the image to be detected and the primary result.

The modules in the image shadow detection system according to the embodiment of the invention may be implemented by a processor of an electronic device implementing image shadow detection according to the embodiment of the invention running computer program instructions stored in a memory, or may be implemented when computer instructions stored in a computer readable storage medium of a computer program product according to the embodiment of the invention are run by a computer.

According to the image shadow detection method and the image shadow detection device, whether the shadow exists can be directly detected by using the trained full convolution network, and the area where the shadow is located can be accurately segmented, so that the method has the characteristics of high precision and strong adaptability, and the precision and the reliability of image identification in related image identification tasks can be greatly improved. In addition, the image shadow detection method and the image shadow detection device have the characteristics of high processing speed and small model volume, and therefore can be conveniently deployed on mobile equipment such as a smart phone and a tablet personal computer.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some of the blocks in an image shadow detection apparatus according to an embodiment of the invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image shadow detection method, comprising:

acquiring an image to be detected; and

processing the image to be detected by using a full convolution network so as to obtain a detection result of position information of a shadow area in the image to be detected at an output end of the full convolution network, wherein the detection result comprises a shadow probability map, and a pixel value of each pixel in the shadow probability map represents the probability that a shadow exists at a pixel position which is in the image to be detected and has the same coordinate with the pixel.

2. The image shadow detection method according to claim 1, wherein the image shadow detection method further comprises:

acquiring training data, wherein the training data comprises a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used for indicating the position of a shadow region in the corresponding sample image; and

and carrying out neural network training by using the training data to obtain the full convolution network.

3. The image shadow detection method according to claim 2, wherein the mask information is a binary image of the same size as the corresponding sample image, pixels of the binary image having the same pixel coordinates as those located inside the shadow region of the corresponding sample image have a first pixel value, and pixels of the binary image having the same pixel coordinates as those located outside the shadow region of the corresponding sample image have a second pixel value.

4. The image shadow detection method of claim 2, wherein the acquiring training data comprises:

acquiring an initial image set, wherein initial images in the initial image set correspond to sample images in the sample image set in a one-to-one mode;

for each initial image of the set of initial images,

generating a predetermined number of shadow regions;

superimposing the initial image with the predetermined number of shaded regions to obtain a sample image corresponding to the initial image; and

and obtaining mask information corresponding to the sample image corresponding to the initial image according to the superposition positions of the predetermined number of shadow areas in the initial image.

5. The image shadow detection method of claim 4, wherein the generating a predetermined number of shadow regions comprises:

generating a plurality of image blocks;

randomly selecting a number of image blocks equal to the predetermined number from the plurality of image blocks; and

randomly setting pixel values of pixels in each of the selected image blocks to values within a preset shadow value range to obtain the predetermined number of shadow areas.

6. The image shadow detection method of claim 2, wherein prior to the training of the neural network with the training data to obtain the full convolution network, the image shadow detection method further comprises:

scaling a size of a sample image in the sample image set to a standard size.

7. The image shadow detection method of claim 6, wherein the scaling of the sizes of the sample images in the sample image set to a standard size comprises:

for each sample image in the set of sample images, scaling the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

8. The image shadow detection method of claim 2, wherein the acquiring training data comprises:

receiving the sample image set and annotation data corresponding to each sample image in the sample image set, wherein the annotation data comprises contour data indicating a contour of a shadow region in the corresponding sample image; and

mask information corresponding to each sample image in the sample image set is generated from the profile data corresponding to the sample image.

9. The image shadow detection method according to claim 1, wherein before the processing the image to be detected using the full convolution network, the image shadow detection method further comprises:

and scaling the size of the image to be detected into a standard size.

10. The image shadow detection method according to claim 9, wherein the scaling of the size of the image to be detected to a standard size comprises:

and scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

11. The image shadow detection method according to claim 9, wherein the processing the image to be detected using a full convolution network comprises:

inputting the image to be detected with a standard size into the full convolution network to obtain a primary result of position information on a shadow area in the image to be detected with a standard size; and

and obtaining the detection result according to the scaling of the image to be detected and the primary result.

12. The image shadow detection method of claim 1, wherein the full convolution network is configured to have a network structure of: the input layer, next two convolutional layers, connects one max pooling layer, next three convolutional layers.

13. An image shadow detection apparatus comprising:

the image acquisition module is used for acquiring an image to be detected; and

and the processing module is used for processing the image to be detected by utilizing a full convolution network so as to obtain a detection result of position information of a shadow area in the image to be detected at the output end of the full convolution network, wherein the detection result comprises a shadow probability map, and the pixel value of each pixel in the shadow probability map represents the probability that a shadow exists at a pixel position which has the same coordinate with the pixel in the image to be detected.

14. The image shadow detection device according to claim 13, wherein the image shadow detection device further comprises:

a data acquisition module, configured to acquire training data, where the training data includes a sample image set and mask information respectively corresponding to each sample image in the sample image set, and the mask information is used to indicate a position of a shadow region in a corresponding sample image; and

and the training module is used for carrying out neural network training by utilizing the training data to obtain the full convolution network.

15. The image shadow detection apparatus according to claim 14, wherein the mask information is a binary image of the same size as the corresponding sample image, pixels of the binary image having the same pixel coordinates as those located inside the shadow region of the corresponding sample image have a first pixel value, and pixels of the binary image having the same pixel coordinates as those located outside the shadow region of the corresponding sample image have a second pixel value.

16. The image shadow detection device according to claim 14, wherein the data acquisition module includes:

the initial image acquisition sub-module is used for acquiring an initial image set, wherein initial images in the initial image set correspond to sample images in the sample image set one by one;

a shadow region generation submodule for generating a predetermined number of shadow regions for each initial image in the set of initial images;

a superimposing sub-module for superimposing, for each initial image of the set of initial images, the initial image with the predetermined number of shadow regions to obtain a sample image corresponding to the initial image; and

and the mask information obtaining submodule is used for obtaining mask information corresponding to the sample image corresponding to the initial image according to the superposition positions of the preset number of shadow areas in the initial image for each initial image in the initial image set.

17. The image shadow detection apparatus of claim 16, wherein the shadow region generation sub-module comprises:

an image block generation unit configured to generate a plurality of image blocks;

an image block selecting unit configured to randomly select a number of image blocks equal to the predetermined number from the plurality of image blocks; and

a pixel value setting unit for randomly setting pixel values of pixels in each of the selected image blocks to values within a preset shadow value range to obtain the predetermined number of shadow areas.

18. The image shadow detection device according to claim 14, wherein the image shadow detection device further comprises:

a first scaling module, configured to scale a size of a sample image in the sample image set to a standard size before the training module performs neural network training using the training data to obtain the full convolution network.

19. The image shadow detection device of claim 18, wherein the first scaling module comprises:

a first scaling sub-module for scaling, for each sample image of the set of sample images, the greater of the height and width of the sample image to a standard size while keeping the aspect ratio of the sample image unchanged.

20. The image shadow detection device according to claim 14, wherein the data acquisition module includes:

a receiving submodule for receiving the sample image set and annotation data corresponding to each sample image in the sample image set, respectively, wherein the annotation data comprises contour data for indicating a contour of a shadow region in the corresponding sample image; and

and the mask information generation submodule is used for generating mask information corresponding to each sample image in the sample image set according to the contour data corresponding to the sample image.

21. The image shadow detection device according to claim 13, wherein the image shadow detection device further comprises:

and the second scaling module is used for scaling the size of the image to be detected to a standard size before the processing module utilizes the full convolution network to process the image to be detected.

22. The image shadow detection apparatus of claim 21, wherein the second scaling module comprises:

and the second scaling submodule is used for scaling the larger one of the height and the width of the image to be detected to a standard size while keeping the aspect ratio of the image to be detected unchanged.

23. The image shadow detection device of claim 21, wherein the processing module comprises:

an input sub-module for inputting the image to be detected in a standard size into the full convolution network to obtain a primary result regarding position information of a shadow region in the image to be detected in a standard size; and

and the detection result obtaining submodule is used for obtaining the detection result according to the scaling of the image to be detected and the primary result.

24. The image shadow detection apparatus according to claim 13, wherein the full convolution network is configured to have a network structure of: the input layer, next two convolutional layers, connects one max pooling layer, next three convolutional layers.