CN111507910A

CN111507910A - Single image reflection removing method and device and storage medium

Info

Publication number: CN111507910A
Application number: CN202010193974.1A
Authority: CN
Inventors: 田治仁; 张贵峰; 李锐海; 廖永力; 张巍; 龚博; 王俊锞; 黄增浩; 朱登杰; 何锦强
Original assignee: Research Institute of Southern Power Grid Co Ltd
Current assignee: Research Institute of Southern Power Grid Co Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2020-08-07
Anticipated expiration: 2040-03-18
Also published as: CN111507910B

Abstract

The invention discloses a method, a device and a storage medium for removing reflection of a single image, wherein the method comprises the following steps: acquiring a background image and a corresponding reflection image through manual shooting, and acquiring a reflection image according to superposition of the background image and the reflection image; inputting the reflection image into a pre-trained VGG-19 network for super-column feature extraction to obtain a feature set; inputting the feature set into a preset generation network to obtain a prediction background image and a prediction reflection image; inputting the predicted background image and the background image into a preset identification network to calculate to obtain an identification loss function of the identification network; training the generation network and the identification network is completed through repeated iterative calculation until the joint loss function and the identification loss function of the generation network are converged; and selecting a plurality of reflection images to perform reflection removing treatment so as to quantitatively evaluate the reflection removing effect. The method can extract high-level sensory information of the image and add the high-level sensory information into the training for generating the countermeasure network, thereby effectively solving the problem of reflection removal of the single image.

Description

Single image reflection removing method and device and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for removing reflection of a single image and a storage medium.

Background

The reflection removal of a single image usually utilizes a predetermined a priori information. First, the most common way is to separate the image layers by finding the minimized edges and corners using the sparse property of the natural image gradients, for example, using the constraint of gradient sparsity in combination with the data fidelity term of the laplacian domain to suppress image reflections. However, this approach relies on low-level heuristics and is limited in situations where high-level analysis of the results of the image is required, for example. Another a priori knowledge is that the image of the reflective layer is generally unfocused, smooth. However, algorithms based on this assumption cannot be applied to the case where the reflected image also has a strong contrast. None of these methods can effectively utilize the high-level sensory information of the image and solve the problem of de-reflection of the high-contrast reflected image.

Disclosure of Invention

The embodiment of the invention aims to provide a single image de-reflection method, a single image de-reflection device and a single image de-reflection storage medium, which can effectively extract high-level sensory information of an image, add the information into network training, combine the advantages of generating a confrontation network, effectively solve the problem of de-reflection of a single image and have satisfactory de-reflection effect on a high-contrast reflection image.

In order to achieve the above object, an embodiment of the present invention provides a method for removing reflection from a single image, including the following steps:

acquiring a background image and a corresponding reflection image through manual shooting, and obtaining a reflection image according to superposition of the background image and the reflection image;

inputting the reflection image into a pre-trained VGG-19 network for super-column feature extraction to obtain a feature set;

inputting the feature set into a preset generation network to obtain a prediction background image and a prediction reflection image; wherein the joint loss function of the generation network comprises a reconstruction loss function, a countermeasure loss function and a separation loss function of the supercolumn feature space;

inputting the prediction background image and the background image into a preset authentication network to calculate an authentication loss function of the authentication network;

training the generation network and the discrimination network through multiple iterative computations until the joint loss function and the discrimination loss function are converged;

and selecting a plurality of reflection images to carry out reflection removing treatment so as to quantitatively evaluate the reflection removing effect.

Preferably, the obtaining a reflected light image according to the superposition of the background image and the reflected image specifically includes:

acquiring a first gray value of the background image;

acquiring a second gray value of the reflection image;

and performing weighted calculation on the first gray value and the second gray value to obtain the reflection image.

Preferably, the convolutional layer of the VGG-19 network includes conv1_2, conv2_2, conv3_2, conv4_2, and conv5_ 2.

Preferably, the generation network comprises an input layer with convolution kernel of 1 × 1 and 8 hole convolution layers with convolution kernel of 3 × 3, wherein the last hole convolution layer generates two three-channel RGB images by using linear transformation.

Preferably, the joint loss function of the generation network includes a reconstruction loss function, a countermeasure loss function, and a separation loss function of the supercolumn feature space, and specifically includes:

the expression of the reconstruction loss function of the super-column characteristic space is

Wherein, L_feat(θ) is a reconstruction loss function of the supercolumn feature space, I, T and f_T(I;. theta.) are the reflected light image, the background image and the predicted background image, respectively, lambda;.)_lThe impact weight of the convolution layer of the l layer, omega, the set of image data of training, | · |. the luminance₁The vector representing the result of the convolution of the neural network takes the 1-norm, i.e. the sum of the absolute values of the elements of the vector, phi_l(x) Representing convolution operation of the first layer convolution layer of the VGG-19 network, and theta represents a generated network parameter;

the expression of the penalty function is

Wherein, L_adv(θ) is the countermeasure loss function, D (I, x) represents the probability that x is the background image corresponding to the reflection image I, and is obtained from the output of the discrimination network;

the separation loss function is expressed as

Wherein, L_excl(theta) is the separation loss function,

λ_Tand λ_RRespectively a first normalization parameter and a second normalization parameterTwo standardized parameters, | · | | non-conducting phosphor_F⊙ represents element multiplication, N is image down-sampling parameter, N is more than or equal to 1 and less than or equal to N, N is maximum value of image down-sampling parameter, f is Robenius norm_R(I; theta) are the predicted reflection maps respectively,

to predict the norm of the gradient of the background image,

is a modulus of the gradient of the predicted reflectance image;

the joint loss function of the generated network is L (theta) w₁L_feat(θ)+w₂L_adv(θ)+w₃L_excl(theta), where L (theta) is the joint loss function, w₁、w₂And w₃And the coefficients are respectively corresponding to the reconstruction loss function, the countermeasure loss function and the separation loss function of the supercolumn feature space.

Preferably, the authentication loss function of the authentication network is L_disc(θ)＝log D(I；f_T(I; theta)) -log D (I, T), wherein, L_disc(θ) is the discrimination loss function.

Preferably, the selecting of the plurality of reflection images for reflection removing processing to quantitatively evaluate the reflection removing effect specifically includes:

selecting a plurality of reflection images to perform reflection removing processing, and calculating the peak signal-to-noise ratio and the structural similarity between the prediction background image and the background image generated by the generation network so as to quantitatively evaluate the reflection removing effect

Another embodiment of the present invention provides an apparatus for de-reflecting a single image, the apparatus comprising:

the image set acquisition module is used for acquiring a background image and a corresponding reflection image through manual shooting, and acquiring a reflection image according to superposition of the background image and the reflection image;

the characteristic extraction module is used for inputting the reflection image into a pre-trained VGG-19 network for super-column characteristic extraction to obtain a characteristic set;

the prediction generation module is used for inputting the feature set into a preset generation network to obtain a prediction background image and a prediction reflection image; wherein the joint loss function of the generation network comprises a reconstruction loss function, a countermeasure loss function and a separation loss function of the supercolumn feature space;

the identification module is used for inputting the prediction background image and the background image into a preset identification network so as to calculate and obtain an identification loss function of the identification network;

the training module is used for completing the training of the generation network and the identification network through repeated iterative computation until the joint loss function and the identification loss function are converged;

and the evaluation module is used for selecting a plurality of reflection images to carry out reflection removing treatment so as to quantitatively evaluate the reflection removing effect.

The invention correspondingly provides an apparatus using a single image de-reflection method, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the single image de-reflection method according to any one of the above items when executing the computer program.

Another embodiment of the present invention provides a computer-readable storage medium, which includes a stored computer program, where when the computer program runs, the computer-readable storage medium controls an apparatus to execute the method for single image de-reflection as described in any one of the above.

Compared with the prior art, the single image reflection removing method, the single image reflection removing device and the single image reflection removing storage medium provided by the embodiment of the invention can obtain the prediction background image which is closer to the real background image by effectively extracting the high-grade sensory information of the image by means of the deep convolutional neural network and combining the optimization characteristics of the generated countermeasure network, thereby effectively solving the image reflection problem in image acquisition and having satisfactory reflection removing effect on the reflection image with high contrast.

Drawings

FIG. 1 is a schematic flow chart of a method for de-reflection of a single image according to an embodiment of the present invention;

FIG. 2 is a simplified flow diagram of a method for de-imaging a single image according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a generation network according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an authentication network according to an embodiment of the present invention;

FIG. 5 is a diagram of the anti-reflective contrast effect of 4 sets of reflective images, background images, predicted background images, and reflective images according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a single image de-reflection apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an apparatus using a single image de-reflection method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a schematic flow chart of a method for removing light reflection from a single image according to an embodiment of the present invention is shown, where the method includes steps S1 to S6:

s1, acquiring a background image and a corresponding reflection image through manual shooting, and obtaining a reflection image according to superposition of the background image and the reflection image;

s2, inputting the reflection image into a pre-trained VGG-19 network for super-column feature extraction to obtain a feature set;

s3, inputting the feature set into a preset generation network to obtain a prediction background image and a prediction reflection image; wherein the joint loss function of the generation network comprises a reconstruction loss function, a countermeasure loss function and a separation loss function of the supercolumn feature space;

s4, inputting the prediction background image and the background image into a preset identification network to calculate and obtain an identification loss function of the identification network;

s5, completing training of the generation network and the identification network through multiple iterative computations until the joint loss function and the identification loss function are converged;

and S6, selecting a plurality of reflection images to perform reflection removing treatment so as to quantitatively evaluate the reflection removing effect.

Specifically, a background image and a corresponding reflection image are obtained through manual shooting, and a reflection image is obtained according to superposition of the background image and the reflection image. Because the non-reflective original image is difficult to obtain in real life, the background image is made in an artificial mode, and the method comprises the following steps: background images select images of the room, with the target object placed on one side of the transparent glass (preferably the dark side) and the taking lens on the other side of the glass. Then fixing the position of the object and the lens, and shooting the image to obtain a background image without reflection. The reflected image may be selected as an outdoor image. And after obtaining the background image and the reflection image, setting the two images to be the same size H W3, and then overlapping the images to obtain the reflection image. The final data set contained 2000 reflected images and their corresponding background and reflected images.

And inputting the light reflection image into a pre-trained VGG-19 network for super-column feature extraction to obtain a feature set. The supercolumn features have a total of 1472 dimensions, and then the three channels of the retroreflective image are connected with the supercolumn features to form a 1475-dimensional feature set, which is denoted as Φ (x), where x is the input retroreflective image.

Inputting the feature set into a preset generation network to obtain a prediction background image and a prediction reflection image; the joint loss function of the generated network comprises a reconstruction loss function, a countermeasure loss function and a separation loss function of a supercolumn feature space;

and inputting the predicted background image and the background image into a preset identification network to calculate and obtain an identification loss function of the identification network. The authentication network is introduced to judge the two input images and output the probability that the two images are derived from the data set.

Training the generation network and the identification network is completed through repeated iterative calculation until the joint loss function and the identification loss function are converged; during the training process, the output probability of the identification network can influence the joint loss function of the generation network so as to optimize the generation network. In general, when the output probability of the discrimination network is 0.5, it means that both functions converge. Preferably, the training parameters are: max _ epoch is 250, batch _ size is 1; the optimization mode is Adam optimization algorithm, and the learning rate is 10^-4。

In order to evaluate the advantages and disadvantages of the method, a plurality of reflection images are selected for reflection removing treatment, so that the reflection removing effect is quantitatively evaluated.

To more clearly understand the implementation process of the method of the present invention, refer to fig. 2, which is a simplified flow chart of the method for removing reflection from a single image according to the embodiment of the present invention.

According to the single-image reflection removing method provided by the embodiment 1 of the invention, the high-level sensory information of the image is effectively extracted by means of the deep convolutional neural network, and the optimization characteristic of the generated countermeasure network is combined, so that the prediction background image which is closer to the real background image can be obtained, the image reflection problem in image acquisition is effectively solved, and the reflection removing effect of the reflection image with high contrast is still satisfactory.

As an improvement of the above scheme, obtaining a reflection image according to superposition of the background image and the reflection image specifically includes:

acquiring a first gray value of the background image;

acquiring a second gray value of the reflection image;

Therefore, the reflection image can be obtained by acquiring a first gray value of the background image and a second gray value of the reflection image, and performing weighted calculation on the first gray value and the second gray value, and expressed by a mathematical expression of I ═ 1- α) + α ×, wherein I is the reflection image, T is the background image, R is the reflection image, α is a weighting parameter corresponding to the reflection image, α∈ [0,1], and preferably α is 0.5.

As an improvement of the scheme, the convolutional layer of the VGG-19 network comprises conv1_2, conv2_2, conv3_2, conv4_2 and conv5_ 2.

Specifically, the convolutional layers of the VGG-19 network include conv1_2, conv2_2, conv3_2, conv4_2, and conv5_ 2. The VGG-19 network is pre-trained on ImageNet for extracting supercolumn features of the input image, the use of supercolumn features has the advantage that the input adds useful features that abstract the visual perception of large datasets (e.g., ImageNet). The supercolumn feature for a given pixel location is the stack of activated cells on the network-selected convolutional layer for that location.

As an improvement of the scheme, the generating network comprises an input layer with convolution kernel of 1 × 1 and 8 cavity convolution layers with convolution kernel of 3 × 3, wherein the last cavity convolution layer generates two three-channel RGB images by utilizing linear transformation.

Specifically, referring to fig. 3, it is a schematic structural diagram of a generation network according to the embodiment of the present invention, as shown in fig. 3, the generation network includes an input layer with a convolution kernel of 1 × 1 and 8 hole convolution layers with a convolution kernel of 3 × 3, where the last hole convolution layer generates two three-channel RGB maps by using linear transformation, the input layer can reduce the 1475-dimensional features output by the VGG-19 network to 64-dimensional, the expansion ratios of the 8 hole convolution layers range from 1 to 128, and the number of feature layers of all intermediate layers of the generation network is 64.

As an improvement of the above scheme, the joint loss function of the generation network includes a reconstruction loss function, a countermeasure loss function, and a separation loss function of the supercolumn feature space, and specifically includes:

of said supercolumn feature spaceThe reconstruction loss function is expressed as

Wherein, L_feat(θ) is a reconstruction loss function of the supercolumn feature space, Φ_lI, T and f for the l-th convolutional layer of the VGG-19 network_T(I;. theta.) are the reflected light image, the background image and the predicted background image, respectively, lambda;.)_lThe impact weight of the convolution layer of the l layer, omega, the set of image data of training, | · |. the luminance₁The vector representing the result of the convolution of the neural network takes the 1-norm, i.e. the sum of the absolute values of the elements of the vector, phi_l(x) Representing convolution operation of the first layer convolution layer of the VGG-19 network, and theta represents a generated network parameter;

the expression of the penalty function is

the separation loss function is expressed as

Wherein, L_excl(theta) is the separation loss function,

λ_Tand λ_RRespectively a first standardized parameter and a second standardized parameter, | · | | non-woven phosphor_F⊙ represents element multiplication, N is image down-sampling parameter, N is more than or equal to 1 and less than or equal to N, N is maximum value of image down-sampling parameter, f is Robenius norm_R(I; theta) are the predicted reflection maps respectively,

to predict the norm of the gradient of the background image,

is a modulus of the gradient of the predicted reflectance image;

Specifically, a reconstruction loss function of the supercolumn Feature space, also called Feature reconstruction loss, is used to measure a distance between the prediction background image generated by the generation network and the background image T in the supercolumn space. Typically, the distance of the predicted image from the target image is calculated at the selected VGG-19 network layer. The reconstruction loss function of the super-column feature space is expressed as

Wherein, L_feat(θ) is the reconstruction loss function for the supercolumn feature space, I, T and f_T(I, theta) are respectively a reflection image, a background image and a prediction background image, lambda_lThe impact weight of the convolution layer of the l layer, omega, the set of image data of training, | · |. the luminance₁The vector representing the result of the convolution of the neural network takes the 1-norm, i.e. the sum of the absolute values of the elements of the vector, phi_l(x) Represents the convolution operation of the first layer convolution layer of the VGG-19 network, and theta represents the generated network parameter.

The function of the penalty function is to generate a predicted background map f_T(I; theta) is more different from the reflected light image I. The penalty function is expressed as

Wherein, L_advAnd (theta) is a resistance loss function, and D (I, x) represents the probability that x is a background image corresponding to the reflection image I and is obtained from the output of the identification network.

The separation loss function, also called exception loss, is designed according to the rule that the reflected light image is observed and found at the edge of the image. The edges of the background layer and the reflection layer are generally not overlapped by observing the two layers of the reflection image. The edges in the reflected image I can only be generated by the background image or the reflected image, but cannot be caused by the superposition of the two. Therefore, the invention proposes to minimize the gradient spatial correlation between the reflection layer and the background layer obtained by generating network prediction, and to calculate the image edge correlation as a separation loss function by considering the normalized gradient information calculated on a plurality of resolutions of the two layers.

The separation loss function is expressed as

Wherein, L_excl(theta) is a function of the separation loss,

λ_Tand λ_RRespectively a first standardized parameter and a second standardized parameter, | · | | non-woven phosphor_FIs a Robenius norm, ⊙ represents element multiplication, N is an image down-sampling parameter, N is more than or equal to 1 and less than or equal to N, N is the maximum value of the image down-sampling parameter, f_R(I; theta) are respectively predicted reflection maps,

to predict the norm of the gradient of the background image,

is a modulus of the gradient of the predicted reflectance image; f. of_TAnd f_RAll pass through 2^n-1Bilinear interpolation downsampling. Preferably, the number of atoms N-3,

the joint loss function of the generated network is L (theta) ═ w₁L_feat(θ)+w₂L_adv(θ)+w₃L_excl(theta) where L (theta) is a joint loss function, w₁、w₂And w₃Reconstruction loss function and counterdamage respectively of supercolumn characteristic spaceAnd coefficients corresponding to the loss function and the separation loss function are used for balancing the influence capability of each loss function on the generated network. Preferably, w₁＝20，w₂＝100，w₃＝1。

As an improvement of the scheme, the authentication loss function of the authentication network is L_disc(θ)＝log D(I；f_T(I; theta)) -log D (I, T), wherein, L_disc(θ) is the discrimination loss function.

It is noted that the construction process of the discrimination network is to first combine the predicted background map and the background image input to the discrimination network by channels to obtain a stacked input image, if the sizes of the predicted background map and the background image are both C × W × H, where C is the number of channels of the image, and W and H are the width and height of the image, respectively, then after channel combination, the dimension of the stacked image obtained will be 2C × W × H. after the stacked input image obtained, it will be passed through a plurality of cascaded down-sampling units, the processing of these down-sampling units will make the input stacked image into a gradually decreasing feature map, these down-sampling units are composed of a convolutional layer with convolution step size of 2, a batch normalization layer and a non-linear activation layer, the convolutional layer with step size of 2 will reduce the size of the input image to one half of the original size, which plays a role of down-sampling, the batch normalization layer obtains a normalized data with normalized mean value of 0, a normalized data with a normalized value of 0, a linear convergence function of stabilizing and accelerating the model convergence, and the linear probability value of the input image obtained by normalizing the input image obtained by the linear probability value of a linear probability map of a linear regression unit, wherein the linear probability value of the present invention is represented by a linear regression unit, C358, the linear probability value of a linear regression graph obtained by normalizing step 364, the linear probability value of a linear regression graph obtained by normalizing step 2, the linear regression graph, the linear probability value of the present invention is represented by a linear regression graph obtained by a linear regression graph after the linear regression graph obtained by a linear regression unit, the linear probability value of a linear regression graph obtained by a linear regression graph after the present invention, the present invention is represented by a linear probability value of a linear regression graph obtained by a linear probability value of a linear regression graph obtained by normalizing step of a.

Specifically, the discrimination loss function of the discrimination network is L_disc(θ)＝log D(I；f_T(I; theta)) -log D (I, T), wherein, L_disc(theta) is a discrimination loss function, and D (I, x) represents the probability that x is a background image corresponding to the reflection image I, i.e., D (I; f)_T(I; theta)) represents the prediction background map f_T(I; θ) a probability of being derived from the background image in the data set, and D (I, T) represents a probability of the background image T being derived from the background image in the data set.

As an improvement of the above scheme, selecting a plurality of reflection images for de-reflection processing to quantitatively evaluate the de-reflection effect specifically includes:

Specifically, a plurality of reflection images are selected for reflection removing processing, and the peak signal-to-noise ratio and the structural similarity between the prediction background image generated by the generated network and the background image are calculated to quantitatively evaluate the reflection removing effect. The Peak signal-to-noise ratio is also called Peak signal-to-noise ratio, which is abbreviated as PSNR. Structural similarity, also known as Structural similarity index, is abbreviated SSIM. Referring to fig. 5, it is a diagram of the anti-reflection contrast effect of 4 sets of reflection images, background images, predicted background images and reflection images provided by this embodiment of the present invention. Referring to table 1, a quantitative evaluation table of evaluation indexes of the de-reflection effect corresponding to fig. 5 is shown. As can be seen from fig. 5 and table 1, the method of the present invention has a good effect on single image de-reflection.

TABLE 1 quantitative evaluation table for evaluation index of image reflection-removing effect

Reflective/background images	PSNR/SSIM	Background image/predicted background image	PSNR/SSIM
				First reflection image/first background image	15.93/0.54	First background image/first predicted background image	23.85/0.82
Second reflection image/second background image	14.70/0.53	Second background image/second predicted background image	25.40/0.87
				Third reflected light image/third background image	14.45/0.54	Third background image/third predicted background map	23.86/0.83
Fourth reflected light image/fourth background image	15.52/0.58	Fourth background image/fourth predicted background map	22.74/0.79
				Mean value of	15.15/0.55	Mean value of	23.96/0.83

Referring to fig. 6, a schematic structural diagram of a single image de-reflection apparatus according to an embodiment of the present invention is shown, where the apparatus includes:

the image set acquisition module 11 is configured to acquire a background image and a corresponding reflection image through manual shooting, and obtain a reflection image according to superposition of the background image and the reflection image;

the feature extraction module 12 is configured to input the reflection image into a pre-trained VGG-19 network to perform supercolumn feature extraction, so as to obtain a feature set;

the prediction generation module 13 is configured to input the feature set into a preset generation network to obtain a prediction background image and a prediction reflection image; wherein the joint loss function of the generation network comprises a reconstruction loss function, a countermeasure loss function and a separation loss function of the supercolumn feature space;

the identification module 14 is configured to input the prediction background image and the background image into a preset identification network to calculate an identification loss function of the identification network;

a training module 15, configured to complete training of the generation network and the discrimination network by performing multiple iterative computations until both the joint loss function and the discrimination loss function converge;

and the evaluation module 16 is used for selecting a plurality of reflection images to perform reflection removing treatment so as to quantitatively evaluate the reflection removing effect.

The single image de-reflection device provided in the embodiment of the present invention can implement all the processes of the single image de-reflection method described in any one of the embodiments, and the functions and technical effects of the modules and units in the device are respectively the same as those of the single image de-reflection method described in the embodiment, and are not described herein again.

Referring to fig. 7, the apparatus for using a single image de-reflection method according to an embodiment of the present invention includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, where the processor 10 implements the single image de-reflection method according to any of the above embodiments when executing the computer program.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 20 and executed by the processor 10 to implement the present invention. One or more modules/units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of a computer program in a single image de-reflection method. For example, the computer program may be divided into an image set acquisition module, a feature extraction module, a prediction generation module, a discrimination module, a training module, and an evaluation module, each of which functions specifically as follows:

The device using the single image reflection removing method can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The device using the single image de-reflection method can include, but is not limited to, a processor and a memory. It will be understood by those skilled in the art that the schematic diagram 7 is merely an example of an apparatus using the single image de-reflection method, and does not constitute a limitation of the apparatus using the single image de-reflection method, and may include more or less components than those shown, or combine some components, or different components, for example, the apparatus using the single image de-reflection method may further include an input-output device, a network access device, a bus, etc.

The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 10 may be any conventional processor or the like, the processor 10 being the control center of the apparatus using the single image de-reflection method, and various interfaces and lines connecting the various parts of the entire apparatus using the single image de-reflection method.

The memory 20 may be used to store the computer programs and/or modules, and the processor 10 implements various functions of the apparatus using the method of single image de-reflection by operating or executing the computer programs and/or modules stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to program use, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the device integrated module using the single image de-reflection method can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the method when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for single-image de-reflection described in any of the above embodiments.

To sum up, the method, the device and the storage medium for single image de-reflection provided by the embodiment of the invention regard the reflection separation task as the separation and evaluation task of the image layer, the convolution layer of the generated network uses the void convolution to increase the visual field without losing the detail characteristics, and the loss function fully considers the high-level characteristics and the image gradient characteristics of the image and the difference between the prediction background image and the reflection image; the high-level features are obtained through a VGG-19 network and can abstract the visual perception of a data set; the identification network designs a loss function according to the difference between the predicted background image and the input reflective image, so that the predicted background image and the background image are more similar; finally, the invention has good effect of removing the reflection of the image, and particularly has satisfactory effect of removing the reflection of the reflection image with high contrast.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method of de-imaging a single image, comprising the steps of:

2. The method for removing the reflected light from the single image according to claim 1, wherein obtaining the reflected light image according to the superposition of the background image and the reflected image specifically comprises:

acquiring a first gray value of the background image;

acquiring a second gray value of the reflection image;

3. The method of single image de-reflection according to claim 1, wherein the convolution layer of the VGG-19 network comprises conv1_2, conv2_2, conv3_2, conv4_2 and conv5_ 2.

4. The method of claim 1, wherein the generating network comprises an input layer with convolution kernel of 1 × 1 and 8 hole convolution layers with convolution kernel of 3 × 3, wherein the last hole convolution layer generates two three-channel RGB images by linear transformation.

5. The method for single image de-reflection according to claim 1, wherein the joint loss function of the generation network comprises a reconstruction loss function, a countermeasure loss function and a separation loss function of the supercolumn feature space, and specifically comprises:

Wherein, L_feat(θ) is a reconstruction loss function of the supercolumn feature space, I, T and f_T(I;. theta.) are the reflected light image, the background image and the predicted background image, respectively, lambda;.)_lIs the first layer of the convolution layerOmega is the image data set of training, | · | | luminance₁The vector representing the result of the convolution of the neural network takes the 1-norm, i.e. the sum of the absolute values of the elements of the vector, phi_l(x) Representing convolution operation of the first layer convolution layer of the VGG-19 network, and theta represents a generated network parameter;

the expression of the penalty function is

the separation loss function is expressed as

Wherein, L_excl(theta) is the separation loss function,

to predict the norm of the gradient of the background image,

is a modulus of the gradient of the predicted reflectance image;

the joint loss function of the generated network is L (theta) w₁L_feat(θ)+w₂L_adv(θ)+w₃L_excl(theta), where L (theta) is the joint loss function, w₁、w₂And w₃Reconstruction of the supercolumn feature spaces, respectivelyCoefficients corresponding to the loss function, the countering loss function and the separating loss function.

6. The method of single image de-reflection according to claim 5, wherein the discrimination loss function of said discrimination network is L_disc(θ)＝logD(I；f_T(I; theta)) -logD (I, T), wherein, L_disc(θ) is the discrimination loss function.

7. The method for single image de-reflection according to claim 1, wherein the selecting a plurality of reflection images for de-reflection to quantitatively evaluate the de-reflection effect comprises:

8. A single image de-retroreflecting apparatus comprising:

9. An apparatus using a method of single image de-reflection, comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor when executing the computer program implementing the method of single image de-reflection as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of single image de-reflection according to any one of claims 1 to 7.