CN111563577B - Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification - Google Patents

Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification Download PDF

Info

Publication number
CN111563577B
CN111563577B CN202010319106.3A CN202010319106A CN111563577B CN 111563577 B CN111563577 B CN 111563577B CN 202010319106 A CN202010319106 A CN 202010319106A CN 111563577 B CN111563577 B CN 111563577B
Authority
CN
China
Prior art keywords
image
map
layer
generator
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010319106.3A
Other languages
Chinese (zh)
Other versions
CN111563577A (en
Inventor
蒋晓悦
方阳
王鼎
李煜祥
冯晓毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010319106.3A priority Critical patent/CN111563577B/en
Publication of CN111563577A publication Critical patent/CN111563577A/en
Application granted granted Critical
Publication of CN111563577B publication Critical patent/CN111563577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an intrinsic image decomposition method based on frequency division of a jump layer and multi-scale identification of Unet, which constructs a generation countermeasure network based on Unet, wherein the network consists of a generator and an identifier, the generator is used for decomposing an image into a reflection diagram and a light diagram, and the identifier is used for judging whether the image is true or false and guiding the generator to generate the image which is false or not. The network designed by the invention effectively relieves the problem caused by directly transmitting the characteristics of the encoder to the decoder. On one hand, the constraint of frequency decomposition is added in the jump connection of the reflection graph Unet, so that the network can learn the importance degree of different characteristics to obtain more suitable characteristics. On the other hand, by adding frequency decomposition and frequency compression to the hopping connection of the illumination map, not only can a more appropriate characteristic map be obtained, but also the problem of more high-frequency components in the illumination map is solved.

Description

Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification
Technical Field
The invention belongs to the field of image processing, and particularly relates to an intrinsic image decomposition method.
Background
Image recognition applications have emerged in various parts of life, such as face recognition, target tracking, unmanned driving, and the like. However, in the imaging process, due to the influence of many environmental factors such as the illumination intensity, the incident angle of light, and the shadow shielding, the imaging effect may be poor, so that the image recognition is difficult and the accuracy is reduced. One way to solve this problem is to extract features that do not change with environmental factors, i.e., intrinsic images, from multi-modal images. The intrinsic image is the inherent characteristics of the object independent of the environmental factors, including color, texture, material and the like, the inherent characteristics are not changed along with the change of the environmental factors, and if the intrinsic information of the object such as color, texture, material and the like can be separated from the environment from the image and the image part influenced by the environment is filtered, more accurate characteristic description of the object can be obtained. The intrinsic image decomposition is to extract the inherent features of an image, and the intrinsic image decomposition is to decompose the image into a reflection image with textures, colors and materials and a light map with shape information and illumination information. The reflection map is not changed along with the environment, and the decomposed reflection map can be used as the input of other image understanding tasks, so that the difficulty of image analysis is greatly reduced, and the image understanding has the robustness of unchanged illumination.
The intrinsic image decomposition is mainly divided into two types according to the algorithm type, the first type is the intrinsic image decomposition based on Retinex theory, and the second type is the intrinsic image decomposition problem based on deep learning.
Intrinsic image decomposition based on Retinex theory decomposes an image according to local gradient changes of the image, and it is considered that large gradient changes are generally caused by different materials on the surface of an object, namely, different colors caused by material reflection, so that the gradient changes greatly, and small gradient changes are generally caused by illumination. This theory assumes that the illumination is slowly varying and uniform, with no abrupt changes. However, in reality, the illumination is suddenly changed due to problems such as shading, and the large gradient change of the illumination is caused, so that the Retinex theory is not applicable.
The intrinsic image decomposition method based on deep learning basically solves the above problems, but does not pay attention to the frequency property of the intrinsic image when designing the network structure. On one hand, the characteristic diagrams of the encoder reaching the decoding end of the reflection diagram through the jump connection are not well combined, and the influence of some high-frequency characteristics on the reflection diagram is larger. On the other hand, the high frequency components in the illumination map obtained after the image decomposition are reduced, but the encoder directly transmits the high frequency components through the jump connection, so that more high frequency noise exists on the image illumination map.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an intrinsic image decomposition method based on skip layer frequency division and multi-scale discrimination of Unet, a generation countermeasure network based on Unet is constructed, the network consists of a generator and a discriminator, the generator is used for decomposing an image into a reflection map and a light map, and the discriminator is used for judging whether the image is true or false and guiding the generator to generate the image which is false or false. The network designed by the invention effectively relieves the problem caused by directly transmitting the characteristics of the encoder to the decoder. On one hand, the constraint of frequency decomposition is added in the jump connection of the reflection graph Unet, so that the network can learn the importance degree of different characteristics to obtain more suitable characteristics. On the other hand, by adding frequency decomposition and frequency compression to the hopping connection of the illumination map, not only can a more appropriate characteristic map be obtained, but also the problem of more high-frequency components in the illumination map is solved.
In order to achieve the purpose, the invention provides an intrinsic image decomposition method based on Unet hopping layer frequency division and multi-scale discrimination, which comprises the following steps:
step 1: constructing a training image sample library
Randomly extracting B images from a test image data set, randomly sampling M small images with the size of N x N from each image, horizontally overturning the M small images to obtain new M small images, and obtaining 2 x M small images from each image; all the 2M B small images obtained from the extracted B images form a training image sample library;
step 2: structure generator
Step 2-1: structural reflection diagram generator
In the Unet network, adding a frequency decomposition submodule into each hop layer, wherein the input of the frequency decomposition submodule is a characteristic diagram output by an Unet network encoder, the output of the frequency decomposition submodule is a new characteristic diagram after frequency decomposition, and the new characteristic diagram is input to a decoder of the Unet network; the Unet network at this time is a constructed reflection map generator, and an image is input in the reflection map generator and is output as a reflection map of the input image;
the frequency decomposition sub-module performs the following frequency decomposition process:
defining the characteristic diagram of the i layer of the Unet network encoder as
Figure GDA0003443894110000021
c represents the number of channels, h represents the height of the feature map, w represents the width of the feature map, and the feature map is subjected to global maximum pooling:
Figure GDA0003443894110000022
in the formula (1)
Figure GDA0003443894110000023
The Global max pooling represents a feature map obtained after the Global max pooling operation;
the result obtained by the formula (1) is then passed through a full contact layer FC1And the number of channels of the compressed feature map is as follows:
Figure GDA0003443894110000031
in the formula (2)
Figure GDA0003443894110000032
To pass through full connection layer FC1Obtaining a characteristic diagram;
and then the result obtained by the formula (2) passes through a Relu activation function layer:
Figure GDA0003443894110000033
in the formula (3)
Figure GDA0003443894110000034
The characteristic diagram is obtained after the Relu activation function layer is passed;
the result obtained by the formula (3) is passed through a full connection layer FC2And restoring the initial channel number of the feature diagram:
Figure GDA0003443894110000035
in the formula (4)
Figure GDA0003443894110000036
To pass through full connection layer FC2Obtaining a characteristic diagram;
then, the result obtained by the formula (4) is processed by a sigmoid activation function layer to obtain a normalized weight parameter:
Figure GDA0003443894110000037
finally, normalizing the weight parameters
Figure GDA0003443894110000038
And ith layer feature map
Figure GDA0003443894110000039
Multiplying to obtain a new characteristic diagram after frequency decomposition:
Figure GDA00034438941100000310
step 2-2: structured illumination pattern generator
In the Unet network, adding a frequency decomposition submodule and a channel compression submodule to each hop layer; the input of the frequency decomposition submodule is a characteristic diagram output by the Unet network encoder, the frequency decomposition submodule carries out frequency decomposition on the characteristic diagram according to formulas (1) to (6) in the step 2-1 and outputs a new characteristic diagram to the channel compression submodule; the channel compression submodule of each jump layer sets different compression ratios according to the position of the layer at the Unet network encoder to perform channel compression and outputs a final characteristic diagram to a decoder of the Unet network; the Unet network at this time is a constructed illumination map generator, and an image is input in the illumination map generator and output as an illumination map of the input image;
and step 3: structure discriminator
The discriminator consists of four layers of convolutional neural networks; when the reflection map generator or the illumination map generator is trained, the reflection map or the illumination map output by the reflection map generator or the illumination map generator is input into the discriminator, the discriminator compares the input reflection map or the illumination map with the label image, and the probability that the reflection map or the illumination map is consistent with the label image is output;
the reflection map generator or the illumination map generator is respectively combined with the discriminator to train the reflection map generator or the illumination map generator;
and 4, step 4: defining a loss function
Step 4-1: define the generator loss function as:
LG=LGAN-G+Lmse+Lcos+Lbf+Lfeat (7)
wherein L isGAN-GRepresenting the inherent loss function, LmseRepresenting the mean square error function, LcosRepresenting the cosine loss function, LbfRepresenting a cross-bilateral filtering loss function, LfeatRepresenting a characteristic loss function;
intrinsic loss function LGAN-GThe calculation formula of (a) is as follows:
Figure GDA0003443894110000041
in the formula WiExpressing the normalization weight parameter of the i-th network, i expressing the network layer number, fake _ outputiThe probability that the output image is false is shown, ones is shown as 1, and x is shown as the number of network layers;
mean square error function LmseThe calculation formula of (a) is as follows:
Figure GDA0003443894110000042
in the formula, fake _ imageiOutput, true _ image, representing the characteristic map of the i-last layer of the decoderjAn image tag representing scaling an input image by j times;
cosine loss function LcosThe calculation formula of (a) is as follows:
Figure GDA0003443894110000043
in the formula, fake _ regionkThe kth block region, true _ region, representing the generated imagekA k-th block area representing a label image, y representing the number of image areas;
cross bilateral filter loss function LbfThe calculation formula of (a) is as follows:
Figure GDA0003443894110000044
Figure GDA0003443894110000045
Figure GDA0003443894110000046
wherein bf represents double sideband filtering, C represents label image, { A, S } represents reflection map and illumination map set, JpRepresenting the output of the bilateral filter, CpValue, N, representing the p-th pixel of the label imagepDenotes the total number of p pixels and neighboring pixels, WpDenotes the normalized weight, q denotes the sequence number of the neighboring pixel of p, n (p) denotes the set of neighboring pixel positions of the p-th pixel,
Figure GDA0003443894110000051
representing a spatial gaussian kernel, p represents the serial number of the p-th pixel,
Figure GDA0003443894110000052
denotes the range Gaussian kernel, CqRepresents the value of the neighboring pixel q;
characteristic loss function LfeatThe calculation formula of (a) is as follows:
Figure GDA0003443894110000053
where l denotes the l-th layer of the network, FlNumber of channels, H, representing the characteristic diagram of the l-th layerlDenotes the height of the ith layer profile, W denotes the width of the ith layer profile,
Figure GDA0003443894110000054
a feature activation value representing the ith layer of the image,
Figure GDA0003443894110000055
a representation generator output image;
step 4-2: the discriminator loss function is defined as follows:
Figure GDA0003443894110000056
where zeros represents a probability of 0,
Figure GDA0003443894110000057
representing the probability that the output image is true;
and 5: network training
Respectively training a combination of a reflection map generator and a discriminator and a combination of a light map generator and the discriminator by using the training image sample library constructed in the step 1, updating network parameters by adopting an Adam optimization method, and stopping training when the loss function value defined in the step 4 is minimum to obtain a final reflection map generator and a final light map generator;
step 6: and inputting the original image to be processed into the step 5 to obtain a reflection map generator or an illumination map generator, wherein the output image is a reflection map or an illumination map obtained by decomposing the original image.
The invention has the beneficial effects that: due to the adoption of the intrinsic image decomposition method based on the frequency division of the jump layer and the multi-scale identification of the Unet, the problem of the equality of each feature of a feature map generated by the native jump connection and the problem of high-frequency noise introduced to a light map decoder are solved.
Drawings
FIG. 1 is a flow chart of the intrinsic image decomposition method of the present invention.
Fig. 2 is a schematic diagram of the frequency decomposition submodule structure of the present invention.
FIG. 3 is a schematic diagram of a reflection map generator network architecture in accordance with the present invention.
Fig. 4 is a schematic diagram of a network architecture of a light pattern generator according to the present invention.
Fig. 5 is a schematic diagram of the network structure of the discriminator of the present invention.
FIG. 6 is an illustration of the results of the method of the present invention, wherein FIG. 6(a) is the original image, FIG. 6(b) is the reflectance map, and FIG. 6(c) is the illumination map.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, the present invention provides an eigen-image decomposition method based on the frequency division of skip layer and multi-scale discrimination of the Unet, which comprises the following steps:
step 1: constructing a training image sample library
Randomly extracting B images from a test image data set, randomly sampling M small images with the size of N x N from each image, horizontally overturning the M small images to obtain new M small images, and obtaining 2 x M small images from each image; all the 2M B small images obtained from the extracted B images form a training image sample library;
step 2: structure generator
Step 2-1: structural reflection diagram generator
In the Unet network, adding a frequency decomposition submodule into each hop layer, wherein the input of the frequency decomposition submodule is a characteristic diagram output by an Unet network encoder, the output of the frequency decomposition submodule is a new characteristic diagram after frequency decomposition, and the new characteristic diagram is input to a decoder of the Unet network; the Unet network at this time is a constructed reflection map generator, and an image is input in the reflection map generator and is output as a reflection map of the input image;
the frequency decomposition sub-module performs the following frequency decomposition process:
defining the characteristic diagram of the i layer of the Unet network encoder as
Figure GDA0003443894110000061
c represents the number of channels, h represents the height of the feature map, w represents the width of the feature map, and the feature map is subjected to global maximum pooling:
Figure GDA0003443894110000062
in the formula (1)
Figure GDA0003443894110000063
The Global max pooling represents a feature map obtained after the Global max pooling operation;
the result obtained by the formula (1) is then passed through a full contact layer FC1And the number of channels of the compressed feature map is as follows:
Figure GDA0003443894110000064
in the formula (2)
Figure GDA0003443894110000065
To pass through full connection layer FC1Obtaining a characteristic diagram;
and then the result obtained by the formula (2) passes through a Relu activation function layer:
Figure GDA0003443894110000066
in the formula (3)
Figure GDA0003443894110000071
The characteristic diagram is obtained after the Relu activation function layer is passed;
the result obtained by the formula (3) is passed through a full connection layer FC2And restoring the initial channel number of the feature diagram:
Figure GDA0003443894110000072
in the formula (4)
Figure GDA0003443894110000073
To pass through full connection layer FC2Obtaining a characteristic diagram;
then, the result obtained by the formula (4) is processed by a sigmoid activation function layer to obtain a normalized weight parameter:
Figure GDA0003443894110000074
finally, normalizing the weight parameters
Figure GDA0003443894110000075
And ith layer feature map
Figure GDA0003443894110000076
Multiplying to obtain a new characteristic diagram after frequency decomposition:
Figure GDA0003443894110000077
step 2-2: structured illumination pattern generator
In the Unet network, adding a frequency decomposition submodule and a channel compression submodule to each hop layer; the input of the frequency decomposition submodule is a characteristic diagram output by the Unet network encoder, the frequency decomposition submodule carries out frequency decomposition on the characteristic diagram according to formulas (1) to (6) in the step 2-1 and outputs a new characteristic diagram to the channel compression submodule; the channel compression submodule of each jump layer sets different compression ratios according to the position of the layer at the Unet network encoder to perform channel compression and outputs a final characteristic diagram to a decoder of the Unet network; the Unet network at this time is a constructed illumination map generator, and an image is input in the illumination map generator and output as an illumination map of the input image;
and step 3: structure discriminator
The discriminator consists of four layers of convolutional neural networks; when the reflection map generator or the illumination map generator is trained, the reflection map or the illumination map output by the reflection map generator or the illumination map generator is input into the discriminator, the discriminator compares the input reflection map or the illumination map with the label image, and the probability that the reflection map or the illumination map is consistent with the label image is output;
the reflection map generator or the illumination map generator is respectively combined with the discriminator to train the reflection map generator or the illumination map generator;
and 4, step 4: defining a loss function
Step 4-1: define the generator loss function as:
LG=LGAN-G+Lmse+Lcos+Lbf+Lfeat (7)
wherein L isGAN-GRepresenting the inherent loss function, LmseRepresenting the mean square error function, LcosRepresenting the cosine loss function, LbfRepresenting a cross-bilateral filtering loss function, LfeatRepresenting a characteristic loss function;
intrinsic loss function LGAN-GThe calculation formula of (a) is as follows:
Figure GDA0003443894110000081
in the formula WiExpressing the normalization weight parameter of the i-th network, i expressing the network layer number, fake _ outputiThe probability that the output image is false is shown, ones is shown as 1, and x is shown as the number of network layers;
mean square error function LmseThe calculation formula of (a) is as follows:
Figure GDA0003443894110000082
in the formula, fake _ imageiOutput, true _ image, representing the characteristic map of the i-last layer of the decoderjAn image tag representing scaling an input image by j times;
cosine loss function LcosThe calculation formula of (a) is as follows:
Figure GDA0003443894110000083
in the formula, fake _ regionkThe kth block region, true _ region, representing the generated imagekA k-th block area representing a label image, y representing the number of image areas;
cross bilateral filter loss function LbfThe calculation formula of (a) is as follows:
Figure GDA0003443894110000084
Figure GDA0003443894110000085
Figure GDA0003443894110000086
wherein bf represents double sideband filtering, C represents label image, { A, S } represents reflection map and illumination map set, JpRepresenting the output of the bilateral filter, CpValue, N, representing the p-th pixel of the label imagepDenotes the total number of p pixels and neighboring pixels, WpDenotes the normalized weight, q denotes the sequence number of the neighboring pixel of p, n (p) denotes the set of neighboring pixel positions of the p-th pixel,
Figure GDA0003443894110000087
representing a spatial gaussian kernel, p represents the serial number of the p-th pixel,
Figure GDA0003443894110000088
denotes the range Gaussian kernel, CqRepresents the value of the neighboring pixel q;
characteristic loss function LfeatThe calculation formula of (a) is as follows:
Figure GDA0003443894110000089
where l denotes the l-th layer of the network, FlNumber of channels, H, representing the characteristic diagram of the l-th layerlDenotes the height of the ith layer profile, W denotes the width of the ith layer profile,
Figure GDA0003443894110000091
a feature activation value representing the ith layer of the image,
Figure GDA0003443894110000092
a representation generator output image;
step 4-2: the discriminator loss function is defined as follows:
Figure GDA0003443894110000093
where zeros represents a probability of 0,
Figure GDA0003443894110000094
representing the probability that the output image is true;
and 5: network training
Respectively training a combination of a reflection map generator and a discriminator and a combination of a light map generator and the discriminator by using the training image sample library constructed in the step 1, updating network parameters by adopting an Adam optimization method, and stopping training when the loss function value defined in the step 4 is minimum to obtain a final reflection map generator and a final light map generator;
step 6: and inputting the original image to be processed into the step 5 to obtain a reflection map generator or an illumination map generator, wherein the output image is a reflection map or an illumination map obtained by decomposing the original image.
Example (b):
(1) constructing a training image sample library
By adopting the MPI image data set, the scenes which are commonly used in the MPI data set have 9 major categories, two minor categories are arranged under each major category, and each minor category has 50 pictures. There are two segmentation modes in constructing the training image sample library, one is based on image-split mode, and the other is scene-split mode.
In the image-split mode, of 18 subclasses of a picture data set, each subclass extracts half of images, each image size is 1024x436, 10 small images with the size of 256x256 are randomly sampled in the images, and then the small images are horizontally flipped, so that each image obtains 20 small images. The training data set totaled 9000(18x25x20) small images of size 256x256, and the test data set used a total of 450(18x25) large images of size 1024x 436.
In the Scene-split mode, one subclass is taken for training in each major class, the other subclass is used for testing, two defective subclasses, namely "bandwidth 1" and "shamman 3", are removed, the same method is used for acquiring small images in the image-split mode, and 9000(9x50x20) small images with the size of 256x256 in the training data set and 350(7x50) large images with the size of 1024x436 in the testing data set are obtained.
(2) As shown in fig. 2 to 5, the reflection pattern generator and the light pattern generator are constructed by the method of step 2, and the used Unet network is a four-layer network. The encoder of the reflection map generator Unet network takes a convolution layer, a batch normalization layer and a LeakyRelu activation function layer as down-sampling blocks, the step length of the convolution layer is 2, and the size of a characteristic map is halved after each convolution operation. The output hopping connections of each active function layer in the encoder are fed into a frequency decomposition submodule, and the channel variation of the encoder is [3,32,64,128,256 ]. The output of the frequency decomposition submodule is fed to a decoder, the step size of the decoder convolutional layer is 1.
The channel compression submodule of the illumination map generator is performed through the convolution layer, the step length of the convolution layer is 1, the size of the characteristic map is not changed, and the number of channels is compressed in different proportions. The high-frequency components in the illumination map are few, and the compression ratio is large; the low frequency component is more, and the compression ratio is small.
(3) The discriminator is a four-layer convolutional neural network, the convolutional layer has the step size of 2, the channel variation of each convolutional layer is reduced by half after passing through one convolutional layer, the channel variation of the four convolutional layers is respectively 3 to 64, 64 to 128, 128 to 256 and 256 to 512, and the output of each convolutional layer is compressed into a single-channel characteristic probability map. When the discriminator determines true, all the single-channel feature probability maps are close to 1, and when the discriminator determines false, the single-channel feature probability map is close to 0.
(4) And (4) calculating a generator loss function according to the step 4, wherein the weights of the first layer and the last layer of the generator network in the inherent loss are set to be 4, and the weights of the middle two layers are set to be 1.
When the mean square error is calculated, the feature maps of the reciprocal 3 layers of the decoder are taken to generate complete, half and quarter original maps respectively, three different scales are constrained, and the weights of the three scales are 1, 0.8 and 0.6 respectively.
In order to better maintain the edge characteristics when the cosine loss is calculated, the edges of the generated image and the label image are kept consistent, the input image is divided into 4 blocks, and the cosine similarity of each block and the corresponding label block is ensured to be consistent.
When calculating the discriminator loss function, the weights of the first layer and the last layer are 4, and the weights of the two middle layers are 1.
(5) And training by using samples of a training image sample library, respectively using different generators and discriminators for a reflection map and an illumination map, and separately training by using network models of the illumination map and the reflection map which are consistent. The network is optimized by adopting an Adam optimization method, different Adam optimizers are needed for a generator and a discriminator, optimizer parameters beta are set to be (0.5,0.999), the learning rate is 0.0005, weight _ decay is 0.0001, and the batch size is 20. The generator and the discriminator employ alternating training (TTUR), the number of trains of the discriminator being 5 to 1 compared to the number of trains of the generator.
(6) As shown in fig. 6, the original image to be processed is input into the reflection map generator or the illumination map generator, respectively, and the output image is the reflection map or the illumination map obtained by decomposing the original image.
To quantitatively evaluate the performance of the method of the invention, tests were performed on the MPI Intrinsic Image dataset and compared with the algorithm provided in the document Fan, et al, "reviewing Intrinsic Image demographics." Computer Vision (CVPR),2018IEEE International Conference on. IEEE,2018, the comparison results are shown in table 1 (bold represents the optimal index values).
TABLE 1 Performance indices of several intrinsic image decomposition methods
Figure GDA0003443894110000111
As can be seen from Table 1, the method of the invention achieves the optimal performance on MSE, LMSE and DSIM, and is improved by a lot compared with the existing method in index, thereby fully illustrating the effectiveness and the practicability of the method of the invention.

Claims (1)

1. An intrinsic image decomposition method based on Unet skip layer frequency division and multi-scale discrimination is characterized by comprising the following steps:
step 1: constructing a training image sample library
Randomly extracting B images from a test image data set, randomly sampling M small images with the size of N x N from each image, horizontally overturning the M small images to obtain new M small images, and obtaining 2 x M small images from each image; all the 2M B small images obtained from the extracted B images form a training image sample library;
step 2: structure generator
Step 2-1: structural reflection diagram generator
In the Unet network, adding a frequency decomposition submodule into each hop layer, wherein the input of the frequency decomposition submodule is a characteristic diagram output by an Unet network encoder, the output of the frequency decomposition submodule is a new characteristic diagram after frequency decomposition, and the new characteristic diagram is input to a decoder of the Unet network; the Unet network at this time is a constructed reflection map generator, and an image is input in the reflection map generator and is output as a reflection map of the input image;
the frequency decomposition sub-module performs the following frequency decomposition process:
defining the characteristic diagram of the i layer of the Unet network encoder as
Figure FDA0003443894100000011
c represents the number of channels, h represents the height of the feature map, w represents the width of the feature map, and the feature map is subjected to global maximum pooling:
Figure FDA0003443894100000012
in the formula (1)
Figure FDA0003443894100000013
The Global max pooling represents a feature map obtained after the Global max pooling operation;
then the junction obtained by the formula (1)The fruit is passed through a full junction layer FC1And the number of channels of the compressed feature map is as follows:
Figure FDA0003443894100000014
in the formula (2)
Figure FDA0003443894100000015
To pass through full connection layer FC1Obtaining a characteristic diagram;
and then the result obtained by the formula (2) passes through a Relu activation function layer:
Figure FDA0003443894100000016
in the formula (3)
Figure FDA0003443894100000017
The characteristic diagram is obtained after the Relu activation function layer is passed;
the result obtained by the formula (3) is passed through a full connection layer FC2And restoring the initial channel number of the feature diagram:
Figure FDA0003443894100000018
in the formula (4)
Figure FDA0003443894100000019
To pass through full connection layer FC2Obtaining a characteristic diagram;
then, the result obtained by the formula (4) is processed by a sigmoid activation function layer to obtain a normalized weight parameter:
Figure FDA0003443894100000021
finally, normalizing the weightWeight parameter
Figure FDA0003443894100000022
And ith layer feature map
Figure FDA0003443894100000023
Multiplying to obtain a new characteristic diagram after frequency decomposition:
Figure FDA0003443894100000024
step 2-2: structured illumination pattern generator
In the Unet network, adding a frequency decomposition submodule and a channel compression submodule to each hop layer; the input of the frequency decomposition submodule is a characteristic diagram output by the Unet network encoder, the frequency decomposition submodule carries out frequency decomposition on the characteristic diagram according to formulas (1) to (6) in the step 2-1 and outputs a new characteristic diagram to the channel compression submodule; the channel compression submodule of each jump layer sets different compression ratios according to the position of the layer at the Unet network encoder to perform channel compression and outputs a final characteristic diagram to a decoder of the Unet network; the Unet network at this time is a constructed illumination map generator, and an image is input in the illumination map generator and output as an illumination map of the input image;
and step 3: structure discriminator
The discriminator consists of four layers of convolutional neural networks; when the reflection map generator or the illumination map generator is trained, the reflection map or the illumination map output by the reflection map generator or the illumination map generator is input into the discriminator, the discriminator compares the input reflection map or the illumination map with the label image, and the probability that the reflection map or the illumination map is consistent with the label image is output;
the reflection map generator or the illumination map generator is respectively combined with the discriminator to train the reflection map generator or the illumination map generator;
and 4, step 4: defining a loss function
Step 4-1: define the generator loss function as:
LG=LGAN-G+Lmse+Lcos+Lbf+Lfeat (7)
wherein L isGAN-GRepresenting the inherent loss function, LmseRepresenting the mean square error function, LcosRepresenting the cosine loss function, LbfRepresenting a cross-bilateral filtering loss function, LfeatRepresenting a characteristic loss function;
intrinsic loss function LGAN-GThe calculation formula of (a) is as follows:
Figure FDA0003443894100000025
in the formula WiExpressing the normalization weight parameter of the i-th network, i expressing the network layer number, fake _ outputiThe probability that the output image is false is shown, ones is shown as 1, and x is shown as the number of network layers;
mean square error function LmseThe calculation formula of (a) is as follows:
Figure FDA0003443894100000031
in the formula, fake _ imageiOutput, true _ image, representing the characteristic map of the i-last layer of the decoderjAn image tag representing scaling an input image by j times;
cosine loss function LcosThe calculation formula of (a) is as follows:
Figure FDA0003443894100000032
in the formula, fake _ regionkThe kth block region, true _ region, representing the generated imagekA k-th block area representing a label image, y representing the number of image areas;
cross bilateral filter loss function LbfThe calculation formula of (a) is as follows:
Figure FDA0003443894100000033
Figure FDA0003443894100000034
Figure FDA0003443894100000035
wherein bf represents double sideband filtering, C represents label image, { A, S } represents reflection map and illumination map set, JpRepresenting the output of the bilateral filter, CpValue, N, representing the p-th pixel of the label imagepDenotes the total number of p pixels and neighboring pixels, WpDenotes the normalized weight, q denotes the sequence number of the neighboring pixel of p, n (p) denotes the set of neighboring pixel positions of the p-th pixel,
Figure FDA0003443894100000036
representing a spatial gaussian kernel, p represents the serial number of the p-th pixel,
Figure FDA0003443894100000037
denotes the range Gaussian kernel, CqRepresents the value of the neighboring pixel q;
characteristic loss function LfeatThe calculation formula of (a) is as follows:
Figure FDA0003443894100000038
where l denotes the l-th layer of the network, FlNumber of channels, H, representing the characteristic diagram of the l-th layerlDenotes the height of the ith layer profile, W denotes the width of the ith layer profile,
Figure FDA0003443894100000039
a feature activation value representing the ith layer of the image,
Figure FDA00034438941000000310
a representation generator output image;
step 4-2: the discriminator loss function is defined as follows:
Figure FDA0003443894100000041
where zeros represents a probability of 0,
Figure FDA0003443894100000042
representing the probability that the output image is true;
and 5: network training
Respectively training a combination of a reflection map generator and a discriminator and a combination of a light map generator and the discriminator by using the training image sample library constructed in the step 1, updating network parameters by adopting an Adam optimization method, and stopping training when the loss function value defined in the step 4 is minimum to obtain a final reflection map generator and a final light map generator;
step 6: and inputting the original image to be processed into the step 5 to obtain a reflection map generator or an illumination map generator, wherein the output image is a reflection map or an illumination map obtained by decomposing the original image.
CN202010319106.3A 2020-04-21 2020-04-21 Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification Active CN111563577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010319106.3A CN111563577B (en) 2020-04-21 2020-04-21 Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010319106.3A CN111563577B (en) 2020-04-21 2020-04-21 Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification

Publications (2)

Publication Number Publication Date
CN111563577A CN111563577A (en) 2020-08-21
CN111563577B true CN111563577B (en) 2022-03-11

Family

ID=72071688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010319106.3A Active CN111563577B (en) 2020-04-21 2020-04-21 Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification

Country Status (1)

Country Link
CN (1) CN111563577B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150432A (en) * 2020-09-22 2020-12-29 电子科技大学 Optical excitation infrared nondestructive testing method based on generation countermeasure network
CN113034353B (en) * 2021-04-09 2024-07-12 西安建筑科技大学 Intrinsic image decomposition method and system based on cross convolution neural network
CN113573047B (en) * 2021-07-16 2022-07-01 北京理工大学 Video quality evaluation method based on eigen-map decomposition and motion estimation

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794389A (en) * 2009-12-30 2010-08-04 中国科学院计算技术研究所 Illumination pretreatment method of facial image
CN104598933A (en) * 2014-11-13 2015-05-06 上海交通大学 Multi-feature fusion based image copying detection method
CN105956995A (en) * 2016-04-19 2016-09-21 浙江大学 Face appearance editing method based on real-time video proper decomposition
CN106156503A (en) * 2016-07-05 2016-11-23 中国矿业大学 A kind of multi-scale entropy characterizing method of anchor system internal flaw distribution
CN106157264A (en) * 2016-06-30 2016-11-23 北京大学 Large area image uneven illumination bearing calibration based on empirical mode decomposition
CN108171741A (en) * 2017-12-22 2018-06-15 河南科技大学 A kind of image texture decomposition method based on adaptive multidirectional empirical mode decomposition
CN108805188A (en) * 2018-05-29 2018-11-13 徐州工程学院 A kind of feature based recalibration generates the image classification method of confrontation network
CN109249546A (en) * 2017-07-13 2019-01-22 长春工业大学 A kind of vibration rotary cutting apparatus and its Identification of Chatter method in place
CN110018517A (en) * 2019-05-07 2019-07-16 西安石油大学 A kind of multiple dimensioned ground micro-seismic inverse time interference localization method
CN110148083A (en) * 2019-05-17 2019-08-20 东南大学 Image interfusion method based on fast B EMD and deep learning
CN110503614A (en) * 2019-08-20 2019-11-26 东北大学 A kind of Magnetic Resonance Image Denoising based on sparse dictionary study
CN110675381A (en) * 2019-09-24 2020-01-10 西北工业大学 Intrinsic image decomposition method based on serial structure network
CN110728633A (en) * 2019-09-06 2020-01-24 上海交通大学 Multi-exposure high-dynamic-range inverse tone mapping model construction method and device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794389A (en) * 2009-12-30 2010-08-04 中国科学院计算技术研究所 Illumination pretreatment method of facial image
CN104598933A (en) * 2014-11-13 2015-05-06 上海交通大学 Multi-feature fusion based image copying detection method
CN105956995A (en) * 2016-04-19 2016-09-21 浙江大学 Face appearance editing method based on real-time video proper decomposition
CN106157264A (en) * 2016-06-30 2016-11-23 北京大学 Large area image uneven illumination bearing calibration based on empirical mode decomposition
CN106156503A (en) * 2016-07-05 2016-11-23 中国矿业大学 A kind of multi-scale entropy characterizing method of anchor system internal flaw distribution
CN109249546A (en) * 2017-07-13 2019-01-22 长春工业大学 A kind of vibration rotary cutting apparatus and its Identification of Chatter method in place
CN108171741A (en) * 2017-12-22 2018-06-15 河南科技大学 A kind of image texture decomposition method based on adaptive multidirectional empirical mode decomposition
CN108805188A (en) * 2018-05-29 2018-11-13 徐州工程学院 A kind of feature based recalibration generates the image classification method of confrontation network
CN110018517A (en) * 2019-05-07 2019-07-16 西安石油大学 A kind of multiple dimensioned ground micro-seismic inverse time interference localization method
CN110148083A (en) * 2019-05-17 2019-08-20 东南大学 Image interfusion method based on fast B EMD and deep learning
CN110503614A (en) * 2019-08-20 2019-11-26 东北大学 A kind of Magnetic Resonance Image Denoising based on sparse dictionary study
CN110728633A (en) * 2019-09-06 2020-01-24 上海交通大学 Multi-exposure high-dynamic-range inverse tone mapping model construction method and device
CN110675381A (en) * 2019-09-24 2020-01-10 西北工业大学 Intrinsic image decomposition method based on serial structure network

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Intrinsic Image Decomposition for Image Enhancement;V.S. ASWATHY等;《2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI)》;20181203;20-23 *
Intrinsic Image Decomposition Using Multi-Scale Measurements and Sparsity;Shouhong Ding等;《COMPUTER GRAPHICS》;20171231;第36卷(第6期);251-261 *
Intrinsic Image Decomposition: A Comprehensive Review;Yupeng Ma等;《Image and Graphics》;20171230;626-638 *
Intrinsic Image Transformation via Scale Space Decomposition;Lechao Cheng等;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20181217;656-665 *
人脸表情图像识别关键技术的分析与研究;卢洋;《中国博士学位论文全文数据库 信息科技辑》;20200215;第2020年卷(第2期);I138-75 *
基于多尺度分析的图像融合算法研究;殷向;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115;第2019年卷(第1期);I138-3382 *
基于样本和深度学习的图像修复方法研究;强振平;《中国博士学位论文全文数据库 信息科技辑》;20200115;第2020年卷(第1期);I138-121 *
基于稀疏表示与鉴别分析算法的人脸图像分类研究;刘梓;《中国博士学位论文全文数据库 信息科技辑》;20180715;第2018年卷(第7期);I138-41 *

Also Published As

Publication number Publication date
CN111563577A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
Xie et al. Hyperspectral image super-resolution using deep feature matrix factorization
CN111563577B (en) Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification
Zhang et al. Regions of interest detection in panchromatic remote sensing images based on multiscale feature fusion
Xie et al. Deep convolutional networks with residual learning for accurate spectral-spatial denoising
Espinal et al. Wavelet-based fractal signature analysis for automatic target recognition
Starovoitov et al. Texture periodicity detection: Features, properties, and comparisons
Sahu et al. Trends and prospects of techniques for haze removal from degraded images: A survey
CN114863223B (en) Hyperspectral weak supervision classification method combining denoising autoencoder and scene enhancement
Zhang et al. Unleashing the power of self-supervised image denoising: A comprehensive review
CN114266957A (en) Hyperspectral image super-resolution restoration method based on multi-degradation mode data augmentation
CN114170418A (en) Automobile wire harness connector multi-feature fusion image retrieval method by searching images through images
Dumka et al. Advanced digital image processing and its applications in big data
Hussain et al. Image denoising to enhance character recognition using deep learning
CN107133579A (en) Based on CSGF (2D)2The face identification method of PCANet convolutional networks
Zhu et al. Rggid: A robust and green gan-fake image detector
CN117876793A (en) Hyperspectral image tree classification method and device
Wu et al. Review of imaging device identification based on machine learning
CN113902013A (en) Hyperspectral classification method based on three-dimensional convolutional neural network and superpixel segmentation
CN114463379A (en) Dynamic capturing method and device for video key points
Zhang et al. Infrared Small Target Detection Based on Four-Direction Overlapping Group Sparse Total Variation.
Shao et al. No-Reference image quality assessment based on edge pattern feature in the spatial domain
Gunawan et al. Classification of Japanese fagaceae wood based on microscopic image analysis
CN117612020B (en) SGAN-based detection method for resisting change of remote sensing image element of neural network
Maheswarreddy et al. Region of interest extraction based on hybrid salient detection for remote sensing image
Foucher et al. Global semantic classification of scenes using ridgelet transform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant