CN111563577B - Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification - Google Patents
Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification Download PDFInfo
- Publication number
- CN111563577B CN111563577B CN202010319106.3A CN202010319106A CN111563577B CN 111563577 B CN111563577 B CN 111563577B CN 202010319106 A CN202010319106 A CN 202010319106A CN 111563577 B CN111563577 B CN 111563577B
- Authority
- CN
- China
- Prior art keywords
- image
- map
- layer
- generator
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000005286 illumination Methods 0.000 claims abstract description 62
- 238000010586 diagram Methods 0.000 claims abstract description 54
- 230000006835 compression Effects 0.000 claims abstract description 20
- 238000007906 compression Methods 0.000 claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 13
- 238000011176 pooling Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 7
- 230000002146 bilateral effect Effects 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 230000006870 function Effects 0.000 description 38
- 239000000463 material Substances 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 4
- 239000003086 colorant Substances 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an intrinsic image decomposition method based on frequency division of a jump layer and multi-scale identification of Unet, which constructs a generation countermeasure network based on Unet, wherein the network consists of a generator and an identifier, the generator is used for decomposing an image into a reflection diagram and a light diagram, and the identifier is used for judging whether the image is true or false and guiding the generator to generate the image which is false or not. The network designed by the invention effectively relieves the problem caused by directly transmitting the characteristics of the encoder to the decoder. On one hand, the constraint of frequency decomposition is added in the jump connection of the reflection graph Unet, so that the network can learn the importance degree of different characteristics to obtain more suitable characteristics. On the other hand, by adding frequency decomposition and frequency compression to the hopping connection of the illumination map, not only can a more appropriate characteristic map be obtained, but also the problem of more high-frequency components in the illumination map is solved.
Description
Technical Field
The invention belongs to the field of image processing, and particularly relates to an intrinsic image decomposition method.
Background
Image recognition applications have emerged in various parts of life, such as face recognition, target tracking, unmanned driving, and the like. However, in the imaging process, due to the influence of many environmental factors such as the illumination intensity, the incident angle of light, and the shadow shielding, the imaging effect may be poor, so that the image recognition is difficult and the accuracy is reduced. One way to solve this problem is to extract features that do not change with environmental factors, i.e., intrinsic images, from multi-modal images. The intrinsic image is the inherent characteristics of the object independent of the environmental factors, including color, texture, material and the like, the inherent characteristics are not changed along with the change of the environmental factors, and if the intrinsic information of the object such as color, texture, material and the like can be separated from the environment from the image and the image part influenced by the environment is filtered, more accurate characteristic description of the object can be obtained. The intrinsic image decomposition is to extract the inherent features of an image, and the intrinsic image decomposition is to decompose the image into a reflection image with textures, colors and materials and a light map with shape information and illumination information. The reflection map is not changed along with the environment, and the decomposed reflection map can be used as the input of other image understanding tasks, so that the difficulty of image analysis is greatly reduced, and the image understanding has the robustness of unchanged illumination.
The intrinsic image decomposition is mainly divided into two types according to the algorithm type, the first type is the intrinsic image decomposition based on Retinex theory, and the second type is the intrinsic image decomposition problem based on deep learning.
Intrinsic image decomposition based on Retinex theory decomposes an image according to local gradient changes of the image, and it is considered that large gradient changes are generally caused by different materials on the surface of an object, namely, different colors caused by material reflection, so that the gradient changes greatly, and small gradient changes are generally caused by illumination. This theory assumes that the illumination is slowly varying and uniform, with no abrupt changes. However, in reality, the illumination is suddenly changed due to problems such as shading, and the large gradient change of the illumination is caused, so that the Retinex theory is not applicable.
The intrinsic image decomposition method based on deep learning basically solves the above problems, but does not pay attention to the frequency property of the intrinsic image when designing the network structure. On one hand, the characteristic diagrams of the encoder reaching the decoding end of the reflection diagram through the jump connection are not well combined, and the influence of some high-frequency characteristics on the reflection diagram is larger. On the other hand, the high frequency components in the illumination map obtained after the image decomposition are reduced, but the encoder directly transmits the high frequency components through the jump connection, so that more high frequency noise exists on the image illumination map.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an intrinsic image decomposition method based on skip layer frequency division and multi-scale discrimination of Unet, a generation countermeasure network based on Unet is constructed, the network consists of a generator and a discriminator, the generator is used for decomposing an image into a reflection map and a light map, and the discriminator is used for judging whether the image is true or false and guiding the generator to generate the image which is false or false. The network designed by the invention effectively relieves the problem caused by directly transmitting the characteristics of the encoder to the decoder. On one hand, the constraint of frequency decomposition is added in the jump connection of the reflection graph Unet, so that the network can learn the importance degree of different characteristics to obtain more suitable characteristics. On the other hand, by adding frequency decomposition and frequency compression to the hopping connection of the illumination map, not only can a more appropriate characteristic map be obtained, but also the problem of more high-frequency components in the illumination map is solved.
In order to achieve the purpose, the invention provides an intrinsic image decomposition method based on Unet hopping layer frequency division and multi-scale discrimination, which comprises the following steps:
step 1: constructing a training image sample library
Randomly extracting B images from a test image data set, randomly sampling M small images with the size of N x N from each image, horizontally overturning the M small images to obtain new M small images, and obtaining 2 x M small images from each image; all the 2M B small images obtained from the extracted B images form a training image sample library;
step 2: structure generator
Step 2-1: structural reflection diagram generator
In the Unet network, adding a frequency decomposition submodule into each hop layer, wherein the input of the frequency decomposition submodule is a characteristic diagram output by an Unet network encoder, the output of the frequency decomposition submodule is a new characteristic diagram after frequency decomposition, and the new characteristic diagram is input to a decoder of the Unet network; the Unet network at this time is a constructed reflection map generator, and an image is input in the reflection map generator and is output as a reflection map of the input image;
the frequency decomposition sub-module performs the following frequency decomposition process:
defining the characteristic diagram of the i layer of the Unet network encoder asc represents the number of channels, h represents the height of the feature map, w represents the width of the feature map, and the feature map is subjected to global maximum pooling:
in the formula (1)The Global max pooling represents a feature map obtained after the Global max pooling operation;
the result obtained by the formula (1) is then passed through a full contact layer FC1And the number of channels of the compressed feature map is as follows:
and then the result obtained by the formula (2) passes through a Relu activation function layer:
in the formula (3)The characteristic diagram is obtained after the Relu activation function layer is passed;
the result obtained by the formula (3) is passed through a full connection layer FC2And restoring the initial channel number of the feature diagram:
then, the result obtained by the formula (4) is processed by a sigmoid activation function layer to obtain a normalized weight parameter:
finally, normalizing the weight parametersAnd ith layer feature mapMultiplying to obtain a new characteristic diagram after frequency decomposition:
step 2-2: structured illumination pattern generator
In the Unet network, adding a frequency decomposition submodule and a channel compression submodule to each hop layer; the input of the frequency decomposition submodule is a characteristic diagram output by the Unet network encoder, the frequency decomposition submodule carries out frequency decomposition on the characteristic diagram according to formulas (1) to (6) in the step 2-1 and outputs a new characteristic diagram to the channel compression submodule; the channel compression submodule of each jump layer sets different compression ratios according to the position of the layer at the Unet network encoder to perform channel compression and outputs a final characteristic diagram to a decoder of the Unet network; the Unet network at this time is a constructed illumination map generator, and an image is input in the illumination map generator and output as an illumination map of the input image;
and step 3: structure discriminator
The discriminator consists of four layers of convolutional neural networks; when the reflection map generator or the illumination map generator is trained, the reflection map or the illumination map output by the reflection map generator or the illumination map generator is input into the discriminator, the discriminator compares the input reflection map or the illumination map with the label image, and the probability that the reflection map or the illumination map is consistent with the label image is output;
the reflection map generator or the illumination map generator is respectively combined with the discriminator to train the reflection map generator or the illumination map generator;
and 4, step 4: defining a loss function
Step 4-1: define the generator loss function as:
LG=LGAN-G+Lmse+Lcos+Lbf+Lfeat (7)
wherein L isGAN-GRepresenting the inherent loss function, LmseRepresenting the mean square error function, LcosRepresenting the cosine loss function, LbfRepresenting a cross-bilateral filtering loss function, LfeatRepresenting a characteristic loss function;
intrinsic loss function LGAN-GThe calculation formula of (a) is as follows:
in the formula WiExpressing the normalization weight parameter of the i-th network, i expressing the network layer number, fake _ outputiThe probability that the output image is false is shown, ones is shown as 1, and x is shown as the number of network layers;
mean square error function LmseThe calculation formula of (a) is as follows:
in the formula, fake _ imageiOutput, true _ image, representing the characteristic map of the i-last layer of the decoderjAn image tag representing scaling an input image by j times;
cosine loss function LcosThe calculation formula of (a) is as follows:
in the formula, fake _ regionkThe kth block region, true _ region, representing the generated imagekA k-th block area representing a label image, y representing the number of image areas;
cross bilateral filter loss function LbfThe calculation formula of (a) is as follows:
wherein bf represents double sideband filtering, C represents label image, { A, S } represents reflection map and illumination map set, JpRepresenting the output of the bilateral filter, CpValue, N, representing the p-th pixel of the label imagepDenotes the total number of p pixels and neighboring pixels, WpDenotes the normalized weight, q denotes the sequence number of the neighboring pixel of p, n (p) denotes the set of neighboring pixel positions of the p-th pixel,representing a spatial gaussian kernel, p represents the serial number of the p-th pixel,denotes the range Gaussian kernel, CqRepresents the value of the neighboring pixel q;
characteristic loss function LfeatThe calculation formula of (a) is as follows:
where l denotes the l-th layer of the network, FlNumber of channels, H, representing the characteristic diagram of the l-th layerlDenotes the height of the ith layer profile, W denotes the width of the ith layer profile,a feature activation value representing the ith layer of the image,a representation generator output image;
step 4-2: the discriminator loss function is defined as follows:
where zeros represents a probability of 0,representing the probability that the output image is true;
and 5: network training
Respectively training a combination of a reflection map generator and a discriminator and a combination of a light map generator and the discriminator by using the training image sample library constructed in the step 1, updating network parameters by adopting an Adam optimization method, and stopping training when the loss function value defined in the step 4 is minimum to obtain a final reflection map generator and a final light map generator;
step 6: and inputting the original image to be processed into the step 5 to obtain a reflection map generator or an illumination map generator, wherein the output image is a reflection map or an illumination map obtained by decomposing the original image.
The invention has the beneficial effects that: due to the adoption of the intrinsic image decomposition method based on the frequency division of the jump layer and the multi-scale identification of the Unet, the problem of the equality of each feature of a feature map generated by the native jump connection and the problem of high-frequency noise introduced to a light map decoder are solved.
Drawings
FIG. 1 is a flow chart of the intrinsic image decomposition method of the present invention.
Fig. 2 is a schematic diagram of the frequency decomposition submodule structure of the present invention.
FIG. 3 is a schematic diagram of a reflection map generator network architecture in accordance with the present invention.
Fig. 4 is a schematic diagram of a network architecture of a light pattern generator according to the present invention.
Fig. 5 is a schematic diagram of the network structure of the discriminator of the present invention.
FIG. 6 is an illustration of the results of the method of the present invention, wherein FIG. 6(a) is the original image, FIG. 6(b) is the reflectance map, and FIG. 6(c) is the illumination map.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, the present invention provides an eigen-image decomposition method based on the frequency division of skip layer and multi-scale discrimination of the Unet, which comprises the following steps:
step 1: constructing a training image sample library
Randomly extracting B images from a test image data set, randomly sampling M small images with the size of N x N from each image, horizontally overturning the M small images to obtain new M small images, and obtaining 2 x M small images from each image; all the 2M B small images obtained from the extracted B images form a training image sample library;
step 2: structure generator
Step 2-1: structural reflection diagram generator
In the Unet network, adding a frequency decomposition submodule into each hop layer, wherein the input of the frequency decomposition submodule is a characteristic diagram output by an Unet network encoder, the output of the frequency decomposition submodule is a new characteristic diagram after frequency decomposition, and the new characteristic diagram is input to a decoder of the Unet network; the Unet network at this time is a constructed reflection map generator, and an image is input in the reflection map generator and is output as a reflection map of the input image;
the frequency decomposition sub-module performs the following frequency decomposition process:
defining the characteristic diagram of the i layer of the Unet network encoder asc represents the number of channels, h represents the height of the feature map, w represents the width of the feature map, and the feature map is subjected to global maximum pooling:
in the formula (1)The Global max pooling represents a feature map obtained after the Global max pooling operation;
the result obtained by the formula (1) is then passed through a full contact layer FC1And the number of channels of the compressed feature map is as follows:
and then the result obtained by the formula (2) passes through a Relu activation function layer:
in the formula (3)The characteristic diagram is obtained after the Relu activation function layer is passed;
the result obtained by the formula (3) is passed through a full connection layer FC2And restoring the initial channel number of the feature diagram:
then, the result obtained by the formula (4) is processed by a sigmoid activation function layer to obtain a normalized weight parameter:
finally, normalizing the weight parametersAnd ith layer feature mapMultiplying to obtain a new characteristic diagram after frequency decomposition:
step 2-2: structured illumination pattern generator
In the Unet network, adding a frequency decomposition submodule and a channel compression submodule to each hop layer; the input of the frequency decomposition submodule is a characteristic diagram output by the Unet network encoder, the frequency decomposition submodule carries out frequency decomposition on the characteristic diagram according to formulas (1) to (6) in the step 2-1 and outputs a new characteristic diagram to the channel compression submodule; the channel compression submodule of each jump layer sets different compression ratios according to the position of the layer at the Unet network encoder to perform channel compression and outputs a final characteristic diagram to a decoder of the Unet network; the Unet network at this time is a constructed illumination map generator, and an image is input in the illumination map generator and output as an illumination map of the input image;
and step 3: structure discriminator
The discriminator consists of four layers of convolutional neural networks; when the reflection map generator or the illumination map generator is trained, the reflection map or the illumination map output by the reflection map generator or the illumination map generator is input into the discriminator, the discriminator compares the input reflection map or the illumination map with the label image, and the probability that the reflection map or the illumination map is consistent with the label image is output;
the reflection map generator or the illumination map generator is respectively combined with the discriminator to train the reflection map generator or the illumination map generator;
and 4, step 4: defining a loss function
Step 4-1: define the generator loss function as:
LG=LGAN-G+Lmse+Lcos+Lbf+Lfeat (7)
wherein L isGAN-GRepresenting the inherent loss function, LmseRepresenting the mean square error function, LcosRepresenting the cosine loss function, LbfRepresenting a cross-bilateral filtering loss function, LfeatRepresenting a characteristic loss function;
intrinsic loss function LGAN-GThe calculation formula of (a) is as follows:
in the formula WiExpressing the normalization weight parameter of the i-th network, i expressing the network layer number, fake _ outputiThe probability that the output image is false is shown, ones is shown as 1, and x is shown as the number of network layers;
mean square error function LmseThe calculation formula of (a) is as follows:
in the formula, fake _ imageiOutput, true _ image, representing the characteristic map of the i-last layer of the decoderjAn image tag representing scaling an input image by j times;
cosine loss function LcosThe calculation formula of (a) is as follows:
in the formula, fake _ regionkThe kth block region, true _ region, representing the generated imagekA k-th block area representing a label image, y representing the number of image areas;
cross bilateral filter loss function LbfThe calculation formula of (a) is as follows:
wherein bf represents double sideband filtering, C represents label image, { A, S } represents reflection map and illumination map set, JpRepresenting the output of the bilateral filter, CpValue, N, representing the p-th pixel of the label imagepDenotes the total number of p pixels and neighboring pixels, WpDenotes the normalized weight, q denotes the sequence number of the neighboring pixel of p, n (p) denotes the set of neighboring pixel positions of the p-th pixel,representing a spatial gaussian kernel, p represents the serial number of the p-th pixel,denotes the range Gaussian kernel, CqRepresents the value of the neighboring pixel q;
characteristic loss function LfeatThe calculation formula of (a) is as follows:
where l denotes the l-th layer of the network, FlNumber of channels, H, representing the characteristic diagram of the l-th layerlDenotes the height of the ith layer profile, W denotes the width of the ith layer profile,a feature activation value representing the ith layer of the image,a representation generator output image;
step 4-2: the discriminator loss function is defined as follows:
where zeros represents a probability of 0,representing the probability that the output image is true;
and 5: network training
Respectively training a combination of a reflection map generator and a discriminator and a combination of a light map generator and the discriminator by using the training image sample library constructed in the step 1, updating network parameters by adopting an Adam optimization method, and stopping training when the loss function value defined in the step 4 is minimum to obtain a final reflection map generator and a final light map generator;
step 6: and inputting the original image to be processed into the step 5 to obtain a reflection map generator or an illumination map generator, wherein the output image is a reflection map or an illumination map obtained by decomposing the original image.
Example (b):
(1) constructing a training image sample library
By adopting the MPI image data set, the scenes which are commonly used in the MPI data set have 9 major categories, two minor categories are arranged under each major category, and each minor category has 50 pictures. There are two segmentation modes in constructing the training image sample library, one is based on image-split mode, and the other is scene-split mode.
In the image-split mode, of 18 subclasses of a picture data set, each subclass extracts half of images, each image size is 1024x436, 10 small images with the size of 256x256 are randomly sampled in the images, and then the small images are horizontally flipped, so that each image obtains 20 small images. The training data set totaled 9000(18x25x20) small images of size 256x256, and the test data set used a total of 450(18x25) large images of size 1024x 436.
In the Scene-split mode, one subclass is taken for training in each major class, the other subclass is used for testing, two defective subclasses, namely "bandwidth 1" and "shamman 3", are removed, the same method is used for acquiring small images in the image-split mode, and 9000(9x50x20) small images with the size of 256x256 in the training data set and 350(7x50) large images with the size of 1024x436 in the testing data set are obtained.
(2) As shown in fig. 2 to 5, the reflection pattern generator and the light pattern generator are constructed by the method of step 2, and the used Unet network is a four-layer network. The encoder of the reflection map generator Unet network takes a convolution layer, a batch normalization layer and a LeakyRelu activation function layer as down-sampling blocks, the step length of the convolution layer is 2, and the size of a characteristic map is halved after each convolution operation. The output hopping connections of each active function layer in the encoder are fed into a frequency decomposition submodule, and the channel variation of the encoder is [3,32,64,128,256 ]. The output of the frequency decomposition submodule is fed to a decoder, the step size of the decoder convolutional layer is 1.
The channel compression submodule of the illumination map generator is performed through the convolution layer, the step length of the convolution layer is 1, the size of the characteristic map is not changed, and the number of channels is compressed in different proportions. The high-frequency components in the illumination map are few, and the compression ratio is large; the low frequency component is more, and the compression ratio is small.
(3) The discriminator is a four-layer convolutional neural network, the convolutional layer has the step size of 2, the channel variation of each convolutional layer is reduced by half after passing through one convolutional layer, the channel variation of the four convolutional layers is respectively 3 to 64, 64 to 128, 128 to 256 and 256 to 512, and the output of each convolutional layer is compressed into a single-channel characteristic probability map. When the discriminator determines true, all the single-channel feature probability maps are close to 1, and when the discriminator determines false, the single-channel feature probability map is close to 0.
(4) And (4) calculating a generator loss function according to the step 4, wherein the weights of the first layer and the last layer of the generator network in the inherent loss are set to be 4, and the weights of the middle two layers are set to be 1.
When the mean square error is calculated, the feature maps of the reciprocal 3 layers of the decoder are taken to generate complete, half and quarter original maps respectively, three different scales are constrained, and the weights of the three scales are 1, 0.8 and 0.6 respectively.
In order to better maintain the edge characteristics when the cosine loss is calculated, the edges of the generated image and the label image are kept consistent, the input image is divided into 4 blocks, and the cosine similarity of each block and the corresponding label block is ensured to be consistent.
When calculating the discriminator loss function, the weights of the first layer and the last layer are 4, and the weights of the two middle layers are 1.
(5) And training by using samples of a training image sample library, respectively using different generators and discriminators for a reflection map and an illumination map, and separately training by using network models of the illumination map and the reflection map which are consistent. The network is optimized by adopting an Adam optimization method, different Adam optimizers are needed for a generator and a discriminator, optimizer parameters beta are set to be (0.5,0.999), the learning rate is 0.0005, weight _ decay is 0.0001, and the batch size is 20. The generator and the discriminator employ alternating training (TTUR), the number of trains of the discriminator being 5 to 1 compared to the number of trains of the generator.
(6) As shown in fig. 6, the original image to be processed is input into the reflection map generator or the illumination map generator, respectively, and the output image is the reflection map or the illumination map obtained by decomposing the original image.
To quantitatively evaluate the performance of the method of the invention, tests were performed on the MPI Intrinsic Image dataset and compared with the algorithm provided in the document Fan, et al, "reviewing Intrinsic Image demographics." Computer Vision (CVPR),2018IEEE International Conference on. IEEE,2018, the comparison results are shown in table 1 (bold represents the optimal index values).
TABLE 1 Performance indices of several intrinsic image decomposition methods
As can be seen from Table 1, the method of the invention achieves the optimal performance on MSE, LMSE and DSIM, and is improved by a lot compared with the existing method in index, thereby fully illustrating the effectiveness and the practicability of the method of the invention.
Claims (1)
1. An intrinsic image decomposition method based on Unet skip layer frequency division and multi-scale discrimination is characterized by comprising the following steps:
step 1: constructing a training image sample library
Randomly extracting B images from a test image data set, randomly sampling M small images with the size of N x N from each image, horizontally overturning the M small images to obtain new M small images, and obtaining 2 x M small images from each image; all the 2M B small images obtained from the extracted B images form a training image sample library;
step 2: structure generator
Step 2-1: structural reflection diagram generator
In the Unet network, adding a frequency decomposition submodule into each hop layer, wherein the input of the frequency decomposition submodule is a characteristic diagram output by an Unet network encoder, the output of the frequency decomposition submodule is a new characteristic diagram after frequency decomposition, and the new characteristic diagram is input to a decoder of the Unet network; the Unet network at this time is a constructed reflection map generator, and an image is input in the reflection map generator and is output as a reflection map of the input image;
the frequency decomposition sub-module performs the following frequency decomposition process:
defining the characteristic diagram of the i layer of the Unet network encoder asc represents the number of channels, h represents the height of the feature map, w represents the width of the feature map, and the feature map is subjected to global maximum pooling:
in the formula (1)The Global max pooling represents a feature map obtained after the Global max pooling operation;
then the junction obtained by the formula (1)The fruit is passed through a full junction layer FC1And the number of channels of the compressed feature map is as follows:
and then the result obtained by the formula (2) passes through a Relu activation function layer:
in the formula (3)The characteristic diagram is obtained after the Relu activation function layer is passed;
the result obtained by the formula (3) is passed through a full connection layer FC2And restoring the initial channel number of the feature diagram:
then, the result obtained by the formula (4) is processed by a sigmoid activation function layer to obtain a normalized weight parameter:
finally, normalizing the weightWeight parameterAnd ith layer feature mapMultiplying to obtain a new characteristic diagram after frequency decomposition:
step 2-2: structured illumination pattern generator
In the Unet network, adding a frequency decomposition submodule and a channel compression submodule to each hop layer; the input of the frequency decomposition submodule is a characteristic diagram output by the Unet network encoder, the frequency decomposition submodule carries out frequency decomposition on the characteristic diagram according to formulas (1) to (6) in the step 2-1 and outputs a new characteristic diagram to the channel compression submodule; the channel compression submodule of each jump layer sets different compression ratios according to the position of the layer at the Unet network encoder to perform channel compression and outputs a final characteristic diagram to a decoder of the Unet network; the Unet network at this time is a constructed illumination map generator, and an image is input in the illumination map generator and output as an illumination map of the input image;
and step 3: structure discriminator
The discriminator consists of four layers of convolutional neural networks; when the reflection map generator or the illumination map generator is trained, the reflection map or the illumination map output by the reflection map generator or the illumination map generator is input into the discriminator, the discriminator compares the input reflection map or the illumination map with the label image, and the probability that the reflection map or the illumination map is consistent with the label image is output;
the reflection map generator or the illumination map generator is respectively combined with the discriminator to train the reflection map generator or the illumination map generator;
and 4, step 4: defining a loss function
Step 4-1: define the generator loss function as:
LG=LGAN-G+Lmse+Lcos+Lbf+Lfeat (7)
wherein L isGAN-GRepresenting the inherent loss function, LmseRepresenting the mean square error function, LcosRepresenting the cosine loss function, LbfRepresenting a cross-bilateral filtering loss function, LfeatRepresenting a characteristic loss function;
intrinsic loss function LGAN-GThe calculation formula of (a) is as follows:
in the formula WiExpressing the normalization weight parameter of the i-th network, i expressing the network layer number, fake _ outputiThe probability that the output image is false is shown, ones is shown as 1, and x is shown as the number of network layers;
mean square error function LmseThe calculation formula of (a) is as follows:
in the formula, fake _ imageiOutput, true _ image, representing the characteristic map of the i-last layer of the decoderjAn image tag representing scaling an input image by j times;
cosine loss function LcosThe calculation formula of (a) is as follows:
in the formula, fake _ regionkThe kth block region, true _ region, representing the generated imagekA k-th block area representing a label image, y representing the number of image areas;
cross bilateral filter loss function LbfThe calculation formula of (a) is as follows:
wherein bf represents double sideband filtering, C represents label image, { A, S } represents reflection map and illumination map set, JpRepresenting the output of the bilateral filter, CpValue, N, representing the p-th pixel of the label imagepDenotes the total number of p pixels and neighboring pixels, WpDenotes the normalized weight, q denotes the sequence number of the neighboring pixel of p, n (p) denotes the set of neighboring pixel positions of the p-th pixel,representing a spatial gaussian kernel, p represents the serial number of the p-th pixel,denotes the range Gaussian kernel, CqRepresents the value of the neighboring pixel q;
characteristic loss function LfeatThe calculation formula of (a) is as follows:
where l denotes the l-th layer of the network, FlNumber of channels, H, representing the characteristic diagram of the l-th layerlDenotes the height of the ith layer profile, W denotes the width of the ith layer profile,a feature activation value representing the ith layer of the image,a representation generator output image;
step 4-2: the discriminator loss function is defined as follows:
where zeros represents a probability of 0,representing the probability that the output image is true;
and 5: network training
Respectively training a combination of a reflection map generator and a discriminator and a combination of a light map generator and the discriminator by using the training image sample library constructed in the step 1, updating network parameters by adopting an Adam optimization method, and stopping training when the loss function value defined in the step 4 is minimum to obtain a final reflection map generator and a final light map generator;
step 6: and inputting the original image to be processed into the step 5 to obtain a reflection map generator or an illumination map generator, wherein the output image is a reflection map or an illumination map obtained by decomposing the original image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010319106.3A CN111563577B (en) | 2020-04-21 | 2020-04-21 | Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010319106.3A CN111563577B (en) | 2020-04-21 | 2020-04-21 | Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111563577A CN111563577A (en) | 2020-08-21 |
CN111563577B true CN111563577B (en) | 2022-03-11 |
Family
ID=72071688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010319106.3A Active CN111563577B (en) | 2020-04-21 | 2020-04-21 | Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111563577B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112150432A (en) * | 2020-09-22 | 2020-12-29 | 电子科技大学 | Optical excitation infrared nondestructive testing method based on generation countermeasure network |
CN113034353B (en) * | 2021-04-09 | 2024-07-12 | 西安建筑科技大学 | Intrinsic image decomposition method and system based on cross convolution neural network |
CN113573047B (en) * | 2021-07-16 | 2022-07-01 | 北京理工大学 | Video quality evaluation method based on eigen-map decomposition and motion estimation |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101794389A (en) * | 2009-12-30 | 2010-08-04 | 中国科学院计算技术研究所 | Illumination pretreatment method of facial image |
CN104598933A (en) * | 2014-11-13 | 2015-05-06 | 上海交通大学 | Multi-feature fusion based image copying detection method |
CN105956995A (en) * | 2016-04-19 | 2016-09-21 | 浙江大学 | Face appearance editing method based on real-time video proper decomposition |
CN106156503A (en) * | 2016-07-05 | 2016-11-23 | 中国矿业大学 | A kind of multi-scale entropy characterizing method of anchor system internal flaw distribution |
CN106157264A (en) * | 2016-06-30 | 2016-11-23 | 北京大学 | Large area image uneven illumination bearing calibration based on empirical mode decomposition |
CN108171741A (en) * | 2017-12-22 | 2018-06-15 | 河南科技大学 | A kind of image texture decomposition method based on adaptive multidirectional empirical mode decomposition |
CN108805188A (en) * | 2018-05-29 | 2018-11-13 | 徐州工程学院 | A kind of feature based recalibration generates the image classification method of confrontation network |
CN109249546A (en) * | 2017-07-13 | 2019-01-22 | 长春工业大学 | A kind of vibration rotary cutting apparatus and its Identification of Chatter method in place |
CN110018517A (en) * | 2019-05-07 | 2019-07-16 | 西安石油大学 | A kind of multiple dimensioned ground micro-seismic inverse time interference localization method |
CN110148083A (en) * | 2019-05-17 | 2019-08-20 | 东南大学 | Image interfusion method based on fast B EMD and deep learning |
CN110503614A (en) * | 2019-08-20 | 2019-11-26 | 东北大学 | A kind of Magnetic Resonance Image Denoising based on sparse dictionary study |
CN110675381A (en) * | 2019-09-24 | 2020-01-10 | 西北工业大学 | Intrinsic image decomposition method based on serial structure network |
CN110728633A (en) * | 2019-09-06 | 2020-01-24 | 上海交通大学 | Multi-exposure high-dynamic-range inverse tone mapping model construction method and device |
-
2020
- 2020-04-21 CN CN202010319106.3A patent/CN111563577B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101794389A (en) * | 2009-12-30 | 2010-08-04 | 中国科学院计算技术研究所 | Illumination pretreatment method of facial image |
CN104598933A (en) * | 2014-11-13 | 2015-05-06 | 上海交通大学 | Multi-feature fusion based image copying detection method |
CN105956995A (en) * | 2016-04-19 | 2016-09-21 | 浙江大学 | Face appearance editing method based on real-time video proper decomposition |
CN106157264A (en) * | 2016-06-30 | 2016-11-23 | 北京大学 | Large area image uneven illumination bearing calibration based on empirical mode decomposition |
CN106156503A (en) * | 2016-07-05 | 2016-11-23 | 中国矿业大学 | A kind of multi-scale entropy characterizing method of anchor system internal flaw distribution |
CN109249546A (en) * | 2017-07-13 | 2019-01-22 | 长春工业大学 | A kind of vibration rotary cutting apparatus and its Identification of Chatter method in place |
CN108171741A (en) * | 2017-12-22 | 2018-06-15 | 河南科技大学 | A kind of image texture decomposition method based on adaptive multidirectional empirical mode decomposition |
CN108805188A (en) * | 2018-05-29 | 2018-11-13 | 徐州工程学院 | A kind of feature based recalibration generates the image classification method of confrontation network |
CN110018517A (en) * | 2019-05-07 | 2019-07-16 | 西安石油大学 | A kind of multiple dimensioned ground micro-seismic inverse time interference localization method |
CN110148083A (en) * | 2019-05-17 | 2019-08-20 | 东南大学 | Image interfusion method based on fast B EMD and deep learning |
CN110503614A (en) * | 2019-08-20 | 2019-11-26 | 东北大学 | A kind of Magnetic Resonance Image Denoising based on sparse dictionary study |
CN110728633A (en) * | 2019-09-06 | 2020-01-24 | 上海交通大学 | Multi-exposure high-dynamic-range inverse tone mapping model construction method and device |
CN110675381A (en) * | 2019-09-24 | 2020-01-10 | 西北工业大学 | Intrinsic image decomposition method based on serial structure network |
Non-Patent Citations (8)
Title |
---|
Intrinsic Image Decomposition for Image Enhancement;V.S. ASWATHY等;《2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI)》;20181203;20-23 * |
Intrinsic Image Decomposition Using Multi-Scale Measurements and Sparsity;Shouhong Ding等;《COMPUTER GRAPHICS》;20171231;第36卷(第6期);251-261 * |
Intrinsic Image Decomposition: A Comprehensive Review;Yupeng Ma等;《Image and Graphics》;20171230;626-638 * |
Intrinsic Image Transformation via Scale Space Decomposition;Lechao Cheng等;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20181217;656-665 * |
人脸表情图像识别关键技术的分析与研究;卢洋;《中国博士学位论文全文数据库 信息科技辑》;20200215;第2020年卷(第2期);I138-75 * |
基于多尺度分析的图像融合算法研究;殷向;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115;第2019年卷(第1期);I138-3382 * |
基于样本和深度学习的图像修复方法研究;强振平;《中国博士学位论文全文数据库 信息科技辑》;20200115;第2020年卷(第1期);I138-121 * |
基于稀疏表示与鉴别分析算法的人脸图像分类研究;刘梓;《中国博士学位论文全文数据库 信息科技辑》;20180715;第2018年卷(第7期);I138-41 * |
Also Published As
Publication number | Publication date |
---|---|
CN111563577A (en) | 2020-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Hyperspectral image super-resolution using deep feature matrix factorization | |
CN111563577B (en) | Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification | |
Zhang et al. | Regions of interest detection in panchromatic remote sensing images based on multiscale feature fusion | |
Xie et al. | Deep convolutional networks with residual learning for accurate spectral-spatial denoising | |
Espinal et al. | Wavelet-based fractal signature analysis for automatic target recognition | |
Starovoitov et al. | Texture periodicity detection: Features, properties, and comparisons | |
Sahu et al. | Trends and prospects of techniques for haze removal from degraded images: A survey | |
CN114863223B (en) | Hyperspectral weak supervision classification method combining denoising autoencoder and scene enhancement | |
Zhang et al. | Unleashing the power of self-supervised image denoising: A comprehensive review | |
CN114266957A (en) | Hyperspectral image super-resolution restoration method based on multi-degradation mode data augmentation | |
CN114170418A (en) | Automobile wire harness connector multi-feature fusion image retrieval method by searching images through images | |
Dumka et al. | Advanced digital image processing and its applications in big data | |
Hussain et al. | Image denoising to enhance character recognition using deep learning | |
CN107133579A (en) | Based on CSGF (2D)2The face identification method of PCANet convolutional networks | |
Zhu et al. | Rggid: A robust and green gan-fake image detector | |
CN117876793A (en) | Hyperspectral image tree classification method and device | |
Wu et al. | Review of imaging device identification based on machine learning | |
CN113902013A (en) | Hyperspectral classification method based on three-dimensional convolutional neural network and superpixel segmentation | |
CN114463379A (en) | Dynamic capturing method and device for video key points | |
Zhang et al. | Infrared Small Target Detection Based on Four-Direction Overlapping Group Sparse Total Variation. | |
Shao et al. | No-Reference image quality assessment based on edge pattern feature in the spatial domain | |
Gunawan et al. | Classification of Japanese fagaceae wood based on microscopic image analysis | |
CN117612020B (en) | SGAN-based detection method for resisting change of remote sensing image element of neural network | |
Maheswarreddy et al. | Region of interest extraction based on hybrid salient detection for remote sensing image | |
Foucher et al. | Global semantic classification of scenes using ridgelet transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |