CN113034353A

CN113034353A - Essential image decomposition method and system based on cross convolution neural network

Info

Publication number: CN113034353A
Application number: CN202110385353.8A
Authority: CN
Inventors: 权炜; 孙燕平; 于军琪; 董芳楠
Original assignee: Xian University of Architecture and Technology
Current assignee: Xian University of Architecture and Technology
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-06-25
Anticipated expiration: 2041-04-09
Also published as: CN113034353B

Abstract

The invention discloses an essential image decomposition method and system based on a cross convolution neural network, wherein the method comprises the following steps: inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain a light map and a reflection map obtained by decomposing the original image; the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed on the basis of a GoogleLeNet convolutional neural network, and the reflection map generation network is constructed on the basis of a VGG19 convolutional neural network; and training the light map generation network and the reflection map generation network by adopting an Adam optimization method. In the invention, the images of the result of the intrinsic image decomposition are kept consistent on the reflectivity of the same object, the protection on edge information and the removal on noise are better, the image quality is higher, and the details and the definition are closer to the true value images.

Description

Essential image decomposition method and system based on cross convolution neural network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an essential image decomposition method and system based on a cross convolution neural network.

Background

Intrinsic image decomposition was first proposed by Barrow and Tenenbaum in 1978, and the intrinsic image solving problem is to recover brightness and reflectivity information in a scene corresponding to all pixel points from an image to form a light map and a reflection map respectively. The intrinsic image decomposition is mainly divided into two types according to the algorithm type, the first type is the intrinsic image decomposition based on Retinex theory, and the second type is the intrinsic image decomposition based on deep learning. The conventional intrinsic image decomposition method Retinex assumes that the larger gradient in the image is caused by the object reflectivity, and the smaller gradient belongs to the illumination variation. Because the Retinex method is completely based on the gradient, the Retinex method establishes local constraints.

Another common constraint at present is that the natural image contains a small number of colors and the color distribution is structured, which is called global color sparsity, i.e. the image of the reflectivity layer is required to contain only several colors. Because only local constraints can be established by the gradient-based method, the obtained reflectivity layer image may have a phenomenon of global inconsistency, that is, the reflectivity of two pixels of the same material which are far away from each other is inconsistent, and multiple images in the same scene increase to provide strict requirements for the input of the essential image method. After the gradient values of the reflectivity and brightness images are estimated, the gradient images are integrated by means of Weiss to solve the reflection map and the illumination map. However, this method requires a large number of samples to train the classifier, which is time-consuming, and the obtained intrinsic image has a large error at the edge, and the finally obtained intrinsic image is blurred at the edge.

The intrinsic image decomposition method based on deep learning improves the problems to a certain extent, but has a plurality of defects, such as Narihira and the like, because of the defects of network design, the image is down-sampled to an excessively small scale, so that a large amount of recovered information is lost, and the output result is fuzzy; fan et al integrate a filter in the network to flatten the reflective layer, remove residual noise and geometry information, but ignore the protection of image details resulting in jagged edges.

Disclosure of Invention

The present invention is directed to a method and a system for intrinsic image decomposition based on a cross-convolution neural network, so as to solve one or more of the above technical problems. In the invention, the images of the result of the intrinsic image decomposition are kept consistent on the reflectivity of the same object, the protection on edge information and the removal on noise are better, the image quality is higher, and the details and the definition are closer to the true value images.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses an essential image decomposition method based on a cross convolution neural network, which comprises the following steps:

inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain a light map and a reflection map obtained by decomposing the original image;

the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed on the basis of a GoogleLeNet convolutional neural network, and the reflection map generation network is constructed on the basis of a VGG19 convolutional neural network;

the method for acquiring the well-trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training a light map generation network and a reflection map generation network by adopting an Adam optimization method.

The invention has the further improvement that the steps of constructing the illumination map generation network based on the GoogleLeNet convolutional neural network specifically comprise:

respectively adding 1 ReLU activation function after 4 convolution operations on the second layer of the GoogleLeNet convolutional neural network initiation 3a, and outputting the 4 ReLU activation functions to the DepthConcato layer of the initiation 3a together;

respectively adding 1 ReLU activation function after 4 convolution operations on the second layer of the GoogleLeNet convolutional neural network initiation 3b, and outputting the 4 ReLU activation functions to the DepthConcato layer of the initiation 3b together;

in the GoogleLeNet convolutional neural network initiation 4a, 2 convolution operations of a first layer are connected with 2 convolution operations of a second layer to form 2 connecting channels; adding 1 ReLU activation function and a MaxPool operation combination on 2 connection channels respectively, wherein the ReLU activation function is in front of the MaxPool operation combination;

in the GoogleLeNet convolutional neural network initiation 4b, 2 convolution operations of a first layer are connected with 2 convolution operations of a second layer to form 2 connecting channels; adding 1 ReLU activation function and a MaxPool operation combination on 2 connection channels respectively, wherein the ReLU activation function is in front of the MaxPool operation combination;

connecting the output of the DepthConcat layer of the GoogleLeNet convolutional neural network initiation 4b to the DepthConcat layer of the initiation 4d in a jumping manner; directly connecting the convolution operation output after the AveragePool operation of the first layer of the GoogleLeNet convolution neural network initiation 4e to the DepthCocatat layer of the initiation 4 e; connecting the output of the DepthConcat layer of the GoogleLeNet convolutional neural network initiation 4e to the DepthConcat layer of the initiation 5b in a jumping manner;

respectively adding 1 ReLU activation function after 4 convolution operations on the third layer of the GoogleLeNet convolutional neural network initiation 5a, and outputting the 4 ReLU activation functions to the DepthConcatin layer of the initiation 5a together;

respectively adding 1 ReLU activation function after 4 convolution operations of the second layer of the GoogleLeNet convolutional neural network initiation 5b, wherein the 4 ReLU activation functions are jointly output to the DepthConcat layer of the initiation 3 b;

an FC layer is added after the FC layer of the google lenet convolutional neural network.

The further improvement of the invention is that the step of constructing the reflection map generation network based on the VGG19 convolutional neural network specifically comprises the following steps:

performing Concat operation on a first MaxBoool output result and a second MaxBoool output result of the VGG19 convolutional neural network, and inputting the obtained results to a fifth layer of the VGG19 convolutional neural network;

performing Concat operation on a third MaxBoool output result and a fourth MaxBoool output result of the VGG19 convolutional neural network, and inputting the obtained results into a thirteenth layer of the VGG19 convolutional neural network;

deleting the seventeenth layer and the eighteenth layer of the VGG19 convolutional neural network; and adding two layers with the same structure as the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network.

The further improvement of the invention is that the step of cross-fusion of the illumination map generation network and the reflection map generation network specifically comprises:

connecting the depthConcat layer output of the GoogleLeNet convolutional neural network initiation 4e to the thirteenth layer of the VGG19 convolutional neural network;

the convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of the google lenet convolutional neural network initiation 5 a.

The invention is further improved in that the Loss function Loss1 of the illumination map generation network has the expression:

in the formula, X is an input image,

for predicting images, H, W and C are respectively the height, width and channel number of the input image, x and y represent the pixel coordinates of the image, C represents the channel, and mu_iDenotes the weight at the i-th scale, X⁽ⁱ⁾Representing the image at the i-th scale,

representing the predicted image at the i-th scale generated by the improved google convolutional neural network.

The invention is further improved in that the expression of the Loss function Loss2 of the reflection map generation network is as follows:

in the formula, Y represents an input image,

representing the estimated value, C, of the input image after being processed by the modified VGG19 network_j,H_j,W_jRespectively representing the channel number, height and width of the j-th layer output characteristic diagram, V_j(.) indicates the output of the activation function when the layer j processes the picture, j indicating the number of layers.

The further improvement of the invention is that the step of training the light map generation network and the reflection map generation network by adopting an Adam optimization method specifically comprises the following steps:

taking an image in a pre-constructed training image sample library as a sample, and training a light map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;

in the training process, inputting the illumination pattern output by the illumination pattern generation network into an identification network, identifying the probability that the illumination pattern output by the network is consistent with the label image of the training sample, and reversely updating the network parameters of the illumination pattern generation network; inputting the reflection map output by the reflection map generation network into an identification network, identifying the probability that the reflection map output by the network is consistent with the label image of the training sample, and reversely updating the network parameters of the reflection map generation network;

when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflection map generation network to obtain a final reflection map generation network;

the identification network is a multilayer convolutional neural network and comprises six same layers; each layer is in turn a convolution operation, a Sigmoid activation function, and MaxPool.

A further refinement of the invention consists in that the Adam optimization parameter beta is set to (0.9,0.999), the learning rate is 0.005, the weight attenuation is 0.0001, epoch is 100 and banksize is 20.

The invention discloses an essential image decomposition system based on a cross convolution neural network, which comprises:

the decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain a light map and a reflection map obtained by decomposing the original image;

A further development of the invention is that,

the steps of constructing the illumination map generation network based on the GoogleLeNet convolutional neural network specifically comprise:

adding an FC layer behind the FC layer of the GoogleLeNet convolutional neural network;

the step of constructing the reflection map generation network based on the VGG19 convolutional neural network specifically comprises the following steps:

deleting the seventeenth layer and the eighteenth layer of the VGG19 convolutional neural network; adding two layers with the same structure as the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the VGG19 convolutional neural network after modification;

the step of performing cross fusion on the illumination map generation network and the reflection map generation network specifically comprises the following steps:

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an essential image decomposition method based on an improved GoogLeNet-VGG19 cross convolution neural network, which comprises the steps of firstly constructing a training image sample library, then carrying out improved construction of an illumination map generation network based on the traditional GoogLeNet convolution neural network, carrying out improved construction of a reflection map generation network based on the traditional VGG19 convolution neural network, and carrying out cross fusion on the illumination map generation network and the reflection map generation network; next, constructing a recognition network; and finally, training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain the final illumination map generation network and reflection map generation network. The result image of the essential image decomposition keeps consistent on the reflectivity of the same object, has better performance on the aspects of protecting edge information and removing noise, has higher image quality, and is closer to a true value image in the aspects of detail and definition.

Compared with the prior art, the system has the advantages that the images output by the method are consistent in reflectivity of the same object, the system is good in protection of edge information and removal of noise, and the image quality is high; the generated result is closer to the true value image in the aspects of detail and definition.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic flow chart of an essential image decomposition method based on an improved GoogLeNet-VGG19 cross-convolution neural network according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an intrinsic image decomposition result according to an embodiment of the present invention; fig. 2 (a) is a schematic diagram of an original image, fig. 2 (b) is a schematic diagram of an illumination map obtained after decomposition, and fig. 2 (c) is a schematic diagram of a reflection map obtained after decomposition.

Detailed Description

In order to make the purpose, technical effect and technical solution of the embodiments of the present invention clearer, the following clearly and completely describes the technical solution of the embodiments of the present invention with reference to the drawings in the embodiments of the present invention; it is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.

Referring to fig. 1, an intrinsic image decomposition method based on an improved google lenet-VGG19 cross convolution neural network according to an embodiment of the present invention includes the following steps:

step 1: constructing a training image sample library;

taking out P images and corresponding light map images and reflection images from the essence image database by adopting the public essence image database; then, randomly cutting the P images to cut a plurality of image blocks with specified sizes; then, image processing is carried out on the image blocks, namely horizontal overturning, vertical overturning, rotation and mirror image are carried out randomly for expanding the database; the image blocks after image processing and the illumination patterns and reflection patterns corresponding to the image blocks form a training image sample library;

step 2: the illumination map generation network is constructed by adopting an improved GoogleLeNet convolution neural network, and the method specifically comprises the following steps:

step 2-1: respectively adding 1 ReLU activation function after 4 convolution operations on the second layer of the GoogleLeNet convolutional neural network initiation 3a, wherein 4 ReLU activation functions are totally added, and the 4 ReLU activation functions are jointly output to a DepthConcat layer of the initiation 3 a;

step 2-2: respectively adding 1 ReLU activation function after 4 convolution operations on the second layer of the GoogleLeNet convolutional neural network initiation 3b, wherein 4 ReLU activation functions are totally added, and the 4 ReLU activation functions are jointly output to a DepthConcat layer of the initiation 3 b;

step 2-3: in the GoogleLeNet convolutional neural network initiation 4a, 2 convolution operations of a first layer are connected with 2 convolution operations of a second layer to form 2 connecting channels; adding 1 ReLU activation function and MaxPool operation combination on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, and the MaxPool operation is in back;

step 2-4: in the GoogleLeNet convolutional neural network initiation 4b, 2 convolution operations of a first layer are connected with 2 convolution operations of a second layer to form 2 connecting channels; adding 1 ReLU activation function and MaxPool operation combination on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, and the MaxPool operation is in back;

step 2-5: connecting the output of the DepthConcat layer of the GoogleLeNet convolutional neural network initiation 4b to the DepthConcat layer of the initiation 4d in a jumping manner;

step 2-6: directly connecting the convolution operation output after the AveragePool operation of the first layer of the GoogleLeNet convolution neural network initiation 4e to the DepthCocatat layer of the initiation 4 e;

step 2-7: connecting the output of the DepthConcat layer of the GoogleLeNet convolutional neural network initiation 4e to the DepthConcat layer of the initiation 5b in a jumping manner;

step 2-8: respectively adding 1 ReLU activation function after 4 convolution operations on the third layer of the GoogleLeNet convolutional neural network initiation 5a, wherein 4 ReLU activation functions are totally added, and the 4 ReLU activation functions are jointly output to a DepthConcat layer of the initiation 5 a;

step 2-9: adding 1 ReLU activation function after 4 convolution operations of the second layer of the GoogleLeNet convolutional neural network initiation 5b, wherein 4 ReLU activation functions are totally added, and the 4 ReLU activation functions are jointly output to the DepthCocatat layer of initiation 3 b;

step 2-10: a new FC layer is added behind the FC layer of the GoogleLeNet convolutional neural network;

step 2-11: forming an improved GoogLeNet convolutional neural network through the operations from the step 2-1 to the step 2-10;

and step 3: the reflection map generation network is constructed by adopting the improved VGG19 convolutional neural network, and the method comprises the following specific steps:

step 3-1: performing Concat operation on a first MaxBoool output result and a second MaxBoool output result of the VGG19 convolutional neural network, and inputting the obtained results to a fifth layer of the VGG19 convolutional neural network;

step 3-2: performing Concat operation on a third MaxBoool output result and a fourth MaxBoool output result of the VGG19 convolutional neural network, and inputting the obtained results into a thirteenth layer of the VGG19 convolutional neural network;

step 3-3: deleting the seventeenth layer and the eighteenth layer of the VGG19 convolutional neural network;

step 3-4: adding two same layers after the sixteenth layer of the VGG19 convolutional neural network to form a new seventeenth layer and an eighteenth layer; the structure of the new seventeenth layer and the eighteenth layer is completely the same as that of the sixteenth layer;

step 3-5: constructing a modified VGG19 convolutional neural network through the operations of step 3-1 to step 3-4;

and 4, step 4: the illumination map generation network and the reflection map generation network are crossed and fused;

step 4-1: connecting the depthConcat layer output of the GoogleLeNet convolutional neural network initiation 4e to the thirteenth layer of the VGG19 convolutional neural network;

step 4-2: a convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of the google convolutional neural network initiation 5 a;

and 5: constructing a recognition network;

the identification network is a multilayer convolutional neural network and comprises six same layers; each layer is sequentially convolution operation, Sigmoid activation function and MaxPool;

step 6: defining a loss function;

step 6-1: defining an illumination map generation network Loss function Loss 1:

wherein, X is an input image,

representing a predicted image at the ith scale generated by the improved GoogleLeNet convolutional neural network;

step 6-2: defining a reflection map to generate a network Loss function Loss 2:

wherein, Y represents an input image,

representing the estimated value, C, of the input image after being processed by the modified VGG19 network_j,H_j,W_jRespectively representing the channel number, height and width of the j-th layer output characteristic diagram, V_j(.) indicates the output of the activation function when the j-th layer processes the image, j indicates the layer number;

and 7: network training;

taking the images in the training image sample library constructed in the step 1 as samples, and training a light map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;

and 8: and (4) respectively inputting the original image to be decomposed into the final illumination map generation network and the final reflection map generation network obtained in the step (7), wherein the output image is the illumination map and the reflection map obtained by decomposing the original image.

In an embodiment of the present invention, the size of the image block with the size specified in step 1 is 224 × 224.

In the embodiment of the present invention, the parameters set when the network is trained in step 7 are as follows: adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight attenuation is 0.0001, epoch is 100, and banksize is 20.

Compared with the problem that the image decomposed by the existing method has a lot of noises and the image edge is fuzzy, the method provided by the embodiment of the invention has the advantages that the output image keeps consistent on the reflectivity of the same object, the performance in the aspects of protecting the edge information and removing the noises is better, and the image quality is higher; the method of the invention can generate results which are closer to the true value image in the aspects of detail and definition.

The invention provides an essential image decomposition system based on a cross convolution neural network, which comprises:

Referring to fig. 1 and fig. 2, an intrinsic image decomposition method based on the improved google lenet-VGG19 cross-convolution neural network according to an embodiment of the present invention includes the following steps:

(1) constructing a training image sample library

And (3) taking 1000 images from the MPCal essential image data set, randomly cutting 50 image blocks of 224X 224 in each image, and then randomly turning the image blocks horizontally, vertically, rotating and mirroring to change the 50 image blocks into 200 image blocks. At this time, the total number of image blocks is 20 ten thousand. Meanwhile, in the illumination map and the reflection map corresponding to 1000 images, the illumination pattern block and the reflection pattern block corresponding to the 20 ten thousand image blocks are found. The image blocks, the corresponding illumination image blocks and the reflection image blocks form a training image sample library.

(2) The illumination map generation network and the reflection map generation network constructed by the method are trained simultaneously by using a training image sample library, an Adam optimization method is adopted, Adam optimization parameters beta are set to be (0.9,0.999), the learning rate is 0.005, the weight attenuation is 0.0001, epoch is 100, and batch size is 20. And when the loss functions of the two generated networks are minimum, stopping training to obtain a final light map generation network and a final reflection map generation network. In the training process, inputting the illumination pattern output by the illumination pattern generation network into an identification network, identifying the probability that the illumination pattern output by the network is consistent with the label image of the training sample, and reversely updating the network parameters of the illumination pattern generation network; inputting the reflection map output by the reflection map generation network into an identification network, identifying the probability that the reflection map output by the network is consistent with the label image of the training sample, and reversely updating the network parameters of the reflection map generation network; the generation network and the recognition network adopt a TTUR training method, and the ratio of the training times of the recognition network to the training times of the generation network is 3 to 1.

(3) As shown in fig. 2, the original image to be processed (shown in (a) of fig. 2) is respectively input into the final illumination map generation network and reflection map generation network, and the output image is the illumination map and reflection map obtained by decomposing the original image (shown in (b) and (c) of fig. 2). The decomposition result shows that the intrinsic image decomposition result of the method has less noise, the image edge is clear, and the integral definition and quality of the image reach higher levels, thereby fully explaining the effectiveness and the practicability of the method.

In summary, the embodiment of the present invention provides an essential image decomposition method based on an improved google lenet-VGG19 cross convolution neural network, which includes first constructing a training image sample library, then performing an improved structure on an illumination map generation network based on a conventional google lenet convolution neural network, performing an improved structure on a reflection map generation network based on a conventional VGG19 convolution neural network, and performing cross fusion on the illumination map generation network and the reflection map generation network; next, constructing a recognition network; and finally, training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain the final illumination map generation network and reflection map generation network. The result image of the essential image decomposition keeps consistent on the reflectivity of the same object, has better performance on the aspects of protecting edge information and removing noise, has higher image quality, and is closer to a true value image in the aspects of detail and definition.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims

1. An essential image decomposition method based on a cross convolution neural network is characterized by comprising the following steps:

2. The method for essential image decomposition based on cross-convolution neural network as claimed in claim 1, wherein the step of constructing the illumination map generation network based on the google lenet convolution neural network specifically includes:

3. The method of claim 2, wherein the step of constructing the reflection map generation network based on the VGG19 convolutional neural network specifically comprises:

4. The method according to claim 3, wherein the step of performing cross fusion on the illumination map generation network and the reflection map generation network specifically comprises:

5. The method for essential image decomposition based on the cross-convolution neural network as claimed in claim 4, wherein the expression of the Loss function Loss1 of the illumination map generation network is:

in the formula, X is an input image,

6. The method for essential image decomposition based on the cross-convolution neural network as claimed in claim 5, wherein the expression of the Loss function Loss2 of the reflection map generation network is as follows:

in the formula, Y represents an input image,

7. The method of claim 6, wherein the step of training the histogram generation network and the reflectogram generation network by using an Adam optimization method specifically comprises:

8. The method of claim 7, wherein Adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight attenuation is 0.0001, epoch is 100, and banksize is 20.

9. An essential image decomposition system based on a cross-convolution neural network, comprising:

10. The cross-convolution neural network-based intrinsic image decomposition system of claim 9,