CN115797236A - Image harmonious processing method and device - Google Patents

Image harmonious processing method and device Download PDF

Info

Publication number
CN115797236A
CN115797236A CN202111051117.9A CN202111051117A CN115797236A CN 115797236 A CN115797236 A CN 115797236A CN 202111051117 A CN202111051117 A CN 202111051117A CN 115797236 A CN115797236 A CN 115797236A
Authority
CN
China
Prior art keywords
image
network
training
lookup table
harmony
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111051117.9A
Other languages
Chinese (zh)
Inventor
陈维强
高雪松
孙萁浩
马琳杰
梁京
丛文艳
陶新昊
牛力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Holding Co Ltd
Original Assignee
Hisense Group Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Group Holding Co Ltd filed Critical Hisense Group Holding Co Ltd
Priority to CN202111051117.9A priority Critical patent/CN115797236A/en
Publication of CN115797236A publication Critical patent/CN115797236A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a method and equipment for harmonious processing of images, which can improve the expression capability of a 3D lookup table and occupy less memory resources and computer resources. The method comprises the following steps: performing feature extraction on a synthetic image to be processed to obtain synthetic features; determining the weight of each sub-network in the trained three-dimensional lookup table network by using the synthesis characteristics, wherein the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthesis graph as input and a training original graph corresponding to the training synthesis graph as output; determining a prediction network in the three-dimensional lookup table network according to the sub-networks and the weights of the sub-networks; and carrying out harmony processing on the synthetic image according to the prediction network to obtain a harmony map of the synthetic image.

Description

Image harmonious processing method and device
Technical Field
The invention relates to the technical field of image synthesis, in particular to a method and equipment for image harmony processing.
Background
Image composition (image composition) is a common operation of image processing, that is, a foreground (an object in an image that a user is interested in) on one image and a background (a scene behind a main body in the image, which is used for representing a spatio-temporal environment where the main body is located) in another image are synthesized to obtain a composite image. However, since the foreground and the background are photographed under different photographing conditions (such as illumination, weather), there is a significant mismatch problem in brightness and luster, and the like, and thus the obtained composite image is not harmonious.
Image harmony (image harmony) aims to adjust the foreground in the composite image to make it look harmonious with the background. Although the 3D lookup table is a relatively common image enhancement method, the existing 3D lookup table is generally artificially designed and applied to a separate task, for a batch of image harmonization tasks, the expression capability is far insufficient only by using a few artificially designed 3D lookup tables, and an expert is required to design the 3D lookup table, which also results in a large amount of human resources being consumed.
Disclosure of Invention
The invention provides a method and equipment for harmonious processing of images, which utilize a trained 3D lookup table network to harmoniously process a synthesized image, can improve the expression capability of a 3D lookup table and occupy less memory resources and computer resources.
In a first aspect, an embodiment of the present invention provides an image harmony processing method, including:
performing feature extraction on a synthetic image to be processed to obtain synthetic features;
determining the weight of each sub-network in the trained three-dimensional lookup table network by using the synthesis characteristics, wherein the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthesis graph as input and a training original graph corresponding to the training synthesis graph as output;
determining a prediction network in the three-dimensional lookup table network according to the sub-networks and the weights of the sub-networks;
and carrying out harmony processing on the synthetic image according to the prediction network to obtain a harmony map of the synthetic image.
The embodiment of the invention trains to obtain a dynamic three-dimensional lookup table network by combining deep learning with the three-dimensional lookup table, determines the weight of each sub-network in the three-dimensional lookup table network by using the synthetic characteristics extracted from the synthetic image, namely the weight of each three-dimensional lookup table, and combines each weight and the corresponding three-dimensional lookup table into a new prediction network (namely a new three-dimensional lookup table), thereby carrying out harmonious processing on the synthetic image by using the new three-dimensional lookup table, improving the expression capability of the 3D lookup table and occupying less memory resources and computer resources.
In some embodiments, the performing feature extraction on the composite image to be processed to obtain a composite feature includes:
carrying out binarization processing on the composite image to obtain a mask image of the foreground;
and performing feature extraction on the synthesized image and the mask image to obtain the synthesized feature.
In some embodiments, the harmonizing the synthesized image according to the prediction network to obtain a harmonizing map of the synthesized image includes:
carrying out harmony processing on the synthetic image according to the prediction network to obtain an initial harmony map;
and generating a harmony map of the synthetic image according to the foreground in the initial harmony map and the background of the synthetic image.
In some embodiments, the harmonizing the synthesized image according to the prediction network to obtain an initial harmonizing map includes:
determining, from the prediction network, a predicted color corresponding to each of the three primary colors in the composite image;
and processing each three primary colors in the synthetic image into corresponding predicted colors to obtain the initial harmony map.
In some embodiments, said determining, from said prediction network, a predicted color corresponding to each of the three primary colors in said composite image comprises:
and if the three primary colors in the composite image are not contained in the prediction network, determining the prediction colors corresponding to the three primary colors from the prediction network according to a trilinear interpolation method.
In some embodiments, the generating a harmony map of a synthetic image from a foreground in the initial harmony map and a background of the synthetic image comprises:
pasting the foreground in the initial harmony map into the background of the synthetic image to generate the harmony map of the synthetic image.
In some embodiments, before performing the feature extraction on the composite image to be processed, the method further includes:
if the resolution of the composite image is higher than a threshold value, downsampling the composite image;
and determining the image subjected to the downsampling processing as the composite image to be processed.
In some embodiments, the trained three-dimensional network of look-up tables is determined by:
determining a loss value according to the training original image corresponding to the training synthetic image and the training harmonic image obtained by harmonizing the training synthetic image through an initial three-dimensional lookup table network;
determining a loss weight of a loss value corresponding to the training composite map according to a comparison result of a pixel value of a foreground in the training composite map and a threshold value, wherein the loss weight is inversely proportional to the size of the pixel value;
determining a loss function according to the loss value and the loss weight corresponding to each training synthetic graph;
and if the function value of the loss function is smaller than a preset value, determining the initial three-dimensional lookup table network as the trained three-dimensional lookup table network.
In a second aspect, an embodiment of the present invention provides an apparatus for image harmonizing processing, including a processor and a memory, where the memory is used for storing a program executable by the processor, and the processor is used for reading the program in the memory and executing the following steps:
performing feature extraction on a synthetic image to be processed to obtain synthetic features;
determining the weight of each sub-network in the trained three-dimensional lookup table network by using the synthesis characteristics, wherein the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthesis graph as input and a training original graph corresponding to the training synthesis graph as output;
determining a prediction network in the three-dimensional lookup table network according to the sub-networks and the weights of the sub-networks;
and carrying out harmony processing on the synthetic image according to the prediction network to obtain a harmony map of the synthetic image.
In some embodiments, the processor is configured to perform:
carrying out binarization processing on the synthetic image to obtain a mask image of the foreground;
and performing feature extraction on the synthetic image and the mask image to obtain the synthetic feature.
In some embodiments, the processor is configured to perform:
performing harmony processing on the synthetic image according to the prediction network to obtain an initial harmony map;
and generating a harmony map of the synthetic image according to the foreground in the initial harmony map and the background of the synthetic image.
In some embodiments, the processor is configured to perform:
determining, from the prediction network, a predicted color corresponding to each of the three primary colors in the composite image;
and processing each three primary colors in the synthetic image into corresponding predicted colors to obtain the initial harmony map.
In some embodiments, the processor is configured to perform:
and if the three primary colors in the composite image are not contained in the prediction network, determining the prediction colors corresponding to the three primary colors from the prediction network according to a trilinear interpolation method.
In some embodiments, the processing appliance is configured to perform:
pasting the foreground in the initial harmony map into the background of the synthetic image to generate the harmony map of the synthetic image.
In some embodiments, before the feature extraction of the composite image to be processed, the processor is specifically further configured to perform:
if the resolution of the composite image is higher than a threshold value, downsampling the composite image;
and determining the image subjected to the downsampling processing as the composite image to be processed.
In some embodiments, the processor is configured to determine the trained three-dimensional network of look-up tables by:
determining a loss value according to the training original image corresponding to the training synthetic image and the training harmonic image obtained by harmonizing the training synthetic image through an initial three-dimensional lookup table network;
determining a loss weight of a loss value corresponding to the training composite map according to a comparison result of a pixel value of a foreground in the training composite map and a threshold value, wherein the loss weight is inversely proportional to the size of the pixel value;
determining a loss function according to the loss value and the loss weight corresponding to each training synthetic graph;
and if the function value of the loss function is smaller than a preset value, determining the initial three-dimensional lookup table network as the trained three-dimensional lookup table network.
In a third aspect, an embodiment of the present invention further provides an apparatus for image harmonization processing, where the apparatus includes:
the feature extraction unit is used for extracting features of the synthetic image to be processed to obtain synthetic features;
a weight determining unit, configured to determine, by using the synthesis feature, a weight of each sub-network in a trained three-dimensional lookup table network, where the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthesis graph as an input and a training original graph corresponding to the training synthesis graph as an output;
the harmony processing unit is used for determining a prediction network in the three-dimensional lookup table network according to the sub-networks and the weights of the sub-networks, and carrying out harmony processing on the synthetic image according to the prediction network to obtain an initial harmony map;
and the synthesis generation unit is used for generating a harmony map of the synthesized image according to the foreground in the initial harmony map and the background of the synthesized image.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program is used to implement the steps of the method in the first aspect when the computer program is executed by a processor.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic diagram of a 3D lookup table according to an embodiment of the present invention;
FIG. 2A is a schematic diagram of a network architecture for a first image harmonization process according to an embodiment of the present invention;
FIG. 2B is a schematic diagram of a network architecture for a first image harmonization process according to an embodiment of the present invention;
FIG. 2C is a schematic diagram of a network architecture for a first image harmonization process according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for image harmonization processing according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating a method for computing a three-dimensional lookup table according to an embodiment of the present invention;
FIG. 5 is a detailed flowchart of an embodiment of the present invention for implementing image harmonization process;
FIG. 6 is a schematic diagram of an apparatus for image harmonization processing according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an apparatus for image harmonization processing according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems. In the description of the present invention, the term "plurality" means two or more unless otherwise specified.
Image composition (image composition) is a common operation of image processing, that is, a foreground (an object in an image that a user is interested in) on one image and a background (a scene behind a main body in the image, which is used for representing a spatio-temporal environment where the main body is located) in another image are synthesized to obtain a composite image. However, since the foreground and the background are photographed under different photographing conditions (such as illumination, weather), there is a significant mismatch problem in brightness and color, and the like, and thus the obtained composite image is not harmonious.
Image harmony (image harmony) aims to adjust the foreground in the composite image to make it look harmonious with the background. At present, the mainstream method for harmonizing the images is to manually repair the images by using PS and other software, although the effect is good, the labor cost is high, and the method cannot be used for harmonizing the batch images; the image harmony method based on deep learning fixes the sizes of image input and image output, and the harmonious processing image has lower resolution, so that the harmony of the image with higher resolution in life is difficult to process; although the 3D lookup table is a relatively common image enhancement method, the usage method is simple, and it can be understood as a function, where the input is the values of three primary colors, and a new color is output. The 3D lookup table itself is a three-dimensional structure, the three-dimensional coordinates are the values representing the input three primary colors (RGB), and the values at the corresponding coordinates are the values to be output, as shown in fig. 1, the values of the corresponding points are directly labeled as the corresponding output colors for the sake of intuition. The 3D look-up table structure thus corresponds to a direct color-to-color mapping, independent of spatial position in the picture, only in relation to the input color.
However, the existing 3D lookup tables are generally artificially designed and applied to separate tasks, and are generally designed by experts for specific problems, for example, if a certain problem is to reduce contrast, the experts design 3D lookup table mapping capable of reducing contrast, and for batch image harmonization tasks, the expression capability is far insufficient only by a few artificially designed 3D lookup tables, and the expert is required to design the 3D lookup tables, which also results in the need of consuming a large amount of human resources.
The embodiment of the invention provides a method for improving the expression capacity of a 3D lookup table by combining deep learning and a 3D lookup table and training a 3D lookup table network by using a training composite graph-training original graph, wherein the extracted features are input into various sub-networks after the features of the training composite graph are extracted, a prediction network is determined according to the weights of the sub-networks, the weights of the sub-networks are trained by using the corresponding training original graph, the weights of the sub-networks obtained by inputting different composite graphs into the three-dimensional lookup table network are different, the prediction network is determined by using the weights of the sub-networks and the sub-networks, and the harmony processing is performed on the composite image by using the prediction network, so that the harmony graph is finally obtained.
In some embodiments, as shown in fig. 2A, the network architecture for image harmonization processing provided by this embodiment includes a feature extraction network 200 and a three-dimensional lookup table network 201, where the three-dimensional lookup table network 201 includes: a plurality of sub-networks, each sub-network for representing a respective three-dimensional look-up table, a prediction network for representing a new three-dimensional look-up table determined from the respective three-dimensional look-up table and corresponding weights.
In some embodiments, as shown in fig. 2B, it is also possible to design a network structure for low resolution, that is, after the feature extraction network 200 (equivalent to an encoder), a decoder 202 is added for restoring the extracted features by using the decoder, so as to obtain a harmony image after harmony processing, process a low-resolution synthetic image by using the simple network structure, and provide a feature extraction function for a high-resolution synthetic image. In order to reduce the amount of computation, as shown in fig. 2C, a high-resolution synthesized image may be downsampled to a low-resolution image having a fixed size, and feature extraction may be performed.
In implementation, the composite image is input into the feature extraction network 200 for feature extraction to obtain composite features; and inputting the synthesized features into a plurality of sub-networks to determine the weight of each sub-network, performing weighted summation on each sub-network and the corresponding weight to obtain a prediction network, wherein the prediction network can be understood as a predicted 3D lookup table, performing harmony processing on the synthesized image by using the prediction network, and synthesizing the foreground in the harmony processed initial harmony image and the background of the synthesized image to obtain the harmony image of the synthesized image.
It should be noted that, in the present embodiment, the synthesized features extracted from the synthesized image are used to determine the weights of the sub-networks, that is, the weights of the 3D lookup tables, and in the process of performing the harmonization processing on the synthesized image by using the prediction network, since the prediction network is also a 3D lookup table, the harmonization processing is not performed by using the synthesized features of the synthesized image, but the three primary color information in the synthesized image is used to find the predicted color.
As shown in fig. 3, the implementation flow of the method for image harmonization processing provided in this embodiment is as follows:
step 300, performing feature extraction on a synthetic image to be processed to obtain synthetic features;
in some embodiments, the feature extraction network used for feature extraction in this embodiment is a lightweight neural network based on a U-net architecture. The feature extraction network in this embodiment is mainly used for a lightweight network that solves a harmonization task of a low-resolution synthesized image, and a loss function of the feature extraction network is determined based on an original image corresponding to a result obtained by inputting the synthesized image into the feature extraction network and the synthesized image.
In the implementation, in the process of performing harmony processing on the three-dimensional lookup table network, the foreground to be subjected to harmony processing and the background not to be subjected to harmony processing in the synthesized image are determined, the mask map of the foreground is also subjected to feature extraction, the obtained synthesized features include features of the foreground and the background, and the features extracted by the mask map of the foreground are utilized to know which synthesized features are features of the foreground and which synthesized features are features of the background.
In some embodiments, the synthetic features can be obtained by:
step 1-1, carrying out binarization processing on the composite image to obtain a mask image of the foreground;
in implementation, each pixel of the composite image is binarized according to whether the pixel belongs to the foreground or the background, so as to obtain a mask image of the foreground of the composite image.
And step 1-2, performing feature extraction on the synthetic image and the mask image to obtain the synthetic features.
Generally, the overall features of an image do not have much relation with the details of the image, and similarly, if a portrait is harmonised, the details of the face of the portrait, etc., no matter how fine the features are, are extracted similarly, and the final image transformation is similar. Therefore, if a high-definition composite image (composite image with high resolution) is input in this embodiment, in order to increase the operation speed and reduce the operation amount, the high-definition image to be harmonized and the mask high-definition image of the foreground in the high-definition image may be respectively downsampled to a low resolution with a fixed size, so as to obtain a low-resolution image to be harmonized and a corresponding mask image with a low resolution; the size of the image to be processed is reduced, so that the running speed is increased, and the calculation amount is reduced.
In some embodiments, before performing the feature extraction on the composite image to be processed, the method further includes:
if the resolution of the synthetic image is higher than a threshold value, downsampling the synthetic image;
and determining the image subjected to the downsampling processing as the composite image to be processed.
Step 301, determining weights of sub-networks in a trained three-dimensional lookup table network by using the synthesis characteristics, wherein the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthetic graph as input and a training original graph corresponding to the training synthetic graph as output;
in an implementation, the three-dimensional lookup table network includes sub-networks and a prediction network, wherein each sub-network is used for representing each basic 3D lookup table, for example, 4 3D lookup tables running in parallel, the prediction network is used for representing a new 3D lookup table generated according to each 3D lookup table and corresponding weights, and the prediction network is dynamic, the prediction network is determined according to the weights of each sub-network and each sub-network, and different synthetic images have different weights corresponding to each sub-network, and the obtained prediction network is different, so the three-dimensional lookup table network in this embodiment is dynamically changed and is different according to different synthetic images.
In some embodiments, the determination of the weights of the sub-networks according to the synthesized features can be realized by using a very light weight network, wherein the number of output weights is used for representing the number of sub-networks (basic 3D lookup tables), and the network structure can be formed by adding an average pooling layer between two convolutional layers, and the attention-like network structure can greatly reduce the calculation amount and simultaneously maintain the accuracy of feature extraction.
It should be noted that, in the process of performing harmony processing on the synthesized image, it is desirable that the background of the synthesized image is always kept unchanged, and for a synthesized image with a very large foreground and a very small foreground, if weight information is not added to the synthesized image, the synthesized image with a very small foreground obviously has a smaller loss value than the synthesized image with a very large foreground, and the harmony corresponding to the foreground is more difficult to train. In order to overcome the problem, the loss function in this embodiment adopts a weighted mean square error loss, that is, the weight of the picture is inversely proportional to the size of the foreground (which may be understood as a pixel value of the foreground), and the larger the foreground is, the smaller the weight of the synthesized image is.
In implementation, the trained three-dimensional lookup table network is determined by:
step 2-1, determining a loss value according to a training original graph corresponding to the training synthetic graph and a training harmonic graph obtained by harmonizing the training synthetic graph through an initial three-dimensional lookup table network;
step 2-2, determining a loss weight of a loss value corresponding to the training composite map according to a comparison result of a pixel value of a foreground in the training composite map and a threshold value, wherein the loss weight is inversely proportional to the size of the pixel value;
the threshold value may be freely selected, and may be generally 1000.
Step 2-3, determining a loss function according to the loss value and the loss weight corresponding to each training synthetic graph;
and 2-4, if the function value of the loss function is smaller than a preset value, determining the initial three-dimensional lookup table network as the trained three-dimensional lookup table network.
Because the background is obtained by pasting in the embodiment, only the foreground of the synthesized image is considered when calculating the loss, and in order to solve the problem that different weights of different pictures are different due to the difference of the sizes of the foreground and the background, the embodiment adopts the weighted mean square error loss as a loss function.
Step 302, determining a prediction network in the three-dimensional lookup table network according to each sub-network and the weight of each sub-network;
and 303, performing harmony processing on the synthetic image according to the prediction network to obtain a harmony map of the synthetic image.
The core of the embodiment of the invention is to combine each sub-network and the corresponding weight into a new 3D lookup table (prediction network), and directly obtain the result of the harmony by using the corresponding lookup table algorithm.
In some embodiments, the prediction network is obtained by performing weighted summation according to the sub-networks and the weights of the sub-networks.
In some examples, the synthetic image is harmoniously processed according to the prediction network to obtain an initial harmonious map; and generating a harmony map of the synthetic image according to the foreground in the initial harmony map and the background of the synthetic image. In implementation, from the prediction network, a prediction color corresponding to each three primary colors in the composite image is determined; processing each three primary colors in the synthetic image into corresponding predicted colors to obtain the initial harmony maps; and then synthesizing the foreground in the initial harmony map into the background of the synthesized image to generate a final harmony map.
In general, in a computer image, the value of each primary color is an integer between 0 and 255, so that theoretically, a 3D lookup table can have 256 scales for each dimension, and thus, for any input composite image, the result corresponding to each primary color in the image can be directly found. But if there are 256 values for each dimension, the following problems result: firstly, the storage capacity is huge, and if the storage capacity is a plurality of 3D lookup tables, the occupied memory is not suitable for practical use by taking GB as a unit; secondly, if the design is carried out by following the principle that two similar inputs before mapping should be similar after mapping, each point learns the mapping value without adding other constraints, overfitting is easy to occur, and the situation that the output value difference is large between two adjacent points is likely to occur; thirdly, such direct mapping does not facilitate the propagation of gradients, and thus learning is difficult to combine with deep learning.
Therefore, in this embodiment, a direct mapping manner is not adopted, and only set values (less than 256) are taken for each dimension, for example, 33 scales are taken for each dimension. Under the scale, the storage capacity is only in units of MB, and due to the reciprocity between 33 and 255, if the scale is expanded into a grid and the original space is cut into a block cube, the input of three primary colors of each composite image falls within a small cube.
For three primary colors in the same small cube, it is desirable that their found predicted colors are relatively close, so the embodiment obtains the corresponding predicted colors as follows:
and if the three primary colors in the composite image are not contained in the prediction network, determining the prediction colors corresponding to the three primary colors from the prediction network according to a trilinear interpolation method.
In some embodiments, as shown in fig. 4, the three-dimensional space of the prediction network (new 3D lookup table) is cut into a block of cubes, and if the three primary colors in the synthesized image do not belong to the vertices of each cube in the three-dimensional space and belong to the inside of a cube, the output predicted colors corresponding to the three primary colors are calculated by using the colors of 8 vertices of the cube and the input three primary colors of the synthesized image.
Wherein, assuming that the side length of the cube is s, for the three primary colors of the synthesized image, i.e. at the interior point (x) of the cube d ,y d ,z d ) Is provided with
Figure BDA0003252983580000121
The output value c (x, y, z) of the interior point, i.e., the predicted color, is calculated by the following formula:
c(x,y,z)=(1-x d )(1-y d )(1-z d )c(i,j,k)+x d (1-y d )(1-z d )c(i+s,j,k)+(1-x d )y d (1-z d )c(i,j+s,k)+(1-x d )(1-y d )z d c(i,j,k+s)+x d y d (1-z d )c(i+s,j+s,k)+x d (1-y d )z d c(i+s,j,k+s)+(1-x d )y d z d c(i,j+s,k+s)+x d y d z d c(i+s)(j+s)(k+s);
where c () represents the output value, i.e. the predicted color, found from the prediction network for the value in brackets.
The interpolation mode in this embodiment ensures that the outputs of adjacent points belonging to the same block space are similar, and the parallelization calculation of the image can be performed on a Graphics Processing Unit (GPU), thereby greatly improving the operation speed.
It should be noted that the size of each cube in the prediction network can reflect the number of parameters and the complexity of the 3D lookup table, that is, the larger the cube is, the fewer the parameters are, the less the learnable content is, the weaker the expression ability is, and the fitting is easy to be underdetermined; the smaller the cube, the more the parameters, the more the learnable content, the stronger the expression ability, and the easy overfitting. Therefore, the two are comprehensively considered in the scheme, and a centering strategy is adopted, namely, a small number of scales (33) are arranged on each dimension in the prediction network to divide the three-dimensional space of the prediction network.
Because the foreground and the background in the initial harmony image are subjected to harmony processing, in order to ensure that the background of the harmonious image is consistent with the background in the synthetic image, only the foreground in the initial harmony image and the background of the synthetic image are intercepted, and the harmony image of the synthetic image is finally obtained.
In some embodiments, pasting the foreground in the initial harmony map into the background of the composite image generates a harmony map for the composite image.
The embodiment of the invention uses a plurality of basic 3D lookup tables (sub-networks), the plurality of basic 3D lookup tables are weighted and summed through the weight of each basic 3D lookup table determined by the characteristics of the synthetic image to obtain a new 3D lookup table (prediction network) corresponding to the synthetic image, then the input synthetic image is calculated by adopting the tri-linear interpolation algorithm, and finally only the foreground part of the initial harmonious image after the harmonious operation is reserved and pasted into the background image of the synthetic image before the harmonious operation, thus obtaining the final harmonious result. In the embodiment, the weights of the 3D lookup tables can be trained by using the characteristics of the synthetic image in a manner of combining deep learning and the 3D lookup tables, so that the 3D lookup tables learn and the parameters corresponding to the harmonic processing, thereby dynamically designing the 3D lookup tables.
The harmonization method of the embodiment does not consider the spatial information of the image, so when the foreground is harmonized, the same point in the background as the foreground pixel is harmonized, the color of the background is changed, and the harmony method runs counter to the target that the background is expected to be unchanged as far as possible; if learning in a 3D look-up table network is done with as little change as possible to the background, the constraints on the change of the foreground are large. Therefore, in order to solve this difficulty, an operation of pasting the background is added in the embodiment, that is, a harmonious foreground result is taken out and pasted to an un-harmonious background image, so that the 3D lookup table can learn the mapping scheme of the foreground with attention.
The bright point of this embodiment is that each synthesized image has a corresponding set of weights, and the deep learning technique is fused with the conventional technique to realize dynamic design of the 3D lookup table, so that the expression capability is greatly improved while the speed and the spatial advantage of the lookup table are exerted, and therefore, the dynamic method for predicting the weights in this embodiment cannot be replaced, and the network for predicting the weights is a simple and efficient network structure and cannot be replaced by a simple convolutional neural network or a simple self-encoder.
In some embodiments, as shown in fig. 5, an embodiment of the present invention further provides a detailed implementation flow of the image harmonization process, as follows:
500, down-sampling the synthetic image with the resolution higher than a threshold value to obtain a synthetic image to be processed;
step 501, performing binarization processing on the composite image to obtain a mask image of the foreground; and performing feature extraction on the synthesized image and the mask image to obtain synthesized features.
502, determining the weight of each sub-network in the trained three-dimensional lookup table network by using the synthesis characteristics;
the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by taking a training composite graph as input and taking a training original graph corresponding to the training composite graph as output;
determining the trained three-dimensional look-up table network by:
determining a loss value according to the training original image corresponding to the training synthetic image and the training harmonic image obtained by harmonizing the training synthetic image through an initial three-dimensional lookup table network;
determining a loss weight of a loss value corresponding to the training composite map according to a comparison result of a pixel value of a foreground in the training composite map and a threshold value, wherein the loss weight is inversely proportional to the size of the pixel value;
determining a loss function according to the loss value and the loss weight corresponding to each training synthetic graph;
and if the function value of the loss function is smaller than a preset value, determining the initial three-dimensional lookup table network as the trained three-dimensional lookup table network.
Step 503, determining a prediction network in the three-dimensional lookup table network according to each sub-network and the weight of each sub-network;
step 504, determining a predicted color corresponding to each three primary colors in the composite image from the prediction network; and processing each three primary colors in the synthetic image into corresponding predicted colors to obtain an initial harmony map.
In practice, if the three primary colors in the composite image are not included in the prediction network, the prediction colors corresponding to the three primary colors are determined from the prediction network according to a trilinear interpolation method.
And 505, pasting the foreground in the initial harmony map into the background of the synthetic image to generate the harmony map.
Based on the same inventive concept, the embodiment of the present invention further provides an apparatus for image harmonization processing, and since the apparatus is an apparatus in the method in the embodiment of the present invention, and the principle of the apparatus for solving the problem is similar to that of the method, the implementation of the apparatus may refer to the implementation of the method, and repeated details are omitted.
As shown in fig. 6, the apparatus includes a processor 600 and a memory 601, the memory 601 is used for storing programs executable by the processor 600, and the processor 600 is used for reading the programs in the memory 601 and executing the following steps:
performing feature extraction on a synthetic image to be processed to obtain synthetic features;
determining the weight of each sub-network in the trained three-dimensional lookup table network by using the synthesis characteristics, wherein the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthesis graph as input and a training original graph corresponding to the training synthesis graph as output;
determining a prediction network in the three-dimensional lookup table network according to the sub-networks and the weights of the sub-networks;
and carrying out harmony processing on the synthetic image according to the prediction network to obtain a harmony map of the synthetic image.
In some embodiments, the processor 600 is specifically configured to perform:
carrying out binarization processing on the synthetic image to obtain a mask image of the foreground;
and performing feature extraction on the synthesized image and the mask image to obtain the synthesized feature.
In some embodiments, the processor 600 is specifically configured to perform:
carrying out harmony processing on the synthetic image according to the prediction network to obtain an initial harmony map;
and generating a harmony map of the synthetic image according to the foreground in the initial harmony map and the background of the synthetic image.
In some embodiments, the processor 600 is specifically configured to perform:
determining, from the prediction network, predicted colors corresponding to respective three primary colors in the composite image;
and processing each three primary colors in the synthetic image into corresponding predicted colors to obtain the initial harmony map.
In some embodiments, the processor 600 is specifically configured to perform:
and if the three primary colors in the synthetic image are not contained in the prediction network, determining the predicted colors corresponding to the three primary colors from the prediction network according to a trilinear interpolation method.
In some embodiments, the processor 600 is specifically configured to perform:
pasting the foreground in the initial harmony map into the background of the synthetic image to generate the harmony map of the synthetic image.
In some embodiments, before performing the feature extraction on the composite image to be processed, the processor 600 is further specifically configured to perform:
if the resolution of the composite image is higher than a threshold value, downsampling the composite image;
and determining the image subjected to the downsampling processing as the composite image to be processed.
In some embodiments, the processor 600 is specifically configured to determine the trained three-dimensional look-up table network by:
determining a loss value according to the training original image corresponding to the training synthetic image and the training harmonic image obtained by harmonizing the training synthetic image through an initial three-dimensional lookup table network;
determining a loss weight of a loss value corresponding to the training composite map according to a comparison result of a pixel value of a foreground in the training composite map and a threshold value, wherein the loss weight is inversely proportional to the size of the pixel value;
determining a loss function according to the loss value and the loss weight corresponding to each training synthetic graph;
and if the function value of the loss function is smaller than a preset value, determining the initial three-dimensional lookup table network as the trained three-dimensional lookup table network.
Based on the same inventive concept, the embodiment of the present invention further provides an apparatus for image harmonization processing, and since the apparatus is the apparatus in the method in the embodiment of the present invention, and the principle of the apparatus for solving the problem is similar to that of the method, the implementation of the apparatus can refer to the implementation of the method, and repeated details are omitted.
As shown in fig. 7, the apparatus includes:
a feature extraction unit 700, configured to perform feature extraction on a composite image to be processed to obtain a composite feature;
a weight determining unit 701, configured to determine, by using the synthesis feature, a weight of each sub-network in a trained three-dimensional lookup table network, where the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training composite graph as an input and a training original graph corresponding to the training composite graph as an output;
a harmony processing unit 702, configured to determine a prediction network in the three-dimensional lookup table network according to the sub-networks and weights of the sub-networks;
and a synthesis generating unit 703, configured to perform a harmony processing on the synthesized image according to the prediction network to obtain a harmony map of the synthesized image.
Based on the same inventive concept, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the steps of:
performing feature extraction on a synthetic image to be processed to obtain synthetic features;
determining the weight of each sub-network in the trained three-dimensional lookup table network by using the synthesis characteristics, wherein the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthesis graph as input and a training original graph corresponding to the training synthesis graph as output;
determining a prediction network in the three-dimensional lookup table network according to the sub-networks and the weights of the sub-networks;
and carrying out harmony processing on the synthetic image according to the prediction network to obtain a harmony map of the synthetic image.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for harmonically processing images, the method comprising:
performing feature extraction on a synthetic image to be processed to obtain synthetic features;
determining the weight of each sub-network in the trained three-dimensional lookup table network by using the synthesis characteristics, wherein the three-dimensional lookup table network is obtained by training an initial three-dimensional lookup table network by using a training synthesis graph as input and a training original graph corresponding to the training synthesis graph as output;
determining a prediction network in the three-dimensional lookup table network according to the sub-networks and the weights of the sub-networks;
and carrying out harmony processing on the synthetic image according to the prediction network to obtain a harmony map of the synthetic image.
2. The method according to claim 1, wherein the performing feature extraction on the composite image to be processed to obtain composite features comprises:
carrying out binarization processing on the synthetic image to obtain a mask image of the foreground;
and performing feature extraction on the synthesized image and the mask image to obtain the synthesized feature.
3. The method according to claim 1, wherein the harmonizing the synthesized image according to the prediction network to obtain a harmonizing graph of the synthesized image comprises:
carrying out harmony processing on the synthetic image according to the prediction network to obtain an initial harmony map;
and generating a harmony map of the synthetic image according to the foreground in the initial harmony map and the background of the synthetic image.
4. The method of claim 3, wherein the harmonizing the synthesized image according to the prediction network to obtain an initial harmony map comprises:
determining, from the prediction network, a predicted color corresponding to each of the three primary colors in the composite image;
and processing each three primary colors in the synthetic image into corresponding predicted colors to obtain the initial harmony map.
5. The method of claim 4, wherein said determining, from the predictive network, a predicted color corresponding to each of the three primary colors in the composite image comprises:
and if the three primary colors in the composite image are not contained in the prediction network, determining the prediction colors corresponding to the three primary colors from the prediction network according to a trilinear interpolation method.
6. The method of claim 2, wherein generating the harmony map for the composite image from the foreground in the initial harmony map and the background of the composite image comprises:
pasting the foreground in the initial harmony map into the background of the synthetic image to generate the harmony map of the synthetic image.
7. The method according to any one of claims 1 to 6, wherein before the extracting the features of the composite image to be processed, the method further comprises:
if the resolution of the synthetic image is higher than a threshold value, downsampling the synthetic image;
and determining the image subjected to the downsampling processing as the composite image to be processed.
8. The method of any of claims 1 to 6, wherein the trained three-dimensional look-up table network is determined by:
determining a loss value according to the original training graph corresponding to the training composite graph and a training harmony graph obtained by carrying out harmony processing on the training composite graph through an initial three-dimensional lookup table network;
determining a loss weight of a loss value corresponding to the training composite map according to a comparison result of a pixel value of a foreground in the training composite map and a threshold value, wherein the loss weight is inversely proportional to the size of the pixel value;
determining a loss function according to the loss value and the loss weight corresponding to each training synthetic graph;
and if the function value of the loss function is smaller than a preset value, determining the initial three-dimensional lookup table network as the trained three-dimensional lookup table network.
9. An apparatus for harmonisation of images, comprising a processor and a memory, said memory being adapted to store a program executable by said processor, said processor being adapted to read the program in said memory and to perform the steps of the method according to any of claims 1 to 8.
10. A computer storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 8.
CN202111051117.9A 2021-09-08 2021-09-08 Image harmonious processing method and device Pending CN115797236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111051117.9A CN115797236A (en) 2021-09-08 2021-09-08 Image harmonious processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111051117.9A CN115797236A (en) 2021-09-08 2021-09-08 Image harmonious processing method and device

Publications (1)

Publication Number Publication Date
CN115797236A true CN115797236A (en) 2023-03-14

Family

ID=85416812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111051117.9A Pending CN115797236A (en) 2021-09-08 2021-09-08 Image harmonious processing method and device

Country Status (1)

Country Link
CN (1) CN115797236A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390600A (en) * 2023-12-08 2024-01-12 中国信息通信研究院 Detection method for depth synthesis information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390600A (en) * 2023-12-08 2024-01-12 中国信息通信研究院 Detection method for depth synthesis information
CN117390600B (en) * 2023-12-08 2024-02-13 中国信息通信研究院 Detection method for depth synthesis information

Similar Documents

Publication Publication Date Title
US11763168B2 (en) Progressive modification of generative adversarial neural networks
US11631239B2 (en) Iterative spatio-temporal action detection in video
US10922793B2 (en) Guided hallucination for missing image content using a neural network
CN110868580B (en) Motion adaptive rendering using variable rate shading
US10970816B2 (en) Motion blur and depth of field reconstruction through temporally stable neural networks
US20190147296A1 (en) Creating an image utilizing a map representing different classes of pixels
US20200151288A1 (en) Deep Learning Testability Analysis with Graph Convolutional Networks
US11836597B2 (en) Detecting visual artifacts in image sequences using a neural network model
US10964000B2 (en) Techniques for reducing noise in video
US11790609B2 (en) Reducing level of detail of a polygon mesh to decrease a complexity of rendered geometry within a scene
US11496773B2 (en) Using residual video data resulting from a compression of original video data to improve a decompression of the original video data
US10614613B2 (en) Reducing noise during rendering by performing parallel path space filtering utilizing hashing
US20210287096A1 (en) Microtraining for iterative few-shot refinement of a neural network
EP3678037A1 (en) Neural network generator
US20200242739A1 (en) Convolutional blind-spot architectures and bayesian image restoration
DE102021121109A1 (en) RECOVERY OF THREE-DIMENSIONAL MODELS FROM TWO-DIMENSIONAL IMAGES
US20210256759A1 (en) Performance of ray-traced shadow creation within a scene
DE102022113244A1 (en) Joint shape and appearance optimization through topology scanning
CN115379185B (en) Motion adaptive rendering using variable rate coloring
CN115797236A (en) Image harmonious processing method and device
US11069095B1 (en) Techniques for efficiently sampling an image
CN116051593A (en) Clothing image extraction method and device, equipment, medium and product thereof
CN117953092A (en) Creating images using mappings representing different types of pixels
US20240153202A1 (en) Temporal denoiser quality in dynamic scenes
Li et al. Real-Time Volume Rendering with Octree-Based Implicit Surface Representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination