CN113269792B

CN113269792B - Image later-stage harmony processing method, system and terminal

Info

Publication number: CN113269792B
Application number: CN202110494193.0A
Authority: CN
Inventors: 宋利; 凌军; 解蓉
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2023-07-21
Anticipated expiration: 2041-05-07
Also published as: CN113269792A

Abstract

The invention discloses a method, a system and a terminal for image later-stage harmony processing, wherein the method comprises the following steps: indicating a region to be processed by using the foreground mask segmentation map to obtain a foreground mask region map, and scaling the foreground mask region map to be the same as the characteristic size of each normalization layer; the regional self-adaptive example normalization layer extracts statistical features from a background region of the input image feature map according to the scaled foreground mask region map, and applies the statistical features to features of the foreground region through inverse modulation, so that the foreground and the background have similar statistical features in feature parts; the region self-adaptive instance normalization layer is applied to the image harmonious mapping network, the foreground image of the input image is input to the image harmonious mapping network, and the image with the foreground image adjusted is output. The invention ensures that the foreground and the background have more continuous visual styles, can be applied to the existing image processing software, and can be better improved.

Description

Image later-stage harmony processing method, system and terminal

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, a system, and a terminal for image late-stage harmony processing.

Background

Mapping is also called image combination, is a common operation in image editing and data enhancement, and the foreground and background image combination can be expressed by the following mathematical formula:

wherein I is _f For foreground image, M is mask of foreground image, I _b Mask for background image, I _c Is a map image, and is the Hadamard product. Typically, one foreground object I _f Sticking to the background, some discordance phenomena can be seen by the naked eye, such as in color, illumination, edges, etc.

In this image editing task, to generate a real map image, a professional designer is usually required to carefully observe the differences between the images with the aid of an image editing tool (such as Photoshop), and adjust the color and illumination characteristics of the attached foreground, so that the fusion of the foreground and the background is more pertinent. The inventive process requires a person to have a certain skill and requires much time to process a single picture. In order to reduce the manual burden, an image harmony task is proposed which aims to automatically adjust the front background to fuse the images.

In practical applications, when a new foreground object image is pasted to a new background image, the visual features of the foreground-background become incompatible and discordant, so that the user can easily judge the authenticity thereof.

There are a large number of image harmony methods in the early stage to improve the realism of the map image. Conventional methods include color features, texture features, or methods employing poisson fusion by migrating statistical knowledge of the manual design from an existing image to an adjusted foreground image. Although effective, these methods can only be effective on simple examples, which are often somewhat harmonious in their foregrounds. In recent years, more and more deep learning methods have been proposed to achieve end-to-end image harmony. The above algorithms all learn a neural mapping network from a global perspective, and do not really take into account the realism from a visual style consistency perspective. And these methods typically do not consider the relationship between the foreground image and the background image in combination or convolve on the foreground and background, respectively. Or other modules are introduced to expand the network capacity, but it is difficult to well make the visual style of the foreground compatible with the background.

Therefore, in order to make a posted image look more realistic, it is necessary to ensure a more continuous visual style between the foreground and the background, and it is highly desirable to provide a new image post-harmonic processing technique.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method, a system and a terminal for image later-stage harmony processing, which can ensure that a more continuous visual style exists between a foreground and a background, can be applied to the existing image processing software or harmony algorithm, and can achieve better performance improvement.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention provides an image later-stage harmony processing method, which comprises the following steps:

s11: indicating a region to be processed by using the foreground mask segmentation map to obtain a foreground mask region map, and scaling the foreground mask region map to be the same as the feature size of each normalization layer;

s12: the region self-adaptive example normalization layer extracts statistical features from a background region of the input image feature map according to the scaled foreground mask region map, and applies the statistical features to features of the foreground region through inverse modulation, so that inverse modulation of the foreground features by the background features is realized, and the foreground and the background have similar statistical features in feature parts;

s13: and (3) applying the area self-adaptive instance normalization layer obtained in the step (S12) to an image harmony mapping network, inputting a foreground image of an input image into the image harmony mapping network, outputting an image with the foreground image adjusted, and enabling the foreground image and a background image to have compatibility in visual style and feeling, so that a new foreground and background compatible and harmonious image is obtained, namely the image looks truly without violating and feeling.

Preferably, the step S12 further includes:

s121: under the instruction of the scaled foreground mask region map, respectively carrying out feature normalization on a foreground region and a background region of the input image feature map;

s122: extracting statistical features from the normalized background area in a channel mode, wherein the statistical features comprise: the statistical features of the average value and the standard deviation along the channel are not influenced by the foreground features;

s123: multiplying the standard deviation statistical feature obtained in the step S122 by the feature of the normalized foreground region, and adding the average statistical feature obtained in the step S122.

Preferably, the normalization in S121 is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

wherein F is ⁱ For the input features of the ith normalized layer, M ⁱ For the scaled foreground mask region map of the i-th layer, the subscripts h, w, c denote the eigenvalues at that position, e denotes a small amount, set to 10 ^-7 ，#{M ⁱ =1 } represents the number of pixels with non-zero pixel values in the i-th layer foreground mask, and ° is the hadamard product;

average statistics feature in S122Standard deviation statistics ∈ ->The formula of (2) is:

wherein F is ⁱ For the input features of the ith normalization layer,is a scaled background mask region map for the ith layer and is represented by the formula +.>Calculated, and the subscripts h, w, c denote the eigenvalues at that location,e represents a small amount, set to 10 ^-7 ，/>Representing the number of pixels with non-zero pixel values in the background mask of the layer, wherein the degree is the Hadamard product;

the inverse modulation formula of S123 is:

preferably, the step S13 further includes:

s131: using encoder network to encode and extract deep features of input image, using the local self-adaptive instance normalization layer to extract statistical features from background features of encoder features, and applying to foreground features;

s132: decoding and mapping deep features of the encoder and the decoder by using a decoder network, wherein the decoder can extract statistical features from background features of the decoder features by using the local adaptive instance normalization layer and apply the statistical features to foreground features;

s133: cascading the result obtained in the step S131 with the result obtained in the step S132 through crossing connection is beneficial to reducing information loss caused by stride convolution and keeping high-resolution characteristics of images;

s134: and decoding the features processed by the S133 into images.

Preferably, the step S133 further includes:

s51: a channel attention weight is learned from the features obtained in the step S133, and the channel attention weight is multiplied by the features obtained in the step S133, so that channel rebalancing is realized in the encoder features and the decoder features, and the model performance is further improved;

further, the step S134 is: the features processed in S51 are decoded into images. .

The invention also provides an image later-stage harmony processing system, which comprises: a foreground image segmentation mask map, a region self-adaptive instance normalization layer and an image harmony mapping network; wherein, the liquid crystal display device comprises a liquid crystal display device,

the foreground image segmentation mask map is used for indicating a region to be processed by utilizing the foreground mask segmentation map to obtain a foreground mask region map, and the foreground mask region map is scaled to be the same as the feature size of each normalization layer;

the region self-adaptive instance normalization layer is used for extracting statistical features from a background region of the input image feature map according to the scaled foreground mask region map, and applying the statistical features to features of the foreground region through reactive modulation, so that the foreground and the background have similar statistical features in feature parts;

the image harmonious mapping network is used for applying the area self-adaptive instance normalization layer to the image harmonious mapping network, inputting a foreground image of an input image into the image harmonious mapping network and outputting an image with the foreground image adjusted.

Preferably, the region adaptive instance normalization layer further includes: the device comprises a foreground background characteristic normalization module, a background characteristic extraction module and a foreground characteristic inverse modulation module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the foreground and background feature normalization module is used for respectively carrying out feature normalization on a foreground region and a background region of the input image feature map under the instruction of the scaled foreground mask region map;

the background feature extraction module is used for extracting statistical features from the normalized background region in a channel mode, and comprises the following steps: average statistics and standard deviation statistics along the channel;

the foreground feature inverse modulation module is used for multiplying the standard deviation statistical feature obtained by the background feature extraction module by the feature of the normalized foreground region and adding the average statistical feature obtained by the background feature extraction module.

Preferably, the normalization formula in the foreground and background feature normalization module is as follows:

average value statistical characteristics in the background characteristic extraction moduleStandard deviation statistics ∈ ->The formula of (2) is:

wherein F is ⁱ For the input features of the ith normalization layer,is a scaled background mask region map for the ith layer and is represented by the formula +.>Calculated, and the subscripts h, w, c represent the eigenvalues at that location, e represents a small amountIs set to 10 ^-7 ，/>Representing the number of pixels with non-zero pixel values in the background mask of the layer, wherein the degree is the Hadamard product;

the reverse modulation formula in the foreground characteristic reverse modulation module is as follows:

preferably, the image harmony mapping network further comprises: an encoder, a decoder, a cross-connect, and a network output layer; wherein, the liquid crystal display device comprises a liquid crystal display device,

the encoder is used for encoding and extracting deep features of an input image by utilizing an encoder network, and the regional self-adaptive instance normalization layer can be utilized in the encoder to extract statistical features from background features of the encoder features and apply the statistical features to foreground features;

the decoder is used for decoding and mapping deep features of the encoder and the decoder by utilizing a decoder network, and the regional adaptive instance normalization layer can be utilized in the decoder to extract statistical features from background features of the decoder features and apply the statistical features to foreground features;

the crossing connection is used for cascading the characteristics of the encoder network with the characteristics of the decoder network through crossing connection operation, so that the information loss caused by the crossing convolution is reduced, and the high-resolution characteristics of the image are maintained;

the network output layer is used for decoding the features subjected to the crossing connection processing into images.

Preferably, the image harmony mapping network further comprises: a channel attention mechanism, when the channel attention mechanism is included, the network output layer is used for decoding the features processed by the channel attention mechanism into images;

the channel attention mechanism is used for learning a channel attention weight from the characteristics of the encoder and the decoder after the cascade connection, and multiplying the channel attention weight by the characteristics of the cascade connection, so that the rebalancing of the channels is realized in the encoder and the decoder, and the model performance is further improved.

The invention also provides an image later-stage harmony processing terminal, which comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor is adapted to perform the method described above when executing the program.

Compared with the prior art, the embodiment of the invention has at least one of the following advantages:

(1) The image later-stage harmony processing method, the system and the terminal provided by the invention can be used for processing the map-attached image, and the foreground mask map is used for indicating which part of the network is the foreground image to be adjusted, so that the network model can better learn and predict the foreground image more harmony with the background;

(2) According to the image later-period harmony processing method, the image later-period harmony processing system and the terminal, the feature vector associated with the visual feature of the background image is explicitly extracted from the background feature through the area self-adaptive instance normalization layer, and the feature vector is further applied to the foreground in a reverse modulation mode, so that the foreground and the background have consistency on the feature level of the network; the method for explicitly constructing the relation between the foreground and background images improves the performance of the model;

(3) The image later-stage harmony processing method, the system and the terminal provided by the invention have the advantages that the area self-adaptive instance normalization layer can be applied to the existing image processing software or harmony algorithm, and better performance improvement can be achieved;

(4) According to the image later-stage harmony processing method, system and terminal provided by the invention, only the foreground region is edited and adjusted, and the final output new foreground part is directly recombined with the original background image through a formula, so that the background information of the input combined image is ensured not to be changed.

Drawings

Embodiments of the present invention are further described below with reference to the accompanying drawings:

FIG. 1 is a flow chart of a method for post-harmonic processing of images according to an embodiment of the invention;

FIG. 2 is a flowchart of the normalization of the region adaptive example according to a preferred embodiment of the present invention;

FIG. 3 is a schematic structural diagram of normalization of an area adaptive example according to an embodiment of the present invention;

FIG. 4 is an example of an image harmony map network architecture according to an embodiment of the present invention;

fig. 5 is a diagram showing an image harmony effect according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

Fig. 1 is a flowchart of an image post-harmony processing method according to an embodiment of the present invention.

Referring to fig. 1, the image post-harmonic processing method of the present embodiment includes:

s11: indicating a region to be processed by using the foreground mask segmentation map to obtain a foreground mask region map, and scaling the foreground mask region map to be the same as the characteristic size of each normalization layer;

s13: and (3) applying the area self-adaptive instance normalization layer obtained in the step (S12) to an image harmony mapping network, inputting a foreground image of an input image into the image harmony mapping network, outputting an image with the foreground image adjusted, and enabling the foreground image and a background image to have compatibility in visual style and feeling, so that a new image with compatible harmony between the foreground and the background is obtained, namely the image is really and non-offensive.

In a preferred embodiment, as shown in fig. 2, S12 further includes:

s121: in order to stabilize network training, under the instruction of a zoomed foreground mask region graph, respectively carrying out feature normalization on a foreground region and a background region of an input image feature graph so as to enable the foreground feature and the background feature to have zero mean unit variance;

s122: extracting statistical features from the normalized background region in a channel manner, including: the statistical features of the average value and the standard deviation along the channel are not influenced by the foreground features;

s123: multiplying the standard deviation statistical feature obtained in the step S122 by the feature of the normalized foreground region, and adding the average statistical feature obtained in the step S122 to ensure that the channel mean and variance of the foreground feature are consistent with the channel mean and variance of the background feature, thereby realizing the back modulation of the foreground feature by the background feature.

In a preferred embodiment, as shown in fig. 3, the normalization in S121 is as follows:

wherein F is ⁱ For the input features of the ith normalization layer,is a scaled background mask region map for the ith layer and is represented by the formula +.>Calculated, and the subscripts h, w, c represent the eigenvalues at that location, e represents a small amount, set to 10 ^-7 ，/>Representing the number of pixels with non-zero pixel values in the background mask of the layer, wherein the degree is the Hadamard product;

the inverse modulation formula of S123 is:

in a preferred embodiment, S13 further comprises:

s132: decoding and mapping deep features of the encoder and the decoder by using a decoder network, wherein statistical features can be extracted from background features of the decoder features by using a region adaptive instance normalization layer and applied to foreground features;

s133: the encoder features and the decoder features are cascaded through the crossing connection, so that information loss caused by the crossing convolution is reduced, the high-resolution features of the image are maintained, and the quality of a final synthesized image is improved;

s134: the features after the processing of S133 are decoded into images.

In a preferred embodiment, between S133 and S134, further comprises:

s51: a channel attention weight is learned from the features obtained in the step S133, and the features obtained in the step S133 are multiplied by the channel attention weight, so that channel rebalancing is realized in the encoder features and the decoder features, and the model performance is further improved; further, S134 at this time decodes the feature processed in S51 into an image.

In a preferred embodiment, the image and harmonious mapping network structure is shown in fig. 4, where the image mapping network does not have to have any network layer number as shown in the figure, and may have any layer number. The foreground-background combination module in the figure is used for recombining the optimized foreground image obtained by the image harmonisation mapping network and the background part of the input image, so as to ensure that the background image is not changed in the image harmonisation network.

In a preferred embodiment, the automatic intelligent processing map image harmony method provided by the invention fuses: foreground image segmentation mask map, region self-adaptive instance normalization layer, image harmony mapping network; the foreground region to be adjusted is indicated through the foreground style mask map, and the worth model can distinguish background features and foreground features.

In the preferred embodiment, the model performance is significantly improved by replacing the general normalization module with a region-adaptive normalization module, and explicitly extracting features from the background image by the region-adaptive instance normalization module and applying the features to the foreground image.

In an embodiment of the present invention, there is also provided an image post-harmony processing system including: a foreground image segmentation mask map, a region self-adaptive instance normalization layer and an image harmony mapping network; wherein, the liquid crystal display device comprises a liquid crystal display device,

the region self-adaptive example normalization layer is used for extracting statistical features from a background region of the input image feature map according to the scaled foreground mask region map, and applying the statistical features to features of the foreground region through inverse modulation, so that the foreground and the background have similar statistical features in feature parts;

the image harmony mapping network is used for applying the area self-adaptive instance normalization layer to the image harmony mapping network, inputting a foreground image of an input image into the image harmony mapping network, and outputting an image with the foreground image adjusted.

In a preferred embodiment, the region adaptive instance normalization layer further comprises: the device comprises a foreground background characteristic normalization module, a background characteristic extraction module and a foreground characteristic inverse modulation module; wherein, the liquid crystal display device comprises a liquid crystal display device,

In a preferred embodiment, the normalization formula in the foreground and background feature normalization module is:

average statistical features in background feature extraction moduleStandard deviation statistics ∈ ->The formula of (2) is:

the inverse modulation formula in the foreground feature inverse modulation module is:

in a preferred embodiment, the image harmony mapping network further comprises: an encoder, a decoder, a cross-connect, and a network output layer; wherein, the liquid crystal display device comprises a liquid crystal display device,

the encoder is used for encoding and extracting deep features of the input image by utilizing the encoder network, and the statistical features can be extracted from background features of the encoder features by utilizing the region self-adaptive instance normalization layer in the encoder and applied to foreground features;

the decoder is used for decoding and mapping deep features of the encoder and the decoder by utilizing a decoder network, wherein the statistical features can be extracted from background features of the decoder features by utilizing a region self-adaptive instance normalization layer and applied to foreground features;

the cross-connect is used to concatenate features of the encoder network with features of the decoder network through a cross-connect operation;

the network output layer is used for decoding the features subjected to the cross connection processing into images.

In a preferred embodiment, the image harmony mapping network further comprises: channel attention mechanism, when included, the network output layer is used to decode the features processed by the channel attention mechanism into images. The channel attention mechanism is used to learn a channel attention weight from the characteristics of the encoder and decoder after cascading, and then multiply the channel attention weight by the characteristics of the cascade.

In another embodiment, the present invention further provides an image post-harmony processing terminal, including: the image post-harmony processing system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor is used for executing the image post-harmony processing method of any embodiment.

In the above embodiment, the memory is used for storing the program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

The results of the image post-harmony processing method of the above embodiment were evaluated, and the igrimon 4 dataset was selected as training and testing data using a data set including DIH, S ² AM, dovenet and the like are taken as the most advanced methods at present, and the methods of the embodiment of the invention are used for quantification and characterizationAnd (5) comparing.

With respect to quantitative assessment, objective indicators used include Mean Square Error (MSE), peak signal-to-noise ratio, foreground estimation error, and the like. First training on the training set and then testing on the test set, consistent with the previous method. All quantitative test results are shown in table 1, with bolded numbers indicating optimal performance. It can be seen intuitively that the method benefits from the normalization of the region adaptive instance and the design of the method of the embodiment, which obtains the ratio DIH and S ² The AM, dovenet, etc. methods perform better, although Dovenet performed slightly better than Rainnet on dataset Hday2 right, overall the performance on HCOCO, HAdobe5k, and HFlickr was not as good as the method of the present invention.

Table 1 quantitative performance comparisons were made on four sub-data sets of harmoni4

Further, in the evaluation selection section, 11 volunteers were invited to participate in subjective experiments. In each experiment, each volunteer was required to obtain from 5 given images in a scrambled order (input combined image, DIH results, S, respectively ² AM results, doveNet results, results of the method of the present example) select the image they consider most realistic. The experiment adopts random cross test, each user needs to carry out 99 times of test, and 495 images are required to be observed in total. The experimental results are shown in table 2. From the results in the table, DIH method and S ² The probability of the AM method being selected is almost the same, and the DoveNet is slightly better than the probability of the AM method being selected; while the input images are least harmonious and the least number of votes is obtained, overall, the method (RainNet) of the embodiment of the invention obtains more selectivity and higher proportion than other methods, which indicates that the method of the embodiment has better and more stable overall performance than other methods on the basis of the real combined image harmonious task.

Table 2 subjective experimental comparison of the method of this example with other methods

To qualitatively compare the quality of the generation, experiments further qualitatively compare the performance of different methods on image harmony. As can be seen from fig. 5, the proposed method produces a result with better visual style consistency between the foreground image and the background image, the foreground can be better blended into the background, and is more visually similar to the real harmony image. For example, in the second line of fig. 5, the image is in an underexposed state, however, the foreground image (balloon) has significantly higher brightness and contrast than the background, so the foreground and background appear not to be harmonious enough. The DIH and Dovent methods do not have good results in adjusting the foreground to be compatible with a dark background image, however S ² The AM produces the least realistic result because it performs different convolution operations on the foreground and background, respectively. The RainNet provided by the invention can realize the result closest to the real image, and has good consistency in the visual style characteristics of the foreground and the background.

In an embodiment, the area adaptive instance normalization method proposed by the method of the embodiment is applied to the previous DIH on the network, so that the network performance is improved, as shown in table 3, and the benefit of the method of the invention is further proved;

table three Performance improvement and comparison of the local adaptive instance normalization method in the method of the present embodiment on DIH networks

Method	PSNR(dB)
		DIH	33.36
DIH+ region self-adaptive instance normalization method	33.84(+0.48)

The embodiments disclosed herein were chosen and described in detail in order to best explain the principles of the invention and the practical application, and to thereby not limit the invention. Any modifications or variations within the scope of the description that would be apparent to a person skilled in the art are intended to be included within the scope of the invention.

Claims

1. A method for post-harmonic post-processing of an image, comprising:

s12: the regional self-adaptive example normalization layer extracts statistical features from a background region of the same input image feature map according to the scaled foreground mask region map, and applies the statistical features to features of the foreground region through inverse modulation, so that the foreground and the background of the same image have similar statistical features in feature parts;

s13: applying the area self-adaptive instance normalization layer obtained in the step S12 to an image harmony mapping network, inputting a foreground image of an input image into the image harmony mapping network, and outputting an image with the foreground image adjusted;

the S12 further includes:

s121: under the instruction of the scaled foreground mask region map, respectively carrying out feature normalization on a foreground region and a background region of the same input image feature map;

s122: extracting statistical features from the normalized background area in a channel mode, wherein the statistical features comprise: average statistics and standard deviation statistics along the channel;

s123: multiplying the standard deviation statistical features obtained in the step S122 by the features of the normalized foreground region, and adding the average statistical features obtained in the step S122;

in S121, the normalization mode is:

in S122, average statisticsStandard deviation statistics ∈ ->The formula of (2) is:

the S123, inverse modulation formula is:

the image harmony mapping network comprises: encoder, decoder, S13 further includes:

s131: using encoder network to encode and extract deep features of input image, using the local self-adaptive instance normalization layer to extract statistical features from background features of encoder features in encoder, and applying to foreground features;

s132: decoding and mapping deep features of the encoder and the decoder by using a decoder network, extracting statistical features from background features of the decoder features by using the local adaptive instance normalization layer in the decoder, and applying the statistical features to foreground features;

s133: cascading the result obtained in the step S131 with the result obtained in the step S132 through crossing connection so as to reduce information loss caused by stride convolution and maintain high-resolution characteristics of images;

s134: decoding the features processed by the S133 into images;

the step S133 further includes:

s51: learning a channel attention weight from the features obtained in the step S133, and multiplying the channel attention weight by the features obtained in the step S133;

further, the step S134 is: the features processed in S51 are decoded into images.

2. An image post-harmony processing system, comprising: a foreground image segmentation mask map, a region self-adaptive instance normalization layer and an image harmony mapping network; wherein, the liquid crystal display device comprises a liquid crystal display device,

the region self-adaptive instance normalization layer is used for extracting statistical features from a background region of the same input image feature map according to the scaled foreground mask region map, and applying the statistical features to features of the foreground region through reactive modulation, so that the foreground and the background of the same image have similar statistical features in feature parts;

the image harmony mapping network applies the area self-adaptive instance normalization layer, takes a foreground image of an input image as input, and outputs an image after the foreground image is adjusted;

the region adaptive instance normalization layer further comprises: the device comprises a foreground background characteristic normalization module, a background characteristic extraction module and a foreground characteristic inverse modulation module; wherein, the liquid crystal display device comprises a liquid crystal display device,

the foreground and background feature normalization module is used for respectively carrying out feature normalization on a foreground region and a background region of the same input image feature map under the instruction of the scaled foreground mask region map;

the foreground feature inverse modulation module is used for multiplying the standard deviation statistical features obtained by the background feature extraction module by the features of the normalized foreground region and adding the average statistical features obtained by the background feature extraction module;

the normalization formula in the foreground and background characteristic normalization module is as follows:

the image harmony mapping network further comprises: an encoder, a decoder, a cross-connect, and a network output layer; wherein, the liquid crystal display device comprises a liquid crystal display device,

the encoder is used for encoding and extracting deep features of an input image by utilizing an encoder network, and statistical features are extracted from background features of the encoder features by utilizing the region self-adaptive instance normalization layer in the encoder and are applied to foreground features;

the decoder is used for decoding and mapping deep features of the encoder and the decoder by utilizing a decoder network, and statistical features are extracted from background features of the decoder features by utilizing the area self-adaptive instance normalization layer in the decoder and are applied to foreground features;

the cross-connect is for cascading features of the encoder network with features of the decoder network through a cross-connect operation;

the network output layer is used for decoding the features subjected to the crossing connection processing into images;

further, the method further comprises the following steps: a channel attention mechanism, when the channel attention mechanism is included, the network output layer is used for decoding the features processed by the channel attention mechanism into images;

the channel attention mechanism is used for learning a channel attention weight from the characteristics of the encoder and the decoder after cascading, and multiplying the channel attention weight by the characteristics of the cascade of the encoder and the decoder.

3. An image post-harmony processing terminal, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor is adapted to perform the method of claim 1 when the program is executed by the processor.