CN116645302A

CN116645302A - Image enhancement method, device, intelligent terminal and computer readable storage medium

Info

Publication number: CN116645302A
Application number: CN202210137405.4A
Authority: CN
Inventors: 李逸群; 俞大海; 凌健
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2023-08-25

Abstract

The invention discloses an image enhancement method, an image enhancement device, an intelligent terminal and a computer readable storage medium, wherein the method comprises the following steps: determining a bilateral network transformation coefficient matrix and a guide graph corresponding to the acquired image to be processed; determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the bilateral network transformation coefficient matrix and the instruction graph; and carrying out image processing on the image to be processed according to the pixel-level transformation coefficient matrix to obtain a target enhanced image. The invention can solve the problem that transition is not natural at the boundary of the portrait area and the background area.

Description

Image enhancement method, device, intelligent terminal and computer readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image enhancement method, an image enhancement device, an intelligent terminal, and a computer readable storage medium.

Background

Image enhancement algorithms are widely used in various intelligent terminals because they can provide a better shooting or display effect for a user's picture shooting or display. Portrait enhancement is one of the most interesting enhancement fields of various intelligent terminal manufacturers as the most common shooting scene in the picture. The existing image enhancement algorithm firstly performs image segmentation on an image, and performs image processing on a human image area and a background area respectively, so that on one hand, the processing time of the image enhancement algorithm can be increased, and on the other hand, the problem that transition is unnatural easily occurs at the boundary between the human image area and the background area is solved.

Accordingly, there is a need for improvement and development in the art.

Disclosure of Invention

The invention provides an image enhancement method, an image enhancement device, an intelligent terminal and a computer readable storage medium, and aims to solve the problem that transition is unnatural at the boundary of the existing portrait area and the background area.

The technical scheme adopted by the invention for solving the problems is as follows:

in a first aspect, an embodiment of the present invention provides an image enhancement method, including:

determining a bilateral network transformation coefficient matrix and a guide graph corresponding to the acquired image to be processed;

determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the bilateral network transformation coefficient matrix and the instruction graph;

and carrying out image processing on the image to be processed according to the pixel-level transformation coefficient matrix to obtain a target enhanced image.

In a second aspect, an embodiment of the present invention further provides an image enhancement apparatus, including:

the first determining module is used for determining a bilateral network transformation coefficient matrix and a guide graph corresponding to the acquired image to be processed;

the second determining module is used for determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the bilateral network transformation coefficient matrix and the instruction diagram;

And the image enhancement module is used for carrying out image processing on the image to be processed according to the pixel-level transformation coefficient matrix to obtain a target enhanced image.

In a third aspect, an embodiment of the present invention provides an intelligent terminal, where the intelligent terminal includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and the steps in the image enhancement method are implemented by the processor when the processor executes the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above-described image enhancement method.

The invention has the beneficial effects that: the pixel-level transformation coefficient matrix is determined according to the bilateral network transformation coefficient matrix and the guidance image, and the image to be processed is enhanced according to the pixel-level transformation coefficient matrix, so that the enhancement of self-adaptive differential color, contrast and dynamic range of a human image area and a background area in the image can be realized, human image segmentation is not needed, the algorithm processing time is short, and the problem of unnatural transition at the boundary of the human image area and the background area can be avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

Fig. 1 is a schematic flow chart of an image enhancement method according to an embodiment of the present invention;

FIG. 2 is a flowchart of an embodiment of an image enhancement method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a self-attention module according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a depth separable convolution unit provided by an embodiment of the present disclosure;

fig. 5 is a schematic block diagram of an image enhancement apparatus provided by an embodiment of the present invention;

fig. 6 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

Conventional image enhancement algorithms include color enhancement, contrast enhancement and dynamic range enhancement, and conventional color enhancement mainly performs overall conversion on full-image colors by presetting LUT (look up table) (typically UV 2D-LUT in YUV domain or RGB 3D-LUT in RGB domain) for color conversion in ISP; the traditional dynamic range enhancement is realized by adjusting Gamma curves in ISP; whereas conventional contrast enhancement is achieved by global/local tone mapping algorithm, etc. As can be seen from the above image enhancement algorithm, the conventional image enhancement algorithm cannot realize adaptive enhancement according to scene information of the picture.

Portrait enhancement is one of the most interesting enhancement fields for large intelligent terminal manufacturers as the most common photographic scene in an image. Because the traditional image enhancement algorithm cannot effectively extract semantic information and can not locally enhance the image, the traditional image enhancement algorithm containing the portrait generally carries out portrait segmentation on the image, and then adopts the traditional image enhancement algorithm to carry out image enhancement on a portrait area and a background area respectively. On one hand, the traditional image enhancement algorithm containing the portrait can increase the processing time of the algorithm, and is not beneficial to the landing of terminal products; the problem that transition is unnatural easily occurs at the boundary of the portrait region and other regions due to the influence of portrait segmentation accuracy is further facilitated.

In order to solve the problems in the prior art, the embodiment provides an image enhancement method, which determines a pixel level transformation coefficient matrix according to a bilateral network transformation coefficient matrix and a guidance image, and performs image enhancement on an image to be processed according to the pixel level transformation coefficient matrix, so that the enhancement of self-adaptive differential color, contrast and dynamic range of a portrait region and a background region in the image can be realized, portrait segmentation is not required, the algorithm processing time is short, and the problem of unnatural transition at the boundary of the portrait region and the background region can be avoided.

Exemplary method

The embodiment of the invention provides an image enhancement method which can be applied to an intelligent terminal. As shown in fig. 1 in particular, the method comprises:

step S100, determining a bilateral network transformation coefficient matrix and a guide graph corresponding to the acquired image to be processed.

Specifically, the image to be processed may be acquired by a camera of an electronic device (for example, a smart phone), or may be acquired by a camera of another electronic device through a network, bluetooth, infrared, or other means. The instruction graph is an image with the same size as the image to be processed, the bilateral network transformation coefficient matrix is a color transformation coefficient matrix which is generated according to the image to be processed, has lower resolution than the image to be processed and is used for enhancing the image to be processed. In order to enhance the image to be processed, after the image to be processed is obtained, the embodiment determines the bilateral network transformation coefficient matrix corresponding to the image to be processed according to the image to be processed, and since the resolution of the bilateral network transformation coefficient matrix is lower than that of the image to be processed, in order to enhance the image to be processed by using the bilateral network transformation coefficient matrix in the subsequent step, the instruction graph with the same size as the image to be processed is determined according to the image to be processed while the bilateral network transformation coefficient matrix corresponding to the image to be processed is determined, so that the image to be processed is enhanced by the instruction graph in the subsequent step.

In a specific embodiment, the step of determining, in step S100, a bilateral network transformation coefficient matrix and a guideline map corresponding to the image to be processed according to the image to be processed includes:

step S110, compressing the acquired image to be processed to obtain a target compressed image;

and step S120, inputting the target compressed image into a self-attention convolutional neural network for processing, outputting a bilateral network transformation coefficient matrix corresponding to the image to be processed, inputting the image to be processed into a shallow neural network for processing, and outputting a guide graph corresponding to the image to be processed.

The image enhancement method in this embodiment is applied to an image enhancement model, as shown in fig. 2, where the image enhancement model includes a self-attention convolutional neural network and a shallow neural network, and the matrix of transformation coefficients of the two-sided network is output through the self-attention convolutional neural network, so as to guide the image to be output through the shallow neural network. Specifically, in this embodiment, after the image to be processed is acquired, the image to be processed is input into a shallow neural network of the image enhancement model, and a guidance chart corresponding to the image to be processed is output through the shallow neural network. And at the same time, compressing the image to be processed to obtain a target compressed image corresponding to the image to be processed, inputting the target compressed image into a self-attention convolutional neural network of the image enhancement model, and outputting a bilateral network transformation coefficient matrix corresponding to the image to be processed through the self-attention convolutional neural network. In the embodiment, the self-attention convolutional neural network and the shallow neural network are used for respectively outputting a bilateral network transformation coefficient matrix and a guide chart corresponding to the image to be processed, and the image to be processed is enhanced through the bilateral network transformation coefficient matrix and the guide chart, so that the possibility of running of the image enhancement method at the mobile terminal product is increased due to a light network structure.

The shallow neural network for generating the instruction graph has various composition modes, and can be an affine transformation network or a convolution neural network, in a specific embodiment, the shallow neural network comprises a fifth convolution layer, a sixth convolution layer and a fourth activation layer, wherein an output item of the fifth convolution layer is an input item of the sixth convolution layer, and an output item of the sixth convolution layer is an input item of the fourth activation layer. In the specific application process, after an image to be processed is input into a shallow neural network, feature extraction is performed through a fifth convolution layer to obtain a second feature image, feature extraction is performed through a sixth convolution layer on the second feature image to obtain a third feature image, and the third feature image outputs a guide image with the image size being the same as that of the image to be processed and the pixel value of each pixel point in the image being between 0 and 1 through a fourth activation layer. In a specific embodiment, the convolution kernel size of the fifth convolution layer is 3×3 (kernel size=3x3), the step size is 1 (stride=1), the output layer size is the SAME as the input layer size (stride= 'SAME'), the number of convolution kernels is 16 (filter number=16), the convolution kernel size of the sixth convolution layer is 1×1 (kernel size=1x1), the step size is 1 (stride=1), the output layer size is the SAME as the input layer size (stride= 'SAME'), the number of convolution kernels is 1 (filter number=1), and the activation function in the fourth activation layer is a sigmoid activation function.

In a specific embodiment, in step S120, the target compressed image is input into the self-attention convolutional neural network for processing, and a bilateral network transformation coefficient matrix corresponding to the image to be processed is output, including:

step S121, inputting a target compressed image into a self-attention convolutional neural network, and extracting features of the target compressed image through a self-attention module to obtain a first feature map;

step S122, local feature extraction and global feature extraction are respectively carried out on the first feature map through a first feature extraction module, so as to obtain a local feature map and a global feature map;

step S123, carrying out feature fusion on the global feature map and the local feature map through a first feature fusion module to obtain a fusion feature map;

and step S124, performing feature expansion and splitting on the fusion feature map through the first convolution layer to obtain a bilateral network transformation coefficient matrix corresponding to the image to be processed.

Specifically, the self-attention convolutional neural network comprises a self-attention module, a first feature extraction module, a first feature fusion module and a first convolutional layer which are sequentially cascaded, after the target compressed image is input into the self-attention convolutional neural network, the first feature extraction module firstly carries out feature extraction on the target compressed image through the self-attention module to obtain a first feature image, then the first feature image is input into the first feature extraction module, the first feature image is subjected to local feature extraction and global feature extraction through the first feature extraction module to obtain a local feature image and a global feature image, then the local feature image and the global feature image are input into the first feature fusion module, the first feature fusion module carries out feature fusion on the local feature image and the global feature image through the first feature fusion module, the local feature image comprises a plurality of channels, and when the first feature fusion module carries out feature fusion on the local feature image and the global feature image, specifically, the global feature image is respectively added into each channel of the local feature image through the third feature fusion layer, then the local feature image and the global feature image are input into the first feature fusion module, the local feature image and the global feature image through the fifth feature fusion module carries out feature fusion through the fifth feature fusion layer, and the first convolutional layer is subjected to be subjected to corresponding to feature fusion, and the matrix expansion is finally obtained. For example, the target compressed image is output after passing through the self-attention module and the first feature extraction module, the local feature map T (i, j, c) and the global feature map S (c '), then S (c') is added to each channel of T (i, j, c) to perform fusion, the fused feature maps act on an activation function, such as a Relu activation function, to obtain a 32×32×64 fused feature map, and finally the 32×32×64 fused feature map is expanded to 32×32×96, and then split into a 32×32×8×12 bilateral network transformation coefficient matrix.

In a specific embodiment, as shown in fig. 3, the self-attention module includes a second convolution layer, a first activation layer, a third convolution layer, a second activation layer, and a first feature fusion layer that are sequentially cascaded, where an output item of the second convolution layer is an input item of the first activation layer, an output item of the first activation layer is an input item of the third convolution layer, an output item of the third convolution layer is an input item of the second activation layer, and an output item of the second activation layer and an input item of the second convolution layer are input items of the first feature fusion layer. Specifically, after the target compressed image is input into the self-attention module, the target compressed image is subjected to feature extraction through the second convolution layer to obtain a fourth feature image, the fourth feature image is input into the third convolution layer after passing through the first activation layer to perform feature extraction to obtain a fifth feature image with the pixel value of each pixel point being 0-1 and the length and depth being the same as those of the target compressed image, the fifth feature image is input into the second feature fusion layer after passing through the second activation layer, and the first feature image is output after feature fusion is performed on the fifth feature image and the target compressed image input by the second feature fusion layer along the depth direction. In a specific embodiment, the second convolution layer has a convolution kernel size of 3×3 (kernel size=3x3), a step size of 1 (stride=1), the output layer has the SAME size as the input layer (stride=3), the number of convolution kernels is 3 (filter number=3), the third convolution layer has a convolution kernel size of 3×3 (kernel size=3x3), a step size of 1 (stride=1), the output layer has the SAME size as the input layer (stride=3), the number of convolution kernels is 3 (filter number=3), the first activation layer uses the Relu as an activation function, and the second activation layer uses the sigmoid as an activation function.

In a specific embodiment, the first feature extraction module includes a fourth convolution layer, a plurality of depth separable convolution units, a first feature extraction unit, and a second feature extraction unit connected in parallel with the first feature extraction unit, where an output term of the fourth convolution layer is an input term of the plurality of depth separable convolution units, and an output term of the plurality of depth separable convolution layers is an input term of the first feature extraction unit and the second feature extraction unit. Specifically, after a first feature map output by the self-attention module is input into the first feature extraction module, features are extracted through a fourth convolution layer to obtain a sixth feature map, then the sixth feature map is input into a plurality of depth separable convolution units, features are extracted through the plurality of depth separable convolution units to obtain a seventh feature map, and then the seventh feature map is respectively input into the first feature extraction unit and the second feature extraction unit to perform local feature extraction such as local detail extraction and global feature extraction such as scene information extraction such as character background and the like, so that the local feature map and the global feature map are obtained. In a specific embodiment, the fourth convolution layer has a convolution kernel size of 3×3 (kernel size=3x3), a step size of 1 (stride=1), an output layer size equal to an input layer size (padding= 'SAME'), a number of convolution kernels of 8 (filter number=8), and the activation function used by the fourth convolution layer is a Relu function.

In a specific embodiment, as shown in fig. 4, each depth-separable convolution unit includes a first depth-separable convolution layer, a third active layer, a second depth-separable convolution layer, a second feature fusion layer, and a third depth-separable convolution layer in cascade with the second feature fusion layer, an output term of the first depth-separable convolution layer being an input term of the third active layer, an output term of the third active layer being an input term of the second depth-separable convolution layer, an input term of the first depth-separable convolution layer being an input term of the third depth-separable convolution layer, an output term of the second depth-separable convolution layer and an output term of the third depth-separable convolution layer being an input term of the second feature fusion layer. Specifically, after a sixth feature map output by the fourth convolution layer is input into a plurality of depth separable convolution units, feature extraction is performed respectively through the first depth separable convolution layer and the third depth separable convolution layer, after the feature map output by the first depth separable convolution layer passes through the third activation layer, feature extraction is performed by inputting the feature map output by the second depth separable convolution layer, and feature fusion is performed between the feature map output by the second depth separable convolution layer and the feature map output by the third depth separable convolution layer through the second feature fusion layer. For example, after a first depth-separable convolution layer of 3x3x (C/2) with a step size of 2 is input, a feature map of (H/2) x (W/2) x (C/2) is output, after a second depth-separable convolution layer of 3x3xC with a step size of 1 is input, a feature map of (H/2) x (W/2) x (2C) is output, and at the same time, after a third depth-separable convolution layer of 3x3x (2C) with a step size of 2 is input, a feature map of (H/2) x (W/2) x (2C) is output, and the feature map of (H/2) x (W/2) x (C/2) is fused with the feature map of (H/2) x (W/2) x (C) output by the third depth-separable convolution layer.

In a specific embodiment, the first feature extraction unit includes a plurality of seventh convolution layers connected in series, the second feature extraction unit includes a plurality of eighth convolution layers connected in series and a plurality of fully-connected layers, wherein a convolution kernel size of the plurality of seventh convolution layers is 3×3 (kernel size=3x3), a step size is 1 (stride=1), an output layer size is the SAME as an input layer size (padding= 'SAME'), a number of convolution kernels is 64 (filter number=64), a convolution kernel size of the plurality of eighth convolution layers is 3×3 (kernel size=3x3), a step size is 2 (stride=2), an output layer size is the SAME as the input layer size (padding= 'SAME'), a number of convolution kernels is 64 (filter number=64), and the plurality of fully-connected layers includes three fully-connected layers, and the number of convolution kernels of the three fully-connected layers is 256, 128, and 64, respectively.

Step 200, determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the bilateral network transformation coefficient matrix and the instruction diagram.

The pixel level transformation coefficient matrix is a color transformation matrix corresponding to each pixel point in the image to be processed, and is used for performing image processing on each pixel point in the image to be processed.

In one embodiment, step S200 specifically includes:

step S210, obtaining the image size information of the instruction image and the pixel value of each pixel point in the instruction image;

step S220, determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the image size information, the pixel values and the bilateral network transformation coefficient matrix.

The graph size information of the instruction graph is specifically length information and width information of the instruction graph, and when determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed, the embodiment firstly obtains the graph size information of the instruction graph and pixel values of each pixel point in the instruction graph, and then determines the pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the graph size information, the pixel values and the bilateral network transformation coefficient matrix.

In one embodiment, step S220 specifically includes:

step S221, determining first position information of each pixel point in the instruction graph in a bilateral network transformation coefficient matrix according to the graph size information and the bilateral grid size information, and determining second position information of each pixel point in the instruction graph in the bilateral network transformation coefficient matrix according to the pixel value and the brightness information;

Step S222, determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the first position information and the second position information.

The bilateral network transformation coefficient matrix obtained in this embodiment includes four dimensions, one dimension and two dimensions representing bilateral grid size information, three dimensions representing luminance information, and four dimensions representing color transformation matrices, for example, when the bilateral network transformation coefficient matrix is 32x32x8x12, one dimension representing the bilateral network transformation coefficient matrix is a 32x32 bilateral grid coefficient matrix, the third dimension representing luminance information is 8, and the fourth dimension representing a 3 x 4 color transformation matrix. When a pixel-level transformation coefficient matrix corresponding to each pixel in an image to be processed is determined, firstly, according to the image size information and the bilateral network size information, all pixel points (i, j) in a guide image are corresponding to grids of the bilateral network transformation coefficient matrix, so that first position information (i ', j') which is position information of each pixel point in the guide image in a two-dimensional space of the bilateral network transformation coefficient matrix is obtained, and three-dimensional position information, which is second position information, of each pixel point in the guide image in the bilateral network transformation coefficient matrix is determined according to the pixel values and the brightness information. The calculation formula of the second position information is s=d× (a-1) +1, where S is the second position information, D is the pixel value of the pixel point, and a is the luminance information. For example, when D is 0.6 and a is 8, s=0.6× (8-1) +1=5.2.

After the position, namely the first position information, of each pixel point in the instruction graph in the two-dimensional space of the bilateral network transformation coefficient matrix and the three-dimensional position, namely the second position information are obtained, the pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed can be determined according to the first position information and the second position information. The calculation formula of the pixel-level transformation coefficient matrix is P= |S1-S|×M (i ', j', S2,:) +|S2-S|×M (i ', j', S1,:), wherein (i ', j') is first position information corresponding to a pixel point (i, j) in the guide graph, S is second position information corresponding to the pixel point (i, j) in the guide graph, S1 and S2 are two integers closest to S, S1> S2, M (i ', j', S1,:) is a color transformation matrix corresponding to a two-dimensional position of (i ', j') in the bilateral network transformation coefficient matrix, M (i ', j', S2,:) is a color transformation matrix corresponding to a two-dimensional position of (i ', j') in the bilateral network transformation coefficient matrix, and the three-dimensional position of (i ', j') is a color transformation matrix corresponding to S2. For example, when s=5.2, s1=6 and s2=5, the pixel-level transform coefficient matrix p= |6-5.2|×m (i ', j',5,:) +|5-5.2|×m (i ', j',6,:) corresponding to the pixel point (i, j) in the instruction map.

And step S300, performing image processing on the image to be processed according to the pixel-level transformation coefficient matrix to obtain a target enhanced image.

After the pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed is obtained, the pixel-level transformation coefficient matrix corresponding to each pixel point is acted on the image to be processed pixel by pixel, so that each pixel point in the image to be processed can perform image processing according to the corresponding pixel-level transformation coefficient matrix, and a target enhanced image is obtained. In the embodiment, the image to be processed is processed according to the pixel-level transformation coefficient matrix corresponding to each pixel point, so that the self-adaptive differential color, contrast and dynamic range of the human image region and other regions in the image to be processed can be enhanced, and the problem that transition is unnatural at the boundary position caused by human image segmentation is avoided.

In one embodiment, the step S300 specifically includes:

step S310, obtaining an original color matrix corresponding to each pixel point in an image to be processed;

and step S320, performing matrix multiplication operation on the original color matrix and the pixel level conversion coefficient matrix to obtain a target enhanced image.

The original color matrix corresponding to each pixel point is a matrix composed of RGB values of RGB channels of each pixel point, which can be expressed as [ R, G, B,1 ] ] ^T . After determining the pixel level transformation coefficient matrix corresponding to each pixel point, the embodiment further obtains an original color matrix corresponding to each pixel point in the image to be processed, and then performs matrix multiplication operation on the original color matrix corresponding to each pixel point and the pixel level transformation coefficient matrix corresponding to each pixel point to obtain the target enhanced image.

In a specific embodiment, before step S110, the method further includes:

m110, inputting training images in a preset training sample set into a preset network model for processing, and outputting a plurality of predicted images corresponding to the training images; the preset training sample set comprises a training image, a target image corresponding to the training image and a portrait mask image;

step M120, training a preset network model based on the predicted image, the target image, the portrait mask drawing and a loss function of the preset network model to obtain an image enhancement model; wherein the loss functions include a first loss function based on a pixel level RGB vector, a second loss function based on a luminance channel histogram distribution, and a third loss function that distinguishes region weights by a portrait mask drawing.

In order to generate the image enhancement model, in this embodiment, a network model is preset, the structure of the network model is the same as that of the image enhancement model, and the network model is trained through a training sample set prepared in advance to generate the image enhancement model. The preset training sample set comprises a training image, a target image corresponding to the training image and a portrait mask image, wherein the training image can be acquired through a camera of an electronic device (such as a smart phone), images acquired through cameras of other electronic devices can also be acquired through a network, bluetooth, infrared and other ways, the target image corresponding to the training image is an enhanced image corresponding to the training image, the enhanced image can be obtained through photo reconstruction (photo mask) processing of the training image, and the portrait mask image can be obtained through inputting the training image into a pre-trained portrait segmentation model. In order to improve the performance of the image enhancement model, the training sample set may include training images obtained under different light conditions, such as indoor and outdoor training images, urban training images, natural training images, backlight training images, and forward training images, the training images may include figures of different sexes and ages, and the proportion of figures in the training images to the total figures may be different. After the training sample set is prepared, the training sample set is divided into a training set and a testing set according to a preset proportion, for example, the training sample set is divided into 85% of training set and 15% of testing set, in order to make the performance of the image enhancement model better, the training set and the testing set all need to comprise all classification scenes, and the picture contents in the training set and the testing set are completely independent, namely, the picture which is completely the same as the picture of the shooting field and the shooting content in the training set cannot appear in the testing set.

After the training sample set is prepared, training images in the training sample set are input into a preset network model, a predicted image corresponding to the training images is output through the preset network model, and the preset network model is trained based on the predicted image, the target image, the portrait mask drawing and a loss function of the preset network model until the training condition of the preset network model meets preset conditions, so that an image enhancement model is obtained. Specifically, when training the preset network model based on the predicted image, the target image, the portrait mask drawing and the loss function of the preset network model, firstly determining a loss value according to the predicted image, the target image, the portrait mask drawing and the loss function of the preset network model, when the loss value does not meet the preset condition, correcting model parameters of the preset network model according to a preset parameter learning rate, continuously executing the predicted image corresponding to the training image output through the preset network model, and determining the loss value according to the predicted image, the target image, the portrait mask drawing and the loss function of the preset network model until the loss value meets the preset condition. The loss value meeting the preset condition may be that the loss value is smaller than a preset first threshold, or that a difference value between two obtained loss values is smaller than a preset second threshold. Of course, whether the training of the preset network model is finished is judged, or the training of the preset network model is finished can be determined by comparing the peak signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR) of the whole training set when the difference value of the peak signal-to-noise ratio obtained in the last two times is smaller than a preset third threshold value. The preset network model has the advantages of small model training parameters, low requirement on hardware computing power, small memory occupation and high running speed, and is convenient for the image enhancement method to land on the mobile terminal.

In order to enhance the image enhancement model from the aspects of local contrast, global brightness, dynamic range and the like, so as to achieve a more natural image enhancement effect, the loss function used for training the preset network model in this embodiment includes a first loss function based on pixel-level RGB vectors and a second loss function based on brightness channel histogram distribution in addition to a basic mean square loss function (mse loss). In order to implement differential enhancement of the character region and other regions by the image enhancement model, the loss function used to train the preset network model further includes a third loss function that distinguishes region weights by the portrait mask drawing. Wherein, the calculation formula of the loss function of the preset network model is L=w _mse L _mse +w _angle L _angle +w _{Y_KS_} _hist L _{Y_KS_hist} +w _HR L _HR ，L _mse ＝||Y′-Y|| ² ，L _angle ＝cos(Y′，Y)，L _HR ＝||mask(α)(Y′-Y)|| ² ，L _mse As a mean square loss function, L _angle As a first loss function, L _{Y_KS_hist} As a second loss function, L _HR Y' is the confidence of the predicted image, Y is the confidence of the target image, which is the third loss function, +.>For predicting the number of pixels of the image whose luminance value falls into the kth segment of the luminance channel, +.>For the number of pixel points of the target image with the brightness value falling into the kth section of the brightness channel, mask (-) is a portrait mask diagram corresponding to the target image, alpha is the weight of a portrait area relative to a non-portrait area, and w _mse 、w _angle 、w _{Y_KS_hist} And W is _HR Respectively is L _mse 、L _angle 、L _{Y_KS_hist} And L _HR And (5) corresponding weight. In one embodiment, L _mse 、L _angle 、L _{Y_KS_hist} And L _HR The ratio of (2) is 1:1:0.1:1, and alpha is 4.

Exemplary apparatus

As shown in fig. 5, an embodiment of the present application provides an image enhancement apparatus, including:

the first determining module 510 is configured to determine a bilateral network transformation coefficient matrix and a guidance chart corresponding to the acquired image to be processed;

the second determining module 520 is configured to determine a pixel level transform coefficient matrix corresponding to each pixel point in the image to be processed according to the bilateral network transform coefficient matrix and the instruction graph;

the image enhancement module 530 is configured to perform image processing on the image to be processed according to the pixel-level transform coefficient matrix, so as to obtain a target enhanced image.

In some embodiments of the present application, the first determining module 510 is specifically configured to:

compressing the acquired image to be processed to obtain a target compressed image;

the target compressed image is input into a self-attention convolutional neural network to be processed, a bilateral network transformation coefficient matrix corresponding to the image to be processed is output, the image to be processed is input into a shallow neural network to be processed, and a guide graph corresponding to the image to be processed is output.

In some embodiments of the present application, the first determining module 510 is specifically further configured to:

Inputting the target compressed image into a self-attention convolutional neural network, and extracting the characteristics of the target compressed image through a self-attention module to obtain a first characteristic diagram;

the method comprises the steps of respectively carrying out local feature extraction and global feature extraction on a first feature map through a first feature extraction module to obtain a local feature map and a global feature map;

feature fusion is carried out on the global feature map and the local feature map through a first feature fusion module, and a fusion feature map is obtained;

and carrying out feature expansion and splitting on the fusion feature map through the first convolution layer to obtain a bilateral network transformation coefficient matrix corresponding to the image to be processed.

In some embodiments of the present application, the second determining module 520 is specifically configured to:

acquiring the graph size information of the instruction graph and the pixel value of each pixel point in the instruction graph;

and determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the image size information, the pixel values and the bilateral network transformation coefficient matrix.

In some embodiments of the present application, the second determining module 520 is specifically further configured to:

determining first position information of each pixel point in the instruction graph in a bilateral network transformation coefficient matrix according to the graph size information and the bilateral grid size information, and determining second position information of each pixel point in the instruction graph in the bilateral network transformation coefficient matrix according to the pixel value and the brightness information;

And determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the first position information and the second position information.

In some embodiments of the present application, the image enhancement module 530 is specifically configured to:

acquiring an original color matrix corresponding to each pixel point in an image to be processed;

and performing matrix multiplication operation on the original color matrix and the pixel level conversion coefficient matrix to obtain a target enhanced image.

In some embodiments of the present application, the image enhancement model 500 further includes:

the image processing module is used for inputting training images in a preset training sample set into a preset network model for processing and outputting predicted images corresponding to the training images; the preset training sample set comprises a training image, a target image corresponding to the training image and a portrait mask image;

the model training module is used for training the preset network model based on the predicted image, the target image, the portrait mask drawing and the loss function of the preset network model to obtain an image enhancement model; wherein the loss functions include a first loss function based on a pixel level RGB vector, a second loss function based on a luminance channel histogram distribution, and a third loss function that distinguishes region weights by a portrait mask drawing.

Based on the above embodiment, the present invention also provides an intelligent terminal, and a functional block diagram thereof may be shown in fig. 6. The intelligent terminal comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. The processor of the intelligent terminal is used for providing computing and control capabilities. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the intelligent terminal is used for communicating with an external terminal through network connection. The computer program is executed by a processor to implement an image enhancement method. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and a temperature sensor of the intelligent terminal is arranged in the intelligent terminal in advance and used for detecting the running temperature of internal equipment.

It will be appreciated by those skilled in the art that the schematic block diagram shown in fig. 6 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the smart terminal to which the present inventive arrangements are applied, and that a particular smart terminal may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a smart terminal is provided that includes a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor executing instructions that:

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

In summary, the invention discloses an image enhancement method, an image enhancement device, an intelligent terminal and a computer readable storage medium, which are used for determining a pixel level transformation coefficient matrix according to a bilateral network transformation coefficient matrix and a guidance image, and carrying out image enhancement on an image to be processed according to the pixel level transformation coefficient matrix, so that the enhancement of self-adaptive differential color, contrast and dynamic range of a portrait region and a background region in the image can be realized, the portrait segmentation of the image is not required, the algorithm processing time is short, and the problem of unnatural transition at the boundary of the portrait region and the background region does not occur.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. An image enhancement method, comprising:

2. The method according to claim 1, wherein the method is applied to an image enhancement model including a self-attention convolutional neural network and a shallow neural network, and the determining a bilateral network transformation coefficient matrix and a guidance graph corresponding to the acquired image to be processed includes:

inputting the target compressed image into the self-attention convolutional neural network for processing, outputting a bilateral network transformation coefficient matrix corresponding to the image to be processed, inputting the image to be processed into the shallow neural network for processing, and outputting a guide graph corresponding to the image to be processed.

3. The method according to claim 2, wherein the self-attention convolutional neural network includes a self-attention module, a first feature extraction module, a first feature fusion module, and a first convolutional layer that are sequentially cascaded, the inputting the target compressed image into the self-attention convolutional neural network for processing, and outputting a bilateral network transformation coefficient matrix corresponding to the image to be processed includes:

Inputting the target compressed image into the self-attention convolutional neural network, and extracting the characteristics of the target compressed image through the self-attention module to obtain a first characteristic diagram;

the first feature extraction module is used for extracting local features and global features of the first feature map respectively to obtain a local feature map and a global feature map;

performing feature fusion on the global feature map and the local feature map through the first feature fusion module to obtain a fusion feature map;

4. The method of claim 3, wherein the self-attention module comprises a second convolution layer, a first activation layer, a third convolution layer, a second activation layer, and a first feature fusion layer in cascade, an output term of the second convolution layer being an input term of the first activation layer, an output term of the first activation layer being an input term of the third convolution layer, an output term of the third convolution layer being an input term of the second activation layer, an output term of the second activation layer and an input term of the second convolution layer being an input term of the first feature fusion layer.

5. A method according to claim 3, wherein the first feature extraction module comprises a fourth convolution layer, a plurality of depth separable convolution units, a first feature extraction unit, and a second feature extraction unit connected in parallel with the first feature extraction unit, in cascade, the output terms of the fourth convolution layer being the input terms of the plurality of depth separable convolution units, the output terms of the plurality of depth separable convolution units being the input terms of the first feature extraction unit and the second feature extraction unit.

6. The method of claim 5, wherein each of the depth-separable convolutional units comprises a first depth-separable convolutional layer, a third active layer, a second depth-separable convolutional layer, a second feature fusion layer, and a third depth-separable convolutional layer concatenated with the second feature fusion layer, an output term of the first depth-separable convolutional layer being an input term of the third active layer, an output term of the third active layer being an input term of the second depth-separable convolutional layer, an input term of the first depth-separable convolutional layer being an input term of the third depth-separable convolutional layer, an output term of the second depth-separable convolutional layer and an output term of the third depth-separable convolutional layer being an input term of the second feature fusion layer.

7. The method according to claim 1, wherein the size of the instruction graph is the same as the size of the image to be processed, and the pixel value range of each pixel point in the instruction graph is 0-1.

8. The method according to claim 1, wherein determining a pixel-level transform coefficient matrix corresponding to each pixel in the image to be processed according to the bilateral network transform coefficient matrix and the instruction graph includes:

acquiring the graph size information of the guide graph and the pixel value of each pixel point in the guide graph;

and determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the image size information, the pixel value and the bilateral network transformation coefficient matrix.

9. The method of claim 8, wherein the bilateral network transform coefficient matrix includes bilateral mesh size information and luminance information, and wherein determining a pixel-level transform coefficient matrix corresponding to each pixel in the image to be processed according to the graph size information, the pixel values, and the bilateral network transform coefficient matrix includes:

determining first position information of each pixel point in the guide graph in the bilateral network transformation coefficient matrix according to the graph size information and the bilateral grid size information, and determining second position information of each pixel point in the guide graph in the bilateral network transformation coefficient matrix according to the pixel value and the brightness information;

10. The method according to claim 1, wherein the performing image processing on the image to be processed according to the pixel-level transform coefficient matrix to obtain a target enhanced image includes:

acquiring an original color matrix corresponding to each pixel point in the image to be processed;

and performing matrix multiplication operation on the original color matrix and the pixel-level transformation coefficient matrix to obtain a target enhanced image.

11. The method according to any one of claims 2-6, wherein before compressing the acquired image to be processed to obtain the target compressed image, the method further comprises:

inputting training images in a preset training sample set into a preset network model for processing, and outputting predicted images corresponding to the training images; the preset training sample set comprises a training image, a target image corresponding to the training image and a portrait mask drawing;

training the preset network model based on the predicted image, the target image, the portrait mask drawing and a loss function of the preset network model to obtain the image enhancement model; wherein the loss function includes a first loss function based on a pixel level RGB vector, a second loss function based on a luminance channel histogram distribution, and a third loss function that distinguishes region weights by a portrait mask drawing.

12. An image enhancement apparatus, comprising:

the second determining module is used for determining a pixel-level transformation coefficient matrix corresponding to each pixel point in the image to be processed according to the bilateral network transformation coefficient matrix and the instruction graph;

13. A smart terminal comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the image enhancement method according to any of the preceding claims 1-11 when the computer program is executed by the processor.

14. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the image enhancement method according to any of the preceding claims 1-11.