CN116563359A

CN116563359A - Training method of highlight processing model, image processing method and device

Info

Publication number: CN116563359A
Application number: CN202310501294.5A
Authority: CN
Inventors: 周代国; 谢嘉仪; 彭鑫; 向晨; 丁倩; 罗飞; 周富; 谌金垚; 肖春霞
Original assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd; Xiaomi Technology Wuhan Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd; Xiaomi Technology Wuhan Co Ltd
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-08-08

Abstract

The application provides a training method of a highlight processing model, an image processing method and a device, wherein the method comprises the following steps: the method comprises the steps of obtaining a sample highlight image and a real highlight-free image corresponding to the sample highlight image, inputting the sample highlight image into a highlight processing model, adding a learning network between a highlight detection network and a texture restoration network in the model, enabling the learning network to learn global target background texture features of a highlight region and a highlight-free region based on output of the highlight detection network, guiding the texture restoration network to carry out texture restoration by utilizing the target background texture features so as to reconstruct nonlinear change of the highlight region, generating a predicted highlight-free image, and training the highlight processing model based on difference between the first highlight-free image and the real highlight-free image and difference between the predicted highlight-free image and the real highlight-free image, so that the highlight processing model obtained through training can output high-quality and rich-texture highlight-free images.

Description

Training method of highlight processing model, image processing method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training method of a highlight processing model, an image processing method and an image processing device.

Background

With the popularity of electronic devices, users often use electronic devices (such as mobile phones) to shoot, and when the mobile phone shoots are irradiated by too strong ambient light, a phenomenon of high light reflection occurs locally. The highlight reflection of light can influence the discernment of useful information such as document characters and pattern, brings bad visual impression for the user. For this reason, it is important to study the highlight processing of images in the case where the highlight reflection causes the information to be recognized to be impaired but still visually recognized.

In the related art, textures of an image are damaged when the highlight processing is performed, and the texture repair is poor.

Disclosure of Invention

The present application aims to solve, at least to some extent, one of the technical problems in the related art.

Therefore, the application provides a training method, an image processing method and a device for the highlight processing model, and the training effect of the highlight processing model is improved.

In one aspect, an embodiment of the present application provides a training method for a highlight processing model, including:

acquiring a sample highlight image and a real no-highlight image corresponding to the sample highlight image;

inputting the sample highlight image into a highlight detection network of a highlight processing model to obtain a highlight mask image;

Subtracting the sample highlight image and the highlight mask image to obtain a first highlight-free image without texture restoration;

inputting the first highlight-free image, the highlight mask image and the sample highlight image into a learning network of the highlight processing model to obtain a target background texture feature map;

inputting the target background texture feature map into a texture restoration network of the highlight processing model to obtain a predicted highlight-free image after texture restoration;

and performing model training on the highlight processing model according to the difference between the first highlight-free image and the real highlight-free image and the difference between the predicted highlight-free image and the real highlight-free image to obtain the trained highlight processing model.

Another embodiment of the present application provides an image processing method, including:

acquiring a highlight image to be processed;

inputting the highlight image into a trained highlight processing model so that the highlight processing model performs highlight elimination and texture reconstruction on the highlight image to obtain a highlight-free image; the highlight processing model is trained by the model training device according to the aspect, so as to obtain the trained highlight processing model.

Another embodiment of the present application proposes a training device for a highlight processing model, including:

the acquisition module is used for acquiring a sample highlight image and a real highlight-free image corresponding to the sample highlight image;

the identification module is used for inputting the sample highlight image into a highlight detection network of a highlight processing model to obtain a highlight mask image;

the processing model is used for subtracting the sample highlight image and the highlight mask image to obtain a first highlight-free image which is not subjected to texture restoration;

the learning module is used for inputting the first highlight-free image, the highlight mask image and the sample highlight image into a learning network of the highlight processing model to obtain a target background texture feature map;

the restoration module is used for inputting the target background texture feature map into a texture restoration network of the highlight processing model to obtain a predicted highlight-free image after texture restoration;

and the training module is used for carrying out model training on the highlight processing model according to the difference between the first highlight-free image and the real highlight-free image and the difference between the predicted highlight-free image and the real highlight-free image so as to obtain the trained highlight processing model.

Another embodiment of the present application proposes an image processing apparatus including:

the acquisition module is used for acquiring the highlight image to be processed;

the processing module is used for inputting the highlight image into a trained highlight processing model so that the highlight processing model can perform highlight elimination and texture reconstruction on the highlight image to obtain a highlight-free image; the highlight processing model is trained by the model training device, so that the trained highlight processing model is obtained.

In another aspect, an embodiment of the present application proposes an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement a method according to the foregoing method embodiment.

In another aspect, embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in the previous method embodiments.

In another aspect, embodiments of the present application provide a computer program product having a computer program stored thereon, which when executed by a processor implements a method as described in the previous method embodiments.

According to the training method, the image processing method and the device for the highlight processing model, the sample highlight image and the real highlight-free image corresponding to the sample highlight image are obtained, the sample highlight image is input into the highlight processing model, a learning network is added between a highlight detection network and a texture restoration network in the model, the learning network learns global target background texture characteristics of a highlight region and a highlight-free region based on the output of the highlight detection network, the texture restoration network is guided to carry out texture restoration by utilizing the target background texture characteristics, nonlinear change of the highlight region is rebuilt, a high-quality and texture-rich prediction highlight-free image is generated, the model training is carried out on the highlight processing model based on the difference between the first highlight-free image and the real highlight-free image, and the difference between the prediction highlight-free image and the real highlight-free image, so that the highlight processing model after training is obtained, the highlight processing model can output high-quality and texture-rich highlight-free images, and the model training effect is improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a schematic flow chart of a training method of a highlight processing model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a sample highlight image according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of another method for training a highlight processing model according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a highlight processing model according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a U-shaped network according to an embodiment of the present application;

FIG. 6 is a flowchart of another method for training a highlight processing model according to an embodiment of the present disclosure;

fig. 7A is a schematic structural diagram of a learning network according to an embodiment of the present application;

fig. 7B is a schematic diagram of a test scenario provided in an embodiment of the present application;

fig. 8 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 9 is a schematic view of a highlight scene according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a training device for a highlight processing model according to an embodiment of the present application;

Fig. 11 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 12 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.

The following describes a training method, an image processing method and an apparatus for a highlight processing model according to the embodiments of the present application with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a training method of a highlight processing model according to an embodiment of the present application.

The execution main body of the training method of the highlight processing model in the embodiment of the application is a training device of the highlight processing model, the device can be arranged in electronic equipment, and the electronic equipment can be a server or a terminal device, which is not limited in the embodiment.

As shown in fig. 1, the method may include the steps of:

step 101, acquiring a sample highlight image and a real no-highlight image corresponding to the sample highlight image.

The sample highlight image refers to an image containing a highlight region. The specular reflection and diffuse reflection are commonly used to create specular reflection, i.e., the specular reflection and diffuse reflection are included in the sample specular image, where diffuse reflection reflects the properties of the illuminated object and specular reflection reflects the properties of the illumination.

The method for generating the sample highlight image can be generated by the following implementation modes:

as an implementation manner, the real image is collected by means of network searching, for example, as a real card, such as an identity card, a social security card, etc. Furthermore, as an implementation manner, the real image and the highlight picture are fused through image processing software, so that a fused image containing highlight, namely a sample highlight image, is obtained. As another fusing method, poisson fusion may be used to implement highlight fusion, as an example, as shown in fig. 2, which is a sample highlight image obtained by fusing a clear id card image with a highlight mask.

It should be noted that, the sample highlight image obtained by fusion may include a plurality of highlight regions with different positions and/or highlight intensities, and the target fusion region of the highlight mask includes at least one of contents such as characters, textures, figures, etc. as much as possible, so that the model obtained by training may be suitable for the application requirements of various scenes, which is not limited in the embodiment of the present application.

As another implementation manner, shooting is performed on a scene with strong light or a specially built highlight scene, so as to obtain a sample highlight image containing highlight.

And 102, inputting the sample highlight image into a highlight detection network of a highlight processing model to obtain a highlight mask image.

In an embodiment of the present application, the highlight processing model includes a highlight detection network, a learning network, and a texture restoration network. The highlight detection network is used for identifying a highlight region in an input sample highlight image so as to obtain highlight distribution in the sample highlight image, wherein the highlight distribution comprises the position and the intensity of the highlight in a background region, the intensity of the highlight indicates the covering degree of the highlight on information, and the larger the intensity of the highlight is, the stronger the covering degree of the highlight on the information is. And generating a high-light mask image according to the high-light distribution in the sample high-light image, wherein the high-light mask image is a gray level image, the gray level value of a non-high-light area is 0, namely black, the gray level value of a high-light area part is determined to be a corresponding gray level value based on the high-light intensity, namely a value between 0 and 255, and displaying the gray or white corresponding to the gray level value.

And step 103, subtracting the sample highlight image and the highlight mask image to obtain a first highlight-free image without texture restoration.

In this embodiment of the present invention, a sample highlight image includes a highlight region, a highlight position and a highlight intensity are included in a highlight mask image, the sample highlight image and the highlight mask image are subtracted to obtain a first highlight-free image without performing texture restoration, and as an implementation manner, each pixel unit in the sample highlight image and a corresponding pixel unit in the highlight mask image are subtracted to obtain the first highlight-free image, where the first highlight-free image is a highlight-free image without performing texture restoration.

And 104, inputting the first highlight-free image, the highlight mask image and the sample highlight image into a learning network of a highlight processing model to obtain a target background texture feature map.

In this embodiment, the learning network is connected to the highlight detection network and the texture restoration network, that is, a learning network is added between the highlight detection network and the texture restoration network, the input of the learning network includes the output of the highlight detection network, the texture feature of the highlight free region is learned to obtain a first background texture feature map based on the input first highlight free image, the texture feature of the highlight free region is learned to obtain a second background texture feature map based on the input highlight mask image and the sample highlight image, and the learned first background texture feature map and the learned second background texture feature map are fused to obtain a target background texture feature map, so that the subsequent texture restoration network is guided to perform texture restoration.

And 105, inputting the target background texture feature map into a texture restoration network of the highlight processing model to obtain a predicted highlight-free image after texture restoration.

In the embodiment of the application, the texture restoration network is based on the input target background texture feature map, and the target background texture feature map comprises the background texture features of the highlight region and the background texture features of the non-highlight region, so that the non-linear change of the highlight region can be reconstructed, and a high-quality non-highlight image with rich textures is generated.

And 106, performing model training on the highlight processing model according to the difference between the first highlight-free image and the real highlight-free image and the difference between the predicted highlight-free image and the real highlight-free image to obtain a trained highlight processing model.

In the embodiment of the present application, according to the difference between the first no-highlight image and the real no-highlight image, and according to the difference between the predicted no-highlight image and the real no-highlight image, a target loss function is determined, that is, the target loss function includes a loss value of the highlight detection network and a loss value of the texture restoration network, so that according to the target loss function, a model parameter adjustment is performed on the highlight processing model to obtain a trained highlight processing model.

It should be noted that, the foregoing steps 101 to 106 need to be repeated multiple times, and each time may use a different sample highlight image, so that training is stopped when the loss function is smaller than the threshold value, or training is stopped when the number of repeated executions is greater than the threshold value. And taking the highlight processing model obtained after the last model parameter adjustment as a trained highlight processing model.

In the training method of the highlight processing model of the embodiment of the application, a sample highlight image and a real highlight-free image corresponding to the sample highlight image are obtained, the sample highlight image is input into a highlight detection network of the highlight processing model to obtain a highlight mask image, the sample highlight image and the highlight mask image are subtracted to obtain a first highlight-free image which is not subjected to texture restoration, the first highlight-free image, the highlight mask image and the sample highlight image are input into a learning network of the highlight processing model to obtain a target background texture feature image, the target background texture feature image is input into a texture restoration network of the highlight processing model to obtain a predicted highlight-free image after texture restoration, according to the difference between the first highlight-free image and the real highlight-free image and the difference between the predicted highlight-free image and the real highlight-free image, model training is carried out on a highlight processing model to obtain a trained highlight processing model, a learning network is added between a highlight detection network and a texture restoration network, so that the learning network learns global background characteristics comprising a highlight region and a highlight-free region based on the output of the highlight detection network, and is used as the input of the texture restoration network to guide the texture restoration network to carry out texture restoration so as to reconstruct nonlinear change of the highlight region, thereby generating a high-quality and texture-rich prediction highlight-free image, carrying out model training on the highlight processing model based on the difference between a first highlight-free image and a real highlight-free image and according to the difference between the prediction highlight-free image and the real highlight-free image, and obtaining the trained highlight processing model, so that the highlight processing model obtained by training can output high-quality, and (5) a highlight-free image with rich textures.

Based on the above embodiments, fig. 3 is a flow chart of another training method of a highlight processing model according to the embodiments of the present application, as shown in fig. 3, the method includes the following steps:

step 301, acquiring a sample highlight image and a real no-highlight image corresponding to the sample highlight image.

The principle of step 301 is the same as that of the previous embodiment, and will not be described again here.

And 302, inputting the sample highlight image into a highlight detection network of a highlight processing model to obtain a highlight mask image.

As an example, fig. 4 is a schematic structural diagram of a highlight processing model provided in an embodiment of the present application, and as shown in fig. 4, the highlight processing model includes a highlight detection network, a learning network, and a texture restoration network, where an output of the highlight detection network is used as an input of the learning network, and an output of the learning network is used as an input of the texture restoration network. The principle of the output of each network may be the same with reference to the explanation of the foregoing embodiments, and will not be repeated here.

Wherein the high light detection network is a U-shaped network comprising an encoder and a decoder, as an example, as shown in fig. 5, wherein the encoder comprises a plurality of processing layers, each processing layer comprising two convolution layers and a pooling layer, wherein the pooling layer adopts maximum pooling. The decoder includes a plurality of processing layers, each including a Fan Jun base layer and two convolutional layers. As shown in fig. 5, each processing layer of the encoder and the corresponding processing layer of the decoder adopt a layer jump connection mode, so that the low-dimensional features and the high-dimensional features can be fused, and the processing effect is improved. Similarly, the texture repair network is also a U-shaped network, including an encoder and a decoder, and the network structure is not described herein.

Step 303, subtracting the sample highlight image and the highlight mask image to obtain a first highlight-free image without texture restoration.

And step 304, inputting the first highlight-free image, the highlight mask image and the sample highlight image into a learning network of a highlight processing model to obtain a target background texture feature map.

And 305, inputting the target background texture feature map into a texture restoration network of the highlight processing model to obtain a predicted highlight-free image after texture restoration.

The principles of steps 303 to 305 may be the same as those of the previous embodiments, and are not repeated here.

Step 306, determining an objective loss function based on the difference between the first non-highlight image and the actual non-highlight image, and based on the difference between the predicted non-highlight image and the actual non-highlight image.

In one implementation manner of the embodiment of the present application, a first loss function of a highlight detection network is determined according to a difference between a first highlight-free image and a real highlight-free image, a second loss function of a texture restoration network is determined according to a difference between a predicted highlight-free image and a real highlight-free image, and the first loss function and the second loss function are weighted and synthesized according to a set first weight coefficient to obtain a target loss function. In the embodiment of the application, the loss function is optimized, so that the loss function comprises the output loss of the highlight detection network and also comprises the output loss of the texture restoration network, the highlight detection network and the texture restoration network are optimized at the same time, and the effect of optimizing the highlight processing model is improved.

The method comprises the steps of determining a first loss function, determining a first content difference between a first highlight-free image and a real highlight-free image and a first texture difference between the first highlight-free image and the real highlight-free image, acquiring a set second weight coefficient, and carrying out weighted synthesis on the first content difference and the first texture difference according to the second weight coefficient to obtain the first loss function of the highlight detection network. In the embodiment of the application, the first non-highlight image is determined based on the highlight mask image output by the highlight detection network, the first loss function comprises the content loss and the texture loss of the image between the first non-highlight image and the real non-highlight image, and the highlight detection network is guided based on the first loss function to keep similarity on the content and the texture of the image, so that the accuracy of parameter adjustment of the highlight detection network is improved.

And for the determination of the second loss function, as an implementation manner, determining a second content difference between the predicted highlight-free image and the real highlight-free image, and a second texture difference between the predicted highlight-free image and the real highlight-free image, acquiring a set third weight coefficient, and carrying out weighted synthesis on the second content difference and the second texture difference according to the third weight coefficient to obtain the second loss function of the texture restoration network. In the embodiment of the application, the texture restoration network outputs the predicted no-highlight image, and the second loss function comprises the content loss and the texture loss of the image between the predicted no-highlight image and the real no-highlight image, and the texture restoration network is guided based on the second loss function to simultaneously maintain the similarity on the content and the texture of the image, so that the accuracy of parameter adjustment of the texture restoration network is improved

In one implementation of the present embodiments, content loss may be measured using a Charbonnier penalty function, which is evaluated using a structural similarity measure function (Structure Similarity Index Measure, SSIM).

Wherein, the objective Loss function Loss can be expressed as:

Loss＝Loss ₁ +α*Loss ₂ 。

wherein, loss ₁ As a first Loss function, loss ₂ As a second loss function, I _pre1 For the first highlight-free image, I _gt Lambda is true highlight-free image ₁ As the second weight coefficient lambda ₂ For the third weight coefficient, I _pre2 For predicting a highlight-free image, α is a first weight coefficient, and ε is a set value, typically 0.001.

Note that Loss ₁ Both the content loss calculated by the Charbonnier penalty function and the texture loss calculated by the SSIM function are in the red-greenCalculated in blue RGB space, loss ₂ The content loss calculated by the Charbonnier penalty function is calculated in the color space CIE-Lab space and the texture loss calculated by the SSIM function is calculated in the RGB space. Because the CIE-Lab space is more in line with the human eye vision system, the color difference problem generated during texture restoration can be reduced by using the content loss calculated in the CIE-Lab space in the background texture restoration process.

And 307, performing model parameter adjustment on the highlight processing model according to the target loss function to obtain a trained highlight processing model.

In the embodiment of the application, the model parameters of the highlight processing model are adjusted according to the target loss function so as to obtain the trained highlight processing model, the parameter adjustment of the highlight processing model is realized through repeated iterative training, and finally the training of the model is completed, and the target loss function is formed by weighting and overlapping the two loss functions and carrying out joint optimization, so that the network is guided to simultaneously maintain similarity in image content and image texture structure, and the highlight elimination and texture restoration effects are optimized.

In the method for training the highlight processing model, the target loss function of the highlight processing model comprises a first loss function corresponding to the highlight detection network and a second loss function corresponding to the texture restoration network, the target loss function is formed by weighting and superposing the two loss functions, and the highlight processing model is subjected to joint optimization, so that the highlight processing model is guided to maintain similarity in image content and image texture structure at the same time, and the highlight processing model obtained through training can optimize highlight elimination and texture restoration effects.

Based on the above embodiments, another method for training a highlight model is provided in the embodiments of the present application, and fig. 6 is a schematic flow chart of another method for training a highlight model provided in the embodiments of the present application, as shown in fig. 6, the method includes the following steps:

Step 601, acquiring a sample highlight image and a real no-highlight image corresponding to the sample highlight image.

Step 602, inputting the sample highlight image into a highlight detection network of a highlight processing model to obtain a highlight mask image.

And 603, subtracting the sample highlight image and the highlight mask image to obtain a first highlight-free image without texture restoration.

The principle of steps 601 to 603 is the same with reference to the explanation in the foregoing embodiments, and will not be repeated here.

Step 604, inputting the first highlight-free image into a first feature extraction layer of the learning network for feature extraction, so as to obtain a first background texture feature map.

As an example, fig. 7 is a schematic structural diagram of a learning network provided in an embodiment of the present application, and as shown in fig. 7, the learning network includes a first feature extraction layer, a second feature extraction layer, and a third feature extraction layer, where an input of the first feature extraction layer is a first highlight-free image, and an output of the first feature extraction layer is a first background texture feature map. The first background texture feature map carries background texture features of the highlight-free area.

And step 605, inputting the high-light mask image and the sample high-light image into a second feature extraction layer of the learning network for processing to obtain a second background texture feature map.

The second feature extraction layer is used for carrying out feature extraction on the position and the intensity of a highlight region in the highlight mask image, so as to obtain a second background texture feature image, wherein the second background texture feature image carries the background texture features of the highlight region.

As an implementation manner, the second feature extraction layer includes a first feature extraction sub-layer and a second feature extraction sub-layer, the highlight mask image is input into the first feature extraction sub-layer to perform feature extraction, so as to obtain a highlight feature map, wherein the highlight feature map includes highlight position features and highlight intensity features, the sample highlight image is input into the second feature extraction sub-layer to perform feature extraction, so as to obtain a sample feature map, and further, each pixel unit in the sample feature map is multiplied by a corresponding pixel unit in the highlight feature map by pixel unit, so as to obtain a second background texture feature map. As an implementation manner, the gray value of each pixel unit in the sample feature map and the gray value of the corresponding pixel unit in the highlight feature map can be subjected to pixel unit-by-pixel dot multiplication, so that the background texture features of the non-highlight region are removed, and the background texture features of the highlight region are reserved.

The pixel unit includes at least one pixel point.

Step 606, fusing the first background texture feature map and the second background texture feature map to obtain a target background texture feature map.

In the embodiment of the application, the first background texture feature map and the second background texture feature map are fused through a fusion function in a selective fusion module (Selective Fusion Block, SFB) to obtain a target background texture feature map.

As an example, taking fig. 7A as an example, the first feature extraction layer comprises a convolution layer with a convolution kernel of 3, an output channel of 48, a convolution step of 1, s representing the Sigmoid operator,representing pixel dot multiplication operations. Highlight image I of sample from highlight detection network _highlight The corresponding highlight distribution is learned to obtain a highlight mask image I _hd And by subtracting I _hd Generating corresponding image delustering results, i.e. first non-highlight image I _hr I.e. comprising highlight region texture features. Will I _hd 、I _hr And I _highlight As input, from I _hd Medium learning high light distribution, guiding network from I _highlight Extracting accurate background texture information to obtain a second background texture feature map f _ti Namely, the background texture repairing method comprises the characteristic of the highlight region texture and assists in repairing the background texture. In connection with fig. 7A, the expression process of the target background feature map can be expressed as:

f _hr ＝F _B (I _hr )；

f _ALM ＝F _SFB (f _ti ,f _hr )。

Wherein F is _R Feature embedding function for the second feature extraction sub-layer, comprising two convolution layers for learning I _highlight Is embedded in the expression. F (F) _R (. Cndot.) is a first feature extraction layer comprising a convolution layer for extracting I _hr Characteristic information f of (1) _hr 。F _M (. Cndot.) is a mask generating function of the first feature extraction layer, comprising three convolution layers and a Sigmoid operator, to obtain I _highlight High light position and intensity information of the light source. From I by pixel multiplication method _highlight Learning background texture information f in an embedded representation of (a) _ti And (5) assisting the background texture reconstruction process. F (F) _SFB (. Cndot.) represents the fusion function in the selective fusion module (Selective Fusion Block, SFB) that is equivalent to the SFB module in the dynamic association module script. Final ALM output enhanced background feature representation f _ALM Then the texture reconstruction sub-network adopts f _ALM As input, high quality texture reconstruction is achieved by a U-network of encoders and decoders.

Step 607, inputting the target background texture feature map into a texture restoration network of the highlight processing model to obtain a predicted highlight-free image after texture restoration.

Step 607 may be the same as the above embodiment, and is not repeated here.

Step 608, performing model training on the highlight processing model according to the difference between the first highlight-free image and the real highlight-free image and according to the difference between the predicted highlight-free image and the real highlight-free image, so as to obtain a trained highlight processing model.

Step 608 may refer to the explanation of the foregoing embodiments, and the principles are the same, which is not repeated here.

As an example, as shown in fig. 7B, the images in fig. 7B are obtained by inputting a trained highlight processing model to the highlight image in the test set, and it can be seen from the figure that after the images are processed by the highlight processing model, the textures of the highlight region are reasonably repaired and the color information of the non-highlight region is maintained.

In the training method of the highlight processing model, a two-stage neural network model is used, namely, two U-shaped networks, namely, a highlight detection network and a texture restoration network are adopted, the image highlight detection and the image texture restoration tasks are respectively completed, and a learning network is adopted to respectively learn background texture features of highlight areas and non-highlight areas after the highlight detection network, so that the subsequent texture restoration network is guided to carry out texture restoration, wherein the loss function of the model is obtained, the loss functions of the two networks are weighted and overlapped to carry out joint optimization, so that the network is guided to simultaneously maintain similarity in image content and image texture structures, and the highlight elimination and texture restoration effects are optimized, so that the highlight processing model obtained through training has better highlight elimination and texture restoration capabilities.

Based on the above embodiments, fig. 8 is a flowchart of an image processing method according to an embodiment of the present application, as shown in fig. 8, the method includes the following steps:

step 801, a highlight image to be processed is acquired.

The highlight image, i.e. the image, contains the highlight, and the reason for the formation of the highlight may be the same as that explained in the foregoing embodiments, and will not be repeated here.

Step 802, inputting the highlight image into a trained highlight processing model, so that the highlight processing model performs highlight elimination and texture reconstruction on the highlight image to obtain a highlight-free image.

The highlight processing model is trained by the model training method described in the foregoing method embodiment to obtain a trained highlight processing model, where the training method of the highlight processing model may refer to the explanation in the foregoing embodiment, and the principle is the same and will not be repeated here.

The image surface picture of the identity card with the highlight and the text texture still visible under the highlight is input into the highlight processing model, so that the image surface picture with the highlight removed and the text texture information and the color information restored can be output, the texture of the highlight area is reasonably restored, and the color information of the non-highlight area is maintained.

As an example, as shown in fig. 9, the left image in fig. 9 is a highlight image, and the highlight-free image obtained after highlight elimination and texture restoration by using the highlight processing model obtained by training is the right image in fig. 9, so that highlight elimination is realized, high-precision texture restoration is realized, and color restoration is realized.

In the image processing method of the embodiment of the application, when the highlight processing model obtained by training by the training method of the embodiment is used for processing the highlight image, the similarity can be kept on the image content and the image texture structure at the same time, the highlight eliminating and texture repairing effects are optimized, and meanwhile, the color difference problem is improved.

In order to achieve the above embodiments, the embodiments of the present application further provide a training device for a highlight processing model.

Fig. 10 is a schematic structural diagram of a training device for a highlight processing model according to an embodiment of the present application.

As shown in fig. 10, the apparatus may include:

the obtaining module 1001 is configured to obtain a sample highlight image, and a real no-highlight image corresponding to the sample highlight image.

And the recognition module 1002 is configured to input the sample highlight image into a highlight detection network of a highlight processing model, so as to obtain a highlight mask image.

And a processing model 1003, configured to subtract the sample highlight image and the highlight mask image, so as to obtain a first highlight-free image without performing texture repair.

The learning module 1004 is configured to input the first non-highlight image, the highlight mask image, and the sample highlight image into a learning network of the highlight processing model, and obtain a target background texture feature map.

And a repair module 1005, configured to input the target background texture feature map into a texture repair network of the highlight processing model, to obtain a predicted highlight-free image after texture repair.

And a training module 1006, configured to perform model training on the highlight processing model according to the difference between the first highlight-free image and the real highlight-free image, and according to the difference between the predicted highlight-free image and the real highlight-free image, so as to obtain the trained highlight processing model.

Further, in an implementation manner of the embodiment of the present application, the training module 1006 is specifically configured to:

determining a target loss function based on the difference between the first non-highlight image and the actual non-highlight image and based on the difference between the predicted non-highlight image and the actual non-highlight image;

And according to the target loss function, carrying out model parameter adjustment on the highlight processing model to obtain a trained highlight processing model.

In one implementation of the embodiment of the present application, the training module 1006 is specifically configured to:

determining a first loss function of the highlight detection network according to the difference between the first highlight-free image and the real highlight-free image; determining a second loss function of the texture repair network according to the difference between the predicted highlight-free image and the real highlight-free image; and carrying out weighted synthesis on the first loss function and the second loss function according to the set first weight coefficient to obtain the target loss function.

determining a first content difference between the first non-highlight image and the real non-highlight image, and a first texture difference between the first non-highlight image and the real non-highlight image; acquiring a set second weight coefficient; and according to the second weight coefficient, carrying out weighted synthesis on the first content difference and the first texture difference to obtain a first loss function of the highlight detection network.

determining a second content difference between the predicted highlight-free image and the true highlight-free image, and a second texture difference between the predicted highlight-free image and the true highlight-free image; acquiring a set third weight coefficient; and according to the third weight coefficient, carrying out weighted synthesis on the second content difference and the second texture difference to obtain a second loss function of the texture repair network.

In one implementation of the embodiment of the present application, the learning module 1004 is specifically configured to:

inputting the first highlight-free image into a first feature extraction layer of the learning network to perform feature extraction to obtain a first background texture feature map; the first background texture feature map carries background texture features of a highlight-free area; inputting the highlight mask image and the sample highlight image into a second feature extraction layer of the learning network for processing to obtain a second background texture feature map; wherein the second background texture feature map carries background texture features of a highlight region; and fusing the first background texture feature map and the second background texture feature map to obtain the target background texture feature map.

In one implementation manner of the embodiment of the present application, the second feature extraction layer includes a first feature extraction sub-layer and a second feature extraction sub-layer, and the learning module 1004 is specifically configured to:

inputting the highlight mask image into the first feature extraction sub-layer for feature extraction to obtain a highlight feature map; the highlight characteristic map comprises highlight position characteristics and highlight intensity characteristics; inputting the sample highlight image into the second feature extraction sub-layer for feature extraction to obtain a sample feature map; and carrying out pixel unit-by-pixel unit dot multiplication on each pixel unit in the sample feature map and a corresponding pixel unit in the highlight feature map to obtain a second background texture feature map.

In one implementation of the embodiment of the present application, the process model 1003 is specifically configured to:

and subtracting each pixel unit in the sample highlight image from the corresponding pixel unit in the highlight mask image to obtain the first highlight-free image.

It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and will not be repeated here.

In the training device of the highlight processing model of the embodiment of the application, a sample highlight image and a real highlight-free image corresponding to the sample highlight image are obtained, the sample highlight image is input into a highlight detection network of the highlight processing model to obtain a highlight mask image, the sample highlight image and the highlight mask image are subtracted to obtain a first highlight-free image which is not subjected to texture restoration, the first highlight-free image, the highlight mask image and the sample highlight image are input into a learning network of the highlight processing model to obtain a target background texture feature image, the target background texture feature image is input into a texture restoration network of the highlight processing model to obtain a predicted highlight-free image after texture restoration, according to the difference between the first highlight-free image and the real highlight-free image and the difference between the predicted highlight-free image and the real highlight-free image, model training is carried out on a highlight processing model to obtain a trained highlight processing model, a learning network is added between a highlight detection network and a texture restoration network, so that the learning network learns global background characteristics comprising a highlight region and a highlight-free region based on the output of the highlight detection network, and is used as the input of the texture restoration network to guide the texture restoration network to carry out texture restoration so as to reconstruct nonlinear change of the highlight region, thereby generating a high-quality and texture-rich prediction highlight-free image, carrying out model training on the highlight processing model based on the difference between a first highlight-free image and a real highlight-free image and according to the difference between the prediction highlight-free image and the real highlight-free image, and obtaining the trained highlight processing model, so that the highlight processing model obtained by training can output high-quality, and (5) a highlight-free image with rich textures.

In order to achieve the above embodiments, the embodiments of the present application further provide an image processing apparatus.

Fig. 11 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

As shown in fig. 11, the apparatus may include:

the acquisition module 1101 is configured to acquire a highlight image to be processed.

The processing module 1102 is configured to input the highlight image into a trained highlight processing model, so that the highlight processing model performs highlight elimination and texture reconstruction on the highlight image to obtain a highlight-free image; the highlight processing model is trained by the model training device according to the previous embodiment, so as to obtain the trained highlight processing model.

In the image processing method of the embodiment of the application, when the highlight processing model obtained by training by the training method of the embodiment is used for processing the highlight image, the similarity can be kept on the image content and the image texture structure at the same time, the highlight eliminating and texture repairing effects are optimized, and the repaired color difference problem is improved.

In order to implement the above embodiment, the application further proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the above method embodiment when executing the program.

In order to implement the above-mentioned embodiments, the present application also proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements a method as described in the foregoing method embodiments.

In order to implement the above-described embodiments, the present application also proposes a computer program product having a computer program stored thereon, which, when being executed by a processor, implements a method as described in the method embodiments described above.

Fig. 12 is a block diagram of an electronic device according to an embodiment of the present application. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 12, an electronic device 800 may include one or more of the following components: a processing component 803, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 803 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 803 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 803 may include one or more modules that facilitate interactions between the processing component 803 and other components. For example, the processing component 803 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 803.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 806 provides power to the various components of the electronic device 800. Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 800.

The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 803 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,4G, or 5G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of electronic device 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. A method of training a highlight treatment model, comprising:

2. The method of claim 1, wherein the model training the highlight processing model based on the difference between the first highlight free image and the true highlight free image and based on the difference between the predicted highlight free image and the true highlight free image to obtain a trained highlight processing model comprises:

3. The method of claim 2, wherein said determining a target loss function based on a difference between said first no-highlight image and said actual no-highlight image, and based on a difference between said predicted no-highlight image and said actual no-highlight image, comprises:

Determining a first loss function of the highlight detection network according to the difference between the first highlight-free image and the real highlight-free image;

determining a second loss function of the texture repair network according to the difference between the predicted highlight-free image and the real highlight-free image;

and carrying out weighted synthesis on the first loss function and the second loss function according to the set first weight coefficient to obtain the target loss function.

4. The method of claim 3, wherein said determining a first loss function of said high light detection network based on a difference between said first non-high light image and said true non-high light image comprises:

determining a first content difference between the first non-highlight image and the real non-highlight image, and a first texture difference between the first non-highlight image and the real non-highlight image;

acquiring a set second weight coefficient;

and according to the second weight coefficient, carrying out weighted synthesis on the first content difference and the first texture difference to obtain a first loss function of the highlight detection network.

5. The method of claim 3, wherein said determining a second loss function of the texture repair network based on the determined difference between the predicted highlight-free image and the actual highlight-free image comprises:

Determining a second content difference between the predicted highlight-free image and the true highlight-free image, and a second texture difference between the predicted highlight-free image and the true highlight-free image;

acquiring a set third weight coefficient;

and according to the third weight coefficient, carrying out weighted synthesis on the second content difference and the second texture difference to obtain a second loss function of the texture repair network.

6. The method of claim 1, wherein inputting the first non-highlight image, the highlight mask image, and the sample highlight image into a learning network of the highlight processing model to obtain a target background texture feature map, comprising:

inputting the first highlight-free image into a first feature extraction layer of the learning network to perform feature extraction to obtain a first background texture feature map; the first background texture feature map carries background texture features of a highlight-free area;

inputting the highlight mask image and the sample highlight image into a second feature extraction layer of the learning network for processing to obtain a second background texture feature map; wherein the second background texture feature map carries background texture features of a highlight region;

And fusing the first background texture feature map and the second background texture feature map to obtain the target background texture feature map.

7. The method of claim 6, wherein the second feature extraction layer comprises a first feature extraction sub-layer and a second feature extraction sub-layer, the inputting the high-gloss mask image and the sample high-gloss image into the second feature extraction layer of the learning network for processing to obtain a second background texture feature map, comprising:

inputting the highlight mask image into the first feature extraction sub-layer for feature extraction to obtain a highlight feature map; the highlight characteristic map comprises highlight position characteristics and highlight intensity characteristics;

inputting the sample highlight image into the second feature extraction sub-layer for feature extraction to obtain a sample feature map;

and carrying out pixel unit point multiplication on each pixel unit in the sample feature map and a corresponding pixel unit in the highlight feature map to obtain the second background texture feature map.

8. The method of any of claims 1-7, wherein said subtracting said sample highlight image and said highlight mask image to obtain said first non-highlight image without texture repair comprises:

9. An image processing method, comprising:

acquiring a highlight image to be processed;

inputting the highlight image into a trained highlight processing model so that the highlight processing model performs highlight elimination and texture reconstruction on the highlight image to obtain a highlight-free image; wherein the highlight treatment model is trained using the model training method according to any one of claims 1-8 to obtain the trained highlight treatment model.

10. A training device for a highlight treatment model, comprising:

11. An image processing apparatus, comprising:

the processing module is used for inputting the highlight image into a trained highlight processing model so that the highlight processing model can perform highlight elimination and texture reconstruction on the highlight image to obtain a highlight-free image; wherein the highlight treatment model is trained using the model training apparatus of claim 10 to obtain the trained highlight treatment model.

12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-8 or the method of any one of claim 9 when the program is executed.

13. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the method according to any of claims 1-8 or implements the method according to any of claims 9.