CN114529469A

CN114529469A - Training method, device, equipment and medium of image enhancement model and image enhancement method, device, equipment and medium

Info

Publication number: CN114529469A
Application number: CN202210151826.2A
Authority: CN
Inventors: 钟志育; 李斌
Original assignee: Guangzhou Lewubian Education Technology Co ltd
Current assignee: Guangzhou Lewubian Education Technology Co ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-24

Abstract

The invention discloses a method, a device, equipment and a medium for training an image enhancement model and enhancing images. Obtaining a target image sample set; the target image sample set includes: a plurality of first target images and a plurality of second target images; the contrast of the first target image is lower than the contrast of the second target image; inputting target images in the target image sample set into a loop generation countermeasure network model to obtain predicted images; the generation network for circularly generating the confrontation network model comprises the following steps: a plurality of U-type networks; training a loop according to a loss function formed by the predicted image and the target image to generate parameters of a countermeasure network model; and returning to execute the operation of inputting the target images in the target image sample set into the counter network model generated circularly to obtain the predicted images until the target image enhancement model is obtained, and improving the counter generation network of the traditional counter network model generated circularly, so that the image enhancement precision and the enhancement effect can be improved.

Description

Training method, device, equipment and medium of image enhancement model and image enhancement method, device, equipment and medium

Technical Field

The invention relates to the technical field of image processing, in particular to a method, a device, equipment and a medium for training and enhancing an image enhancement model.

Background

In real life, it is often necessary to scan documents. There are many applications available on the market that can implement scanning, and the core step of document scanning is to obtain a high-contrast gray image through image enhancement processing. Through the image enhancement process, noise is removed but some information in the picture is lost.

The current common image enhancement methods are mainly divided into the traditional method and the machine learning method. The traditional method has the defects that the comprehensive coverage of various conditions is difficult, and the actual effect is obviously dependent on the change of parameters. The existing machine learning method includes: supervised machine learning methods and unsupervised learning methods. The supervised learning method needs to manually label a large amount of paired training data to obtain good results, and the data with clear images are manually labeled with high difficulty and high cost. The unsupervised learning method generally uses a counterproductive network, and the image enhancement effect is to be improved.

Disclosure of Invention

The invention provides a training and image enhancement method, device, equipment and medium of an image enhancement model, which are used for solving the problem that the image enhancement effect of the traditional image enhancement model needs to be improved, and improving the traditional confrontation generation network so as to improve the image enhancement precision and the image enhancement effect.

According to an aspect of the present invention, there is provided a training method of an image enhancement model, including:

acquiring a target image sample set; the target image sample set comprises: a plurality of first target images and a plurality of second target images; the contrast of the first target image is lower than the contrast of the second target image;

inputting target images in the target image sample set into a loop to generate a confrontation network model to obtain a predicted image; wherein the cyclically generating a generation network of the antagonistic network model comprises: a plurality of U-type networks;

training the loop to generate parameters of an antagonistic network model according to a loss function formed by the predicted image and the target image;

and returning to execute the operation of inputting the target images in the target image sample set into the loop generation countermeasure network model to obtain predicted images until a target image enhancement model is obtained.

According to another aspect of the present invention, there is provided an image enhancement method including:

acquiring an image to be processed;

and carrying out image enhancement processing on the image to be processed through a target image enhancement model to obtain an enhanced image.

According to another aspect of the present invention, there is provided a training apparatus for an image enhancement model, comprising:

the acquisition module is used for acquiring a target image sample set; the target image sample set comprises: a plurality of first target images and a plurality of second target images; the contrast of the first target image is lower than the contrast of the second target image;

the prediction module is used for inputting the target images in the target image sample set into a loop to generate a confrontation network model to obtain predicted images; wherein the cyclically generating a generation network of the antagonistic network model comprises: a plurality of U-type networks;

the training module is used for training the parameters of the loop generation countermeasure network model according to the loss function formed by the predicted image and the target image;

and the iteration module is used for returning and executing the operation of inputting the target images in the target image sample set into the loop generation countermeasure network model to obtain predicted images until a target image enhancement model is obtained.

According to another aspect of the present invention, there is provided an image enhancement apparatus including:

the acquisition module is used for acquiring an image to be processed;

and the processing module is used for carrying out image enhancement processing on the image to be processed through the target image enhancement model to obtain an enhanced image.

According to another aspect of the present invention, there is provided a computer apparatus comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a method of training an image enhancement model or a method of image enhancement according to any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement a training method or an image enhancement method of an image enhancement model according to any one of the embodiments of the present invention when the computer instructions are executed.

According to the technical scheme of the embodiment of the invention, a target image sample set is obtained; the target image sample set includes: a plurality of first target images and a plurality of second target images; the contrast of the first target image is lower than the contrast of the second target image; inputting target images in a target image sample set into a loop to generate a countermeasure network model to obtain a predicted image; the generation network for circularly generating the confrontation network model comprises the following steps: a plurality of U-type networks; generating parameters of an impedance network model according to a loss function training loop formed by the predicted image and the target image; and returning to execute the operation of generating the predicted image for the anti-network model by inputting the target image in the target image sample set in a circulating manner until the target image enhancement model is obtained, solving the problem that the image enhancement effect of the traditional image enhancement model needs to be improved, improving the traditional anti-generation network and achieving the technical effect of improving the image enhancement precision and the enhancement effect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a training method of an image enhancement model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the structure of a cycle generating countermeasure network;

FIG. 3 is a generation network for cyclically generating a countermeasure network model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a conventional U-Net network;

fig. 5 is a schematic network structure diagram of a U-type network according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a void volume block according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an upsampling block in a generation network according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a downsample block in a generation network according to an embodiment of the present invention;

FIG. 9 is a flowchart of an image enhancement method according to a second embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a training apparatus for an image enhancement model according to a third embodiment of the present invention;

fig. 11 is a schematic structural diagram of an image enhancement apparatus according to a fourth embodiment of the present invention;

fig. 12 is a schematic structural diagram of a computer device for implementing the training method of the image enhancement model according to the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Fig. 1 is a flowchart of an embodiment of the present invention, which provides a training method for an image enhancement model, where the embodiment is applicable to a case of training the image enhancement model, and the method can be performed by a training apparatus for the image enhancement model, where the training apparatus for the image enhancement model can be implemented in a form of hardware and/or software, and the training apparatus for the image enhancement model can be configured in a computer device. As shown in fig. 1, the method includes:

s110, obtaining a target image sample set; the target image sample set includes: a plurality of first target images and a plurality of second target images; the contrast of the first target image is lower than the contrast of the second target image.

The target image sample set is a training sample set for training the image enhancement model, and may include a plurality of first target images and a plurality of second target images, and the contrast of the first target images is lower than that of the second target images. Specifically, the first target image may be a low-contrast image, and the second target image may be a high-contrast image. It should be noted that the first target image and the second target image do not need to be correlated.

For example, the mode of obtaining the first target image may be to scan a paper document through a scanning device to obtain the first target image, or may be to shoot the paper document through a computer device with an image acquisition function to obtain the first target image, or to perform preprocessing such as filtering or intercepting on the shot image to obtain the first target image. The manner of obtaining the second target image may be to convert an electronic document in pdf or Word format corresponding to the paper document into a picture format such as png with a fixed resolution to obtain the second target image. The embodiment of the invention does not limit the way of obtaining the target image sample set. It should be noted that the first target image has a lower contrast but generally has a higher resolution.

S120, inputting target images in the target image sample set into a loop to generate a confrontation network model to obtain a predicted image; wherein the generating network for cyclically generating the countermeasure network model includes: a plurality of U-type networks.

The cyclic generation countermeasure network CycleGan is an unsupervised learning model and is derived from the generation countermeasure network GAN. The generation of the confrontation network generally comprises a generation network and an identification network, wherein the generation network is responsible for generating the prediction data in the training process, so that the prediction data is difficult to distinguish from the real data, and the identification network tries to distinguish the prediction data from the real data, and the two are continuously confronted in the training process. As shown in FIG. 2, the cyclic generation countermeasure network employs two generation networks G and F, the first generation network G representing a generation network from the X domain to the Y domain, and two discrimination networks Dx and Dy, the second generation network F representing a generation network from the Y domain to the X domain, the first discrimination network D_XFor determining whether F (y) generated by the first generating network F belongs to the X domain,second authentication network D_YIt is determined whether G (x) generated by the second generation network G belongs to the Y domain. X and Y represent two different domains, respectively, the X domain may be represented as a sample image domain and the Y domain may be represented as a predicted image domain. The image output by the first generation network may be used as an input image for the second generation network, the output image of the second generation network should match the original input image. For example, the X field may represent a first target image and the Y field may represent a second target image.

In the cyclic generation countermeasure network model, the two generation networks have the same structure. In a traditional cyclic generation countermeasure network model, a commonly used generation countermeasure network is a U-shaped network U-Net, the U-Net adopts a full convolution neural network, a left network is a feature extraction network formed by a plurality of lower sampling blocks, and a right network is a feature fusion network formed by a plurality of upper sampling blocks.

In the conventional loop generation countermeasure network model, generally, for the processing of lower resolution images, and for the processing of higher resolution images, the two generation networks of the loop generation countermeasure network model in the embodiment of the present invention respectively include a plurality of U-type networks, and the plurality of U-type networks are connected as the generation networks, so as to improve the accuracy of the generation model.

For example, the plurality of U-type networks may be connected to each other by two adjacent U-type networks, specifically, the input and output of the current U-type network are connected to the input of the next U-type network, and the output of each U-type network is used as one output of the generation network, so that the generation network has a plurality of outputs, and the number of the outputs is the same as the number of the U-type networks included in the generation network. The selection of the number of the U-type networks may be determined according to the requirement of the actual application scenario on the accuracy, and the embodiment of the present invention does not set any limitation to this.

Specifically, the target image in the target image sample set is input to generate a confrontation network model in a circulating manner to obtain a predicted image, and the input target image may be a first target image or a second target image.

And S130, training a loop according to a loss function formed by the predicted image and the target image to generate parameters of the antagonistic network model.

Specifically, a first target image is input into a loop generation countermeasure network model, a prediction image obtained by the loop generation countermeasure network model and the first target image are matched to form a loss function between the prediction image and the first target image, parameters of each generation network in the loop generation countermeasure network model are trained, and the training loop generation countermeasure network model is a forward training loop process from an X domain to a Y domain of the loop consistency countermeasure network. Similarly, the reverse training loop process from the Y domain to the X domain is reversed, the second target image is input into the loop to generate a predicted image obtained by the confrontation network model, the predicted image and the second target image are matched to form a loss function between the predicted image and the second target image, and the training loop generates parameters of each generation network in the confrontation network model.

Illustratively, the loss function for the formation of the predicted image and the target image may include: a first pair loss corresponding to the first generation network and a second pair loss corresponding to the second generation network, the first pair loss L_{GAN(G，DY，X，Y)}Comprises the following steps:

L_{GAN(G，DY，x，Y)}＝E_y～pdata(y)[log D_Y(y)]+E_x～pdata(x)[log(1-D_Y(G(x))]；

the second challenge loss is:

L_{GAN(F，DX，Y，X)}＝E_x～pdata(x)[log D_X(x)]+E_y～pdata(y)[log(1-D_X(F(y))]；

wherein D is_XRepresenting a first authentication network and D_YRepresenting a second authentication network, G a first generation network, and F a second generation network.

Since the first generation network G and the second generation network F are easily cheated by jointly identifying the networks in the unsupervised training of the recurrent antagonistic network model, training the CycleGAN using GAN loss alone cannot guarantee the consistency of the recurrent, and an additional recurrent consistency loss is required to implement this attribute, which is defined as the difference between the input value x and the forward prediction F (G (x)) and the input value y and the forward prediction G (F (y)). The greater the difference, the greater the distance between the predicted image and the target image. Ideally, our network minimizes this loss. The cycle consistent losses are:

L_cyc(G，F)＝E_x～pdata(x)[||F(G(x))-x||₁]+E_y～pdata(y)[||G(F(y))-y||₁]；

wherein L is_cyc(G，F)Loss is consistent for circulation; e_x～pdata(x)Is the difference between the input value x and the forward prediction F (G (x)), E_y～pdata(y)Is the difference between the input value y and the forward prediction G (f (y)). The cyclic consistent loss may be an L1 norm loss function, i.e., an average absolute error function; or may be a structural similarity function (SSIM) of pictures, or may be a multi-scale structural similarity function (MS-SSIM) of pictures; a weighted sum of various functions is also possible.

In summary, the loss function for the predicted image and the target image formation is:

L_{(G，F，DX，DY)}＝L_{GAN(G，DY，x，Y)}+L_{GAN(F，DX，Y，X)}+λL_cyc(G，F)；

wherein L is_{(G，F，DX，DY)}As a loss function, L_{GAN(GDY，X，Y)}To combat the loss, L_{GAN(F，Dx，Y，X)}Is a second countermeasure loss; λ is the weight of the cyclic consistent loss.

And S140, returning to execute the operation of inputting the target image into a loop generation confrontation network model to obtain a predicted image until a target image enhancement model is obtained.

Specifically, the target images in the target image sample set are sequentially input into a loop generation countermeasure network model to obtain predicted images, the operation of training the loop generation parameters of the countermeasure network model according to the loss functions formed by the predicted images and the target images is performed iteratively until the loss functions converge or preset training times are reached, and the loop generation countermeasure network model at the moment is used as a target image enhancement model for image enhancement.

According to the training method of the image enhancement model provided by the embodiment of the invention, a target image sample set is obtained; the target image sample set includes: a plurality of first target images and a plurality of second target images; the contrast of the first target image is lower than the contrast of the second target image; inputting target images in the target image sample set into a loop generation countermeasure network model to obtain predicted images; wherein the generating network for circularly generating the antagonistic network model comprises: a plurality of U-type networks; training a loop to generate parameters of a countermeasure network model according to a loss function formed by the predicted image and the target image; and returning to execute the operation of inputting the target images in the target image sample set into the antagonistic network model generated circularly to obtain the predicted images until the target image enhancement model is obtained, and improving the antagonistic generation network of the traditional antagonistic network model generated circularly to obtain the target image enhancement model comprising a plurality of U-shaped networks, so that the image enhancement precision and the enhancement effect can be improved.

On the basis of the above embodiment, the structure of the loop generation countermeasure network model is further refined. It is to be noted herein that, in order to make the description brief, only the differences from the above-described embodiment are described in the modified embodiment.

In one embodiment, the loop generating the antagonistic network model comprises: a first generation network of sample image domains to predicted image domains and a second generation network of predicted image domains to sample image domains; the first generating network and the second generating network are respectively composed of a first U-shaped network and a second U-shaped network, and the input end of the first U-shaped network and the output end of the first U-shaped network are respectively connected with the input end of the second U-shaped network; the first U-type network and the second U-type network have the same network structure.

In the present embodiment, on the basis of the structure of the loop generation countermeasure network model shown in fig. 2, the first generation network G and the second generation network F are each formed by connecting two U-type networks having the same network structure.

Specifically, as shown in fig. 3, the two same U-type networks are a first U-type network and a second U-type network, an input end of the first U-type network and an output end of the first U-type network are connected to an input end of the second U-type network, respectively, and an input and an output of the first U-type network are used as an input of the second U-type network.

When outputting the prediction image, the output image of the second U-type network is taken as the prediction image, and the output result of the first U-type network is taken as the prior result only. And in the parameter training process of circularly generating the antagonistic network model, calculating a loss function value by using the output of the first U-shaped network and the output of the second U-shaped network together, and optimizing the generated network.

According to the embodiment, the two U-shaped networks are connected to serve as the generation network for circularly generating the confrontation network model, so that the image enhancement precision of the circularly generated confrontation network model is improved, the method is suitable for enhancing the processing scene of the document photo with extremely high precision requirement, and the shot picture is clear and visible.

In one embodiment, the network structure of the U-type network includes: a first preset number of consecutive upsampling blocks and a first preset number of consecutive downsampling blocks; wherein, adopt the hole convolution piece to connect between continuous upsampling piece and the continuous downsampling piece, the hole convolution piece includes: a second predetermined number of hole convolutions.

Illustratively, fig. 4 is a network structure of a conventional U-Net, which requires a deep sampling process of 7 down-samples and 7 last-samples for 256 × 256 × 3 images to enable the output of the U-Net network to cover a larger field of view of picture information. Further, using two U-type networks as a generation network for generating an opposing network model in a loop may result in the network being too deep, resulting in a significant increase in the hardware requirements needed for the same size of photograph. Therefore, in order to reduce the occupation of hardware resources, a hole convolution block is added between the convolution process of down sampling and the convolution process of up sampling, so that a single convolution can obtain a very large visual field, the visual field range is guaranteed by deep sampling in the traditional U-shaped network U-Net, and meanwhile, the sampling times are reduced to reduce the consumption of the hardware resources.

Specifically, as shown in fig. 5, the network structure of the U-type network includes: a first preset number of consecutive upsampling blocks and a first preset number of consecutive downsampling blocks; and the continuous up-sampling blocks and the continuous down-sampling blocks are connected by adopting a hole convolution block. The cavity convolution blocks are formed by convolution and splicing of a second preset number of cavities.

In the forward propagation process, the output of the lower sampling block is used as the input result of the first cavity convolution, the input result and the output result of each cavity convolution are used as the input result of one splicing in the operation process of the cavity convolution block, the output of each splicing is used as the input result of the next cavity convolution, the output result of the last splicing is used as the output result of the cavity convolution block, and the output result is input into the continuous upper sampling blocks.

It should be noted that the first preset number of the upsampling blocks and the downsampling blocks and the second preset number of the hollow convolution in the hollow convolution block are related to each other, and the second preset number may be determined by the size of the picture after upsampling the picture by the upsampling blocks of the continuous first preset number. The first preset number and the second preset number may be set according to actual requirements, and the embodiment of the present invention is not limited thereto.

Exemplarily, fig. 6 is a schematic structural diagram of a hole convolution block, as shown in fig. 6, the hole convolution block is formed by splicing 4 hole convolutions, a step length of each hole convolution is 1, a convolution kernel is 3, a distance between two adjacent values, i.e., a distance between two adjacent values, in a hole convolution operation process is set as a coefficient L, and when L is 1, the coefficient L is the same as a normal convolution. For example, the coefficients of the convolution of the four holes in the hole convolution block shown in fig. 6 are 1, 2, 3, and 5, respectively.

In the embodiment, the continuous up-sampling blocks and the continuous down-sampling blocks are connected by the hollow rolling blocks, so that the view range is ensured by deep-level sampling in the traditional U-shaped network U-Net, and the sampling times are reduced to reduce the consumption of hardware resources.

The pictures generated by the network model are often shaded with checkerboard patterns, which is especially obvious in the process of enhancing the image of the document photo, especially in the generated sharp image where the fonts are dense, that is, the part of the picture where the color brightness changes frequently and severely is most prominent. This problem also arises with the U-Net generation network employed in the conventional loop generation countermeasure network model.

In response to the above problem, in one embodiment, the upsampling block comprises: the device comprises a first activation unit, a bilinear interpolation unit, a convolution unit and a first normalization unit; the downsampling block includes: the device comprises a second activation unit, a hole convolution unit, a maximum pooling unit and a second normalization unit.

Fig. 7 is a schematic diagram of a structure of upsampling blocks in a generation network provided by an embodiment of the present invention, and a picture generated by a loop against a network model often has a checkerboard-like shading pattern due to a transposed convolution (deconvolution) adopted by the upsampling block part in the generation network. When the step size and the step size in the transposition convolution are not completely matched, a superposed part occurs to cause shadow in a result, and complicated test and training are needed for reasonably selecting the step size and the step size in the transposition convolution. Therefore, the embodiment of the invention improves the up-sampling block, the up-sampling is firstly carried out in the up-sampling block through the bilinear interpolation unit to enable the length and the width of the input target picture to be doubled, and then the common convolution is used to replace the original transposition convolution operation by the scheme. The bilinear interpolation is a common interpolation method when the size is converted in the image processing, and the bilinear interpolation method is not described in detail in the invention.

Specifically, as shown in fig. 7, the upsampling block in the embodiment of the present invention includes: the device comprises a first activation unit, a bilinear interpolation unit, a convolution unit and a first normalization unit. The activation function in the first activation unit may be, for example, a ReLU function, the convolution kernel of the convolution unit being 5 × 5 with a step size of 1.

Accordingly, the downsampling block is also adjusted to accommodate the improved upsampling block. The downsampling block in the embodiment of the present invention is shown in fig. 8, and the downsampling block specifically includes a second activation unit, a hole convolution unit, a maximum pooling unit, and a second normalization unit. The second activation unit may be, for example, an leakrelu activation function, which is a common improvement of ReLU, and a small gradient is also added in an interval less than 0, so that necrosis of the network node can be prevented to some extent. The normalization unit is a common layer following the convolution, and can generally accelerate the convergence speed of the network and improve the convergence stability.

In one embodiment, the loss function comprises: a first pair loss tolerance corresponding to a first generation network, a second pair loss tolerance corresponding to a second generation network, and a cyclical consistent loss between the first generation network and the second generation network; wherein the cycle consistent loss comprises a weighted sum of a multiscale picture structure similarity function and an average absolute error function of a first preset weight.

In particular, the image enhancement process is not a symmetric interconversion process. Part of the information is undoubtedly lost during the image enhancement, i.e. during the generation of the image by the first generator G. It is desirable to be able to discard all noise including sensor noise during image acquisition, blocking artifacts due to lossy compression during image storage, etc. However, in the process of generating the image by the second generator F, the second generator F tries to generate a picture close to the original image, that is, an image including various noises, because of the existence of the cyclic loss. This is certainly a significant disturbance for the entire model.

To this end, the cyclic agreement loss comprises a weighted sum of a multi-scale picture structure similarity function and an average absolute error function of a first preset weight. The multi-scale picture structure similarity function is an image quality evaluation index provided on the assumption that structured information is extracted when a human eye views a picture, and the multi-scale picture structure similarity function reflects the degree of image similarity seen by a human more than an average absolute error function (L1 loss function).

In one embodiment, the acquiring a sample set of target images comprises:

acquiring a paper document and an electronic document;

carrying out image acquisition on the paper document through image acquisition equipment to obtain a first target image in a target image sample set;

and converting the electronic document into a picture format to obtain a second target image in the target image sample set.

The image acquisition equipment can be equipment with an image acquisition function, such as scanning equipment, a camera and the like, and can also have certain image preprocessing functions such as filtering and the like.

Specifically, a document photo of a paper document is collected through an image collecting device, and the document photo is used as a first target image in a target image sample set; and converting the electronic document in the format of word, pdf or the like into the picture format of JPEG, PNG or the like with fixed resolution as a second target image in the target image sample set.

In order to reduce the interference to the model training, the first target image may be subjected to preprocessing such as cropping before being input into the model, so that the first target image has a fixed size, for example, 256 × 256, and the first target image has as little as possible of the portion other than the document.

Example two

Fig. 9 is a flowchart of an image enhancement method according to a second embodiment of the present invention, where this embodiment is applicable to image enhancement of a document image, and the method may be executed by an image enhancement apparatus, where the image enhancement apparatus may be implemented in a form of hardware and/or software, and the image enhancement apparatus may be configured in a computer device. As shown in fig. 9, the method includes:

and S210, acquiring an image to be processed.

Specifically, the mode of acquiring the image to be processed may be to acquire the image to be processed by a camera, or may be to acquire the image to be processed by interception.

S220, carrying out image enhancement processing on the image to be processed through the target image enhancement model to obtain an enhanced image.

The target image enhancement model is obtained by generating a confrontation network model through a target image sample set iterative training loop. The generating network for generating the countermeasure network model includes: a plurality of U-type networks. In the cyclic generation countermeasure network model, the two generation networks have the same structure. In a traditional cyclic generation countermeasure network model, a commonly used generation countermeasure network is a U-shaped network U-Net, the U-Net adopts a full convolution neural network, a left network is a feature extraction network formed by a plurality of lower sampling blocks, and a right network is a feature fusion network formed by a plurality of upper sampling blocks.

The image enhancement method provided by the embodiment of the invention comprises the steps of obtaining an image to be processed; carrying out image enhancement processing on the image to be processed through a target image enhancement model to obtain an enhanced image; the method comprises the steps of obtaining a target image enhancement model comprising a plurality of U-shaped networks by improving a countermeasure generation network in a traditional image enhancement model, and improving the image enhancement precision and the enhancement effect.

EXAMPLE III

Fig. 10 is a schematic structural diagram of a training apparatus for an image enhancement model according to a third embodiment of the present invention. As shown in fig. 10, the apparatus includes: an acquisition module 310, a prediction module 320, a training module 330, and an iteration module 340;

the obtaining module 310 is configured to obtain a target image sample set; the target image sample set comprises: a plurality of first target images and a plurality of second target images; the contrast of the first target image is lower than the contrast of the second target image;

a prediction module 320, configured to input a target image in the target image sample set into a loop to generate a countermeasure network model to obtain a predicted image; wherein the cyclically generating a generation network of the antagonistic network model comprises: a plurality of U-type networks;

a training module 330, configured to train parameters of the loop generation countermeasure network model according to a loss function formed by the predicted image and the target image;

and the iteration module 340 is configured to perform an operation of inputting the target image in the target image sample set into the loop generation countermeasure network model to obtain a predicted image until a target image enhancement model is obtained.

Optionally, the cyclically generating the antagonistic network model includes: a first generation network of sample image domains to predicted image domains and a second generation network of predicted image domains to sample image domains;

the first generating network and the second generating network are respectively composed of a first U-shaped network and a second U-shaped network, and the input end of the first U-shaped network and the output end of the first U-shaped network are respectively connected with the input end of the second U-shaped network; the first U-type network and the second U-type network have the same network structure.

Optionally, the network structure includes: a first preset number of consecutive upsampling blocks and a first preset number of consecutive downsampling blocks;

wherein, the consecutive upsampling blocks are connected with the consecutive downsampling blocks by using a hole convolution block, and the hole convolution block includes: a second predetermined number of hole convolutions.

Optionally, the upsampling block comprises: the device comprises a first activation unit, a bilinear interpolation unit, a convolution unit and a first normalization unit; the downsampling block includes: the device comprises a second activation unit, a hole convolution unit, a maximum pooling unit and a second normalization unit.

Optionally, the loss function includes: a first countermeasure loss corresponding to a first generation network, a second countermeasure loss corresponding to a second generation network, and a cyclical consistent loss between the first generation network and the second generation network;

wherein the cycle consistent loss comprises a weighted sum of a multi-scale picture structure similarity function and an average absolute error function of a first preset weight.

Optionally, the obtaining module 310 is specifically configured to:

acquiring a paper document and an electronic document;

acquiring images of the paper documents through image acquisition equipment to obtain a first target image in the target image sample set;

The training device for the image enhancement model provided by the embodiment of the invention can execute the training method for the image enhancement model provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 11 is a schematic structural diagram of an image enhancement apparatus according to a fourth embodiment of the present invention. As shown in fig. 11, the apparatus includes: an acquisition module 410 and a processing module 420;

the acquiring module 410 is configured to acquire an image to be processed;

and the processing module 420 is configured to perform image enhancement processing on the image to be processed through the target image enhancement model to obtain an enhanced image.

The image enhancement device provided by the embodiment of the invention can execute the image enhancement method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE five

FIG. 12 shows a structural schematic diagram of a computer device 10 that may be used to implement an embodiment of the invention. Computer devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The computer device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 12, the computer device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM)12, a Random Access Memory (RAM)13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM)12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the computer device 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the computer device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the computer device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a training method of an image enhancement model or an image enhancement method.

In some embodiments, the method of training the image enhancement model or the method of image enhancement may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the computer device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the training method of the image enhancement model or the image enhancement method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured by any other suitable means (e.g., by means of firmware) to perform a training method of the image enhancement model or an image enhancement method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium may include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A training method of an image enhancement model is characterized by comprising the following steps:

inputting the target images in the target image sample set into a loop to generate a confrontation network model to obtain a predicted image; wherein the cyclically generating a generation network of the antagonistic network model comprises: a plurality of U-type networks;

training parameters of the loop generation countermeasure network model according to a loss function formed by the predicted image and the target image;

2. The method of claim 1, wherein the cyclically generating a countering network model comprises: a first generation network of sample image domains to predicted image domains and a second generation network of predicted image domains to sample image domains;

3. The method of claim 2, wherein the network structure comprises: a first preset number of consecutive upsampling blocks and a first preset number of consecutive downsampling blocks;

wherein the successive upsampling blocks are connected to the successive downsampling blocks by a hole convolution block, and the hole convolution block includes: a second predetermined number of hole convolutions.

4. The method of claim 3, wherein the upsampling block comprises: the device comprises a first activation unit, a bilinear interpolation unit, a convolution unit and a first normalization unit; the downsampling block includes: the device comprises a second activation unit, a hole convolution unit, a maximum pooling unit and a second normalization unit.

5. The method of any of claims 2-4, wherein the loss function comprises: a first countermeasure loss corresponding to a first generation network, a second countermeasure loss corresponding to a second generation network, and a cyclical consistent loss between the first generation network and the second generation network;

wherein the cycle consistent loss comprises a weighted sum of a multiscale picture structure similarity function and an average absolute error function of a first preset weight.

6. An image enhancement method, comprising:

acquiring an image to be processed;

and carrying out image enhancement processing on the image to be processed through the target image enhancement model according to any one of claims 1 to 5 to obtain an enhanced image.

7. An apparatus for training an image enhancement model, comprising:

8. An image enhancement apparatus, comprising:

the acquisition module is used for acquiring an image to be processed;

a processing module, configured to perform image enhancement processing on the image to be processed through the target image enhancement model according to any one of claims 1 to 5 to obtain an enhanced image.

9. A computer device, characterized in that the computer device comprises:

at least one processor; and

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of training an image enhancement model of any one of claims 1-5 or to perform the method of image enhancement of claim 6.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions for causing a processor to perform a method of training an image enhancement model according to any one of claims 1-5 or performing the method of image enhancement according to claim 6 when executed.