CN116205820A

CN116205820A - Image enhancement method, target identification method, device and medium

Info

Publication number: CN116205820A
Application number: CN202310315222.1A
Authority: CN
Inventors: 王浩然; 汪磊; 李瑮; 毛晓蛟
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-06-02

Abstract

The application relates to an image enhancement method, a target identification method, equipment and a medium, belonging to the technical field of computers, wherein the method comprises the following steps: inputting a target image to be enhanced into a pre-trained image enhancement network, encoding the target image through a self-encoder in the image enhancement network, inputting the obtained encoding features into a first diffusion model in an image enhancement model for enhancement, and inputting the obtained enhanced encoding features and the obtained encoding features into a self-decoder in the image enhancement model after fusion to obtain an enhanced image corresponding to the target image; the output result of the self-encoder is corrected based on a preset lookup table when the image enhancement network is trained, and the image enhancement network is trained based on the corrected encoding result; the problem that the traditional neural network model for image enhancement is poor in enhancement effect on images with more characteristics missing can be solved; the adaptability of the network to extreme scenes is improved, and the enhancement effect of the target image is improved.

Description

Image enhancement method, target identification method, device and medium

Technical Field

The application relates to an image enhancement method, a target identification method, equipment and a medium, and belongs to the technical field of computers.

Background

The image enhancement technique refers to a technique of processing features in an image to improve the visual effect of the image and to improve the quality of the image. The image enhancement technology can be applied to scenes such as indoor monitoring and bayonet monitoring, and can improve the image brightness of the low-illumination image. Along with the improvement of the preset quality requirement, the image enhancement technology generally needs to restore detailed information such as color, texture and the like of the image.

To enhance the image enhancement effect, conventional image enhancement algorithms are implemented based on neural network models, including but not limited to the following:

first kind: convolutional neural networks (Convolutional Neural Networks, CNN), which use low quality images (i.e., images that require image enhancement) as input and high quality images (i.e., images that do not require image enhancement) as training targets during training, use loss functions for iterative training of the network. And when the image is enhanced, inputting the target image to be enhanced into the trained CNN, and outputting the image to be enhanced.

Second kind: a generated challenge network (Generative Adversarial Networks, GAN) uses low quality images as input, high quality images as training targets, and performs iterative training in the challenge of the generator and discriminator. When the image enhancement is carried out, inputting the target image to be subjected to the image enhancement into a trained generator, and outputting the image to be subjected to the image enhancement.

However, the conventional neural network model for image enhancement has poor enhancement effect on images with more missing features, such as: image enhancement cannot be achieved under the conditions of low illumination, missing or shielding of a target to be identified and the like.

Disclosure of Invention

The application provides an image enhancement method, a target identification method, equipment and a medium, which can improve the enhancement effect on images with more missing features. The application provides the following technical scheme:

in a first aspect, there is provided an image enhancement method, the method comprising:

acquiring a target image to be enhanced;

inputting the target image into a pre-trained image enhancement network, encoding the target image through a self-encoder in the image enhancement network, inputting the obtained encoding features into a first diffusion model in the image enhancement model for enhancement, merging the obtained enhanced encoding features and the encoding features, and inputting the merged encoding features into a self-decoder in the image enhancement model to obtain an enhanced image corresponding to the target image;

the output result of the self-encoder is corrected based on a preset lookup table during training of the image enhancement network, and the image enhancement network is trained based on the corrected encoding result.

Optionally, the training process of the image enhancement network includes:

acquiring a first image, wherein the first image meets a preset quality requirement;

iteratively training an initial self-encoder and an initial self-decoder based on the first image and the preset lookup table to learn first model parameters of the self-encoder and the self-decoder, so as to obtain a trained self-encoder and a trained self-decoder;

acquiring a second image and a third image, wherein the quality of the second image is higher than that of the third image, and the quality of the second image meets the preset quality requirement;

and fixing the first model parameters, and performing iterative training on an initial diffusion model based on the second image, the third image and the preset lookup table to learn the second model parameters of the first diffusion model so as to obtain the first diffusion model.

Optionally, the fixing the first model parameter, performing iterative training on an initial diffusion model based on the second image, the third image and the preset lookup table to learn the second model parameter of the first diffusion model, so as to obtain the first diffusion model, including:

Respectively inputting the second image and the third image into the trained self-encoder to obtain encoded data corresponding to the second image and encoded data corresponding to the third image;

correcting the coded data corresponding to the second image by using the preset lookup table to obtain corrected coded data;

inputting the coded data corresponding to the third image and the diffusion step number t into an initial diffusion model to obtain a model output result; t is a positive integer;

after the model output result and the coded data corresponding to the third image are fused, inputting the trained self-decoder to obtain a network output result corresponding to the third image;

inputting the coded data corresponding to the second image into the trained self-decoder to obtain a network output result corresponding to the second image;

determining loss information of the initial diffusion model based on the model output result, the corrected coding data, the network output result corresponding to the second image and the network output result corresponding to the third image;

and carrying out iterative training on the initial diffusion model based on the loss information to obtain the second model parameters.

Optionally, the determining the loss information of the initial diffusion model based on the model output result, the modified encoded data, the network output result corresponding to the second image, and the network output result corresponding to the third image includes:

acquiring actual noise added when the coded data corresponding to the third image is taken as input to perform t-step forward diffusion;

determining a t-th diffusion result obtained when the initial diffusion model carries out t-th diffusion based on the corrected coding data;

determining the model output result obtained after t-step reverse diffusion is carried out on the t-step diffusion result;

determining the predictive noise added during forward diffusion based on the model output result and the t-th diffusion result;

determining a first loss value based on the actual noise and the predicted noise;

determining a second loss value based on the network output result corresponding to the second image, the network output result corresponding to the third image, the intermediate result of the self-decoder when determining the network output result corresponding to the second image, and the intermediate result of the self-decoder when determining the network output result corresponding to the third image;

The loss information is determined based on the first loss value and the second loss value.

Optionally, each second image corresponds to at least two third images;

accordingly, the determining the loss information based on the first loss value and the second loss value includes:

obtaining a t-th diffusion result corresponding to different third images;

determining a third loss value based on the at least two third images and the t-th diffusion result corresponding to each third image;

the loss information is determined based on the first loss value, the second loss value, and the third loss value.

Optionally, the iterative training is performed on the initial diffusion model based on the second image, the third image and the preset lookup table, so as to learn the second model parameters of the first diffusion model, and after obtaining the first diffusion model, the method further includes:

acquiring a fourth image and a fifth image, wherein the quality of the fourth image is higher than that of the fifth image, and the quality of the fourth image meets the preset quality requirement;

fixing the model parameters of the self-encoder and the second model parameters, iteratively training the trained self-decoder based on the fourth image and the fifth image to adjust the model parameters of the self-decoder, and determining the self-decoder based on the adjusted model parameters.

Optionally, the acquiring the second image and the third image includes:

acquiring the second image;

and carrying out degradation processing on the second image by using a pre-trained degradation model to obtain the third image, wherein the degradation model is obtained by using a degradation image set through training, and the degradation images in the degradation image set comprise images corresponding to different image enhancement scenes to be carried out.

Optionally, the pre-trained degradation model includes a second diffusion model, and the degradation image used by the second diffusion model is obtained by image acquisition based on the different image enhancement scenes to be performed.

Optionally, the pre-trained degradation model includes a cyclic consistency generation countermeasure network that uses a same or different target for the high quality images in the set of degradation images than for the degradation images.

Optionally, the acquiring the second image and the third image includes:

acquiring the second image;

for each second image, performing superposition degradation operation on the second image according to at least two preset degradation strategies and superposition probabilities corresponding to each degradation strategy;

and outputting the third image under the condition that the number of times of the superposition and degradation operation of the second image reaches the preset number of times.

Optionally, the method further comprises:

determining a teacher model based on the first diffusion model;

the teacher model is used for guiding a pre-established continuous-time student model to carry out distillation training, a trained student network is obtained, the student network is used for carrying out noise sampling in the calculation process of the first diffusion model, and noise data obtained by sampling is used for carrying out reverse diffusion, so that the enhanced coding characteristics are obtained;

wherein the continuous-time student model has a learnable parameter for matching the output of the teacher model at any time step.

Optionally, the first diffusion model is built based on a U-Net neural network, the U-Net neural network comprises a contracted path and an expanded path, the contracted path and the expanded path comprise at least two residual modules, and each residual module comprises a first convolution module and a second convolution module which are sequentially connected;

and after the time feature matrix of the diffusion step t which is currently input to the first diffusion model and subjected to embedded transformation is converted into scalar parameters, the scalar parameters are fused with the output result of the first convolution module, and the fused result is input to the second convolution module.

Optionally, after the target image is input into a pre-trained image enhancement network to obtain an enhanced image corresponding to the target image, the method further includes:

and inputting the enhanced image into a pre-trained target recognition network to obtain a target recognition result.

Optionally, before inputting the enhanced image into a pre-trained target recognition network, the method further includes:

acquiring a fifth image and a target label corresponding to the fifth image;

inputting the fifth image into the image enhancement network to obtain an enhanced fifth image;

and learning model parameters of the target identification network by using the enhanced fifth image and the target label to obtain the target identification network.

In a second aspect, there is provided a target recognition method, the method comprising:

acquiring an image to be identified;

inputting the image to be identified into a pre-trained target identification network to obtain a target identification result corresponding to the image to be identified;

the target recognition network is obtained by training a fifth image based on the image enhancement network by using the enhanced fifth image and a target label corresponding to the fifth image; the image enhancement network comprises a self-encoder, a first diffusion model and a self-decoder which are sequentially connected, wherein the output result of the self-encoder is corrected based on a preset lookup table when the image enhancement network is trained, and the image enhancement network is trained based on the corrected encoding result.

In a third aspect, an electronic device is provided, the device comprising a processor and a memory; the memory stores a program that is loaded and executed by the processor to implement the image enhancement method provided in the first aspect; or the object recognition method provided in the second aspect.

In a fourth aspect, there is provided a computer readable storage medium having stored therein a program for implementing the image enhancement method provided in the first aspect when executed by a processor; or the object recognition method provided in the second aspect.

The beneficial effects of this application include at least: the target image is encoded to obtain the encoding characteristics and then is input into a first diffusion model for enhancement, and then an enhanced image is generated based on the enhanced encoding characteristics, on one hand, the diffusion model has better enhancement performance compared with CNN or GAN and other neural networks, so that the enhancement effect of the target image can be improved by introducing the first diffusion model into the network enhancement model; on the other hand, since the input of the traditional diffusion model is a low-quality image and the diffusion model does not have the feature extraction capability, the corresponding residual image cannot be predicted for the target image with more features missing to realize noise reduction, in the embodiment, after the target image is encoded by using the self-encoder, the encoded features are input into the first diffusion model, even if the image with more features missing is processed by the encoder, the first diffusion model can acquire more image features, so that the adaptability of the first diffusion model to extreme scenes (namely, the scene with more features missing is acquired), and the adaptability of the whole network to the extreme scenes is improved; the method can solve the problem that the traditional neural network model for image enhancement has poor enhancement effect on images with more feature loss, and improves the enhancement effect of target images.

In addition, the coding characteristics of the encoder can be corrected by introducing a preset lookup table in the network training process, and because the preset lookup table is set based on the target characteristics of the target to be identified, the self-encoder can learn model parameters conforming to the target characteristics during the network training, and the first diffusion model and the self-decoder can train based on the coding data conforming to the target characteristics, so that the rationality of the whole network output result is improved.

Meanwhile, the searching process of the preset lookup table is slow, and the trained self-encoder can output the encoding features conforming to the target features, so that the encoding features output by the encoder are not corrected by the preset lookup table in the image enhancement process, and the image enhancement efficiency can be improved under the condition of ensuring the rationality of a network.

In addition, when the initial diffusion model and the first diffusion model are established, the vector of the time feature matrix is converted into scalar parameters in the calculation process, and the scalar parameters and the output result of the first convolution model are fused and calculated, so that the calculated amount can be reduced, and the calculation speed of the model can be improved.

In addition, the model calculation speed can be further improved through model modification, model simplification and the like.

In addition, the accuracy of the overall recovery target feature of the network can be improved and the accuracy of image enhancement can be improved by training through the preset lookup table in the training process of the self-encoder, the first diffusion model and the self-decoder.

In addition, when determining the loss information corresponding to the initial diffusion model, the model output result is combined, and the output result of the whole network is combined, so that the model performance of the diffusion model can be improved.

In addition, consistency loss is added during the training of the first diffusion model, so that the problem that the accuracy of the subsequent target recognition is low due to the fact that the difference of enhancement results of different degraded images corresponding to the same image is large can be prevented, and the target recognition effect of the enhanced image is improved.

In addition, by using the degradation model to acquire the degradation image when training the first diffusion model, on the one hand, the acquisition efficiency of the degradation image can be improved, and on the other hand, since the degradation model can realize various degradation factors simulating the real scene, the authenticity of the degradation image can also be improved.

In addition, by establishing the degradation model based on the diffusion model, since the diffusion model can learn the actual degradation factors, the authenticity of the degraded image can be improved. Or, the degradation model is built through the loop consistency generation countermeasure network, and the pair of high-quality images and low-quality images are not required to be set, so that the acquisition complexity of training data can be reduced, and the training efficiency is improved. Or, the probability superposition mode is used for degrading the high-quality image, and a neural network model does not need to be trained in advance, so that the degradation efficiency can be improved.

In addition, after the trained self-decoder is obtained, the model parameters of the self-decoder are finely adjusted again, so that the problem that the accuracy of the subsequent target recognition is low due to the fact that the difference between the high-quality image output by the self-decoder and the original low-quality image is large can be prevented, and the accuracy of the target recognition of the enhanced image can be improved.

In addition, by retraining the target recognition network with the enhanced image, the applicability of the recognition network to the enhanced image can be improved, thereby improving the recognition accuracy.

In addition, noise sampling is carried out in the calculation process of the first diffusion model by using the trained student network, so that the sampling efficiency of the diffusion model can be improved, and the calculation speed of the model can be improved.

In addition, the enhanced image to be identified or the image to be identified which is not processed by the image enhancement model can be input into the target identification network, and the universality of the target identification network can be improved under the condition that the target identification accuracy of the target identification network is ensured.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical means of the present application more clearly understood, it can be implemented according to the content of the specification, and the following detailed description of the preferred embodiments of the present application will be given with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of an image enhancement method provided by one embodiment of the present application;

FIG. 2 is a schematic diagram of the computational principle of a diffusion model provided in one embodiment of the present application;

FIG. 3 is a schematic diagram of an image enhancement process provided by one embodiment of the present application;

FIG. 4 is a flow chart of a training method for an image enhancement model provided in one embodiment of the present application;

FIG. 5 is a flow chart of a method of object identification provided in one embodiment of the present application;

FIG. 6 is a block diagram of an image enhancement apparatus provided in one embodiment of the present application;

FIG. 7 is a block diagram of an object recognition device provided in one embodiment of the present application;

FIG. 8 is a block diagram of a training apparatus for image enhancement models provided in one embodiment of the present application;

fig. 9 is a block diagram of an electronic device provided in one embodiment of the present application.

Detailed Description

The detailed description of the present application is further described in detail below with reference to the drawings and examples. The following examples are illustrative of the present application, but are not intended to limit the scope of the present application.

Optionally, the image enhancement method provided in each embodiment is used in an electronic device, where the electronic device is a device with computing capability, such as a terminal or a server, and the terminal may be a mobile phone, a computer, a tablet computer, a scanner, an electronic eye, a monitoring camera, and the embodiment does not limit the type of the electronic device.

Fig. 1 is a flowchart of an image enhancement method according to an embodiment of the present application, where the method includes at least the following steps:

step 101, obtaining a target image to be enhanced.

The target image to be enhanced refers to an image to be input to the image enhancement model.

In one example, the target image may be acquired by the electronic device (where the electronic device has image acquisition capabilities), or may be transmitted by another device; the target image may be a frame of image in a video stream, or may be a single picture, and the implementation of the target image is not limited in this embodiment.

In another example, the target image is obtained after extracting a target area from an original image acquired by the electronic device. Since the enhanced image is generally used for object recognition, and the object to be recognized during object recognition generally occupies only a part of the original image, by extracting the target region from the original image, the redundant part of the original image can be removed, the calculation amount of the image enhancement model can be reduced, and the calculation accuracy of the image enhancement model can be improved. At this time, the extracted target image includes the target to be identified.

Optionally, in the present application, the object to be identified includes, but is not limited to: the implementation manner of the target to be identified is not limited in this embodiment, such as a face area, a head-shoulder area, a whole car area, or a license plate area.

Such as: and the electronic equipment extracts the original image to obtain 256×256 sRGB pictures to obtain the target image. In actual implementation, the size of the target image may be smaller or larger, and the size of the target image is not limited in this embodiment.

Means of target region extraction include, but are not limited to:

first kind: and determining a target area based on the preset position of the original image, and cutting the target area to obtain a target image. In some scenarios, since the position of the object to be identified in the original image is relatively fixed, such as: in the vehicle bayonet, the target to be identified generally appears in the central area of the original image, so that the accuracy of extracting the target image can be ensured and the extraction efficiency can be improved by determining the target area based on the preset position.

The preset position is determined based on a fixed position where the object to be identified appears in the original image, and the preset position may be an image center point, an image upper edge center, or an image lower edge, etc., which is not limited by the implementation manner of the preset position in this embodiment.

Second kind: and receiving a selection operation of a target area in the original image, and cutting the area indicated by the selection operation to obtain the target image. Selection operations include, but are not limited to: the present embodiment does not limit the implementation of the selection operation for the frame selection of the target area or the operation of moving the clipping window of a fixed size. At this time, the accuracy of extracting the target region can be improved by selecting the target region of the original image by the user.

Third kind: performing target detection on the original image to obtain a target area; and cutting the target area to obtain a target image.

The manner in which the target region is extracted is not specifically described in this embodiment.

Optionally, the target image is an image that does not meet a preset quality requirement. If the target image is required to be subjected to target recognition after being enhanced, the preset quality requirement can be determined based on the recognition capability of the target recognition network, namely, the image meeting the preset quality requirement can be recognized by the target recognition network, and the image not meeting the preset quality requirement cannot be recognized by the target recognition network.

In other embodiments, the preset quality requirement may also be determined based on a user setting, and the setting manner of the preset quality requirement is not limited in this embodiment.

Wherein the preset quality requirements include, but are not limited to, at least one of the following: the image brightness is greater than the brightness threshold; the image resolution is greater than a resolution threshold; the image definition is greater than the definition threshold; the image noise level is lower than a preset noise level; in other embodiments, the preset quality requirement may be other requirements determined based on parameters affecting the image quality, and the specific content of the preset quality requirement is not limited in this embodiment.

At this time, the electronic device may compare the current image with a preset quality requirement, and if the preset quality requirement is satisfied, process the current image as a target image; if the preset quality requirement is not satisfied, the current image is used as the target image to trigger step 102.

In actual implementation, the electronic device may directly process the current image as the target image without comparing the current image with the preset quality requirement, and at this time, the preset quality requirement is not required to be set in the electronic device.

Step 102, inputting the target image into a pre-trained image enhancement network, encoding the target image by a self-encoder in the image enhancement network, inputting the obtained encoding features into a first diffusion model in an image enhancement model for enhancement, and inputting the obtained enhanced encoding features and the obtained encoding features into a self-decoder in the image enhancement model after fusion, so as to obtain an enhanced image corresponding to the target image.

The output result of the encoder is corrected based on a preset lookup table during training of the image enhancement network, and the image enhancement network is trained based on the corrected encoding result.

The diffusion model (including the first diffusion model in the present embodiment) is a process of gradually denoising data from pure noise data through neural network learning, and referring to fig. 2, the diffusion model generally includes two diffusion processes:

1. fixed forward diffusion process: in this step gaussian noise is gradually added to the input data x ₀ Until a pure noise image x is obtained _T ；

2. A learnable back diffusion process: the pure noise image is progressively denoised from this step until it is obtainedImage x of model output ₀ 。

Specifically, for a diffusion model of T steps, each step has an index of T. During forward diffusion, some gaussian noise is randomly generated at each step from the data input to the diffusion model, then the generated noise is gradually added to the input data, and when T is sufficiently large, the obtained noisy image approximates a gaussian noise image, for example: denoising diffusion probability model (Denoising Diffusion Probabilistic Models, t=1000 in DDPM. During back diffusion, from noise image x _T Beginning (real image plus noise result during training, random noise during sampling), learning x through a neural network _T-1 To x _T The added noise is then gradually denoised to obtain the final image to be generated.

In a conventional diffusion model, an image to be noise reduced is generally input to the diffusion model. However, if the image is acquired in an extreme environment, the feature loss of the image is serious, and at this time, the noise reduction effect of the diffusion model is poor.

In this embodiment, instead of directly inputting the target image to the first diffusion model, the target image is encoded first, and the obtained encoding feature is input to the first diffusion model, where the first diffusion model fits the residual error of the target image in the feature domain, so that the adaptability to the extreme environment can be improved.

Specifically, referring to the calculation process of the image enhancement model shown in fig. 3: delivering the target picture I into a self-encoder to obtain coding characteristics I _z The method comprises the steps of carrying out a first treatment on the surface of the Will encode feature I _z Random noise coding N _t And the current diffusion step number T (the initialization T is the total diffusion step number T) is sent into a first diffusion model to obtain the noise characteristic z added by each step when the current diffusion step number is diffused _t The method comprises the steps of carrying out a first treatment on the surface of the Based on z _t Calculating to obtain a residual image N obtained by current back diffusion _t-1 The method comprises the steps of carrying out a first treatment on the surface of the Determining whether t is equal to 1; if not, N is _t Replaced by N _t-1 T=t-1, coding feature I _z Delivering the mixture to a first diffusion model network for diffusion circularly without change to obtain corresponding diffusionResidual image N _t -1; if yes, z+N ₀ And sending the video to a self-decoder to obtain an enhanced image I'.

Each time back-diffusion predicted residual image N _t-1 The calculation mode of (2) is expressed by the following formula:

wherein when t=t, N _t Is a Gaussian noise matrix; at present T is less than T, N _t The residual image is outputted for the last time of back diffusion; sigma (sigma) _t As a super-parameter associated with the diffusion step t,

a variance parameter representing the diffusion step at step t, when t is greater than 1,

z is a random gaussian matrix. When t=t, …,2, < >>

Wherein (1)>

Representing a standard normal distribution; when t=1, z=0; />

Diffusion coefficient, beta, representing the diffusion step at step t _t The Gaussian distribution parameter is the step t diffusion step; z _t Representing the noise characteristics added at each step in the diffusion of the current diffusion step number.

Optionally, if the target image is an image extracted from the original image, after obtaining the enhanced image, the enhanced image may be pasted back to the target area corresponding to the original image.

In one example, after pasting the enhanced image back to the original image, the color difference and/or brightness difference between the enhanced image and the original image may also be compared; and correcting the color and/or brightness of the enhanced image under the condition that the color difference and/or brightness difference is larger than a preset threshold value so as to reduce the color difference and/or brightness difference.

Such as: in fig. 3, the enhanced image output from the decoder is input to a color correction module to correct the color and/or brightness of the enhanced image, so as to obtain a corrected image.

According to the above, the enhanced image output by the image enhancement network is generally used for target recognition, but for a target image with more missing features, the coded features output after the encoder encodes the target image may deviate greatly from the target features of the target to be recognized. At this time, there is a problem in that the result of the image enhancement network output is poor in rationality.

In this embodiment, the image enhancement network includes three parts connected in sequence: a self-encoder, a first diffusion model, and a self-decoder; the method has the advantages that the coding characteristics of the encoder can be corrected by introducing the preset lookup table in the network training process, and because the preset lookup table is set based on the target characteristics of the target to be identified, model parameters conforming to the target characteristics can be learned by the self-encoder during network training, and the first diffusion model and the self-decoder can be trained based on the coding data conforming to the target characteristics, so that the rationality of the whole network output result is improved.

Generally, the searching process of the preset lookup table is slower, and the trained self-encoder can output the coding features conforming to the target features, so in the embodiment, the coding features output by the encoder are not corrected by the preset lookup table in the image enhancement process, and the image enhancement efficiency can be improved under the condition of ensuring the rationality of the network. In other words, in this embodiment, the preset lookup table is only used in the network training process, and the preset lookup table is not present in the image enhancement process.

Illustratively, the preset lookup table may be implemented as a codebook (codebook) structure, in which feature data of the target feature is pre-stored. The specific process of training the image enhancement network using the preset lookup table is described in the following embodiments, which are not described herein.

In summary, in the image enhancement method provided in this embodiment, the target image is encoded to obtain the encoding feature, and then the encoding feature is input into the first diffusion model to enhance, and then the enhanced image is generated based on the enhanced encoding feature, on the one hand, since the diffusion model has better enhancement performance compared with the neural network such as CNN or GAN, the enhancement effect of the target image can be improved by introducing the first diffusion model into the network enhancement model; on the other hand, since the input of the traditional diffusion model is a low-quality image and the diffusion model does not have the feature extraction capability, the corresponding residual image cannot be predicted for the target image with more features missing to realize noise reduction, in the embodiment, after the target image is encoded by using the self-encoder, the encoded features are input into the first diffusion model, even if the image with more features missing is processed by the encoder, the first diffusion model can acquire more image features, so that the adaptability of the first diffusion model to extreme scenes (namely, the scene with more features missing is acquired), and the adaptability of the whole network to the extreme scenes is improved; the method can solve the problem that the traditional neural network model for image enhancement has poor enhancement effect on images with more feature loss, and improves the enhancement effect of target images.

Based on the above embodiments, a training process of the image enhancement network is described below. Referring to fig. 4, the training process includes the following steps:

in step 401, a first image is acquired.

The first image refers to an image for training the self-encoder and the self-decoder, and the number of the first image is generally large. The first image meets the preset quality requirement, and the first image may be acquired by an electronic device, or may be acquired from an existing image database, or may be downloaded from a network, which is not limited by the manner of acquiring the first image in this embodiment.

Step 402, performing iterative training on the initial self-encoder and the initial self-decoder based on the first image and a preset lookup table to learn first model parameters of the self-encoder and the self-decoder, thereby obtaining a trained self-encoder and a trained self-decoder.

The initial self-encoder (same model structure as the trained self-encoder) and the initial self-decoder (same model structure as the trained self-decoder) are built based on a U-Net neural network.

Each first image serves as both an input for training the self-encoder and the self-decoder and a training target, thereby training the image restoration capabilities of the self-encoder and the self-decoder.

The method comprises the steps of performing iterative training on an initial self-encoder and an initial self-decoder based on a first image and a preset lookup table to learn first model parameters of the self-encoder and the self-decoder, obtaining a trained self-encoder and a trained self-decoder, and comprising the following steps: inputting the first image into an initial self-encoder to obtain encoded data; determining template coding data with highest similarity with the coding data in a preset lookup table to obtain corrected coding data; inputting the corrected coded data into an initial decoder to obtain a decoding result; and determining the initial self-encoder and the corresponding loss information of the initial self-decoder based on the first image, the decoding result and the corrected encoded data, so as to iteratively train and learn the first model parameters for the initial decoder and the initial decoder based on the loss information, and obtain the trained self-encoder and the trained self-decoder.

In one example, the first model parameters are derived by phased iterative training, with different loss functions used at different stages. Illustratively, the iterative training phase of the first model parameters includes a first phase and a second phase, the first phase using a first loss function and the second phase using a second loss function.

The first loss function is used for training the image reconstruction capability of the self-encoder and the self-decoder and training the consistency of the coding result of the self-encoder and the searching result from the preset lookup table.

The first loss function includes a first term for determining loss of image reconstruction from the encoder and the self decoder, a second term for determining loss of a preset lookup table lookup process, and a third term for constraining output of the self encoder to be consistent with the lookup result.

Schematically, a first loss function L _ac Represented by the formula:

wherein x is _hq For the first image input to the initial self-encoder,

is x _hq A decoding result obtained after passing through the self-encoder and the self-decoder; sg represents a gradient off operation; the codebook is a lookup (or mapping) process of a preset lookup table; e (x) _hq ) Representing the encoded data initially output from the encoder; codebook [ E (x) _hq )]Representing the modified encoded data.

The second loss function is used to train the consistency of the image output from the decoder with the original image.

Schematically, the second loss function is the counterloss, in which case the second loss function L _gan Represented by the formula:

wherein D represents a separate discriminator for use in countermeasure training with the self-codec, D (x _hq ) Representing the discrimination result of the discriminator for the first image,

and the discrimination result of the discriminator on the decoding result is shown after the first image is sequentially processed by the self-encoder and the self-decoder.

In the iterative training process of the first stage, first model parameters are learned based on the loss information output by the first loss function.

In the iterative training process of the second stage, the first model parameters are continuously learned based on the loss information output by the first loss function and the second loss function.

Optionally, learning the first model parameter based on the loss information output by the first loss function and the second loss function includes: acquiring a first weight of a first loss function and a second weight of a second loss function; determining the sum of the product of the loss information output by the first loss function and the first weight and the product of the loss information output by the second loss function and the second weight to obtain a weighted sum; the first model parameters are learned based on the weighted sum.

The first weight is greater than the second weight, the second weight may be 1, the second weight may be 0.01, and in actual implementation, the values of the first weight and the second weight may be other values.

In other embodiments, the sum of the loss information output by the first loss function and the loss information output by the second loss function may be directly calculated, and the usage of the first loss function and the second loss function is not limited in this embodiment.

Optionally, the first stage iterative training frequency threshold is the same as the second stage iterative training frequency threshold, for example: 25k times; alternatively, the first stage of iterative training frequency threshold is different from the second stage of iterative training frequency threshold, for example: the iterative training frequency threshold value of the first stage is smaller than that of the second stage, and the iterative training frequency threshold value of each stage is not limited in the embodiment.

Step 403, acquiring a second image and a third image, wherein the quality of the second image is higher than that of the third image, and the quality of the second image meets the preset quality requirement.

The second image and the third image refer to images used to train the first diffusion model. The data set to which the second image belongs may be the same as or different from the data set to which the first image belongs.

In one example, the second image is acquired in an acquisition environment that meets the preset quality requirement and the third image is acquired in an acquisition environment that does not meet the preset quality requirement. However, the number of acquisitions of the second image and the third image is often huge, and thus, there is caused a problem that the acquisition efficiency of the second image and the third image is low.

In another example, acquiring the second image and the third image includes: acquiring a second image; performing degradation treatment on the second image by using a pre-trained degradation model to obtain a third image; the degradation model is obtained by training a degradation image set, and the degradation images in the degradation image set comprise images corresponding to different image enhancement scenes to be performed.

Because the degradation image set comprises images corresponding to different image enhancement scenes to be performed, various degradation factors can be covered, and a degradation model obtained through training of the degradation image set can also cover the different degradation factors, so that a third image obtained through degradation of the degradation model can also cover the different degradation factors, different degradation conditions in a simulated reality environment are realized, and the model performance of a first diffusion model obtained through the third image is ensured.

Optionally, implementations of the degradation model include, but are not limited to, the following:

first kind: the pre-trained degradation model comprises a second diffusion model, and the degradation image used by the second diffusion model is obtained by image acquisition based on different scenes to be subjected to image enhancement.

The second diffusion model is used for degrading the image meeting the preset quality requirement into an image not meeting the preset quality requirement. In training, the image input to the second diffusion model is an image satisfying the preset quality requirement (i.e., hereinafter, a high-quality image), and the training target is a residual image between an image not satisfying the preset quality requirement (i.e., a degraded image) and an image satisfying the preset quality requirement.

In the mode, the degradation image set for training the degradation model is acquired based on a real scene, at the moment, the degradation factors caused by the environment and the degradation factors caused during image shooting can be acquired, various degradation conditions can be covered, the richness and the authenticity of the degradation image set are ensured, and therefore the model performance of the obtained first diffusion model is improved.

Second kind: the pre-trained degradation model includes a cyclic consistency generation countermeasure network that uses a degradation image set that has the same or different targets as the targets in the degradation image. In other words, the degraded image is not required to be in one-to-one correspondence with the high-quality image in the degraded image set, and the acquisition efficiency of the second image and the third image can be improved.

In yet another example, acquiring the second image and the third image includes: acquiring a second image; for each second image, performing superposition degradation operation on the second image according to at least two preset degradation strategies and superposition probabilities corresponding to each degradation strategy; and outputting a third image when the number of the superposition reducing operation of the second image reaches a preset number.

The method for performing superposition and degradation operation on the second image according to at least two preset degradation strategies and superposition probabilities corresponding to each degradation strategy comprises the following steps: randomly generating superposition probability corresponding to each degradation strategy; and performing superposition operation on the second image by using a degradation strategy with superposition probability larger than a probability threshold value to obtain a second image subjected to degradation.

Alternatively, the preset number of times is 1 or at least two, and in an actual shooting scene, the same degradation factor may be introduced a plurality of times, and therefore, by setting the preset number of times to at least two, the simulation effect of the degradation operation may be improved, and the authenticity of the third image may be improved.

Illustratively, at least two degradation strategies cover various degradation factors that cause image degradation, including but not limited to at least two of the following: brightness rise and fall, gaussian noise addition, poisson noise addition, gaussian blur, motion blur, JPEG preservation and degradation and other strategies.

By acquiring the degraded image in the above way, the acquisition efficiency of the third image can be improved, so that the training sample of the first diffusion model is increased, and the performance of the first diffusion model after training is improved.

In addition, the same degradation can be used for different second images, such as: the second diffusion model is used for degradation, or a different degradation mode may be used for degradation, such as: one part uses the second diffusion model to degrade, the other part uses the probability superposition mode to degrade, and the embodiment does not limit the degradation mode of the second image.

And step 404, fixing the first model parameters, and performing iterative training on the initial diffusion model based on the second image, the third image and a preset lookup table to learn the second model parameters of the first diffusion model so as to obtain the first diffusion model.

The traditional diffusion model is built based on a U-Net neural network, and the network structure generally comprises: the system comprises a first convolution module, four continuous contraction paths, a center module, four continuous expansion paths and a second convolution module which are sequentially connected. The first convolution module and the second convolution module have the same structure and are composed of a two-dimensional convolution layer and a Mish layer; each of which The shrinkage paths are formed by sequentially connecting two continuous residual error modules and a downsampling layer; the center module consists of two continuous residual modules; each expansion path is mainly formed by sequentially connecting two continuous residual modules and an up-sampling layer. The residual error module structures of the contraction path, the center module and the expansion path are the same, each residual error module consists of two continuous convolution modules, and the output of the first convolution module in the residual error module and the time characteristic matrix t of the current t-th diffusion step are the same _e And after addition, the result is input into a second convolution module, and the result processed by the two convolution modules is added with the result to be output by the residual module. The convolution modules in the residual modules of the contracted path, the center module and the expanded path are identical to the first convolution module and the second convolution module in structure.

The output of the first convolution module and the low-resolution image information Ie output by the low-resolution encoder are added and then input into a first contraction path; the output of the first convergent path is input into the second convergent path on one hand, and is added with the output of the third convergent path and then input into the fourth convergent path on the other hand; the output of the second convergent path is input into the third convergent path on one hand, and is added with the output of the second divergent path and then is input into the third divergent path on the other hand; the output of the third convergent path is input into the fourth convergent path on the one hand, and is added with the output of the first divergent path and then is input into the second divergent path on the other hand; the output of the fourth convergent path is input to the central module on the one hand, and is input to the first divergent path after being added with the output of the central module on the other hand.

However, the conventional diffusion model is generally large in calculation amount, which causes problems of low training efficiency and low calculation efficiency of the model. In addition, when the diffusion model is directly applied to the image enhancement scene in the application, there may be a problem that the electronic device does not have an operation environment of the model, and the electronic device is not supported.

Based on the above technical problems, in the present embodiment, a conventional diffusion model is improved, and the improvement is described below.

In the first aspect, a first diffusion model and an initial diffusion model are established based on a U-Net neural network, wherein the U-Net neural network comprises a contraction path and an expansion path, the contraction path and the expansion path comprise at least two residual modules, and each residual module comprises a first convolution module and a second convolution module which are sequentially connected; the time feature matrix of the diffusion step t which is currently input to the first diffusion model after being subjected to embedded transformation is converted into scalar parameters, the scalar parameters are fused with the output result of the first convolution module, and the fused result is input to the second convolution module.

The other parts of the first diffusion model and the initial diffusion model, and the related descriptions of the contracted path and the expanded path are referred to the above-mentioned conventional diffusion model, and the description of this embodiment is omitted here. Unlike the conventional diffusion model, the present embodiment does not use the time feature matrix t any more _e Directly adding the output result of the first convolution module, and adding the time feature matrix t _e The scalar parameter is converted into scalar parameter, and the scalar parameter is calculated with the output result of the first convolution module, so that the calculated amount can be reduced and the calculation efficiency can be improved under the condition that the fusion effect is not affected.

In one example, the manner in which the temporal feature matrix is converted to scalar parameters includes: and inputting the time feature matrix into a convolution layer and a full connection layer which are connected in sequence to obtain scalar parameters shift and scale.

Accordingly, the manner in which the scalar parameter is fused with the output result of the first convolution module includes: and adding shift to the product of scale and the output result of the first convolution module to obtain a fused result.

In a second aspect, the present embodiment may also make model modifications to the U-Net neural network.

Generally, the network structure of the first convolution module and the second convolution module includes a convolution layer, a normalization layer, and an activation layer that are sequentially connected. Implementation Silu (Sigmoid Linear Unit) of the activation layer activates a function. In this embodiment, the model calculation amount can be further reduced by modifying the Silu to the linear correction units (Rectified Linear Unit, reLU).

Additionally, in some embodiments, an attention module, which may be a QKV (Query-Key-Value) attention mechanism, is also connected after the first and second convolution modules. In this embodiment, the QKV attention mechanism is modified to be the channel attention mechanism, so that the model calculation amount can be further reduced.

In a third aspect, the present embodiment may further perform model reduction on a U-Net neural network.

Schematically, the number of channels in the first 2 convergent paths in the 4 convergent paths and the last 2 convergent paths in the 4 convergent paths is reduced to 1/2 of the original number, so that the complexity of a model is reduced, and the calculated amount of the model is reduced.

The improvement modes can reduce the calculated amount of the model under the condition of not influencing the performance of the image enhancement model.

When the initial expansion model is trained, fixing the first model parameters, and carrying out iterative training on the initial diffusion model based on the second image, the third image and a preset lookup table so as to learn the second model parameters of the first diffusion model to obtain the first diffusion model, wherein the method comprises the following steps of: respectively inputting the second image and the third image into a trained self-encoder to obtain encoded data z corresponding to the second image _hq Encoded data z corresponding to the third image _lq The method comprises the steps of carrying out a first treatment on the surface of the Encoding data z corresponding to the second image by using a preset lookup table _hq Correcting to obtain corrected coded data codebook [ z ] _hq ]The method comprises the steps of carrying out a first treatment on the surface of the Encoded data z corresponding to the third image _lq And inputting the diffusion step number t into an initial diffusion model to obtain a model output result; the model output result is matched with the code data z corresponding to the third image _lq After fusion, inputting the trained self-decoder to obtain a network output result corresponding to the third image; encoded data z corresponding to the second image _hq Inputting the trained self-decoder to obtain a network output result corresponding to the second image; determining loss information of the initial diffusion model based on the model output result, the corrected coding data, the network output result corresponding to the second image and the network output result corresponding to the third image; iterating an initial diffusion model based on the loss informationTraining to obtain second model parameters.

Wherein determining the loss information of the initial diffusion model based on the model output result, the modified encoded data, the network output result corresponding to the second image, and the network output result corresponding to the third image includes:

acquiring encoded data z corresponding to a third image _lq The actual noise E added when t steps of forward diffusion is carried out for the input; based on modified encoded data codebook z _hq ]Determining a t-step diffusion result obtained when the initial diffusion model carries out the t-step diffusion; determining a model output result obtained after t-step reverse diffusion is carried out on the diffusion result of the t-step; based on the model output result and the t-th diffusion result, determining the predicted noise E added in forward diffusion _σ The method comprises the steps of carrying out a first treatment on the surface of the Determining a first loss value based on the actual noise and the predicted noise; determining a second loss value based on the network output result corresponding to the second image, the network output result corresponding to the third image, the intermediate result from the decoder when determining the network output result corresponding to the second image, and the intermediate result from the decoder when determining the network output result corresponding to the third image; loss information is determined based on the first loss value and the second loss value.

The calculation of the first loss value is expressed by the following equation:

wherein, codebook [ z _hq ]Representing corrected encoded data, z _t Represents the diffusion result of the t step, x ₀ Representing encoded data corresponding to the third image, z ₀ The result is output for the model and,

the diffusion coefficient of the diffusion step at the t-th step is shown.

The intermediate result from the decoder refers to the result output from the hidden layer specified in the decoder, or may be the result output from all hidden layers, and the embodiment does not limit the source of the intermediate result.

The second loss value is determined based on a perceptual loss function of the self-decoder. Schematically, a second loss value L _p By the following steps

The formula is calculated:

wherein,,

representing the network layers of the self-decoder; j is the hidden layer of the output result and the output layer of the self-decoder, C _j H _j W _j The number of channels of the j-th layer of the decoder, the length and the width are respectively, j is a positive integer, gamma is the weight of the output result (which can be an intermediate result or a network output result) of the j-th layer, the weights corresponding to different network layers are the same or different, and the gamma of the output layer is schematically larger than the gamma, Z of the hidden layer _hq Representing encoded data corresponding to the second image, z _lq Representing encoded data corresponding to the third image.

In one example, determining the loss information based on the first loss value and the second loss value includes: acquiring weights corresponding to the first loss value and the second loss value; and determining a weighted sum of the first loss value and the second loss value based on the weight to obtain loss information.

Such as: the weight of the first loss value is 1, the weight of the second loss is 0.01, and in other embodiments, the weights corresponding to the first loss value and the second loss value may be other values, and the embodiment does not limit the setting manner of the weights.

In other embodiments, the loss information may be a sum of the first loss value and the second loss value, and the present embodiment is not limited to the manner of determining the loss information.

In some scenarios, the enhanced image may also require object recognition. General target recognition means include: and inputting the enhanced image into a pre-trained target recognition network to obtain a target recognition result.

Because the image enhancement model is used for image enhancement, the object to be identified can carry out information complementation under the condition of information deficiency, but the information after complementation has randomness, which can cause the capability on object identification to be reduced to a certain extent, based on the fact, in order to improve the identification capability of the object identification network, in one example of the embodiment, at least two third images corresponding to each second image can be used for training when the first diffusion model is trained, so that the consistency of different third images in the denoising process can be trained. That is, each second image corresponds to at least two third images; accordingly, determining loss information based on the first loss value and the second loss value includes: obtaining a t-th diffusion result corresponding to different third images; determining a third loss value based on at least two third images and a t-step diffusion result corresponding to each third image; loss information is determined based on the first loss value, the second loss value, and the third loss value.

Schematically, the calculation process of the third loss value Ls is represented by the following formula:

Ls＝||∈ _φ (y _t ，x _W-tuurb ，t)-∈ _δ (y _t ，x _W-turb ，t)+cond _φ -cond _δ || ²

the epsilon is the diffusion result of the t-th step of the third image obtained after the same second image is subjected to different degrees of degradation, and cond is the third image corresponding to different degrees of degradation.

The loss information is determined based on the first loss value, the second loss value, and the third loss value, and may be determined based on a weighted sum or a summation, and the determination manner of the loss information is not limited in this embodiment.

In order to improve the recognition capability of the target recognition network, in another example of this embodiment, after the first diffusion model is trained, the model parameters of the self-decoder may be further fine-tuned to improve the adaptation capability of the output result of the self-decoder to the target recognition network.

Specifically, after step 404, the method further includes: acquiring a fourth image and a fifth image; fixing the model parameters of the self-encoder and the second model parameters, iteratively training the trained self-decoder based on the fourth image and the fifth image to adjust the model parameters of the self-decoder, and determining the self-decoder based on the adjusted model parameters.

The quality of the fourth image is higher than that of the fifth image, and the quality of the fourth image meets the preset quality requirement. The data set to which the fourth image belongs is the same as or different from the training set to which the second image belongs, and the data set to which the fifth image belongs is the same as or different from the training set to which the third image belongs.

Illustratively, the object recognition network includes an encoder, and accordingly, iteratively trains the above-described trained self-decoder based on the fourth image and the fifth image to adjust model parameters of the self-decoder, comprising:

fourth image I _hq Inputting an encoder in a target recognition network to obtain encoded data corresponding to a fourth image; will fifth image I _lq Inputting an encoder in the target recognition network to obtain encoded data corresponding to the fifth image; the coded data corresponding to the fourth image and the coded data corresponding to the fifth image are respectively input into a first diffusion model and a trained self-decoder to obtain the current decoding result of the self-decoder; fine tuning model parameters of the self-decoder; the encoded data corresponding to the fourth image and the encoded data corresponding to the fifth image are respectively input into a first diffusion model and a fine-tuned self-decoder to obtain an adjusted decoding result of the self-decoder; and determining loss information based on the current decoding result and the adjusted decoding result, and performing iterative fine adjustment on the model parameters based on the loss information to obtain adjusted model parameters.

In one example, the loss information is determined based on encoded data corresponding to the fourth image, an intermediate result output by the encoder when the encoded data corresponding to the fourth image is determined, encoded data corresponding to the fifth image, an intermediate result output by the encoder when the encoded data corresponding to the fifth image is determined, and the current decoding result and the adjusted decoding result. Such as: loss information L _id The calculation process of (2) is represented by the following formula:

wherein,,

layers of network representing encoders in a target recognition network, I _lq For the fifth image, I _hq For the fourth image, x is the current decoding result, x _id The adjusted decoding result is obtained; i is a hidden layer and an output layer of an output result of an encoder in the target recognition network, C _i H _i W _i The channel number, length and width of the ith layer of the encoder are respectively, i is a positive integer, gamma ' is the weight of the output result (which can be an intermediate result or a network output result) of the ith layer, the weights corresponding to different network layers are the same or different, and the gamma ' of the output layer is schematically larger than the gamma ' of the hidden layer.

In order to improve the recognition capability of the target recognition network, in another example of this embodiment, before inputting the enhanced image into the pre-trained target recognition network, the method further includes: acquiring a fifth image and a target label corresponding to the fifth image; inputting the fifth image into an image enhancement network to obtain an enhanced fifth image; and learning model parameters of the target recognition network by using the enhanced fifth image and the target tag to obtain the target recognition network.

The target recognition network uses the enhanced image to train again, so that the adaptability of the target recognition network to the enhanced image can be improved, and the accuracy of target recognition can be improved.

Optionally, the fifth image is an image that does not meet the preset quality requirement.

In addition, the initial parameters of the target recognition network in the embodiment can realize target recognition on the image meeting the preset quality requirement, that is, the target recognition network is obtained by training the image meeting the preset quality requirement and the target label corresponding to the image, and the enhanced fifth image is used for adjusting the initial parameters of the target recognition network so as to improve the adaptability of the target recognition network to the enhanced image.

In summary, in this embodiment, when the initial diffusion model and the first diffusion model are established, the vector of the time feature matrix is converted into the scalar parameter in the calculation process and the output result of the first convolution model is fused for calculation, so that the calculation amount can be reduced, and the calculation speed of the model can be improved.

Optionally, based on the above embodiment, in step 102, the first diffusion model needs to use a sampler for noise sampling during the calculation. Traditional sampling methods include: a discrete time denoising diffusion implicit model (Denoising diffusion implicit models, DDIM) sampler is used to sample from the model. However, the diffusion step of the first diffusion model is generally one thousand steps or more, and this causes a problem in that the calculation efficiency of the model is low. Based on this, before step 102, further includes: determining a teacher model based on the first diffusion model; the teacher model is used for guiding a pre-established continuous-time student model to carry out distillation training, a trained student network is obtained, the student network is used for carrying out noise sampling in the calculation process of the first diffusion model, so that noise data obtained by sampling is used for carrying out reverse diffusion, and enhanced coding characteristics are obtained; the continuous-time student model has a learnable parameter, and the learnable parameter is used for matching the output of the teacher model at any time step.

The distillation training mode is a two-step distillation (two-step distillation) mode so as to improve the sampling efficiency.

In this embodiment, noise sampling is performed in the first diffusion model calculation process by using the trained student network, so that the sampling efficiency of the diffusion model can be improved, and the model calculation speed can be improved.

Based on the foregoing embodiments, the present application further provides a target recognition method, and fig. 5 is a flowchart of the target recognition method provided in one embodiment of the present application, where the method at least includes the following steps:

step 501, an image to be identified is acquired.

The image to be identified may be an enhanced image output in step 102, or may be an image that is not processed by the image enhancement model, which is not limited in type in this embodiment.

Step 502, inputting the image to be identified into a pre-trained target identification network to obtain a target identification result corresponding to the image to be identified.

In one example, the target recognition network is obtained by training the fifth image based on the image enhancement network by using the enhanced fifth image and a target label corresponding to the fifth image after image enhancement; the image enhancement network comprises a self-encoder, a first diffusion model and a self-decoder which are sequentially connected, wherein the output result of the self-encoder is corrected based on a preset lookup table when the image enhancement network is trained, and the image enhancement network is trained based on the corrected encoding result.

The related description of this example is detailed in the embodiment shown in fig. 4, and this embodiment is not repeated here.

In summary, in this embodiment, the enhanced image to be identified or the image to be identified that is not processed by the image enhancement model may be input into the target identification network, so that the versatility of the target identification network may be improved while the accuracy of target identification of the target identification network is ensured.

Fig. 6 is a block diagram of an image enhancement apparatus provided in one embodiment of the present application. The device at least comprises the following modules: an image acquisition module 610 and an image enhancement module 620.

An image acquisition module 610, configured to acquire a target image to be enhanced;

the image enhancement module 620 is configured to input the target image into a pre-trained image enhancement network, encode the target image by a self-encoder in the image enhancement network, input the obtained encoding feature into a first diffusion model in the image enhancement model for enhancement, and input the obtained enhanced encoding feature and the encoding feature into a self-decoder in the image enhancement model after fusion, so as to obtain an enhanced image corresponding to the target image;

For relevant details reference is made to the method embodiments described above.

It should be noted that: in the image enhancement device provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the image enhancement device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image enhancement device and the image enhancement method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the image enhancement device and the image enhancement method are detailed in the method embodiments and are not repeated herein.

Fig. 7 is a block diagram of an object recognition apparatus provided in one embodiment of the present application. The device at least comprises the following modules: an image acquisition module 710 and a target recognition module 720.

An image acquisition module 710, configured to acquire an image to be identified;

the target recognition module 720 is configured to input the image to be recognized into a pre-trained target recognition network, so as to obtain a target recognition result corresponding to the image to be recognized;

The target recognition network is obtained by training a fifth image based on the image enhancement network by using the enhanced fifth image and a target label corresponding to the fifth image; the image enhancement network comprises a self-encoder, a first diffusion model and a self-decoder which are sequentially connected, the output result of the self-encoder is corrected based on a preset lookup table when the image enhancement network is trained, and the target recognition network is trained based on the corrected encoding result.

It should be noted that: in the object recognition device provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the object recognition device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the target recognition device and the target recognition method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the target recognition device and the target recognition method are detailed in the method embodiments and are not repeated herein.

Fig. 8 is a block diagram of a training device of an image enhancement network according to an embodiment of the present application, where the image enhancement network includes a self-encoder, a first diffusion model and a self-decoder that are sequentially connected, so that after an input target image is encoded by using the self-encoder, an obtained encoding feature is input into the first diffusion model to be enhanced, and after the obtained enhanced encoding feature and the encoding feature are fused, the obtained enhanced encoding feature and the obtained encoding feature are input into the self-decoder, and an enhanced image corresponding to the target image is obtained. The device at least comprises the following modules: a first acquisition module 810, a first training module 820, a second acquisition module 830, and a second training module 840.

A first obtaining module 810, configured to obtain a first image, where the first image meets a preset quality requirement;

a first training module 820, configured to iteratively train an initial self-encoder and an initial self-decoder based on the first image and the preset lookup table, so as to learn first model parameters of the self-encoder and the self-decoder, and obtain a trained self-encoder and a trained self-decoder;

a second obtaining module 830, configured to obtain a second image and a third image, where a quality of the second image is higher than a quality of the third image, and a quality of the second image meets the preset quality requirement;

The second training module 840 is configured to fix the first model parameters, and perform iterative training on the initial diffusion model based on the second image, the third image, and the preset lookup table, so as to learn the second model parameters of the first diffusion model, and obtain the first diffusion model.

It should be noted that: in the training device of the image enhancement network provided in the above embodiment, only the division of the above functional modules is used for illustration when training the image enhancement network, and in practical application, the above functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the training device of the image enhancement network is divided into different functional modules to complete all or part of the functions described above. In addition, the training device of the image enhancement network provided in the above embodiment and the training method embodiment of the image enhancement network belong to the same concept, and detailed implementation processes of the training device and the training method embodiment of the image enhancement network are detailed in the method embodiment and are not repeated here.

Fig. 9 is a block diagram of an electronic device provided in one embodiment of the present application. The device comprises at least a processor 901 and a memory 902.

Processor 901 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 901 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 901 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 901 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 901 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement the image enhancement or object recognition methods provided by the method embodiments herein.

In some embodiments, the electronic device may further optionally include: a peripheral interface and at least one peripheral. The processor 901, memory 902, and peripheral interfaces may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface via buses, signal lines or circuit boards. Illustratively, peripheral devices include, but are not limited to: radio frequency circuitry, touch display screens, audio circuitry, and power supplies, among others.

Of course, the electronic device may also include fewer or more components, as the present embodiment is not limited in this regard.

Optionally, the application further provides a computer readable storage medium, in which a program is stored, the program being loaded and executed by a processor to implement the image enhancement or object recognition method of the above-mentioned method embodiment.

Optionally, the application further provides a computer product, which includes a computer readable storage medium, where a program is stored, and the program is loaded and executed by a processor to implement the image enhancement or object recognition method of the above method embodiment.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of image enhancement, the method comprising:

acquiring a target image to be enhanced;

2. The method of claim 1, wherein the training process of the image enhancement network comprises:

3. The method of claim 2, wherein the fixing the first model parameters, iteratively training an initial diffusion model based on the second image, the third image, and the preset lookup table to learn second model parameters of the first diffusion model, and obtaining the first diffusion model, comprises:

4. The method of claim 3, wherein the determining the loss information for the initial diffusion model based on the model output result, the modified encoded data, the network output result for the second image, and the network output result for the third image comprises:

5. The method of claim 4, wherein each second image corresponds to at least two third images;

obtaining a t-th diffusion result corresponding to different third images;

6. The method of claim 2, wherein the iteratively training an initial diffusion model based on the second image, the third image, and the predetermined lookup table to learn second model parameters of the first diffusion model, further comprises, after deriving the first diffusion model:

7. The method of claim 2, wherein the acquiring the second image and the third image comprises:

acquiring the second image;

8. The method of claim 7, wherein the pre-trained degradation model comprises a second diffusion model using degradation images obtained from image acquisition based on the different image enhancement scenes to be performed; or,

the pre-trained degradation model includes a cyclic consistency generation countermeasure network that uses a degradation image set with a same or different target than a target in the degradation image.

9. The method according to any one of claims 1 to 8, wherein the first diffusion model is built based on a U-Net neural network comprising a contracted path and an expanded path, the contracted path and the expanded path comprising at least two residual modules, each residual module comprising a first convolution module and a second convolution module connected in sequence;

10. The method according to any one of claims 1 to 8, wherein the inputting the target image into a pre-trained image enhancement network, after obtaining an enhanced image corresponding to the target image, further comprises:

acquiring a fifth image and a target label corresponding to the fifth image;

11. A method of target identification, the method comprising:

acquiring an image to be identified;

12. An electronic device comprising a processor and a memory; the memory having stored therein a program loaded and executed by the processor to implement the image enhancement method according to any one of claims 1 to 10; or to implement the object recognition method as claimed in claim 11.

13. A computer-readable storage medium, characterized in that the storage medium has stored therein a program for implementing the image enhancement method according to any one of claims 1 to 10 when executed by a processor; or to implement the object recognition method as claimed in claim 11.