CN115131218A

CN115131218A - Image processing method, image processing device, computer readable medium and electronic equipment

Info

Publication number: CN115131218A
Application number: CN202110320718.9A
Authority: CN
Inventors: 刘恩雨; 李松南
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2022-09-30

Abstract

The embodiment of the application provides an image processing method and device, a computer readable medium and electronic equipment. The image processing method comprises the following steps: acquiring an image to be processed; inputting the image to be processed into an image generation network, wherein the image generation network is obtained by training according to a joint loss function, and the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated and a reality judging result between the output image and the target image; and acquiring a repaired image which is output by the image generation network and aims at the image to be processed. According to the technical scheme, the image can be repaired, the texture and the details of the image are increased, the repaired image is more vivid, and the image repairing effect is improved.

Description

Image processing method and device, computer readable medium and electronic equipment

Technical Field

The present application relates to the field of computer and communication technologies, and in particular, to an image processing method and apparatus, a computer-readable medium, and an electronic device.

Background

With the continuous development of artificial intelligence in image processing technology, computer equipment can perform personalized processing on images or videos by using machine learning technology, such as image restoration, image enhancement, image segmentation and the like, so as to obtain an image processing result meeting the actual requirements of users.

In the image processing technology based on artificial intelligence, image restoration is an important research direction, and various degradation losses of images can be restored and enhanced. However, the related art cannot achieve a good restoration effect when image restoration is performed.

Disclosure of Invention

Embodiments of the present application provide an image processing method, an image processing apparatus, a computer-readable medium, and an electronic device, so that an image can be repaired at least to a certain extent, and image textures and details are increased, so that the repaired image is more vivid, and an image repairing effect is improved.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided an image processing method including: acquiring an image to be processed; inputting the image to be processed into an image generation network, wherein the image generation network is obtained by training according to a joint loss function, and the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated and a authenticity judgment result between the output image and the target image; and acquiring a repaired image which is output by the image generation network and aims at the image to be processed.

According to an aspect of an embodiment of the present application, there is provided a training method for an image generation network, including: acquiring a plurality of image pairs, wherein each image pair comprises a sample image and a degraded image corresponding to the sample image; inputting the degraded image corresponding to the sample image into a network to be trained to obtain a generated image output by the network to be trained; constructing a joint loss value according to the loss value between the generated image and the sample image and the authenticity judgment result between the generated image and the sample image; and adjusting parameters of the network to be trained according to the joint loss value to obtain an image generation network.

According to an aspect of an embodiment of the present application, there is provided an image processing apparatus including: a first acquisition unit configured to acquire an image to be processed; the image processing device comprises a first input unit, a second input unit and a third input unit, wherein the first input unit is configured to input the image to be processed into an image generation network, the image generation network is obtained through training according to a joint loss function, the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated, and an authenticity judgment result between the output image and the target image; a second acquisition unit configured to acquire a repair image for the image to be processed output by the image generation network.

In some embodiments of the present application, based on the foregoing solution, the image processing apparatus further includes: the third acquisition unit is configured to acquire a plurality of image pairs, and each image pair comprises a sample image and a degraded image corresponding to the sample image; the second input unit is configured to input the degraded image corresponding to the sample image into a network to be trained to obtain a generated image output by the network to be trained; the first construction unit is configured to construct the joint loss function according to the generated image and the sample image, and adjust parameters of the network to be trained according to the joint loss function to obtain the image generation network.

In some embodiments of the present application, based on the foregoing solution, the first building unit includes: an input subunit, configured to input the generated image and the sample image to a pre-trained discrimination network, and determine a first loss function according to an output result of the discrimination network; a calculation subunit configured to calculate an image information difference from the generated image and the sample image, and construct a second loss function from the image information difference; a construction subunit configured to construct the joint loss function from the first loss function and the second loss function.

In some embodiments of the present application, based on the foregoing solution, the input subunit is configured to: acquiring a plurality of authenticity judgment results output by the judgment network; carrying out logarithmic operation on each authenticity judgment result to obtain a plurality of operation results; and obtaining the first loss function according to the sum of the operation results.

In some embodiments of the present application, based on the foregoing solution, the third obtaining unit includes: an acquisition subunit configured to acquire a plurality of sample images; the quality degradation processing subunit is configured to perform image quality degradation processing on each sample image to obtain a quality degradation image corresponding to each sample image; a generating subunit configured to generate the image pair according to the respective sample images and the degraded images corresponding to the respective sample images.

In some embodiments of the present application, based on the foregoing solution, the quality-degradation processing subunit is configured to perform image quality-degradation processing on each sample image, and includes at least one of: blurring each sample image, wherein the blurring treatment comprises one or more of Gaussian blur and motion blur; carrying out downsampling processing on each sample image; carrying out interpolation amplification processing on each sample image, wherein the interpolation amplification processing comprises one or more of bilinear interpolation, bicubic interpolation and nearest neighbor interpolation; performing noise increasing processing on each sample image; and compressing each sample image.

In some embodiments of the present application, based on the foregoing solution, the obtaining subunit is configured to: acquiring a plurality of initial images with image backgrounds and image sizes meeting preset sizes; and screening out images of image quality categories with image quality higher than a set value from the plurality of initial images according to the image quality categories of the initial images to obtain the plurality of sample images.

In some embodiments of the present application, based on the foregoing solution, the image processing apparatus further includes: a third input unit, configured to input the plurality of initial images into an image classification model, where the image classification model includes a feature extraction layer and a full connection layer; the feature extraction unit is configured to perform feature extraction on each initial image based on the feature extraction layer to obtain a target feature vector corresponding to each initial image; and the processing unit is configured to perform full connection processing on the target feature vectors through the full connection layer to obtain the image quality categories corresponding to the initial images.

In some embodiments of the present application, based on the foregoing solution, the processing unit is configured to: performing full-connection processing on the target feature vector through the full-connection layer; normalizing the output of the full connection layer to obtain the prediction probability of each initial image corresponding to each image quality category; and taking the image quality category corresponding to the maximum prediction probability in the prediction probabilities as the image quality category corresponding to each initial image.

In some embodiments of the present application, based on the foregoing scheme, the image classification model is trained by: acquiring a training sample set comprising a plurality of sample images, wherein the sample images carry image quality labeling categories; inputting the plurality of sample images into the image classification model to obtain image quality prediction categories corresponding to the sample images output by the image classification model; constructing a target loss function of the image classification model based on the image quality labeling category and the image quality prediction category; and training the image classification model based on the target loss function to obtain a trained image classification model.

In some embodiments of the present application, constructing the objective loss function of the image classification model based on the image quality labeling category and the image quality prediction category according to the foregoing scheme includes: acquiring the difference between the image quality prediction category corresponding to each sample image and the image quality marking category corresponding to each sample image; and constructing a target loss function of the image classification model according to the obtained difference sum of all the differences.

According to an aspect of an embodiment of the present application, there is provided a training method for an image generation network, including: the fourth acquisition unit is configured to acquire a plurality of image pairs, and each image pair comprises a sample image and a degraded image corresponding to the sample image; the fourth input unit is configured to input the degraded image corresponding to the sample image into a network to be trained to obtain a generated image output by the network to be trained; a second construction unit configured to construct a joint loss value according to a loss value between the generated image and the sample image and a truth discrimination result between the generated image and the sample image; and the adjusting unit is configured to adjust the parameters of the network to be trained according to the joint loss value to obtain an image generation network.

According to an aspect of an embodiment of the present application, there is provided a computer-readable medium on which a computer program is stored, the computer program, when executed by a processor, implementing the image processing method or the training method of an image generation network as described in the above embodiments.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the image processing method or the training method of the image generation network as described in the above embodiments.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method or the training method of the image generation network provided in the above-described various optional embodiments.

In the technical solutions provided in some embodiments of the present application, after an image to be processed is obtained, the image to be processed may be input to an image generation network, and image processing may be performed using the image generation network, where the image generation network may generate a repair image corresponding to the input image to be processed according to the input image to be processed, and the image generation network is obtained by joint loss function training, and on one hand, the output image may be close to or the same as a target image by a loss value between the output image and the target image of the image generation network; on the other hand, because the joint loss function also considers the authenticity judgment result between the output image and the target image during construction, the parameters of the image generation network can be adjusted according to the authenticity judgment result between the output image and the target image, and the output image of the image generation network is ensured to be judged as true as much as possible, so that the output image is close to the target image as much as possible, the effect of false and false is achieved, the missing part in the image to be processed can be completed to a certain extent, and the function of repairing the image to be processed is achieved. Therefore, the technical scheme of the embodiment of the application can increase the texture and the details of the image, so that the repaired image is more vivid, and a better image repairing effect is achieved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic diagram illustrating an implementation environment in which embodiments of the present application may be applied;

FIG. 2 shows a flow diagram of an image processing method according to an embodiment of the present application;

FIG. 3 illustrates a flow diagram for deriving an image generation network from image pairs according to one embodiment of the present application;

FIG. 4 illustrates a flow diagram for constructing a joint loss function from a generated image and a sample image according to one embodiment of the present application;

FIG. 5 shows a flow diagram for acquiring a plurality of image pairs according to an embodiment of the present application;

FIG. 6 illustrates a flow chart of inputting an initial image into an image classification model, resulting in an image quality class corresponding to the initial image, according to an embodiment of the present application;

FIG. 7 shows a schematic structural diagram of an image classification model according to an embodiment of the present application;

FIG. 8 is a flowchart illustrating a full join process through a full join layer to obtain an image quality class corresponding to an initial image according to an embodiment of the application;

FIG. 9 shows a flow diagram of training an image classification model according to an embodiment of the present application;

FIGS. 10A-10B illustrate schematic views of a comparison of a face image to be processed before and after image processing according to an embodiment of the present application;

FIG. 11 shows a flow diagram of a method of training an image generation network according to an embodiment of the present application;

FIG. 12 shows a block diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 13 shows a block diagram of a training apparatus of an image generation network according to an embodiment of the present application;

FIG. 14 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

It is to be noted that the terms used in the specification and claims of the present application and the above-described drawings are only for describing the embodiments and are not intended to limit the scope of the present application. It will be understood that the terms "comprises," "comprising," "includes," "including," "has," "having," and the like, when used herein, specify the presence of stated features, integers, steps, operations, elements, components, and/or groups thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element without departing from the scope of the present invention. Similarly, a second element may be termed a first element. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more.

Referring to fig. 1, fig. 1 shows a scene schematic diagram to which the technical solution of the embodiment of the present application may be applied. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 may be an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data and artificial intelligence platform. The terminal 102 may be, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a desktop computer, or a wearable device.

Those skilled in the art will appreciate that the number of terminals described above is merely illustrative. There may be any number of terminals, for example, only one terminal, or tens or hundreds of terminal devices, or more, according to implementation requirements. The number and the type of the terminal devices are not limited in the embodiments of the present application.

In an embodiment of the present application, after acquiring the to-be-processed image, the terminal 102 may send the to-be-processed image to the server 104; the server 104 is installed with an image generation network, the image generation network is obtained by training according to a joint loss function, the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated and an authenticity judgment result between the output image and the target image, and the server 104 can input an image to be processed into the image generation network and then acquire a restored image output by the image generation network and aiming at the image to be processed.

In an embodiment of the present application, the server 104 may obtain an image generation network by training according to a plurality of image pairs, obtain the plurality of image pairs, where each image pair includes a sample image and a degraded image corresponding to the sample image, then input the degraded image corresponding to the sample image into the network to be trained, obtain a generated image output by the network to be trained, and finally, the server 104 may construct a joint loss function according to a loss value between the generated image and the sample image and an authenticity judgment result between the generated image and the sample image, and finally adjust a parameter of the network to be trained according to the joint loss function, so as to obtain the image generation network.

It should be noted that the image processing method provided in the embodiment of the present application is generally executed by the server 104, and accordingly, the image processing apparatus is generally disposed in the server 104. However, in other embodiments of the present disclosure, the terminal may also have similar functions as the server, so as to execute the image optimization scheme provided by the embodiments of the present disclosure.

The image processing method provided in the embodiment of the present application may relate to technologies such as a machine learning technology in artificial intelligence, and the artificial intelligence technology and the machine learning technology are explained first below.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. Artificial intelligence infrastructures generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operating/interactive systems, and mechatronics. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV), which is a science for researching how to make a machine "see", and further refers to using a camera and a Computer to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further performing image processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal learning.

With the research and development of artificial intelligence technology, the artificial intelligence technology is developed and researched in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service and the like.

The scheme provided by the embodiment of the application relates to an artificial intelligence image processing technology and an image recognition technology, and is specifically explained by the following embodiments:

in the embodiment of the application, an image processing method is provided, which can repair an image to be processed at least to a certain extent, and increase image texture and details, so that the repaired image is more vivid, and a better image repairing effect is achieved. The execution subject of the image processing method provided by this embodiment may be a device having a calculation processing function, for example, a server, a terminal, or a server and a terminal that are executed together, where the terminal and the server may be the terminal 102 and the server 104 shown in fig. 1.

The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:

fig. 2 shows a flowchart of an image processing method according to an embodiment of the present application, which may be performed by a server, which may be the server 103 shown in fig. 1, but which may also be performed by a terminal, such as the terminal 101 shown in fig. 1. Referring to fig. 2, the method includes:

step S210, acquiring an image to be processed;

step S220, inputting the image to be processed into an image generation network, where the image generation network is obtained by training according to a joint loss function, and the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated, and an authenticity judgment result between the output image and the target image.

And step S230, acquiring a repaired image output by the image generation network and aiming at the image to be processed.

In step S210, an image to be processed is acquired.

Specifically, in the present application, the image to be processed refers to an image that needs to be subjected to image processing, for example, image enhancement or image restoration is required. The image may be obtained from a settings database, or may be obtained from other sources, such as an image that may be searched from a search engine, or an image that is captured from a video frame. The type of the image to be processed is not limited in this embodiment, and the image to be processed may be a face image, a text image, a building image, a watermark image, or the like.

When step S210 is implemented for the terminal in fig. 1, the image to be processed may be acquired from a server. In some embodiments, the image to be processed may also be an image captured by a terminal. When step S210 is implemented by the server in fig. 1, the image to be processed may be uploaded to the server by the terminal.

Specifically, after the image to be processed is acquired, the image to be processed may be input into an image generation network, and the image to be processed may be subjected to image processing by the image generation network. The image generation network is obtained through joint loss function training, a restoration image corresponding to the image to be processed can be generated according to the input image to be processed, and the restoration image is an image for restoring, even clearing and enhancing the image to be processed.

Before the image generation network is used to generate the repair image, the image generation network needs to be trained according to the joint loss function. The construction of the joint loss function includes two parts: the first is a loss value between an output image of the image generation network and a target image expected to be generated, and the second is a authenticity judgment result between the output image of the image generation network and the target image expected to be generated.

The present embodiment trains an image generation network according to a joint loss function, in one aspect, the joint loss function includes a loss value between an output image of the image generation network and a target image desired to be generated, and the loss value can make the output image of the image generation network close to or the same as the target image desired to be generated; on the other hand, the joint loss function also comprises a reality judging result between the output image of the image generation network and the target image expected to be generated, so that the parameters of the image generation network can be adjusted according to the reality judging result between the output image and the target image, and the output image of the image generation network is ensured to be judged to be true as much as possible, so that the output image is close to the target image as much as possible, the effect of falseness is achieved, the missing part in the image to be processed can be supplemented to a certain extent, and the function of repairing the image to be processed is achieved.

It should be noted that the image generation network in this embodiment is characterized in that the input and output sizes of the network are consistent, and the network is a three-channel fixed-size image, so that the image to be processed needs to be randomly cropped (for example, the image to be processed needs to be randomly cropped to have a size of 256 × 256) before being input to the image generation network.

Specifically, the image to be processed is input to an image generation network, and the image to be processed is subjected to image processing through the image generation network, so that a repair image for the image to be processed, which is output by the image generation network, is obtained. The restored image is an image in which the image to be processed is restored, even cleared, and enhanced.

Based on the technical scheme of the embodiment, the image generation network is used for image processing, the image generation network can generate a repair image corresponding to the input image to be processed according to the input image to be processed, the image generation network is obtained through joint loss function training, and on one hand, the output image and the target image are close to or identical through a loss value between the output image and the target image of the image generation network; on the other hand, because the joint loss function also considers the authenticity judgment result between the output image and the target image during construction, the parameters of the image generation network can be adjusted according to the authenticity judgment result between the output image and the target image, and the output image of the image generation network is ensured to be judged as true as much as possible, so that the output image is close to the target image as much as possible, the effect of false and false is achieved, the missing part in the image to be processed can be completed to a certain extent, and the function of repairing the image to be processed is achieved. Therefore, the technical scheme of the embodiment of the application can increase the texture and the details of the image, so that the repaired image is more vivid, and a better image repairing effect is achieved.

In an embodiment of the present application, fig. 3 shows a flowchart of obtaining an image generation network according to an image pair according to an embodiment of the present application, and as shown in fig. 3, the flowchart of obtaining an image generation network according to an image pair may specifically include steps S310 to S330, which are specifically described as follows:

in step S310, a plurality of image pairs are acquired, each image pair including a sample image and a degraded image corresponding to the sample image.

In particular, the image generation network may be obtained by acquiring a plurality of image pairs. Each image pair comprises a sample image and a degraded image corresponding to the sample image, and compared with the sample image, the degraded image corresponding to the sample image has poor image quality, low definition and more noise. The degraded image corresponding to the sample image is used as an input sample, and the sample image is used as a verification sample and used for judging whether the performance of the image generation network is stable. That is, the degraded image in each image pair is the image to be processed, and the sample image is the image desired to be processed.

In step S320, the degraded image corresponding to the sample image is input to the network to be trained, and a generated image output by the network to be trained is obtained.

After the plurality of image pairs are obtained, the degraded images corresponding to the sample images can be input into the network to be trained, and the network to be trained can output and generate images according to the input degraded images. The main characteristic of the network to be trained is that the input image and the output image of the network have the same size and are three-channel images with fixed sizes.

In some embodiments, to obtain more image detail, the size may be fixed to 256 × 256, and thus, the sample image may be an image with a size of 256 × 256.

In some embodiments, the network to be trained may select a network with a hopping connection operation, and of course, the network to be trained may also select other alternative generation networks, which is not specifically limited herein.

In step S330, a joint loss function is constructed according to the generated image and the sample image, and parameters of the network to be trained are adjusted according to the joint loss function, so as to obtain an image generation network.

After the degraded image corresponding to the sample image is input into the network to be trained to obtain the generated image output by the network to be trained, a joint loss function can be constructed according to the output generated image and the sample image corresponding to the input degraded image, the network to be trained can be reversely parametrized based on the joint loss function, and after multiple times of iterative training, the image generation network which is used for repairing the image to be processed and has the advantages of convergence of the joint loss function and stable performance can be obtained.

In an embodiment of the present application, fig. 4 shows a flowchart of constructing a joint loss function according to a generated image and a sample image according to an embodiment of the present application, and as shown in fig. 4, constructing a joint loss function according to a generated image and a sample image may specifically include steps S410 to S430, which are described in detail as follows:

and S410, inputting the generated image and the sample image into a pre-trained discrimination network, and determining a first loss function according to an output result of the discrimination network.

The image generation network is used for processing the degraded image and outputting a generated image. And the judging network is used for receiving the generated image and the sample image corresponding to the degraded image and judging whether one image (comprising the sample image and the generated image) is true or false. The training targets of the discriminant network are: and judging the sample image to be true and judging the generated image to be false. And the training targets of the image generation network are: and processing the degraded image to obtain a generated image, and enabling the discrimination network to discriminate the generated image as true, namely enabling the generated image to be very close to the sample image so as to achieve the effect of falseness and falseness. Therefore, through the countermeasure training process of the image generation network and the judgment network, better image generation network parameters can be obtained through optimization, and therefore the generated image is close to the sample image.

Based on the loss function, a loss function in the training process of the image generation network can be constructed through the discrimination network, namely, the generated image and the sample image are used as the input of the discrimination network, and then the first loss function is determined according to the output result of the discrimination network.

In an embodiment of the present application, after the generated image and the sample image are input to the pre-trained discrimination network, a plurality of authenticity discrimination results output by the discrimination network may be obtained, that is, the authenticity discrimination result of the generated image output by the discrimination network and the authenticity discrimination result of the sample image output by the discrimination network are respectively obtained, and determining the first loss function according to the plurality of authenticity discrimination results may specifically include: and carrying out logarithm operation on each authenticity judgment result to obtain a plurality of operation results, and then obtaining a first loss function according to the sum of the operation results. First loss function L _GAN Can be expressed as shown in equation (1):

L _GAN ＝-∑logD(I _t ,G(I _s )) (1)

wherein D represents a pre-trained discrimination network, I _t For a sample image, I _s For degrading the image, G (I) _s ) To generate an image, D (I) _t ,G(I _s ) ) is the authenticity discrimination result output by the discrimination network.

Step S420, calculating an image information difference according to the generated image and the sample image, and constructing a second loss function according to the image information difference.

In this embodiment, in addition to determining the first loss function from the output result of the discrimination network, it is also possible to calculate an image information difference from the generated image and the sample image, and construct the second loss function from the image information difference.

It can be understood that, since the degraded image corresponding to the sample image is used as an input sample, the generated image is used as an output sample, the sample image is used as a verification sample, and the generated image and the sample image should be close to each other no matter on a low-level pixel value or a high-level abstract feature, in order to ensure that the generated image and the sample image are consistent in deep semantics, the generated image and the sample image may be compared, a second loss function may be constructed according to a comparison result, and then parameters of the image generation network may be adjusted based on the second loss function, so that the generated image output by the image generation network is close to or identical to the sample image. Wherein the generated image and the sample image are compared, a second loss function is constructed according to the comparison result, specifically, an image information difference can be calculated according to the generated image and the sample image, and a second loss function L is constructed according to the image information difference ₁ The expression of the second loss function may be as shown in equation (2):

L ₁ ＝||I _t -G(I _s )|| (2)

wherein, I _t As a sample image, I _s For degrading the image, G (I) _s ) To generate an image.

And step S430, constructing a joint loss function according to the first loss function and the second loss function.

After determining the first loss function and the second loss function, a joint loss function may be finally constructed from the first loss function and the second loss function. Illustratively, the joint loss function may be defined as L, and the expression of the joint loss function may be as shown in equation (3):

L＝L ₁ *k ₁ +L _GAN *k ₂ (3)

wherein L is _GAN Is a first loss function, L ₁ Is a secondLoss function, k ₁ 、k ₂ Are all constant. After the experiment, when k is ₁ ＝1,k ₂ The image generation network obtained when the value is 0.0001 has the best performance and the best stability, and the image generation network has the best effect of repairing the image.

In an embodiment of the present application, fig. 5 shows a flowchart of acquiring a plurality of image pairs according to an embodiment of the present application, and as shown in fig. 5, acquiring the plurality of image pairs may specifically include steps S510 to S530, which are specifically described as follows:

step S510, a plurality of sample images are acquired.

Specifically, the sample image may be an image obtained by screening an existing image set, and the screened sample image is an image with high image quality.

And step S520, performing image degradation processing on each sample image to obtain a degradation image corresponding to each sample image.

After obtaining a plurality of sample images, image degradation processing may be performed on each sample image to obtain a degradation image corresponding to each sample image, where the image degradation processing is to reduce the quality of the sample image.

In one embodiment of the present application, the image degradation processing performed on each sample image may include at least one of: blurring each sample image, wherein the blurring treatment comprises one or more of Gaussian blur and motion blur; carrying out downsampling processing on each sample image; carrying out interpolation amplification processing on each sample image, wherein the interpolation amplification processing comprises one or more of bilinear interpolation, bicubic interpolation and nearest neighbor interpolation; carrying out noise increasing processing on each sample image; and compressing each sample image.

The above-mentioned image degradation processing method will be briefly described below.

(1) Fuzzification processing

The blurring process may include one or more of gaussian blur, which is essentially a data smoothing process, motion blur, which is a static scene or a series of pictures like a movie or a fast moving object in an animation causing noticeable blur dragging marks.

Gaussian blur is an image blur filter that uses a normal distribution to compute the transform for each pixel in an image. The N-dimensional spatial normal distribution equation is as shown in formula (4):

in two dimensions, it can be defined as shown in equation (5):

where r is the blur radius and σ is the standard deviation of the normal distribution. In two-dimensional space, the contour lines of the curved surface generated by the formula are concentric circles which are normally distributed from the center. And transforming a convolution matrix formed by pixels with distribution not equal to zero and the original image. The value of each pixel is a weighted average of the values of the surrounding neighboring pixels. The original pixel has the largest gaussian distribution value and therefore has the largest weight, and the adjacent pixels have smaller and smaller weights as they are farther from the original pixel.

In Gaussian blur, the selection of the size of a Gaussian kernel influences the blur degree, and the larger the Gaussian kernel is, the larger the blur is. Through experiments, when each image is subjected to Gaussian blur, the Gaussian kernel size randomly takes values in odd numbers in the (3-13).

The principle of motion blur is: assuming a clear planar picture y (x), only the blurred image (y × psf) (x) can be observed, where psf (x) is a known point Spread function psf (point Spread function) and represents a convolution (convolution), assuming that the convolution is discrete and noisy, the observed image can be expressed as shown in formula (6):

z(x)＝(y*psf)(x)+ε(x) (6)

where ε (x) is noise. x is n distributed in order ₁ *n ₂ In grid X, X ═ k ₁ ,k ₂ :k ₁ ＝1,2,....,n ₁ ,k ₂ ＝1,2,....,n ₂ }。

The simplest dynamic fuzzy model can be expressed as shown in equation (7) and equation (8) in the form of discrete convolution by using a linear point spread function:

psf(x ₁ ,x ₂ )＝0,otherwise (8)

wherein, L is the length of the kernel and is determined by the motion rate, and the slope is determined by the motion direction. This model assumes that the motion of all pixels in the picture is the same. Through experiments, when each image is subjected to motion blur, L is in the range of 0-12 degrees, and the motion direction is randomly selected in the range of 0-90 degrees.

(2) Down sampling process

Downsampled (downsampled), also known as reduced image or downsampling, has two main purposes: 1. fitting the image to the size of the display area; 2. a thumbnail of the corresponding image is generated. For an image with the size of M multiplied by N, the image is subjected to s times down sampling, and the image with the size of (M/s) × (N/s) is obtained.

(3) Interpolation amplification process

Amplifying the image to the original size, wherein the amplifying method randomly selects the following three types: bilinear interpolation, bicubic interpolation, nearest neighbor interpolation.

(4) Increasing noise processing

The probability density of gaussian noise follows a gaussian distribution, and the expression of the gaussian distribution is shown in equation (9):

where μ represents the mean of the distribution, σ represents the standard deviation of the distribution, and σ represents the mean of the distribution ² Representing the variance of the distribution. In embodiments of the present disclosure, μ and σ may be determined randomly, and the pair of probability distributions is used after determining the parametersNoise is added to the color values of individual pixels in the image, and finally the color values of the pixels are scaled to [0,255 ]]The addition of gaussian noise can be achieved.

The probability density of poisson noise follows a poisson distribution, the expression of which is shown in equation (10):

wherein the parameter λ may be determined randomly. After determining the parameters, the color values of the pixels in the image may be processed according to the probability distribution of poisson noise to add poisson noise.

The salt and pepper noise is that black and white pixel points are randomly added to the image, the number of the black and white pixels can be controlled by a signal-to-noise ratio, and the signal-to-noise ratio can be randomly determined. After the signal-to-noise ratio is appointed, the total pixel number can be determined according to the signal-to-noise ratio, then the position of the pixel to be added with noise is randomly acquired in the image area corresponding to the total pixel number, the pixel value of the position is set to be 255 or 0, and finally the steps are repeated for other pixels in the image area, so that the salt and pepper noise can be added to the image.

(5) Compression process

The image is stored in a jpeg format, and it is noted that jpeg compression quality parameters need to be set to be [ 50-100 ], and the jpeg compression quality parameters are randomly selected.

In step S530, an image pair is generated from each sample image and the degraded image corresponding to each sample image.

After the image degradation processing is performed to obtain the degraded image corresponding to each sample image, an image pair may be generated from each sample image and the degraded image corresponding to each sample image.

In some embodiments, the sample image acquired in step S510 may be an image with high image quality screened from an existing image set, or an image with an image background and an image size satisfying a preset size.

In this embodiment, the step of acquiring a plurality of sample images may specifically include: firstly, acquiring a plurality of initial images with image backgrounds and image sizes meeting preset sizes; then, according to the image quality category of each initial image, an image of an image quality category with an image quality higher than a set value is screened out from the plurality of initial images to obtain a plurality of sample images.

The preset size may be set according to actual conditions, for example, the preset size is 500 (pixels) × 500 (pixels) or more in width and height. The purpose of screening is to ensure that the image details meeting the preset size are clearer, thereby being beneficial to network training; furthermore, images with backgrounds are screened from the existing image set, the purpose is that the backgrounds of the images may contain a large amount of useful information, and the screening of the images with the backgrounds can provide more background information for network training.

After a plurality of initial images having image backgrounds and image sizes meeting the preset size are acquired, images of image quality categories with image quality higher than a set value can be screened out from the plurality of initial images according to the image quality categories of the initial images, so as to obtain a plurality of sample images. The image quality categories are obtained by dividing the image quality conditions, and the image quality categories are different and have different image qualities.

In some embodiments, different image quality classes may be made to correspond to different image quality ranges, for example, an image quality class of image quality between 0-30 may be defined as poor, an image quality class of image quality between 30-60 as medium, and an image quality class of image quality between 60-100 as good.

Therefore, when the sample image is screened, an image of an image quality class having an image quality higher than a set value can be screened from the plurality of initial images, and for example, if the set value is 60, an image of a good image quality class can be screened from the plurality of initial images, and the image of the good image quality class can be used as the sample image.

In an embodiment of the present application, the image quality category of each initial image may be determined by an image classification model, and fig. 6 shows a flowchart of inputting the initial image into the image classification model to obtain the image quality category corresponding to the initial image according to an embodiment of the present application, as shown in fig. 6, the method may specifically include steps S610 to S630, which are described in detail as follows:

step S610, inputting a plurality of initial images into an image classification model, wherein the image classification model comprises a feature extraction layer and a full connection layer.

And S620, extracting the features of each initial image based on the feature extraction layer to obtain a target feature vector corresponding to each initial image.

And step S630, carrying out full connection processing on the target characteristic vectors through a full connection layer to obtain the image quality categories corresponding to the initial images.

As described above, the initial image is an image having an image background and an image size satisfying a preset size, and after a plurality of initial images are acquired, the plurality of initial images may be input to the image classification model to classify each initial image, so as to obtain an image quality category of each initial image.

Fig. 7 is a schematic structural diagram of a component of an image classification model according to an embodiment of the present application, and a classification process of multiple initial images by the image classification model is described below with reference to fig. 7:

referring to fig. 7, the image classification model provided in the embodiment of the present application includes an input layer, a feature extraction layer, and a full connection layer. In actual implementation, each initial image is input to an image classification model through an input layer, and the initial image is subjected to feature extraction through a feature extraction layer of the image classification model to obtain a target feature vector of the initial image; and connecting the target characteristic vectors corresponding to the initial images through a full connection layer of the image classification model to obtain the prediction probability of each initial image corresponding to each image quality category, and then determining the image quality category corresponding to each initial image according to the prediction probability of each initial image corresponding to each image quality category.

Here, in practical applications, the feature extraction layer of the image classification model may be formed by any network having a function of extracting features from an image, such as a Convolutional Neural Network (CNN) including a Convolutional layer, a pooling layer, and a fully connected layer.

Based on the above embodiment, the image classification model includes the feature extraction layer and the full-link layer, and the feature extraction layer and the full-link layer learn the relevance between the image quality categories in the training process. The method comprises the steps of obtaining an initial image, inputting the initial image into a trained image classification model, carrying out feature extraction on an image to be processed based on a feature extraction layer to obtain a target feature vector corresponding to the initial image, inputting the target feature vector into a full connection layer, carrying out full connection processing on the target feature vector through the full connection layer to obtain an image quality category corresponding to the initial image, classifying the initial image based on the relevance among the image quality categories, and improving the classification accuracy.

In some embodiments, as shown in fig. 8, step S630 may specifically include steps S810 to S830, which are described in detail as follows:

and step S810, carrying out full connection processing on the target characteristic vector through a full connection layer.

In this embodiment, target feature vectors corresponding to each initial image output by each feature extraction layer may be obtained, and the target feature vectors output by each feature extraction layer are connected through the full connection layer, so as to obtain the output of the full connection layer.

And S820, normalizing the output of the full connection layer to obtain the prediction probability of each initial image corresponding to each image quality category.

Step S830, the image quality category corresponding to the maximum prediction probability in the prediction probabilities is used as the image quality category corresponding to each initial image.

Specifically, the output of the full connection layer may be normalized to obtain the prediction probability of each initial image corresponding to each image quality category. Next, a maximum prediction probability of the prediction probabilities may be determined, an image quality category corresponding to the maximum prediction probability may be determined, and the image quality category corresponding to the maximum prediction probability may be used as the image quality category corresponding to each initial image.

In an embodiment of the present application, fig. 9 shows a flowchart of training an image classification model according to an embodiment of the present application, and as shown in fig. 9, the training of the image classification model may specifically include steps S910 to S940, which are described in detail as follows:

step S910, a training sample set containing a plurality of sample images is obtained, and the sample images carry image quality labeling categories.

Before model training, a training sample set for model training needs to be constructed, the training sample set includes a plurality of sample images, and the sample images may be stored locally, stored by other devices, acquired from a network, or photographed in real time, but not limited thereto.

Each sample image in the training sample set carries an image quality annotation class, which may be an image quality class manually annotated in advance, and illustratively, the annotated image quality class may be: poor, medium and good.

And step S920, inputting the plurality of sample images into an image classification model to obtain image quality prediction categories corresponding to the sample images output by the image classification model.

Specifically, when model training is carried out, a plurality of sample images are input into an image classification model through an input layer, and feature extraction is carried out on the sample images through a feature extraction layer of the image classification model to obtain feature vectors of the sample images; the feature vectors corresponding to the sample images are connected through the full connection layer of the image classification model to obtain the prediction probability of the sample images corresponding to each image quality category, and then the image quality prediction categories corresponding to the sample images can be determined according to the prediction probability of the sample images corresponding to each image quality category.

And S930, constructing a target loss function of the image classification model based on the image quality labeling category and the image quality prediction category.

The image classification model further comprises a Loss Function (Loss Function), and the Loss Function is used for representing the degree of inconsistency between the image quality prediction category and the image quality labeling category. It can be understood that the loss functions have various types, and the corresponding types of the loss functions can be selected according to requirements in practical application.

In some embodiments, the difference between the image quality prediction category corresponding to each sample image and the image quality annotation category corresponding to each sample image may be obtained; and then, according to the obtained difference sum of all differences, constructing a target loss function of the image classification model.

And S940, training the image classification model based on the target loss function to obtain the trained image classification model.

Further, the model parameters in the image classification model can be adjusted through the determined target loss function, so that the loss between the image quality prediction category and the image quality annotation category predicted by the image classification model after the model parameters are adjusted tends to be converged.

Accordingly, the model parameters of the image classification model may be adjusted as follows:

when the value of the target loss function is determined to exceed a preset threshold value, determining a corresponding error signal based on the loss function of the image classification model; and (4) reversely propagating the error signal in the image classification model, and adjusting the model parameters of the image classification model in the process of propagation.

Describing backward propagation, inputting training sample data into an input layer of a neural network model, passing through a hidden layer, finally reaching an output layer and outputting a result, which is a forward propagation process of the neural network model, wherein because an output result of the neural network model has an error with an actual result, an error between the output result and the actual result is calculated and is propagated backward from the output layer to the hidden layer until the error is propagated to the input layer, and in the process of backward propagation, the value of a model parameter is adjusted according to the error; and continuously iterating the process until convergence.

In an embodiment of the present application, fig. 10A-10B show schematic diagrams of comparison between a to-be-processed face image and a to-be-processed face image before and after image processing according to an embodiment of the present application, as shown in fig. 10A, the to-be-processed face image is shown, it can be seen that the to-be-processed face image is blurred and low in definition, and after the to-be-processed face image is processed by an image generation network, a face image with high definition, face texture, and rich face details can be obtained, as shown in fig. 10B.

According to the image processing method provided by the embodiment of the application, on the basis of keeping the image characteristics of the repaired image of the image to be processed, the repaired image has the characteristics of high definition, image detail and accurate noise removal, the repaired image is more vivid, and the image repairing effect is better.

In addition, the image processing method in the embodiment of the application can also be applied to processing of video frame images, and the repaired images of the front and rear video frames do not have difference in repairing effect, so that the video is more coherent, and the flicker phenomenon does not occur.

In an embodiment of the present application, fig. 11 shows a flowchart of a training image generation network according to an embodiment of the present application, and as shown in fig. 11, the flowchart of the training image generation network may specifically include steps S1110 to S1140, which is described in detail as follows:

in step S1110, a plurality of image pairs are acquired, each image pair including a sample image and a degraded image corresponding to the sample image.

When training an image generation network, a plurality of image pairs as training samples need to be acquired in advance. The image pair includes a sample image and a degraded image corresponding to the sample image. Compared with the sample image, the degraded image corresponding to the sample image has poor image quality, low definition and more noise. The degraded image corresponding to the sample image is used as an input sample, and the sample image is used as a verification sample and used for judging whether the performance of the image generation network is stable. That is, the degraded image in each image pair is the image to be processed, and the sample image is the image desired to be processed.

In step S1120, the degraded image corresponding to the sample image is input to the network to be trained, and a generated image output by the network to be trained is obtained.

In step S1130, a joint loss value is constructed from the loss value between the generated image and the sample image and the authenticity discrimination result between the generated image and the sample image.

It can be understood that, since the degraded image corresponding to the sample image is used as an input sample, the generated image is used as an output sample, and the sample image is used as a verification sample, the generated image and the sample image should be close to each other in terms of low-level pixel values or high-level abstract features, so as to ensure that the generated image and the sample image are consistent in deep-level semantics, on one hand, the generated image and the sample image may be compared, and a loss value may be constructed according to the loss value between the generated image and the sample image, on the other hand, a loss value may be constructed according to the authenticity judgment result between the generated image and the sample image, and finally, a joint loss value is constructed.

In step S1140, parameters of the network to be trained are adjusted according to the joint loss value, so as to obtain an image generation network.

Further, after the joint loss value is constructed, the network to be trained can be reversely parametrized based on the joint loss value, and after multiple times of iterative training, the image generation network which is used for repairing the image to be processed and has the advantages of convergence of the joint loss value and stable performance can be obtained.

Embodiments of the apparatus of the present application are described below, which may be used to perform the image processing methods in the above-described embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the image processing method described above in the present application.

Fig. 12 shows a block diagram of an image processing apparatus according to an embodiment of the present application, and referring to fig. 12, an image processing apparatus 1200 according to an embodiment of the present application includes: a first acquisition unit 1202, a first input unit 1204, and a second acquisition unit 1206.

The first obtaining unit 1202 is configured to obtain an image to be processed; the first input unit 1204 is configured to input the image to be processed to an image generation network, where the image generation network is trained according to a joint loss function, and the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated, and a authenticity judgment result between the output image and the target image; the second obtaining unit 1206 is configured to obtain a repair image for the image to be processed, which is output by the image generation network.

In some embodiments of the present application, the image processing apparatus further comprises: the third acquisition unit is configured to acquire a plurality of image pairs, and each image pair comprises a sample image and a degraded image corresponding to the sample image; the second input unit is configured to input the degraded image corresponding to the sample image into a network to be trained to obtain a generated image output by the network to be trained; the first construction unit is configured to construct the joint loss function according to the generated image and the sample image, and adjust parameters of the network to be trained according to the joint loss function to obtain the image generation network.

In some embodiments of the present application, the first building unit comprises: an input subunit, configured to input the generated image and the sample image to a pre-trained discrimination network, and determine a first loss function according to an output result of the discrimination network; a calculation subunit configured to calculate an image information difference from the generated image and the sample image, and construct a second loss function from the image information difference; a construction subunit configured to construct the joint loss function from the first loss function and the second loss function.

In some embodiments of the present application, the input subunit is configured to: obtaining a plurality of authenticity judging results output by the judging network; carrying out logarithmic operation on each authenticity judgment result to obtain a plurality of operation results; and obtaining the first loss function according to the sum of the operation results.

In some embodiments of the present application, the third obtaining unit includes: an acquisition subunit configured to acquire a plurality of sample images; the quality degradation processing subunit is configured to perform image quality degradation processing on each sample image to obtain a quality degradation image corresponding to each sample image; a generating subunit configured to generate the image pair according to the respective sample images and the degraded images corresponding to the respective sample images.

In some embodiments of the present application, the degradation processing subunit is configured to perform image degradation processing on each sample image, including at least one of: blurring each sample image, wherein the blurring treatment comprises one or more of Gaussian blur and motion blur; performing down-sampling processing on each sample image; carrying out interpolation amplification processing on each sample image, wherein the interpolation amplification processing comprises one or more of bilinear interpolation, bicubic interpolation and nearest neighbor interpolation; performing noise-adding processing on each sample image; and compressing each sample image.

In some embodiments of the present application, the obtaining subunit is configured to: acquiring a plurality of initial images with image backgrounds and image sizes meeting preset sizes; and screening images of image quality categories with image quality higher than a set value from the plurality of initial images according to the image quality categories of the initial images to obtain the plurality of sample images.

In some embodiments of the present application, the image processing apparatus further comprises: a third input unit, configured to input the plurality of initial images into an image classification model, where the image classification model includes a feature extraction layer and a full connection layer; the feature extraction unit is configured to perform feature extraction on each initial image based on the feature extraction layer to obtain a target feature vector corresponding to each initial image; and the processing unit is configured to perform full connection processing on the target feature vectors through the full connection layer to obtain the image quality categories corresponding to the initial images.

In some embodiments of the present application, the processing unit is configured to: performing full connection processing on the target feature vector through the full connection layer; normalizing the output of the full connection layer to obtain the prediction probability of each image quality category corresponding to each initial image; and taking the image quality category corresponding to the maximum prediction probability in the prediction probabilities as the image quality category corresponding to each initial image.

In some embodiments of the present application, the image classification model is trained by: acquiring a training sample set comprising a plurality of sample images, wherein the sample images carry image quality labeling categories; inputting the plurality of sample images into the image classification model to obtain image quality prediction categories corresponding to the sample images output by the image classification model; constructing a target loss function of the image classification model based on the image quality labeling category and the image quality prediction category; and training the image classification model based on the target loss function to obtain a trained image classification model.

In some embodiments of the present application, constructing an objective loss function of the image classification model based on the image quality annotation class and the image quality prediction class comprises: acquiring the difference between the image quality prediction category corresponding to each sample image and the image quality marking category corresponding to each sample image; and constructing a target loss function of the image classification model according to the obtained difference sum of all the differences.

FIG. 13 shows a block diagram of a training apparatus of an image generation network according to an embodiment of the present application.

Referring to fig. 13, an image generation network training apparatus 1300 according to an embodiment of the present application includes: a fourth acquisition unit 1302, a fourth input unit 1304, a second construction unit 1306 and an adjustment unit 1308.

The fourth obtaining unit 1302 is configured to obtain a plurality of image pairs, where each image pair includes a sample image and a degraded image corresponding to the sample image; the fourth input unit 1304 is configured to input the degraded image corresponding to the sample image into a network to be trained, so as to obtain a generated image output by the network to be trained; the second constructing unit 1306 is configured to construct a joint loss value according to a loss value between the generated image and the sample image and a authenticity judgment result between the generated image and the sample image; the adjusting unit 1308 is configured to adjust the parameter of the network to be trained according to the joint loss value, so as to obtain an image generation network.

FIG. 14 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.

It should be noted that the computer system 1400 of the electronic device shown in fig. 14 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 14, a computer system 1400 includes a Central Processing Unit (CPU)1401, which can perform various appropriate actions and processes, such as executing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1402 or a program loaded from a storage portion 1408 into a Random Access Memory (RAM) 1403. In the RAM1403, various programs and data necessary for system operation are also stored. The CPU 1401, ROM1402, and RAM1403 are connected to each other via a bus 1404. An Input/Output (I/O) interface 1405 is also connected to the bus 1404.

The following components are connected to the I/O interface 1405: an input portion 1406 including a keyboard, a mouse, and the like; an output portion 1407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1408 including a hard disk and the like; and a communication portion 1409 including a Network interface card such as a Local Area Network (LAN) card, a modem, and the like. The communication section 1409 performs communication processing via a network such as the internet. The driver 1410 is also connected to the I/O interface 1405 as necessary. A removable medium 1411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1410 as necessary, so that a computer program read out therefrom is installed into the storage section 1408 as necessary.

In particular, according to embodiments of the present application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411. When the computer program is executed by a Central Processing Unit (CPU)1401, various functions defined in the system of the present application are executed.

It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs, which when executed by one of the electronic devices, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring an image to be processed;

inputting the image to be processed into an image generation network, wherein the image generation network is obtained by training according to a joint loss function, and the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated and a reality judging result between the output image and the target image;

and acquiring a repaired image which is output by the image generation network and aims at the image to be processed.

2. The method of claim 1, further comprising:

acquiring a plurality of image pairs, wherein each image pair comprises a sample image and a degraded image corresponding to the sample image;

inputting the degraded image corresponding to the sample image into a network to be trained to obtain a generated image output by the network to be trained;

and constructing the joint loss function according to the generated image and the sample image, and adjusting the parameters of the network to be trained according to the joint loss function to obtain the image generation network.

3. The method of claim 2, wherein constructing the joint loss function from the generated image and the sample image comprises:

inputting the generated image and the sample image into a pre-trained discrimination network, and determining a first loss function according to an output result of the discrimination network;

calculating an image information difference according to the generated image and the sample image, and constructing a second loss function according to the image information difference;

and constructing the joint loss function according to the first loss function and the second loss function.

4. The method of claim 3, wherein determining a first loss function based on the output of the discrimination network comprises:

acquiring a plurality of authenticity judgment results output by the judgment network;

carrying out logarithmic operation on each authenticity judgment result to obtain a plurality of operation results;

and obtaining the first loss function according to the sum of the operation results.

5. The method of any of claims 2 to 4, wherein the acquiring a plurality of image pairs comprises:

acquiring a plurality of sample images;

performing image degradation processing on each sample image to obtain a degraded image corresponding to each sample image;

and generating the image pair according to the sample images and the degraded images corresponding to the sample images.

6. The method of claim 5, wherein performing image degradation on each sample image comprises at least one of:

performing fuzzification processing on each sample image, wherein the fuzzification processing comprises one or more of Gaussian blur and motion blur;

carrying out downsampling processing on each sample image;

carrying out interpolation amplification processing on each sample image, wherein the interpolation amplification processing comprises one or more of bilinear interpolation, bicubic interpolation and nearest neighbor interpolation;

performing noise increasing processing on each sample image;

and compressing each sample image.

7. The method of claim 5, wherein said obtaining a plurality of sample images comprises:

acquiring a plurality of initial images which have image backgrounds and the sizes of which meet preset sizes;

and screening out images of image quality categories with image quality higher than a set value from the plurality of initial images according to the image quality categories of the initial images to obtain the plurality of sample images.

8. The method of claim 7, further comprising:

inputting the plurality of initial images into an image classification model, wherein the image classification model comprises a feature extraction layer and a full connection layer;

extracting the features of each initial image based on the feature extraction layer to obtain a target feature vector corresponding to each initial image;

and carrying out full connection processing on the target characteristic vectors through the full connection layer to obtain the image quality categories corresponding to the initial images.

9. The method according to claim 8, wherein performing full-concatenation processing on the target feature vector through the full-concatenation layer to obtain an image quality category corresponding to each of the initial images includes:

performing full-connection processing on the target feature vector through the full-connection layer;

normalizing the output of the full connection layer to obtain the prediction probability of each image quality category corresponding to each initial image;

and taking the image quality category corresponding to the maximum prediction probability in the prediction probabilities as the image quality category corresponding to each initial image.

10. The method of claim 9, wherein the image classification model is trained by:

acquiring a training sample set containing a plurality of sample images, wherein the sample images carry image quality labeling categories;

inputting the plurality of sample images into the image classification model to obtain image quality prediction categories corresponding to the sample images output by the image classification model;

constructing a target loss function of the image classification model based on the image quality annotation class and the image quality prediction class;

and training the image classification model based on the target loss function to obtain a trained image classification model.

11. The method of claim 10, wherein constructing the target loss function of the image classification model based on the image quality labeling class and the image quality prediction class comprises:

acquiring the difference between the image quality prediction category corresponding to each sample image and the image quality annotation category corresponding to each sample image;

and constructing a target loss function of the image classification model according to the obtained difference sum of all the differences.

12. A method of training an image generation network, the method comprising:

constructing a joint loss value according to the loss value between the generated image and the sample image and the authenticity judgment result between the generated image and the sample image;

and adjusting parameters of the network to be trained according to the joint loss value to obtain an image generation network.

13. An image processing apparatus, characterized in that the apparatus comprises:

a first acquisition unit configured to acquire an image to be processed;

the image processing device comprises a first input unit, a second input unit and a third input unit, wherein the first input unit is configured to input the image to be processed into an image generation network, the image generation network is obtained through training according to a joint loss function, the joint loss function is constructed according to a loss value between an output image of the image generation network and a target image expected to be generated, and an authenticity judgment result between the output image and the target image;

a second acquisition unit configured to acquire a repair image for the image to be processed output by the image generation network.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the image processing method of any one of claims 1 to 11 or the training method of the image generation network of claim 12.

15. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the image processing method of any one of claims 1 to 11 or the training method of the image generation network of claim 12.