CN113128389A

CN113128389A - Image processing method and device, computer readable storage medium and electronic device

Info

Publication number: CN113128389A
Application number: CN202110400357.9A
Authority: CN
Inventors: 杨健榜
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-07-16

Abstract

The disclosure provides an image processing method, an image processing device, a computer readable storage medium and an electronic device, and relates to the technical field of image processing. The image processing method comprises the following steps: performing key point detection on the image to be processed to determine a key area mask image of the image to be processed; combining the key area mask image, and performing downsampling processing on the image to be processed to obtain an intermediate characteristic image; and performing up-sampling processing on the intermediate characteristic graph to obtain a repaired image corresponding to the image to be processed. The present disclosure can improve an image restoration effect.

Description

Image processing method and device, computer readable storage medium and electronic device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device.

Background

In the field of image processing technology, problems such as unclear and distorted acquired images may occur due to scenes, photographing devices, and the like. In this case, the image can be restored to improve the image quality.

At present, the image restoration algorithm generally has the problems of insufficient detail expression and poor restoration effect.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, thereby overcoming, at least to some extent, the problem of poor image restoration effect.

According to a first aspect of the present disclosure, there is provided an image processing method including: performing key point detection on the image to be processed to determine a key area mask image of the image to be processed; combining the key area mask image, and performing downsampling processing on the image to be processed to obtain an intermediate characteristic image; and performing up-sampling processing on the intermediate characteristic graph to obtain a repaired image corresponding to the image to be processed.

According to a second aspect of the present disclosure, there is provided an image processing apparatus comprising: the key point detection module is used for detecting key points of the image to be processed so as to determine a key area mask image of the image to be processed; the down-sampling module is used for carrying out down-sampling processing on the image to be processed by combining the key area mask image to obtain an intermediate characteristic image; and the up-sampling module is used for performing up-sampling processing on the intermediate characteristic graph to obtain a repaired image corresponding to the image to be processed.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method described above.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising a processor; a memory for storing one or more programs which, when executed by the processor, cause the processor to implement the image processing method described above.

In the technical solutions provided by some embodiments of the present disclosure, a key point detection is performed on an image to be processed to determine a key area mask map of the image to be processed, a down-sampling process is performed on the image to be processed in combination with the key area mask map to obtain an intermediate feature map, and an up-sampling process is performed on the intermediate feature map to obtain a repaired image corresponding to the image to be processed. According to the scheme, the image restoration is realized by combining the key area mask image, the details of the key area can be better recovered, the detail expressive force is improved, and the image restoration effect is favorably improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture for an image processing scheme to which embodiments of the present disclosure are applied;

FIG. 2 illustrates a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure;

FIG. 3 schematically shows a flow chart of an image processing method according to an exemplary embodiment of the present disclosure;

FIG. 4 shows a network architecture diagram for image processing of an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of a network for generating a key region mask map according to an embodiment of the disclosure;

FIG. 6 shows a schematic structural diagram of a discrimination network of an embodiment of the present disclosure;

FIG. 7 is a diagram schematically illustrating a comparison of image restoration effects using the image processing method according to an exemplary embodiment of the present disclosure;

fig. 8 schematically shows a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 9 schematically shows a block diagram of an image processing apparatus according to another exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation. In addition, all of the following terms "first", "second", and "third" are used for distinguishing purposes only, and should not be construed as limiting the present disclosure.

FIG. 1 shows a schematic diagram of an exemplary system architecture for an image processing scheme of an embodiment of the present disclosure.

As shown in fig. 1, the system architecture may include a terminal device 1001 and a server 1002. The terminal device 1001 and the server 1002 may be connected via a network, and the connection type of the network may include, for example, a wired line, a wireless communication link, or an optical fiber cable.

It should be understood that the number of terminal devices 1001 and servers 1002 is merely illustrative. There may be any number of terminal devices and servers, as desired for implementation. For example, the server 1002 may be a server cluster composed of a plurality of servers, or the like. The server 1002 may also be referred to as a cloud or cloud server.

The terminal device 1001 may interact with the server 1002 through a network to receive or transmit a message or the like. Although fig. 1 illustrates a smart phone as an example, the terminal device 1001 further includes a tablet computer, a smart wearable device, a personal computer, or the like having a shooting function. The terminal device 1001 may also be referred to as a terminal, a mobile terminal, a smart terminal, or the like.

In the case where the terminal device 1001 executes the image processing process according to the exemplary embodiment of the present disclosure, first, the terminal device 1001 may perform key point detection on an image to be processed to determine a key area mask map of the image to be processed, and then perform downsampling on the image to be processed in combination with the key area mask map to obtain an intermediate feature map; next, the terminal device 1001 may perform upsampling processing on the intermediate feature map, and obtain a repaired image corresponding to the image to be processed.

Taking the image to be processed as the face image as an example, the terminal device 1001 may determine coordinates of rectangular frames of eyes, a nose, a mouth, and the like in the face image based on a key point detection algorithm, so as to form a key area mask map. The terminal 1001 may take this mask as attention and incorporate it into the downsampling process. That is, the key region mask map is combined to perform downsampling processing on the face image to obtain an intermediate feature map. Then, the terminal device 1001 performs upsampling processing on the intermediate feature map to obtain a repaired face image.

In addition, the face image can also be an image cut from the original image. The source of the face image or the original image is not limited in the present disclosure, and for example, the face image or the original image may be an image captured by the terminal device 1001 through the camera module, or an image acquired from another device or a server, such as an old photograph.

In the case where the server 1002 executes the image processing procedure according to the exemplary embodiment of the present disclosure, first, the server 1002 may acquire an image to be processed sent by the terminal device 1001, perform key point detection on the image to be processed to determine a key area mask map of the image to be processed, and then perform downsampling on the image to be processed by combining the key area mask map to obtain an intermediate feature map; next, the server 1002 may perform upsampling processing on the intermediate feature map to obtain a repaired image corresponding to the image to be processed.

After the server 1002 determines the repaired image, the server 1002 may send the repaired image to the terminal device 1001, and the repaired image is saved and/or displayed by the terminal device 1001.

Similarly, in the case where the image to be processed is a face image, the server 1002 may generate a repaired face image.

In the scheme for realizing image restoration by applying the image processing process, the key area mask is taken as attention, so that the image restoration is realized, the details of the key area can be better recovered, the detail expressive force is improved, and the image restoration effect is favorably improved. By taking the face image as an example, the details of eyes, nose, mouth and the like on the face can be better repaired, and the expressive force of the image is improved.

It should be noted that any step of processing the image in the scheme may be executed by the terminal device 1001 or the server 1002, and the present disclosure is not limited thereto.

FIG. 2 shows a schematic diagram of an electronic device suitable for use in implementing exemplary embodiments of the present disclosure. The terminal device of the exemplary embodiment of the present disclosure may be configured as in fig. 2. It should be noted that the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, cause the processor to implement the image processing method of the exemplary embodiments of the present disclosure.

Specifically, as shown in fig. 2, the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management Module 240, a power management Module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication Module 250, a wireless communication Module 260, an audio Module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor Module 280, a display 290, a camera Module 291, a pointer 292, a motor 293, a button 294, and a Subscriber Identity Module (SIM) card interface 295. The sensor module 280 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

It is to be understood that the illustrated structure of the embodiments of the present disclosure does not constitute a specific limitation to the electronic device 200. In other embodiments of the present disclosure, electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 may include one or more processing units, such as: the Processor 210 may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural Network Processor (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors. Additionally, a memory may be provided in processor 210 for storing instructions and data.

The electronic device 200 may implement a shooting function through the ISP, the camera module 291, the video codec, the GPU, the display screen 290, the application processor, and the like. In some embodiments, the electronic device 200 may include 1 or N camera modules 291, where N is a positive integer greater than 1, and if the electronic device 200 includes N cameras, one of the N cameras is a main camera.

The images to be processed or the original images mentioned in the present disclosure can be captured by the camera module 291.

Internal memory 221 may be used to store computer-executable program code, including instructions. The internal memory 221 may include a program storage area and a data storage area. The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 200.

The present disclosure also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device.

A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

The following description is given taking as an example a case where a terminal device executes the image processing method of the exemplary embodiment of the present disclosure, and in this case, the image processing apparatus may be configured in the terminal device.

Fig. 3 schematically shows a flowchart of an image processing method of an exemplary embodiment of the present disclosure. Referring to fig. 3, the image processing method may include the steps of:

and S32, carrying out key point detection on the image to be processed to determine a key area mask image of the image to be processed.

In the exemplary embodiment of the present disclosure, the image to be processed is any image to be repaired. For example, the image to be processed may be a low resolution, noisy and/or blurred image. The present disclosure does not limit the image content contained in the image to be processed, for example, the image to be processed may contain a human face image, an animal and plant image, or any object.

The terminal equipment can directly acquire the image to be processed. In one embodiment, the terminal device may capture an image to be processed by means of a camera module equipped in the terminal device, for example, a camera of the terminal device may be directed at a human face to obtain an image of the human face to be processed. In another embodiment, the terminal device may obtain the image to be processed from its own album, other device or server, for example, obtain an old photo as the image to be processed from the server.

And under the condition that the image to be processed is the face image, the terminal equipment can also acquire an original image and cut the image to be processed from the original image.

Firstly, the terminal equipment shoots or acquires an original image from other equipment through a camera module. And then, the terminal equipment performs alignment operation on the original image by using standard face data, and determines the area of the image to be processed on the original image. The standard face data is coordinates of a pre-configured standard face, and includes, for example, standard eye coordinates, nose coordinates, mouth coordinates, and the like. The coordinates can be used to realize the alignment operation between the original image and the standard face, for example, the eyes in the original image correspond to the eyes of the standard face, the nose in the original image corresponds to the nose of the standard face, and the mouth in the original image corresponds to the mouth of the standard face.

Then, the terminal device cuts the original image based on the area of the image to be processed on the original image to obtain the image to be processed and a background image of the original image except the image to be processed.

It should be noted that the original image may include at least one image to be processed. In a scene of face image restoration, if the original image does not contain the face image, the terminal device may prompt.

After determining the image to be processed, the terminal device may perform key point detection on the image to be processed to determine a key area mask map of the image to be processed. The key area mask can be understood as a one-dimensional feature map, and the feature map contains coordinate information of the key area.

In the exemplary embodiment of the present disclosure, the Mask may also be referred to as Mask, and still taking the face image as an example, in the case that the key area is defined as eyes, a nose, and a mouth, the determined key area Mask image includes rectangular frame coordinates of the eyes, the nose, and the mouth in the face, and the rest area of the image to be processed is configured as the background.

In addition, it should be noted that the critical area can be flexibly set, which is not limited by the present disclosure.

Specifically, the image to be processed may be processed by using the key point detection network, so as to obtain a key area mask map. The specific structure of the key point detection network is not limited in the present disclosure, and for example, the key point detection network may be constructed in a cascade shape regression-based manner, in a deep learning-based manner, in a parametric shape model-based manner, and so on.

And S34, combining the key area mask image, and performing downsampling processing on the image to be processed to obtain an intermediate characteristic image.

In an exemplary embodiment of the present disclosure, the down-sampling process may include at least one down-sampling process.

In the case that the downsampling process only includes one downsampling process, the feature map obtained by subjecting the image to be processed to the downsampling process may be fused with the key region mask map determined in step S32 to obtain an intermediate feature map. Specifically, the feature map obtained after the image to be processed is subjected to the downsampling process and the key area mask map are subjected to dot product operation to obtain an intermediate feature map.

In the case where the down-sampling process includes a plurality of down-sampling processes, these down-sampling processes may be sequentially performed on a cascade basis.

Specifically, the feature map obtained in the ith down-sampling process may be fused with the key region mask map, and the (i + 1) th down-sampling process may be performed on the fused feature map until all the down-sampling processes are performed, so as to obtain the intermediate feature map. Wherein i is a positive integer. That is, for each result of the down-sampling process, the key region mask map is fused, and the next down-sampling process is performed.

In this way, the down-sampling process is carried out by taking the key area as attention, and the repair of the details of the key area is facilitated.

Similarly, the process of fusion is also to perform a dot product operation. In this case, for the fusion process at the ith time, the terminal device may compare the size of the key region mask map with the size of the feature map obtained by the downsampling process at the ith time.

If the size of the key region mask map matches the size of the feature map obtained by the i-th down-sampling process, that is, the size of the key region mask map is consistent with the size of the feature map obtained by the i-th down-sampling process, for example, both are 512 × 512, the fusion process of the two can be directly performed.

If the size of the key area mask graph is not matched with the size of the feature graph obtained in the ith down-sampling process, firstly adjusting the size of the key area mask graph to be matched with the size of the feature graph obtained in the ith down-sampling process, and fusing the feature graph obtained in the ith down-sampling process with the size-adjusted key area mask graph.

For example, the size of the critical area mask is 512 × 512, and the size of the feature map obtained in the i-th downsampling process is 256 × 256, the size of the critical area mask may be adjusted to 1/2, and then the dot product operation may be completed, and finally the intermediate feature map is obtained.

And S36, performing up-sampling processing on the intermediate characteristic graph to obtain a repaired image corresponding to the image to be processed.

After the intermediate feature map is determined, the terminal device may perform upsampling processing on the intermediate feature map to obtain a repaired image corresponding to the image to be processed. Taking the to-be-processed image as the to-be-processed face image as an example, the repaired face image can be obtained through the step S36.

Under the condition that the image to be processed is the image obtained by cutting the original image, after the restored image corresponding to the image to be processed is obtained, the background images except the image to be processed in the restored image original image can be spliced to obtain the image corresponding to the original image after the face area is restored.

According to some embodiments of the disclosure, the terminal device may repair the image to be processed through the trained key point detection network and the trained generation network in the generation countermeasure network, so as to obtain a repaired image corresponding to the image to be processed. Wherein the generation network comprises an encoder and a decoder.

It should be noted that the training process of the network model may be performed by the terminal device. In addition, the training process may also be executed by a server or other devices, and the terminal device may directly obtain the trained network model to execute the exemplary scheme of the present disclosure.

These embodiments will be described below with reference to fig. 4.

As shown in fig. 4, the image to be processed may be input to a key point detection network, and the key point detection network is used to perform key point detection on the image to be processed, so as to determine a key area mask map of the image to be processed.

In addition, the image to be processed can also be input into an encoder for generating a network, and downsampling processing is carried out on the image to be processed by utilizing the encoder and combining a key area mask image determined by the key point detection network to obtain an intermediate characteristic image.

Fig. 5 shows a schematic structural diagram of an encoder according to an embodiment of the present disclosure. As shown in fig. 5, the key region mask map of the human face and the feature map extracted by the encoder may be subjected to dot-product fusion on different scales.

And then, inputting the intermediate characteristic diagram into a decoder of the generation network, and performing up-sampling processing on the intermediate characteristic diagram by using the decoder to obtain a repaired image corresponding to the image to be processed. Still referring to fig. 5, the same scale features on the encoder can be added to the decoder for information transfer using a skip connection. Specifically, the generation network may add a feature map obtained by down-sampling by the encoder and a feature map of the same scale obtained by the decoder, and obtain a restored image.

In the training phase, the generation of the countermeasure network requires updating of parameters in the generation of the countermeasure network in combination with the output result of the discrimination network.

Fig. 6 schematically shows a structural diagram of a discriminant network according to an embodiment of the present disclosure. As shown in fig. 6, the discrimination network, for example, undergoes a 32-fold down-sampling process, and outputs a probability feature map, elements on the feature map are probability values between 0 and 1, and then the probability feature map is averaged to obtain a "true" or "false" discrimination result.

The training process for generating the countermeasure network utilized by the present disclosure is described below using a set of training samples as an example.

First, the terminal device acquires a first sample image, wherein the first sample image is an image with relatively high image quality. The terminal device may perform image degradation processing on the first sample image to obtain a second sample image, where the image quality of the second sample image is lower than that of the first sample image. The image degradation processing may include, for example, operations of adding noise, blurring an image, and the like.

And then, inputting the first sample image into a key point detection network to obtain a key area mask image of the first sample image. It is readily understood that the first sample image may be labeled manually.

And inputting the second sample image into an encoder, and performing downsampling processing on the second sample image by using the encoder in combination with the key area mask map of the first sample image to obtain an intermediate sample characteristic map.

And then inputting the intermediate sample characteristic diagram into a decoder to obtain a third sample image, inputting the third sample image into a discrimination network in the generation countermeasure network, and training the generation countermeasure network by combining the output result of the discrimination network.

In an exemplary embodiment of the present disclosure, in order to improve the repairing effect, the following loss function is configured:

L_total＝α*L_gan+β*L_pixel+λ*L_perceptual

wherein L is_totalAs a function of total loss; l is_ganThe loss of the image pixel value distribution is represented, and the similar pixel value distribution of the image generated by the generation network and the corresponding high-quality image is ensured; l is_pixelAn L1 norm loss representing the image generated by the generation network and the high quality image, ensuring that the generated image is similar to the corresponding high quality image at the pixel level; l is_perceptualRepresenting the perceptual loss, the L2 norm distance between the generated image and the corresponding high quality image's features on vgg19 pre-trained using the Imagenet dataset was calculated. Further, α, β, λ are weights of the above items, respectively.

In order to verify the effect of the image processing method, 358 low-quality portraits are restored, the restoration effect is good, and details of eyes, mouths, teeth and the like are restored well. Fig. 7 shows a comparison graph of the effect of repairing three low-quality human images by applying the image processing method of the present disclosure, and it can be seen that the repaired images are clearer and the expression of the detail information of the face is stronger.

In addition, the method also adopts an image quality evaluation algorithm BRISQUE algorithm to perform batch evaluation and scoring on 358 images before and after restoration. In the evaluation process, in order to evaluate the repairing effect of different resolutions, evaluation is performed under two different resolutions, and the scoring result is shown in table 1:

TABLE 1

Image size	Raw data set	Data set after repair
			512*512	57.6	49.36
256*256	42.186	37.45

Wherein a smaller score indicates better image quality. Therefore, the quality of the repaired image is obviously improved through the image processing process disclosed by the invention.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Further, an image processing apparatus is also provided in the present exemplary embodiment.

Fig. 8 schematically shows a block diagram of an image processing apparatus of an exemplary embodiment of the present disclosure. Referring to fig. 8, the image processing apparatus 8 according to an exemplary embodiment of the present disclosure may include a keypoint detection module 81, a downsampling module 83, and an upsampling module 85.

Specifically, the key point detection module 81 may be configured to perform key point detection on the image to be processed, so as to determine a key area mask map of the image to be processed; the down-sampling module 83 may be configured to perform down-sampling on the image to be processed in combination with the key region mask map to obtain an intermediate feature map; the upsampling module 85 may be configured to perform upsampling processing on the intermediate feature map to obtain a repaired image corresponding to the image to be processed.

According to an exemplary embodiment of the present disclosure, the down-sampling process includes a plurality of down-sampling processes. In this case, the down-sampling module 83 may be configured to perform: fusing the feature map obtained in the ith down-sampling process with the key area mask map, and executing the (i + 1) th down-sampling process on the fused feature map until the execution of the down-sampling processes is finished to obtain an intermediate feature map; wherein i is a positive integer.

According to an exemplary embodiment of the present disclosure, the process of fusing the feature map obtained by the i-th downsampling process and the key region mask map by the downsampling module 83 may be configured to perform: comparing the size of the key area mask image with the size of the feature image obtained in the ith down-sampling process; and if the size of the key area mask graph is not matched with the size of the feature graph obtained in the ith down-sampling process, adjusting the size of the key area mask graph to be matched with the size of the feature graph obtained in the ith down-sampling process, and fusing the feature graph obtained in the ith down-sampling process and the key area mask graph after size adjustment.

According to an exemplary embodiment of the present disclosure, the image to be processed is a face image. In this case, the key point detection module 81 may be further configured to perform: acquiring an original image; carrying out alignment operation on the original image by using standard face data so as to determine the area of the image to be processed on the original image; and cutting the original image based on the area of the image to be processed on the original image to obtain the image to be processed and a background image in the original image except the image to be processed.

According to an exemplary embodiment of the present disclosure, the upsampling module 85 may be further configured to perform: and after a restored image corresponding to the image to be processed is obtained, splicing the restored image with the background image to obtain an image corresponding to the original image and with the face area restored.

According to the exemplary embodiment of the disclosure, the image to be processed is repaired through the trained key point detection network and the trained generation network in the generation countermeasure network, so that a repaired image corresponding to the image to be processed is obtained; wherein the generation network comprises an encoder and a decoder. Under the condition, the key point detection network detects key points of the image to be processed so as to determine a key area mask image of the image to be processed; the encoder performs downsampling processing on the image to be processed by combining the key area mask image determined by the key point detection network to obtain an intermediate characteristic image; and the decoder performs up-sampling processing on the intermediate characteristic graph to obtain a repaired image corresponding to the image to be processed.

It will be appreciated that the keypoint detection module 81 may be configured to include a keypoint detection network, the downsampling module 83 may be configured to include an encoder, and the upsampling module 85 may be configured to include a decoder.

According to an exemplary embodiment of the present disclosure, referring to fig. 9, the image processing apparatus 9 may further include a network training module 91, compared to the image processing apparatus 8.

In particular, the network training module 91 may be configured to perform: carrying out image degradation processing on the first sample image to obtain a second sample image; inputting the first sample image into a key point detection network to obtain a key area mask image of the first sample image; inputting the second sample image into an encoder, and performing downsampling processing on the second sample image by using the encoder and combining the key area mask map of the first sample image to obtain an intermediate sample characteristic map; inputting the intermediate sample feature map into a decoder to obtain a third sample image; and inputting the third sample image into a discrimination network in the generation countermeasure network, and training the generation countermeasure network by combining the output result of the discrimination network.

Since each functional module of the image processing apparatus according to the embodiment of the present disclosure is the same as that in the embodiment of the method described above, it is not described herein again.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. An image processing method, comprising:

performing key point detection on an image to be processed to determine a key area mask image of the image to be processed;

combining the key area mask image, and performing downsampling processing on the image to be processed to obtain an intermediate characteristic image;

and performing up-sampling processing on the intermediate characteristic graph to obtain a repaired image corresponding to the image to be processed.

2. The image processing method according to claim 1, wherein the down-sampling process includes a plurality of down-sampling processes; combining the key area mask image, performing downsampling processing on the image to be processed to obtain an intermediate characteristic image, wherein the downsampling processing comprises the following steps:

fusing the feature map obtained in the ith down-sampling process with the key area mask map, and executing the (i + 1) th down-sampling process on the fused feature map until the execution of the plurality of down-sampling processes is finished to obtain the intermediate feature map;

wherein i is a positive integer.

3. The image processing method according to claim 2, wherein fusing the feature map obtained in the i-th downsampling process with the key region mask map comprises:

comparing the size of the key area mask image with the size of the feature image obtained in the ith down-sampling process;

and if the size of the key area mask graph is not matched with the size of the feature graph obtained in the ith down-sampling process, adjusting the size of the key area mask graph to be matched with the size of the feature graph obtained in the ith down-sampling process, and fusing the feature graph obtained in the ith down-sampling process with the size-adjusted key area mask graph.

4. The image processing method according to claim 1, wherein the image to be processed is a face image; wherein the image processing method further comprises:

acquiring an original image;

carrying out alignment operation on the original image by using standard face data so as to determine the area of the image to be processed on the original image;

and cutting the original image based on the area of the image to be processed on the original image to obtain the image to be processed and a background image except the image to be processed in the original image.

5. The image processing method according to claim 4, wherein after obtaining the restored image corresponding to the image to be processed, the image processing method further comprises:

and splicing the repaired image with the background image to obtain an image of the repaired face region corresponding to the original image.

6. The image processing method according to any one of claims 1 to 5, wherein the image to be processed is restored through a trained key point detection network and a trained generation network in a generation countermeasure network, so as to obtain a restored image corresponding to the image to be processed; wherein the generation network comprises an encoder and a decoder;

performing key point detection on the image to be processed by using the key point detection network to determine a key area mask image of the image to be processed;

utilizing the encoder to perform downsampling processing on the image to be processed by combining the key area mask image determined by the key point detection network to obtain the intermediate feature image;

and performing upsampling processing on the intermediate characteristic graph by using the decoder to obtain a repaired image corresponding to the image to be processed.

7. The image processing method according to claim 6, characterized in that the image processing method further comprises:

carrying out image degradation processing on the first sample image to obtain a second sample image;

inputting the first sample image into the key point detection network to obtain a key area mask image of the first sample image;

inputting the second sample image into the encoder, and performing downsampling processing on the second sample image by using the encoder and combining the key area mask map of the first sample image to obtain an intermediate sample characteristic map;

inputting the intermediate sample feature map into the decoder to obtain a third sample image;

and inputting the third sample image into a discrimination network in the generated countermeasure network, and training the generated countermeasure network by combining an output result of the discrimination network.

8. An image processing apparatus characterized by comprising:

the key point detection module is used for detecting key points of the image to be processed so as to determine a key area mask image of the image to be processed;

the down-sampling module is used for carrying out down-sampling processing on the image to be processed by combining the key area mask image to obtain an intermediate characteristic image;

and the up-sampling module is used for performing up-sampling processing on the intermediate characteristic diagram to obtain a repaired image corresponding to the image to be processed.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out an image processing method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a processor;

a memory for storing one or more programs which, when executed by the processor, cause the processor to implement the image processing method of any one of claims 1 to 7.