CN114299089A

CN114299089A - Image processing method, image processing device, electronic equipment and storage medium

Info

Publication number: CN114299089A
Application number: CN202111614202.1A
Authority: CN
Inventors: 蔡晓霞; 丁予康; 戴宇荣; 闻兴; 徐宁
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-08

Abstract

The present disclosure relates to an image processing method, an apparatus, an electronic device, and a storage medium, the image processing method including: acquiring an input image; performing region-of-interest detection in the input image, extracting the region-of-interest from the input image, and performing first image processing on the extracted region-of-interest, the first image processing being region-of-interest restoration processing for improving distortion of the region-of-interest; acquiring attribute information related to the region of interest, wherein the attribute information comprises at least one of the following attribute information: first attribute information of the region of interest and second attribute information of an occlusion region related to the region of interest; and performing second image processing on the region of interest subjected to the first image processing based on the acquired attribute information to obtain an output image, wherein the second image processing is used for improving the image quality of the region of interest subjected to the first image processing.

Description

Image processing method, image processing device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of signal processing, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

When the image content is used, the image content is often subjected to operations such as forwarding and editing for many times, so that the overall image quality is relatively low, and particularly, if the quality of an area of interest in the image, which is focused by human eyes, is reduced, the appearance of the image content by a user is greatly influenced. For example, the face information belongs to the eye focus attention information in the video consumption scene. For example, in short video-type video content, human faces are heavier, and users are more sensitive to human face information. Therefore, the quality of the face information of the short video scene needs to be paid particular attention. The short video content is often forwarded and edited many times, and the overall video picture quality is low. The face region often has various problems such as noise, blur, coding distortion and the like. The region of interest can be restored to a certain extent by the region of interest restoration technology, for example, a region of interest of a human eye such as a human face five sense organs can be restored, but since the existing region of interest restoration technology only processes the region of interest with serious distortion, that is, only focuses on improving the distortion of the region of interest itself, poor consistency between the region of interest and a background region after the region of interest is restored or an occlusion region which is not processed and is related to the region of interest may be processed, so that a higher image processing quality still cannot be obtained. For example, when the background noise is heavy, an inter-frame flicker or discontinuity problem and a color shift problem may occur between a region of interest (e.g., a human face region) and a background region near the region of interest, and in the case where the region of interest is occluded, the occluded region is prone to a distortion, a distortion phenomenon, and the like.

Disclosure of Invention

The present disclosure provides an image processing method, an electronic device and a storage medium, so as to at least solve the problem in the related art that high image processing quality still cannot be obtained because only distortion of an image is processed.

According to a first aspect of embodiments of the present disclosure, there is provided an image processing method including: acquiring an input image; performing region-of-interest detection in the input image, extracting the region-of-interest from the input image, and performing first image processing on the extracted region-of-interest, wherein the first image processing is region-of-interest restoration processing for improving distortion of the region-of-interest; acquiring attribute information related to the region of interest, wherein the attribute information comprises at least one of the following attribute information: first attribute information of the region of interest and second attribute information of an occlusion region related to the region of interest; and performing second image processing on the region of interest after the first image processing is performed based on the acquired attribute information to obtain an output image, wherein the second image processing is used for improving the image quality of the region of interest after the first image processing is performed.

Optionally, the performing, based on the acquired attribute information, second image processing on the region of interest after the performing of the first image processing includes performing one of the following operations: performing color correction on the region of interest on which the first image processing has been performed, based on the first attribute information; fusing the region of interest on which the first image processing has been performed with the input image based on the first attribute information and/or the second attribute information; color correction is performed on the region of interest after the first image processing has been performed based on the first attribute information, and the region of interest after the color correction has been performed is fused with the input image based on the first attribute information and/or the second attribute information.

Optionally, the acquiring attribute information related to the region of interest includes: acquiring first attribute information of the region of interest based on the extracted region of interest; and/or performing occlusion detection in the input image and acquiring second attribute information of the occlusion region based on an occlusion detection result and a region-of-interest detection result.

Optionally, the obtaining first attribute information of the region of interest based on the extracted region of interest includes: acquiring position information of each sub-region of the region of interest based on the extracted region of interest; and acquiring the first attribute information according to the acquired position information of each sub-region.

Optionally, the performing occlusion detection in the input image and acquiring second attribute information of the occlusion region based on an occlusion detection result and a region-of-interest detection result includes: performing occlusion detection in the input image to obtain position information of all occlusion regions in the input image; acquiring the second attribute information based on the position information of all the occlusion regions and the position information of the region of interest.

Optionally, the obtaining the second attribute information based on the position information of all the occlusion regions and the position information of the region of interest includes: obtaining first mask graphs of all the occlusion areas based on the position information of all the occlusion areas; acquiring the position information of a template of the region of interest; and obtaining a second mask map of an occlusion region related to the region of interest as the second attribute information based on the first mask map, the region of interest template position information and the position information of the region of interest.

Optionally, the performing color correction on the region of interest after the first image processing is performed based on the first attribute information includes: and respectively performing color correction on each sub-area of the region of interest after the first image processing is performed based on the first attribute information.

Optionally, the fusing the region of interest after the first image processing is performed with the input image based on the first attribute information and/or the second attribute information includes: obtaining a third mask map of the region of interest based on the first attribute information; fusing the region of interest on which the first image processing has been performed with the input image based on the second mask map and the third mask map.

Optionally, the obtaining a third mask map of the region of interest based on the first attribute information includes: obtaining a fourth mask map of the region of interest based on the first attribute information; acquiring the position information of a template of the region of interest; and obtaining the third mask image based on the fourth mask image, the position information of the interested area template and the position information of the interested area.

Optionally, the fusing the region of interest after the first image processing is performed with the input image based on the second mask map and the third mask map includes: fusing the second mask image and the third mask image to obtain a fused mask image; and fusing the region of interest subjected to the first image processing with the input image based on the fusion mask map.

Optionally, the fusing the region of interest after the first image processing is performed with the input image based on the first attribute information and/or the second attribute information includes: obtaining a third mask map of the region of interest based on the first attribute information; fusing the region of interest after the first image processing has been performed with the input image based on the third mask map.

Optionally, the fusing the region of interest after the first image processing is performed with the input image based on the first attribute information and/or the second attribute information includes: and fusing the region of interest subjected to the first image processing with the input image based on the second mask map.

Optionally, the performing, based on the first attribute information, color correction on each sub-region of the region of interest after the first image processing has been performed includes: estimating color correction parameters of sub-regions of the region of interest after the first image processing has been performed, respectively, based on the first attribute information; and respectively carrying out color correction on each subregion by using the estimated color correction parameter of each subregion.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including: an image acquisition unit configured to acquire an input image; an image processing unit configured to: performing region-of-interest detection in the input image, extracting the region-of-interest from the input image, and performing first image processing on the extracted region-of-interest, wherein the first image processing is region-of-interest restoration processing for improving distortion of the region-of-interest; acquiring attribute information related to the region of interest, wherein the attribute information comprises at least one of the following attribute information: first attribute information of the region of interest and second attribute information of an occlusion region related to the region of interest; and performing second image processing on the region of interest after the first image processing is performed based on the acquired attribute information to obtain an output image, wherein the second image processing is used for improving the image quality of the region of interest after the first image processing is performed.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the image processing method as described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions, which when executed by at least one processor, cause the at least one processor to perform the image processing method as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the image processing method as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: according to the image processing method of the embodiment of the present disclosure, after a region-of-interest recovery process for improving distortion of an extracted region of interest is performed on the extracted region of interest, attribute information about the region of interest is acquired, and second image processing for improving image quality of the region of interest after first image processing has been performed is performed on the extracted region of interest based on the acquired attribute information (including at least one of first attribute information of the region of interest itself and second attribute information of an occlusion region related to the region of interest), by using the first attribute information of the region of interest when second image processing is performed, a problem of poor consistency of the region of interest with a background region after region-of-interest recovery is improved; in addition, the second attribute information of the occlusion region related to the region of interest is considered when the second image processing is executed, so that the occlusion region is prevented from being processed to generate distortion, and finally, an output image with higher quality can be obtained.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is an exemplary system architecture to which exemplary embodiments of the present disclosure may be applied;

FIG. 2 is a flow chart of an image processing method of an exemplary embodiment of the present disclosure;

fig. 3 is a schematic diagram of an example of an image processing method according to an exemplary embodiment of the present disclosure;

FIG. 4 is a diagram illustrating an operation of generating an occlusion region mask diagram shown in FIG. 3;

FIG. 5 is a schematic diagram illustrating the color correction operation shown in FIG. 3;

fig. 6 is a diagram illustrating an operation of generating a face region mask map shown in fig. 3;

FIG. 7 is a schematic diagram illustrating the face region backfill fusion operation shown in FIG. 3;

fig. 8 is a block diagram showing an image processing apparatus of an exemplary embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Fig. 1 illustrates an exemplary system architecture 100 in which exemplary embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages (e.g., image or video data upload requests, image or video data download requests), etc. Various communication client applications, such as audio and video communication software, audio and video recording software, instant messaging software, conference software, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103. Further, various image or video shooting editing applications may also be installed on the

terminal apparatuses

101, 102, and 103. The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and capable of playing, recording, editing, etc. audio and video, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, etc. When the

terminal device

101, 102, 103 is software, it may be installed in the electronic devices listed above, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or it may be implemented as a single software or software module. And is not particularly limited herein.

The

terminal devices

101, 102, 103 may be equipped with image capturing means (e.g. a camera) to capture image or video data. In practice, the smallest visual unit that makes up a video is a Frame (Frame). Each frame is a static image. Temporally successive sequences of frames are composited together to form a motion video. Further, the

terminal apparatuses

101, 102, 103 may also be mounted with a component (e.g., a speaker) for converting an electric signal into sound to play the sound, and may also be mounted with a device (e.g., a microphone) for converting an analog audio signal into a digital audio signal to pick up the sound. In addition, the

terminal apparatuses

101, 102, 103 can perform voice communication or video communication with each other.

The server 105 may be a server providing various services, such as a background server providing support for multimedia applications installed on the

terminal devices

101, 102, 103. The background server can analyze, store and the like the received data such as the audio and video data uploading request, can also receive the audio and video data downloading request sent by the

terminal equipment

101, 102 and 103, and feeds back the audio and video data indicated by the audio and video data downloading request to the

terminal equipment

101, 102 and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the image processing method provided by the embodiment of the present disclosure is generally executed by a terminal device, but may also be executed by a server, or may also be executed by cooperation of the terminal device and the server. Accordingly, the image processing apparatus may be provided in the terminal device, the server, or both the terminal device and the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation, and the disclosure is not limited thereto.

Fig. 2 is a flowchart of an image processing method of an exemplary embodiment of the present disclosure.

Referring to fig. 2, in step S210, an input image is acquired. Any image acquisition method may be employed to acquire the input image, and the exemplary embodiments of the present disclosure do not set any limit to the manner in which the input image is acquired.

In step S220, region-of-interest detection is performed in the input image, the region of interest is extracted from the input image, and first image processing is performed on the extracted region of interest. Here, the first image processing is a region-of-interest restoration processing for improving distortion of the region of interest. As an example, the region of interest may be a human face region, but is not limited thereto. In the case where the region of interest is a face region, the first image processing is face region restoration processing. Hereinafter, the image processing method described in fig. 2 will be described in conjunction with the example of fig. 3 for the sake of clarity. In the example of fig. 3, it is assumed that the region of interest is a face region.

Specifically, in step S220, a region of interest in the input image may be detected using any object detection method, and after performing the region of interest detection, the region of interest may be extracted from the input image based on the region of interest detection result, and then, the first image processing may be performed with respect to the extracted region of interest. As an example, the above-described region of interest detection result may be position information of the region of interest. For example, region of interest detection may be performed to obtain region of interest keypoint information and to obtain location information of the region of interest based on the keypoint information. Alternatively, the extracted region of interest may be subjected to the above-described first image processing after being mapped to a template of a predetermined size.

As shown in fig. 3, the position information of the face region is obtained by performing face region detection on the input image. After the face position information is obtained, the face region is extracted, and the face region is mapped onto a template with a fixed size through affine transformation (also referred to as aligning the face region to the template with the fixed size). The aligned face region further performs face region restoration to obtain a clearer face image.

In step S230, attribute information about the region of interest is acquired. Here, the attribute information includes at least one of the following attribute information: first attribute information of the region of interest itself, and second attribute information of an occlusion region associated with the region of interest.

Specifically, according to an exemplary embodiment, the obtaining of the attribute information of the region of interest may include: acquiring first attribute information of the region of interest based on the extracted region of interest; and/or performing occlusion detection in the input image and acquiring second attribute information of the occlusion region based on an occlusion detection result and a region-of-interest detection result. That is, only the first attribute information, only the second attribute information, or both the first attribute information and the second attribute information may be acquired. Here, the occlusion region related to the region of interest may be a text region, but is not limited thereto.

For example, the first attribute information of the region of interest may be acquired by: first, position information of each sub-region of the region of interest is acquired based on the extracted region of interest, and then the first attribute information is acquired according to the acquired position information of each sub-region. The first attribute information is obtained according to the acquired position information of each sub-region, so that each sub-region can be conveniently and respectively processed in a targeted manner during subsequent second image processing, and the subsequent image processing quality is improved. For example, as will be mentioned later, in the case where the first attribute information is obtained from the acquired position information of each sub-region, each sub-region may be corrected separately according to the first attribute information, thereby improving the color correction effect.

For example, as shown in fig. 3, after the face region is extracted through face region extraction, face region attribute detection may be performed on the extracted face region to obtain position information of each sub-region of the face region (e.g., sub-regions of facial features, skin, hair, etc.). Specifically, the category of each sub-region of the face region may be first identified, and the position information of each sub-region may be acquired. Subsequently, a category labeling information map of the face region can be further obtained according to the acquired position information of each sub-region, and the category labeling information map is used as the first attribute information of the face region. The above face region attribute detection module can detect the attribute information of the face region through deep learning, and can label the regions of facial features, skin, hair and the like.

According to the exemplary embodiment of the present disclosure, in step S230, only the first attribute information of the region of interest may be acquired, only the second attribute information of the occlusion region related to the region of interest may be acquired, or both of them may be acquired.

Regarding the acquisition of the second attribute information, for example, the second attribute information of the occlusion region may be acquired by performing occlusion detection in the input image and based on an occlusion detection result and a region-of-interest detection result. Specifically, performing occlusion detection in the input image and acquiring the second attribute information of the occlusion region based on an occlusion detection result and a region-of-interest detection result may include: performing occlusion detection in the input image to obtain position information of all occlusion regions in the input image; acquiring the second attribute information based on the position information of all the occlusion regions and the position information of the region of interest. As an example, the second attribute information may be a mask map (mask map) of an occlusion region related to the region of interest, but is not limited thereto. When the second attribute information is a mask map of an occlusion region related to the region of interest, for example, acquiring the second attribute information based on the position information of all occlusion regions and the position information of the region of interest includes: firstly, obtaining first mask images of all the occlusion areas based on the position information of all the occlusion areas; secondly, acquiring the position information of the template of the region of interest; and finally, obtaining a second mask image of an occlusion area related to the interested area as the second attribute information based on the first mask image, the interested area template position information and the interested area position information. Here, the first mask pattern and the second mask pattern may be mask patterns having the same size as that of the input image. For example, in the first mask map, the values at the positions of all the occlusion regions are 1, and the values of the other regions may be 0. In the second mask image, the value of the position of the occlusion region related to the region of interest is 1, and the values of the other regions are 0.

For example, in the example of fig. 3, occlusion detection is also performed on the input image in order to obtain position information of all occlusion regions. The occlusion region may be a text region. As an example, occlusion detection may be traditional text region detection or text region detection based on deep learning, which can label all text regions in an image to generate a whole text region mask map. Subsequently, second attribute information of the occlusion region may be obtained based on the result of the occlusion detection and the region of interest detection result. For example, in the example of fig. 3, the occlusion detection result is the above-described whole text region mask map, the face region detection result may be face key point information, and the occlusion region mask map may be generated based on the occlusion detection result and the region-of-interest detection result. When the occlusion area is a character area, the mask image is the mask image of the character area related to the face area.

Fig. 4 is a schematic diagram illustrating an operation of generating an occlusion region mask map shown in fig. 3. As shown in fig. 4, in the generation operation of the mask map of fig. 3, first, face key point information as a face region detection result is acquired, and template face position information is acquired. And obtaining the face position information according to the face key point information. And obtaining template face position information according to the face template image. By mapping the face position information to the above template face position information, affine transformation parameters, for example, an affine transformation matrix, can be obtained. Secondly, affine transformation is carried out on all character region mask graphs serving as character region detection results by utilizing an affine transformation matrix, and a character region template mask graph is generated. Here, the size of the entire text region mask image is the same as that of the original input image, and the size of the text region template mask image is the same as that of the face template image. After the character region template mask image is obtained, affine transformation is performed on the character region template mask image (specifically, affine transformation can be performed using an inverse affine transformation matrix obtained from the affine transformation matrix), and the size of the input image is restored. Note that at this time, the content beyond the face template image will be zeroed out. And then, carrying out binarization processing on the mask image restored to the size of the input image to obtain a character area mask image only related to the face area, and taking the character area mask image as a finally generated mask image of the occlusion area.

After the attribute information on the region of interest is acquired, finally, in step S240, second image processing is performed on the region of interest on which the first image processing has been performed, based on the acquired attribute information, resulting in an output image. As described above, the first image processing may be the region-of-interest restoration processing, but it may still be difficult to achieve a high image processing quality because the region-of-interest restoration processing only processes for image distortion, and for this reason, the image processing method of the exemplary embodiment of the present disclosure further performs the second image processing on the region of interest on which the first image processing has been performed, based on the acquired attribute information, after acquiring the attribute information with respect to the region of interest, to obtain a higher-quality output image. Here, the second image processing is for improving the image quality of the region of interest after the first image processing has been performed. Specifically, for example, performing second image processing on the region of interest after the first image processing has been performed based on the acquired attribute information includes performing one of the following operations: performing color correction on the region of interest on which the first image processing has been performed, based on the first attribute information; fusing the region of interest on which the first image processing has been performed with the input image based on the first attribute information and/or the second attribute information; color correction is performed on the region of interest after the first image processing has been performed based on the first attribute information, and the region of interest after the color correction has been performed is fused with the input image based on the first attribute information and/or the second attribute information.

Color shift can be further improved by performing color correction on the region of interest after the first image processing has been performed based on the first attribute information. By fusing the region of interest after the first image processing has been performed with the input image based on the first attribute information, inter-frame flicker or discontinuity occurring between a region of interest (e.g., a face region) and a background region in the vicinity of the region of interest can be improved. By fusing the region of interest after the first image processing is performed with the input image based on the second attribute information, distortion and distortion of the occlusion region can be prevented when the region of interest is occluded. If the region of interest after the first image processing is performed is fused with the input image based on both the first attribute information and the second attribute information, the inter-frame flicker or discontinuity problem can be improved, and the occlusion region can be prevented from being distorted and distorted. Furthermore, the region of interest after the first image processing has been performed may be subjected to color correction based on the first attribute information, and after the color correction is performed, the region of interest after the first image processing has been performed may be further fused with the input image based on the first attribute information and/or the second attribute information, whereby not only color shift may be improved, but also the above-described problem of flicker or discontinuity between frames may be solved, and distortion of the occlusion region may be prevented.

Next, the color correction operation will be described with reference to fig. 5 in conjunction with fig. 1. Specifically, for example, the color correction may be performed on each sub-region of the region of interest after the first image processing has been performed, respectively, based on the first attribute information. For example, the verification correction parameters of the sub-regions of the region of interest after the first image processing has been performed may be first estimated based on the first attribute information, and then the color correction may be performed on the sub-regions using the estimated color correction parameters of the sub-regions, respectively.

As shown in fig. 1, after the attribute information of the face region is acquired through the face region attribute detection, the face region on which the face region restoration processing is performed may be subjected to color correction based on the attribute information (the above-mentioned first attribute information) to help correct color cast generated in the face region restoration processing. For example, the color gamut of facial features and skin is greatly different, the color cast generated by the facial region recovery processing is nonlinear, if the facial features and the skin are regarded as a whole and corrected by adopting uniform parameters, the regions with smaller areas, such as the mouth, eyebrows and the like, of the facial features are easily affected by the color cast of the large-area region, and the obtained result still easily has a certain degree of color cast. Fig. 5 is a schematic diagram illustrating the color correction operation shown in fig. 3. As shown in fig. 5, color correction parameters of each sub-region of the face (skin region, lip region, eyebrow region, hair region, and eye region) can be estimated separately using the face region attribute information, and color correction of each sub-region is performed separately based on the estimated color correction parameters, thereby linearly correcting color shift locally.

Next, the above-described fusion operation will be described with reference to fig. 6 and 7 in conjunction with fig. 1. As mentioned hereinbefore, the second attribute information may be, for example, a second mask map of an occlusion region associated with the region of interest. In this case, the fusing the region of interest on which the first image processing has been performed with the input image based on the first attribute information and/or the second attribute information may include: and fusing the region of interest subjected to the first image processing with the input image based on the second mask map. By fusing the region of interest after the first image processing is performed with the input image based on the second mask map, the occlusion region related to the region of interest can be effectively distinguished under the condition that the region of interest is occluded, so that the occlusion region is prevented from being processed during image fusion, and the occlusion region is prevented from being distorted and distorted.

Optionally, the fusing the region of interest after the first image processing is performed with the input image based on the first attribute information and/or the second attribute information may include: obtaining a third mask map of the region of interest based on the first attribute information; fusing the region of interest after the first image processing has been performed with the input image based on the third mask map. By fusing the region of interest after the first image processing is performed with the input image based on the third mask map, the region of interest can be effectively distinguished from other background regions during fusion, so that the image fusion effect is more natural, and inter-frame flicker or discontinuity occurring between the region of interest (e.g., a face region) and the background region near the region of interest is improved.

Optionally, the fusing the region of interest after the first image processing is performed with the input image based on the first attribute information and/or the second attribute information may include: obtaining a third mask map of the region of interest based on the first attribute information; fusing the region of interest on which the first image processing has been performed with the input image based on the second mask map and the third mask map. By fusing the region of interest after the first image processing has been performed with the input image based on the second mask map and the third mask map, not only is inter-frame flicker or discontinuity occurring between the region of interest (e.g., a face region) and a background region near the region of interest improved, but also distortion, distortion of an occlusion region associated with the region of interest is prevented.

In the above, it is mentioned that the third mask map of the region of interest is obtained based on the first attribute information. For example, obtaining a third mask map of the region of interest based on the first attribute information includes: obtaining a fourth mask map of the region of interest based on the first attribute information; acquiring the position information of a template of the region of interest; and obtaining the third mask image based on the fourth mask image, the position information of the interested area template and the position information of the interested area. Here, the fourth mask map has the same size as the region of interest template. The size of the third mask is the same as the size of the original input image.

For example, as shown in the example of fig. 3, after obtaining attribute information of a face region by performing face region attribute detection, from the attribute information, a mask map (referred to as "face region mask map" in fig. 3) of the entire face region (including hair) can be obtained. For example, as mentioned above, the attribute information of the face region may be a class label information map of the face region, a face region mask map having the same size as the template size may be obtained by binarizing the class label information map, and a face region mask map having the same size as the original input image may be obtained based on the face region mask map.

Fig. 6 is a schematic diagram illustrating an operation of generating the face region mask illustrated in fig. 3. Referring to fig. 6, a face region mask map having the same size as the template size may be transformed into a face region mask map having the same size as the original input image, in a similar operation to the generation of the mask map of the occlusion region. For example, after obtaining the face key point information, face position information may be obtained from the face key point information, and further, based on the obtained face position information and the template face position information, a set of affine transformation parameters, for example, an inverse affine transformation matrix, may be obtained. By using the inverse affine transformation matrix, affine transformation can be performed on the face region mask image with the same size as the template, and the face region mask image is restored to the size of the original input image (i.e., the original image size), so that a final face region mask image is obtained. Note that at this time, the content beyond the template image will be zeroed out.

In addition, it was mentioned above that the region of interest after having performed the first image processing can be fused with the input image based on the second mask map and the third mask map. Specifically, for example, the second mask map and the third mask map may be fused first to obtain a fused mask map, and then the region of interest after the first image processing is performed may be fused with the input image based on the fused mask map.

For example, as shown in fig. 3, after obtaining the occlusion region mask map and the face region mask map, the extracted region of interest on which the face restoration and the color correction have been performed may be fused with the input image based on the occlusion region mask map and the face region mask map (in fig. 3, this fusion operation is referred to as "face region backfill fusion").

Fig. 7 is a schematic diagram illustrating the face region backfill fusion operation shown in fig. 3. The face region backfill fusion can fuse the obtained face region processing result with the original input image. In the fusion operation, the mask image of the shielding region and the mask image of the face region can be fused, the public region of the mask image can be extracted, and the final fused mask image can be obtained. In addition, optionally, the fusion mask image can be feathered, so that the connection between the face region and other surrounding regions is more natural. Subsequently, the region of interest on which the face region restoration process is performed may be fused with the original input image based on the feathered mask map. Since the size of the image subjected to the face region restoration processing in fig. 3 is the same as the size of the template, before the fusion, affine transformation can be performed on the image to obtain an image subjected to the restoration processing having the same size as the input image. Next, the input image and the restored image of the same size as the input image may be fused based on the feathered mask map. The fused image may be used as an output image.

Optionally, as shown in fig. 3, the fused image may be further subjected to global image processing to further enhance and repair the quality of the overall image, so as to obtain an output image. Here, depending on the subsequent specific application of the fused image, the global processing suitable for the subsequent specific application may be performed, and the present disclosure is not limited in any way thereto.

In the above, the image processing method according to the exemplary embodiment of the present disclosure has been described with reference to fig. 2 to 7, and according to the above image processing method, by acquiring the attribute information about the extracted region of interest after performing the region of interest recovery processing on the region of interest, and performing the second image processing on the region of interest after having performed the first image processing based on the acquired attribute information, it is possible to obtain an output image of higher quality.

The image processing method can be applied to video scenes, for example, the problems that when character occlusion exists, a character area is easy to distort, when background noise is heavy, the noise level of an area of interest is easy to be inconsistent with the surrounding background (namely, inter-frame flicker and discontinuity), even color cast occurs with original input and the like are effectively solved.

Fig. 8 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment of the present disclosure.

Referring to fig. 8, the image processing apparatus 800 may include an image acquisition unit 810 and an image processing unit 820. Specifically, the image acquisition unit 810 may be configured to acquire an input image. The image processing unit 820 may be configured to: performing region-of-interest detection in the input image, extracting the region-of-interest from the input image, and performing first image processing on the extracted region-of-interest, wherein the first image processing is region-of-interest restoration processing for improving distortion of the region-of-interest; acquiring attribute information related to the region of interest; and performing second image processing on the region of interest after the first image processing is performed based on the acquired attribute information to obtain an output image, wherein the second image processing is used for improving the image quality of the region of interest after the first image processing is performed.

Since the image processing method shown in fig. 2 can be performed by the image processing apparatus 800 shown in fig. 8, the image acquisition unit 810 performs operations corresponding to step S410 in fig. 4, and the image processing unit 820 performs operations corresponding to steps S220 to S240 in fig. 2, any relevant details related to the operations performed by the units in fig. 8 can be referred to in the corresponding description of fig. 2, and are not repeated here.

Further, it should be noted that although the image processing apparatus 800 is described above as being divided into units for respectively performing the respective processes, it is clear to those skilled in the art that the processes performed by the respective units described above may also be performed without any specific division of the units by the image processing apparatus 800 or without explicit demarcation between the units. Further, the image processing apparatus 800 may also include other units, for example, a storage unit and the like.

Referring to fig. 9, an electronic device 900 may include at least one memory 901 storing computer-executable instructions that, when executed by the at least one processor, cause the at least one processor 902 to perform an image processing method according to an embodiment of the disclosure and at least one processor 902.

By way of example, the electronic device may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. The electronic device need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) either individually or in combination. The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In an electronic device, a processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor may execute instructions or code stored in the memory, which may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory may be integral to the processor, e.g., RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the memory may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the memory.

In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform an image processing method according to an exemplary embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The instructions in the computer-readable storage medium or computer program described above may be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, etc., and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there may also be provided a computer program product including computer instructions which, when executed by a processor, implement an image processing method according to an exemplary embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. An image processing method, comprising:

acquiring an input image;

performing region-of-interest detection in the input image, extracting the region-of-interest from the input image, and performing first image processing on the extracted region-of-interest, wherein the first image processing is region-of-interest restoration processing for improving distortion of the region-of-interest;

acquiring attribute information related to the region of interest, wherein the attribute information comprises at least one of the following attribute information: first attribute information of the region of interest and second attribute information of an occlusion region related to the region of interest;

and performing second image processing on the region of interest after the first image processing is performed based on the acquired attribute information to obtain an output image, wherein the second image processing is used for improving the image quality of the region of interest after the first image processing is performed.

2. The image processing method according to claim 1, wherein the performing second image processing on the region of interest on which the first image processing has been performed based on the acquired attribute information includes performing one of:

performing color correction on the region of interest on which the first image processing has been performed, based on the first attribute information;

fusing the region of interest on which the first image processing has been performed with the input image based on the first attribute information and/or the second attribute information;

color correction is performed on the region of interest after the first image processing has been performed based on the first attribute information, and the region of interest after the color correction has been performed is fused with the input image based on the first attribute information and/or the second attribute information.

3. The image processing method according to claim 2, wherein the acquiring of the attribute information about the region of interest includes:

acquiring first attribute information of the region of interest based on the extracted region of interest; and/or

Performing occlusion detection in the input image and acquiring second attribute information of the occlusion region based on an occlusion detection result and a region-of-interest detection result.

4. The image processing method according to claim 3, wherein the acquiring first attribute information of the region of interest based on the extracted region of interest includes:

acquiring position information of each sub-region of the region of interest based on the extracted region of interest;

and acquiring the first attribute information according to the acquired position information of each sub-region.

5. The image processing method according to claim 3, wherein said performing occlusion detection in the input image and acquiring second attribute information of the occlusion region based on an occlusion detection result and a region-of-interest detection result comprises:

performing occlusion detection in the input image to obtain position information of all occlusion regions in the input image;

acquiring the second attribute information based on the position information of all the occlusion regions and the position information of the region of interest.

6. The image processing method according to claim 5, wherein the acquiring the second attribute information based on the position information of all the occlusion regions and the position information of the region of interest includes:

obtaining first mask graphs of all the occlusion areas based on the position information of all the occlusion areas;

acquiring the position information of a template of the region of interest;

and obtaining a second mask map of an occlusion region related to the region of interest as the second attribute information based on the first mask map, the region of interest template position information and the position information of the region of interest.

7. The image processing method according to claim 4, wherein the color correcting the region of interest on which the first image processing has been performed based on the first attribute information includes:

and respectively performing color correction on each sub-area of the region of interest after the first image processing is performed based on the first attribute information.

8. An image processing apparatus comprising:

an image acquisition unit configured to acquire an input image;

an image processing unit configured to:

9. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the image processing method of any one of claims 1 to 7.

10. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the image processing method of any one of claims 1 to 7.