CN117830077A

CN117830077A - Image processing method and device and electronic equipment

Info

Publication number: CN117830077A
Application number: CN202311611982.3A
Authority: CN
Inventors: 苏婧文; 王凡祎
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-04-05

Abstract

The embodiment of the application discloses an image processing method, an image processing device and electronic equipment. The method comprises the following steps: acquiring a feature map of an image to be processed; carrying out noise adding processing on the feature map for multiple times to obtain a target feature map and a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding processing obtained in the previous time, and the target feature map is the feature map obtained in the last noise adding processing; denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and the first mask map to obtain a second target hidden vector, wherein the first mask image is a mask map of a target foreground object in the image to be processed; and obtaining a corresponding target image based on the second target hidden vector. Therefore, the obtained target image can not only show the target effect which needs to be increased, but also have better fidelity, thereby improving the quality of the special effect for the image.

Description

Image processing method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image processing method, an image processing device, and an electronic device.

Background

Image processing is a technique that involves manipulating or processing an image to improve the visual effect of the image. Image processing techniques may be applied in a variety of fields including computer vision, medical imaging, digital image processing, remote sensing image analysis, machine vision, and the like. As a way of image processing, special effects may be added to the image. For example, a corresponding light effect may be added to the image. However, there is a need for improvement in the manner of adding special effects to images, and for improvement in the quality of images after adding special effects.

Disclosure of Invention

In view of the above, the present application proposes an image processing method, an image processing apparatus, and an electronic device to improve the above.

In a first aspect, the present application provides an image processing method, the method including: acquiring a feature map of an image to be processed; carrying out noise adding processing on the feature map for multiple times to obtain a target feature map, and obtaining a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding processing obtained in the previous time, and the target feature map is the feature map obtained in the last noise adding processing; denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and a first mask map to obtain a second target hidden vector, wherein the first mask image is a mask map of a target foreground object in the image to be processed; and obtaining a corresponding target image based on the second target hidden vector.

In a second aspect, the present application provides an image processing apparatus, the apparatus comprising: the characteristic map acquisition unit is used for acquiring a characteristic map of the image to be processed; the noise adding unit is used for carrying out noise adding processing on the feature map for multiple times to obtain a target feature map and a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding obtained by the noise adding processing at the last time, and the target feature map is the feature map obtained by the noise adding processing at the last time; the denoising unit is used for denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and a first mask map to obtain a second target hidden vector, wherein the first mask image is a mask map of a target foreground object in the image to be processed; and the image generation unit is used for obtaining a corresponding target image based on the second target hidden vector.

In a third aspect, the present application provides an electronic device comprising at least a processor, and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods described above.

In a fourth aspect, the present application provides a computer readable storage medium having program code stored therein, wherein the program code, when executed by a processor, performs the above-described method.

According to the image processing method, the image processing device and the electronic equipment, after the feature image of the image to be processed is obtained, the feature image can be subjected to noise adding processing for multiple times to obtain the target feature image, and the first target hidden vector corresponding to the target feature image is obtained. And then, denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and the first mask map to obtain a second target hidden vector, so as to obtain a corresponding target image based on the second target hidden vector. Therefore, original details of the target foreground object in the image to be processed can be reserved by combining the first mask image, and further, after the first target hidden vector is obtained by carrying out noise adding processing for multiple times based on the feature image and the second target hidden vector is obtained by carrying out noise removing processing for multiple times based on the first target hidden vector, the target image obtained based on the second target hidden vector not only can embody the target effect to be increased, but also has better fidelity, so that the quality of adding special effects to the image is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an application scenario of an image processing method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating another application scenario of the image processing method according to the embodiment of the present application;

FIG. 3 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 4 shows a schematic diagram of a forward diffusion process in an embodiment of the present application;

FIG. 5 shows a schematic diagram of a back diffusion process in an embodiment of the present application;

FIG. 6 shows a schematic diagram of a first mask image in an embodiment of the present application;

fig. 7 shows a flowchart of an image processing method according to another embodiment of the present application;

FIG. 8 is a schematic diagram showing the principle of operation of a target diffusion model of the present application;

FIG. 9 shows a schematic diagram of the incorporation of control net with stablishfusion in the present application;

Fig. 10 shows a flowchart of an image processing method according to another embodiment of the present application;

fig. 11 shows a flowchart of an image processing method according to still another embodiment of the present application;

FIG. 12 shows a schematic representation of an object candidate in an embodiment of the present application;

FIG. 13 shows a schematic diagram of identifying a candidate object in a pair in an embodiment of the present application;

FIG. 14 shows a flow chart of an image processing method proposed by the present application;

fig. 15 shows a block diagram of an image processing apparatus according to an embodiment of the present application;

fig. 16 shows a block diagram of another electronic device of the present application for performing an image processing method according to an embodiment of the present application;

fig. 17 is a storage unit for storing or carrying program code for implementing the image processing method according to the embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Image processing is a technique that involves performing various operations and analysis on an image. The method is widely applied to various fields including computer vision, medical image analysis, digital image processing, remote sensing image processing and the like. By processing the image, we can perform various tasks such as image restoration, image enhancement, feature extraction, object detection and recognition, etc.

As a way of image processing, special effects may be added to the image. For example, a corresponding light effect may be added to the image. However, the inventors found in the research that the related ways of adding special effects to images are still to be improved, and the quality of the images with added special effects is still to be improved. Accordingly, the inventors have found the above-described problems in the study and have proposed an image processing method, apparatus, and electronic device in the present application that can improve the above-described problems. In the method, after the feature map of the image to be processed is acquired, the feature map can be subjected to noise adding processing for multiple times to obtain a target feature map, and a first target hidden vector corresponding to the target feature map is obtained. And then, denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and the first mask map to obtain a second target hidden vector, so as to obtain a corresponding target image based on the second target hidden vector.

Therefore, original details of the target foreground object in the image to be processed can be reserved by combining the first mask image, and further, after the first target hidden vector is obtained by carrying out noise adding processing for multiple times based on the feature image and the second target hidden vector is obtained by carrying out noise removing processing for multiple times based on the first target hidden vector, the target image obtained based on the second target hidden vector not only can embody the target effect to be increased, but also has better fidelity, so that the quality of adding special effects to the image is improved.

Before describing embodiments of the present application in further detail, embodiments of the present application are described with reference to an application environment.

The application scenario according to the embodiment of the present application will be described first.

In the embodiment of the application, the provided image processing method can be executed by the electronic device. In this manner, which is performed by the electronic device, all steps in the image processing method provided in the embodiment of the present application may be performed by the electronic device. For example, as shown in fig. 1, all steps in the image processing method provided in the embodiment of the present application may be executed by a processor of the electronic device 100.

Alternatively, the image processing method provided in the embodiment of the present application may be executed by a server. Correspondingly, in this manner executed by the server, the server may start executing steps in the image processing method provided in the embodiment of the present application in response to the trigger instruction. The triggering instruction may be sent by an electronic device used by a user, or may be triggered locally by a server in response to some automation event.

In addition, the image processing method provided by the embodiment of the application can be cooperatively executed by the electronic device and the server. In such a manner that the electronic device and the server cooperatively execute, part of the steps in the image processing method provided by the embodiment of the present application are executed by the electronic device, and the other part of the steps are executed by the server. For example, as shown in fig. 2, the electronic device 100 may perform an image processing method including: after the image to be processed is acquired, the electronic device 100 may transmit the image to be processed to the server 200, after the server 200 receives the image to be processed, a subsequent step may be performed to obtain a target image, and then the target image is transmitted to the electronic device 100, where the electronic device 100 may store, display or share the target image after receiving the target image.

In this way, the steps performed by the electronic device and the server are not limited to those described in the above examples, and in practical applications, the steps performed by the electronic device and the server may be dynamically adjusted according to practical situations.

It should be noted that, the electronic device 100 may be a tablet computer, a smart watch, a smart voice assistant, or other devices besides the smart phone shown in fig. 1 and 2. The server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud computing, cloud storage, network services, cloud communication, middleware services, CDNs (Content Delivery Network, content delivery networks), and artificial intelligence platforms. In the case where the image processing method provided in the embodiment of the present application is executed by a server cluster or a distributed system formed by a plurality of physical servers, different steps in the image processing method may be executed by different physical servers, or may be executed by servers built based on the distributed system in a distributed manner.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 3, an image processing method provided in an embodiment of the present application includes:

s110: and acquiring a feature map of the image to be processed.

In the embodiment of the present application, the image to be processed may be understood as an image to be added with the target effect. The target effect may be various, for example, a light effect, a season effect, or the like. The light effect may specifically include a top light, a profile light, a tyndall effect, or the like. In embodiments of the present application, the target effect may be manually entered by a user. For example, an input interface may be provided in the electronic device, in which case a user may input a desired target effect through the input interface. Furthermore, the target effect may be an effect determined by the user from among the effects recommended by the electronic device. The recommended effect of the electronic device may be a preconfigured effect or an effect determined by the electronic device according to the content of the image to be processed.

In the embodiment of the application, the image to be processed can be determined in various manners.

As one way, a preview image currently displayed by the electronic device may be taken as the image to be processed. In this way, the electronic device may collect an image in real time through the camera and display the collected image in the preview area, and further, the user may operate the electronic device to use the image (a frame of picture) displayed in the preview area as the image to be processed. Alternatively, the electronic device may obtain the image to be processed from the album. In this way, the electronic device may display pictures in the album in the case of starting the album, and select a picture from the album as the image to be processed in response to a user's selection. As yet another way, the electronic device may take a picture transmitted from another device as the image to be processed. For example, in the case where the user of the electronic device chat with other friends through the instant messaging program, the other friends may send a picture through the instant messaging program, and in this case, the electronic device may use the sent picture as the image to be processed.

After the image to be processed is determined, a feature map corresponding to the image to be processed can be obtained. In this feature map, each element represents color and texture information of a corresponding position in the image to be processed. As one approach, the processed image may be processed by an image encoder to obtain a corresponding feature map. Alternatively, the image encoder may be a VAE endoder in a target diffusion model (stablisafusion). For example, the image to be processed may be encoded by VAE endoder in the target diffusion model to obtain a feature map of a specific size in the hidden space. The specific size may be 128 x 4.

S120: and carrying out noise adding processing on the feature map for multiple times to obtain a target feature map and obtain a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding processing obtained at the previous time, and the target feature map is the feature map obtained through the noise adding processing at the last time.

After the feature map corresponding to the image to be processed is obtained, a forward diffusion process can be performed based on the feature map. In the forward diffusion process, the feature map can be subjected to noise adding processing for multiple times to obtain a target feature map. Wherein the added noise is random noise during each noise adding process. The random noise may be gaussian noise.

Exemplary, as shown in FIG. 4, where x ₀ Feature map, x, characterizing an image to be processed ₁ Characterizing a feature map that has undergone a first noise addition, wherein x is the sum of the first noise addition ₀ The characteristic diagram is subjected to noise adding treatment, and x ₂ Characterizing the feature map after the second noise adding process, and comparing x in the second noise adding process ₁ The characteristic diagram is subjected to noise adding treatment, and the like, and x is obtained through multiple noise adding treatments _n ，x _n The feature map obtained by the last noise adding process is characterized. After the target feature map is obtained, a first target hidden vector corresponding to the target feature map can be obtained based on the target feature map.

S130: and denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and the first mask map to obtain a second target hidden vector, wherein the first mask image is a mask map of a target foreground object in the image to be processed.

After the first target hidden vector is obtained through the above-mentioned multiple noise adding processes, a back diffusion process can be performed based on the first target hidden vector. In the back diffusion process, denoising processing is performed for a plurality of times based on the first target hidden vector. By way of example, as shown in fig. 5, the back diffusion process may be regarded as a process opposite to the forward diffusion process.

In the process of denoising each time, the hidden vector obtained by the previous denoising process can be obtained, then the hidden vector obtained by the previous denoising process is updated based on the feature map, the first mask map and the second mask map, and then the updated hidden vector is denoised, so that the hidden vector obtained by the present denoising process is obtained. Wherein, the hidden vector obtained by the last denoising process can be used as a second target hidden vector. The second mask map is a mask map of contents except the target foreground object in the image to be processed, and the second mask map can be obtained through the first mask map. It should be noted that, in the embodiment of the present application, the mask image may be understood as a binary image having the same size as the original image (for example, the image to be processed). Only pixels of two colors, black and white, may be included in the binary image.

It should be noted that the mask map may include a mask area, where the mask area may be used to identify an area where a corresponding object is located in the original image. For example, in the embodiment of the present application, the first mask map may be a mask map of a target foreground object in the image to be processed, and in this case, a mask region corresponding to the target foreground object may be corresponding to the first mask map. Illustratively, as shown in fig. 6, the target foreground object 10 is included in the image to be processed shown in fig. 6, in which case the first mask map may be as shown in the right image of fig. 6. In the first mask diagram shown in fig. 6, the mask areas corresponding to the target foreground object are each black in color, and the other areas are each white in color. Alternatively, in the embodiment of the present application, the colors of the mask areas corresponding to the target foreground object in the first mask map may be white, and the colors of the other areas may be black.

Correspondingly, the second mask map may be a mask map of an area other than the target foreground object in the image to be processed. The first mask map and the second mask map are combined in the process of denoising each time, so that the hidden vector obtained in denoising each time can be better fused with the foreground object and the background content of the target.

Also, in the embodiment of the present application, one purpose of processing an image to be processed is to add a target effect to the image to be processed. The target effect in the embodiment of the application characterizes the special effect which needs to be added. In the process of denoising each time, the hidden vector obtained by denoising each time can be processed based on the target effect, so that the hidden vector obtained by denoising each time can carry information corresponding to the target effect, and further, after the finally obtained second target hidden vector is converted into an image, the converted image can show the target effect. Alternatively, the denoising process may be performed on the first target hidden vector multiple times based on the img2img inpainting algorithm in the target diffusion model.

S140: and obtaining a corresponding target image based on the second target hidden vector.

After the second target hidden vector is obtained, the second target hidden vector may be converted to obtain a target image.

For example, in the case where the feature map is obtained by encoding the image to be processed through the VAE encoder in the target diffusion model, the second target hidden vector may be decoded through the VAE decoder in the target diffusion model to obtain the target image.

According to the image processing method, original details of the target foreground object in the image to be processed can be reserved by combining the first mask image, and further after the first target hidden vector is obtained by carrying out noise adding processing for multiple times based on the feature image and the second target hidden vector is obtained by carrying out noise removing processing for multiple times based on the first target hidden vector, the target image obtained based on the second target hidden vector can show the target effect to be increased, and has good fidelity, so that the quality of adding special effects to the image is improved.

Referring to fig. 7, an image processing method provided in an embodiment of the present application includes:

s210: and acquiring a feature map of the image to be processed.

S220: and carrying out noise adding processing on the feature map for multiple times to obtain a target feature map and obtain a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding processing obtained at the previous time, and the target feature map is the feature map obtained through the noise adding processing at the last time.

S230: and denoising the first target hidden vector for a plurality of times based on the target effect, the image constraint information, the target diffusion model, the feature map and the first mask map to obtain a second target hidden vector.

As one way, the method provided by the embodiment of the present application may be performed by the target extension model. In this case, the image to be processed may be input into the target extension model, and then encoded by an image encoder of the target extension model to obtain the corresponding feature map. Correspondingly, the feature map may be subjected to multiple times of noise adding processing by the target expansion model to obtain a first target hidden vector, and then the target expansion model is subjected to multiple times of noise removing processing based on the first target hidden vector to obtain a second target hidden vector.

In this embodiment, in the process of denoising the target extension model, the hidden vector obtained in each denoising process is further processed by combining the image constraint information. In an embodiment of the present application, the image constraint information may characterize the content in the image to be processed. The features may include depth features or contour features, etc.

As one way, at least one of depth constraint information and contour constraint information of an image to be processed may be acquired as the image constraint information. Optionally, the depth constraint information is obtained based on a depth map of the image to be processed and the target neural network, and the contour constraint information is obtained based on a line manuscript map of the image to be processed and the target neural network. The Depth Map (Depth Map) and the line Drawing (line Drawing) are two image representation methods. Where each pixel in the depth map may represent the distance of the observer to a point in the scene. The depth map may be used to understand and interpret shapes and structures in the image. A line graph is a simplified representation of an image, which includes contour or line information of the image.

Under the condition that the target neural network is combined with the target expansion model, the target neural network can encode the image constraint information and then add the encoded image constraint information into the Unet network of the target expansion model, so that the generation result of the target expansion model is more controllable. Alternatively, the target neural network may be a control net model. For example, in the embodiment of the present application, depth constraint information of an image to be processed may be obtained through a control net-depth model, and contour constraint information of the image to be processed may be obtained through a control net-linear model.

The principle of operation of the target diffusion model in embodiments of the present application may be illustrated in fig. 8, for example. In fig. 8, the role of each node in fig. 8 is illustrated by the lower diagram of fig. 8. In the case that the image constraint information may include various kinds of information, the image constraint information may be encoded and then input into the Unet network. As shown in fig. 8, the condition constraint information shown in fig. 8 may include information such as a first mask map in addition to image constraint information.

In the case where the target diffusion model is stablishffusion, the manner of combining the control net model and the stable diffusion may be as shown in fig. 9. The SD Encoder module 1 is SD Encoder Block_1, the SD Encoder module 2 is SD Encoder Block_2, the SD Encoder module 3 is SD Encoder Block_3, and the SD Encoder module 4 is SD Encoder Block_4. The SD Middle module is SD Middle Block. The SD Decoder Block 1 is SD Decoder Block_1, the SD Decoder Block 2 is SD Decoder Block_2, the SD Decoder Block 3 is SD Decoder Block_3, and the SD Decoder Block 4 is SD Decoder Block_4. Wherein the convolution module is zero content. The Text Encoder is the Text Encoder and the Time Encoder is the Time Encoder.

S240: and obtaining a corresponding target image based on the second target hidden vector.

According to the image processing method, the target image obtained based on the second target hidden vector can show the target effect which needs to be increased, and has good fidelity, so that the quality of adding special effects to the image is improved. In addition, in this embodiment, the image constraint information may be further combined to perform multiple denoising processing on the first target hidden vector, so that the finally obtained target image may be more natural and coordinated under the condition that the image constraint information may include at least one of depth constraint information and contour constraint information of the image to be processed.

Referring to fig. 10, an image processing method provided in an embodiment of the present application includes:

s310: and acquiring a feature map of the image to be processed.

S320: and carrying out noise adding processing on the feature map for multiple times to obtain a target feature map and obtain a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding processing obtained at the previous time, and the target feature map is the feature map obtained through the noise adding processing at the last time.

S330: and adding noise corresponding to the current denoising process to the feature map in the current denoising process to obtain a first feature map.

In the denoising process, the noise corresponding to the current denoising process is added based on the feature map obtained by directly encoding the image to be processed, so as to obtain a first feature map. That is, in each denoising process, the noise added to the object corresponding to the next denoising process is the same.

S340: and obtaining the updated hidden vector in the current denoising process based on the target effect, the first feature map, the first mask map and the input hidden vector in the current denoising process.

As one way, based on the target effect, the first feature map, the first mask map, and the input hidden vector in the current denoising process, obtaining the updated hidden vector in the current denoising process includes: multiplying the first feature map with the first mask map to obtain a first vector; multiplying the input hidden vector in the current denoising process with a second mask map to obtain a second vector, wherein the second mask map is a mask map of contents except for a target foreground object in the image to be processed; based on the first vector, the second vector and the target effect, the updated hidden vector in the current denoising process is obtained.

As one way, the updated hidden vector in the current denoising process may be obtained based on the following formula:

E＝Di*C+E*(1-C)

the E on the left side of the equal sign represents the updated hidden vector, and the E on the right side of the equal sign represents the input hidden vector in the process of denoising. The Di represents a first feature map obtained by adding noise corresponding to the current denoising process to the feature map, and the Di x C corresponds to the first vector, where the Di x C represents noise corresponding to the current denoising process to the foreground region in the image to be processed. The foreground region in the image to be processed may be understood as the region where the target foreground object is located. Wherein 1-C represents a second mask map, i.e. represents a mask map of a background (content other than the target foreground object) in the image to be processed, and E (1-C) represents the aforementioned second vector, wherein E (1-C) represents a hidden vector obtained by directly multiplexing the background portion with the previous denoising process.

Optionally, based on the first vector, the second vector, and the target effect, obtaining the updated hidden vector in the current denoising process may include: based on the first vector and the second vector, the hidden vector to be processed is obtained, and based on the target effect and the target diffusion model, the hidden vector to be processed is subjected to effect processing, so that the hidden vector updated in the current denoising process is obtained.

S350: and removing noise corresponding to the current denoising process from the updated hidden vector to obtain an output hidden vector.

It should be noted that, in the process of obtaining the first feature map, the noise corresponding to the current denoising process is added to the feature map, and when the denoising process is currently performed, the noise corresponding to the current denoising process is also removed from the updated hidden vector, so that the target foreground object can be restored, that is, the noise is not added to the foreground target foreground object in the output hidden vector.

S360: if the denoising times meet the target conditions, the hidden vector output in the current denoising process is used as a second target hidden vector, if the denoising times do not meet the target conditions, the next denoising process is started, the hidden vector output in the current denoising process is used as an input hidden vector in the next denoising process, and the input hidden vector in the first denoising process is the first target hidden vector.

It should be noted that, in the multiple denoising process, the noise removed each time corresponds to the noise added in the multiple denoising process. Alternatively, the noise removed by the noise removal process is the same as the noise added by the noise removal process. Furthermore, the second target hidden vector obtained through multiple denoising processes can not only contain originally increased noise, but also complete the increase of the target effect.

If the number of times of the current noise removal processing is the same as the number of times of the noise addition processing, the current noise removal processing corresponds to the noise addition processing.

Illustratively, in the foregoing first denoising process of the multiple denoising processes, the added noise is noise S1, and correspondingly, in the first denoising process of the multiple denoising processes, the added noise for the feature map is noise S1, and correspondingly, after obtaining the updated hidden vector in the first denoising process, the noise S1 is also removed from the updated hidden vector. Similarly, in the second denoising process, the added noise is noise S2, in the corresponding second denoising process of the multiple denoising processes, the added noise for the feature map is noise S2, in the corresponding second denoising process, after the updated hidden vector in the second denoising process is obtained, the noise S2 is removed from the updated hidden vector, so that the second target hidden vector is obtained through multiple denoising processes.

The target condition may be that the number of times of denoising processing reaches a target number of times. Wherein the target number of times is the same as the number of times of the aforementioned multiple noise addition processing. For example, if the number of times of the plurality of noise adding processes is N, the target condition may be that the number of times of the plurality of noise removing processes reaches N.

S370: and obtaining a corresponding target image based on the second target hidden vector.

According to the image processing method provided by the embodiment, original details of the target foreground object in the image to be processed can be reserved by combining the first mask image, in the process of denoising, in the embodiment, the updated hidden vector in the process of denoising at the present time is obtained according to the input hidden vector in the process of denoising at the present time, then denoising is carried out on the updated hidden vector and information related to the target effect is added, further, after the second target hidden vector is obtained through denoising processing for a plurality of times, the target image obtained based on the second target hidden vector not only can embody the target effect to be added, but also has good fidelity, and therefore the quality of adding the special effect to the image is improved. In addition, in this embodiment, in the process of denoising each time, the hidden vectors are fused by combining the first mask image and the second mask image, so as to obtain updated hidden vectors in the process of denoising each time, and further, the obtained updated hidden vectors can be better fused with foreground objects and backgrounds in the image to be processed, so as to alleviate artifacts at the junction of the foreground objects and the backgrounds.

Referring to fig. 11, an image processing method provided in an embodiment of the present application includes:

s410: and acquiring a feature map of the image to be processed.

S420: and carrying out noise adding processing on the feature map for multiple times to obtain a target feature map and obtain a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding processing obtained at the previous time, and the target feature map is the feature map obtained through the noise adding processing at the last time.

S430: and denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and the first mask map to obtain a second target hidden vector, wherein the first mask image is a mask map of a target foreground object in the image to be processed.

S440: obtaining a corresponding initial output image based on the second target hidden vector;

s450: and replacing the target foreground object in the initial output image with the target foreground object in the image to be processed based on the first mask map so as to obtain a target image.

In the embodiment of the present application, the target foreground object may be understood as all foreground objects in the image to be processed, and may also be understood as part of the foreground objects in the image to be processed. There are a number of ways in which the target foreground object may be determined.

As one way, the target foreground object in the image to be processed may be determined by the user. For example, in the case where a plurality of foreground objects are included in the image to be processed, the plurality of objects may each be taken as an object candidate. By way of example, as shown in fig. 12, by recognizing fig. 12, the person 10 and the person 20 in fig. 12 can be taken as candidates, and further the user can select one person from the person 10 and the person 20 as a target foreground object. In the embodiment of the present application, the manner in which the user selects the target foreground object from the candidate objects is not specifically limited. For example, as shown in fig. 13, the candidate object may be identified in the image to be processed by a dotted frame, and if it is detected that the dotted frame is clicked, the candidate object identified by the clicked dotted frame may be taken as the target foreground object.

As one way, the target foreground object in the image to be processed may be determined by the current processing scene. Alternatively, the electronic device may enter the specified processing scene in response to the operation of the user, and then, in the specified processing scene, an object corresponding to the image to be processed may be taken as the target foreground object. For example, the specified processing scene may be a person fidelity scene in which the electronic device may treat a person in the foreground objects of the image to be processed as the target foreground object.

It should be noted that, in the target image obtained by converting the second target hidden vector, color style information (e.g., hue) of the target foreground object may not coincide with the background in the target image. To improve this problem, as one way, in obtaining the target image, the target foreground object may be acquired from the image to be processed based on the first mask map, color style information of the target foreground object is adjusted to obtain an adjusted target foreground object, and the target foreground object in the initial output image is replaced with the adjusted target foreground object to obtain the target image. The initial output image is an image obtained by directly decoding the second target hidden vector. By the method, not only can the foreground target object be subjected to fidelity, but also the color style information of the current foreground object in the target image can be kept consistent with the color style information of the background.

Alternatively, the color style information may include hue. In this case, adjusting the color style information of the target foreground object may include: obtaining first color information based on color values of a plurality of color channels of an image to be processed, obtaining second color information based on color values of a plurality of color channels of an area outside a target foreground object in a target image, and adjusting the tone of the target foreground object based on the first color information and the second color information. Optionally, in the case that the color mode of the image to be processed is an RGB mode, the plurality of color channels are an R channel, a G channel, and a B channel. Wherein the RGB mode is only exemplary and the application does not limit the color model of the image to be processed. For example, in addition to the RGB mode, the CMYK mode is also possible.

Optionally, the first color information includes a first color average value, and the second color information includes a second color average value, in which case the hue of the target foreground object may be adjusted based on a ratio of the second color average value to the first color average value. As a mean value calculation mode, the respective color mean value of a plurality of color channels in the image to be processed can be obtained first, and then the respective color mean value of the plurality of color channels is averaged to obtain a first color mean value. Correspondingly, the color average value of each of the plurality of color channels in the region outside the target foreground object in the target image can be obtained first, and then the color average value of each of the plurality of color channels is averaged to obtain a second color average value.

As one way, the target image may be acquired based on the following formula:

G＝A*C*F_mean/A_mean+F*(1-C)

wherein, A represents the image to be processed, F represents the initial output image, G represents the target image, C represents the first mask image, A_mean represents the first color mean, and F_mean represents the second color mean.

It should be noted that the foregoing color tone is only one of the color style information, and the electronic device may support adjustment of other types of color style information.

According to the image processing method, the target image obtained based on the second target hidden vector can show the target effect which needs to be increased, and has good fidelity, so that the quality of adding special effects to the image is improved. In this embodiment, after the initial output image is obtained, the target foreground object in the initial output image is replaced by the target foreground object in the image to be processed to obtain the target image, or the target foreground object in the initial output image is replaced by the target foreground object subjected to color adjustment in the image to be processed to obtain the target image, so that the obtained target image has a better fidelity effect. In addition, in the process of adjusting the color of the target foreground object, the first color information and the second color information can be combined, so that the generated target image can be enabled to have more realism and stereoscopic impression through the self-adaptive light and shadow adjustment of the foreground and the background, and more realistic visual experience is provided for a user.

An image processing flow according to an embodiment of the present application will be described below with reference to fig. 14.

As shown in fig. 14, after the input original a is obtained, the original a may be taken as an image to be processed. Then, corresponding first mask map, feature map, and image constraint information may be acquired based on the original map a, respectively.

In the process of acquiring the first mask image, the main body matting model may be used to process the image to be processed to obtain the mask image C. Wherein the mask pattern C can be understood as a mask pattern directly output by the subject matting model. Mask map C may then be downsampled (Resize) to 128 x 128 to adapt the target expansion model. Where the mask pattern is downsampled to 128×128, it can be understood as a first mask pattern.

In the process of obtaining the image constraint information, a corresponding depth map and a corresponding line manuscript map (as shown in fig. 14) can be obtained based on the original image a. And then, combining the depth map and the control net-depth model to obtain depth constraint information, obtaining contour constraint information through the line manuscript map and the control net-linear model, and further using the depth constraint information and the contour constraint information as image constraint information.

In the process of obtaining the feature map, the image to be processed can be processed through the VAE encoder so as to obtain a corresponding feature map D. The size of the obtained feature map D may be 128×128×4.

Then, noise adding processing can be performed for multiple times based on the feature map D to obtain the hidden vector E. The hidden vector E may be understood as a first target hidden vector in the embodiment of the present application. Wherein, gaussian noise is added in each noise adding process. After the first target hidden vector, the image constraint information and the first mask map are obtained, denoising processing can be performed for multiple times through the Unet of the target expansion model. Wherein, in each denoising process, the formula can be based on: e=di×c+e (1-C) to update the hidden vector, so as to obtain an updated hidden vector in each denoising process, and then denoising the updated hidden vector.

After multiple denoising processes, the second target hidden vector shown in the foregoing embodiment may be obtained, and then the second target hidden vector may be decoded by the VAE decoder, so as to obtain the image F. Wherein the image F can be understood as the initial output image described previously. After obtaining image F, the average a_mean of the plurality of color channels in the background areas in image a and image F may be calculated, where a_mean may be understood as the aforementioned first color average, and f_mean may be understood as the aforementioned second color average. Then, according to the formula: g=a×c×f_mean/a_mean+f (1-C) to calculate an image G, and the image G is taken as a target image.

Referring to fig. 15, an image processing apparatus 500 according to an embodiment of the present application, the apparatus 500 includes:

a feature map acquiring unit 510, configured to acquire a feature map of an image to be processed.

The denoising unit 520 is configured to perform denoising processing on the feature map multiple times to obtain a target feature map, and obtain a first target hidden vector corresponding to the target feature map, where the object subjected to the last denoising processing is a denoised feature map obtained in the previous denoising processing, and the target feature map is a feature map obtained in the last denoising processing.

The denoising unit 530 is configured to denoise the first target hidden vector multiple times based on the target effect, the feature map and the first mask map to obtain a second target hidden vector, where the first mask image is a mask map of a target foreground object in the image to be processed.

The image generating unit 540 is configured to obtain a corresponding target image based on the second target hidden vector.

As a way, the denoising unit 530 is specifically configured to perform denoising processing on the first target hidden vector multiple times based on the target effect, the image constraint information, the target diffusion model, the feature map, and the first mask map, so as to obtain a second target hidden vector, where the image constraint information characterizes the feature of the content in the image to be processed. Optionally, the denoising unit 530 is further specifically configured to obtain at least one of depth constraint information and contour constraint information of the image to be processed as image constraint information; the depth constraint information is obtained based on a depth map of the image to be processed and the target neural network, and the contour constraint information is obtained based on a line manuscript map of the image to be processed and the target neural network.

As one way, the denoising unit 530 is specifically configured to add noise corresponding to the current denoising process to the feature map during the current denoising process, so as to obtain a first feature map; based on the target effect, the first feature map, the first mask map and the input hidden vector in the current denoising process, obtaining an updated hidden vector in the current denoising process; removing noise corresponding to the current denoising process from the updated hidden vector to obtain an output hidden vector; if the denoising times meet the target conditions, the hidden vector output in the current denoising process is used as a second target hidden vector, if the denoising times do not meet the target conditions, the next denoising process is started, the hidden vector output in the current denoising process is used as an input hidden vector in the next denoising process, and the input hidden vector in the first denoising process is the first target hidden vector.

Optionally, the denoising unit 530 is specifically configured to multiply the first feature map with the first mask map to obtain a first vector; multiplying the input hidden vector in the current denoising process with a second mask map to obtain a second vector, wherein the second mask map is a mask map of contents except for a target foreground object in the image to be processed; based on the first vector, the second vector and the target effect, the updated hidden vector in the current denoising process is obtained.

Optionally, the denoising unit 530 is specifically configured to obtain a hidden vector to be processed based on the first vector and the second vector; and performing effect processing on the hidden vector to be processed based on the target effect and the target diffusion model to obtain the updated hidden vector in the current denoising process.

As one way, the image generating unit 540 is specifically configured to obtain a corresponding initial output image based on the second target hidden vector; and replacing the target foreground object in the initial output image with the target foreground object in the image to be processed based on the first mask map so as to obtain a target image.

Optionally, the image generating unit 540 is specifically configured to obtain, based on the first mask map, the target foreground object from the image to be processed; adjusting color style information of the target foreground object to obtain an adjusted target foreground object; and replacing the target foreground object in the initial output image with the adjusted target foreground object to obtain a target image.

Optionally, the color style information includes a hue, in which case the image generating unit 540 is specifically configured to obtain the first color information based on color values of a plurality of color channels of the image to be processed; obtaining second color information based on color values of a plurality of color channels of a region outside the target foreground object in the target image; the hue of the target foreground object is adjusted based on the first color information and the second color information.

Optionally, the first color information includes a first color average value, and the second color information includes a second color average value, in which case the image generating unit 540 is specifically configured to adjust the hue of the target foreground object based on a ratio of the second color average value to the first color average value.

According to the image processing device provided by the embodiment, original details of the target foreground object in the image to be processed can be reserved by combining the first mask image, so that after the first target hidden vector is obtained by carrying out noise adding processing for multiple times based on the feature image and the second target hidden vector is obtained by carrying out noise removing processing for multiple times based on the first target hidden vector, the target image obtained based on the second target hidden vector can show the target effect which needs to be increased, and has better fidelity, and therefore the quality of adding special effects to the image is improved.

It should be noted that, in the present application, the device embodiment and the foregoing method embodiment correspond to each other, and specific principles in the device embodiment may refer to the content in the foregoing method embodiment, which is not described herein again.

An electronic device provided in the present application will be described with reference to fig. 16.

Referring to fig. 16, based on the above-mentioned image processing method and apparatus, an electronic device 1000 capable of executing the above-mentioned image processing method is further provided in the embodiments of the present application. The electronic device 1000 comprises one or more (only one is shown in the figure) processors 105, a memory 104, an audio playback module 106 and an audio acquisition means 108 coupled to each other. The memory 104 stores therein a program capable of executing the contents of the foregoing embodiments, and the processor 105 can execute the program stored in the memory 104.

Wherein the processor 105 may include one or more processing cores. The processor 105 utilizes various interfaces and lines to connect various portions of the overall electronic device 1000, perform various functions of the electronic device 1000, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 104, and invoking data stored in the memory 104. Alternatively, the processor 105 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 105 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 105 and may be implemented solely by a single communication chip.

The Memory 104 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (RAM). Memory 104 may be used to store instructions, programs, code sets, or instruction sets. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc.

Further, the electronic device 1000 may include a network module 110 and a sensor module 112 in addition to the devices shown above.

The network module 110 is configured to implement information interaction between the electronic device 1000 and other devices, for example, may establish a connection with other audio playing devices or other electronic devices, and perform information interaction based on the established connection. As one way, the network module 110 of the electronic device 1000 is a radio frequency module, which is configured to receive and transmit electromagnetic waves, and implement mutual conversion between the electromagnetic waves and the electrical signals, so as to communicate with a communication network or other devices. The radio frequency module may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and the like. For example, the radio frequency module may interact with external devices through transmitted or received electromagnetic waves.

The sensor module 112 may include at least one sensor. Specifically, the sensor module 112 may include, but is not limited to: pressure sensors, motion sensors, acceleration sensors, and other sensors.

Wherein the pressure sensor may detect a pressure generated by pressing against the electronic device 1000. That is, the pressure sensor detects a pressure generated by contact or pressing between the user and the electronic device 1000, for example, a pressure generated by contact or pressing between the user's ear and the electronic device 1000. Thus, the pressure sensor may be used to determine whether contact or pressure has occurred between the user and the electronic device 1000, as well as the magnitude of the pressure.

The acceleration sensor may detect the acceleration in each direction (typically, three axes), and may detect the gravity and direction when stationary, and may be used for applications for recognizing the gesture of the electronic device 1000 (such as landscape/portrait screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer, and knocking), and so on. In addition, the electronic device 1000 may further be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, etc., which will not be described herein.

The audio acquisition device 108 is used for acquiring audio signals. Optionally, the audio capturing device 108 includes a plurality of audio capturing devices, which may be microphones.

Referring to fig. 17, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 800 has stored therein program code which can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 800 comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 800 has storage space for program code 810 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 810 may be compressed, for example, in a suitable form.

In summary, after the feature map of the image to be processed is obtained, the feature map may be subjected to noise adding processing multiple times to obtain a target feature map, and a first target hidden vector corresponding to the target feature map is obtained. And then, denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and the first mask map to obtain a second target hidden vector, so as to obtain a corresponding target image based on the second target hidden vector. Therefore, original details of the target foreground object in the image to be processed can be reserved by combining the first mask image, and further, after the first target hidden vector is obtained by carrying out noise adding processing for multiple times based on the feature image and the second target hidden vector is obtained by carrying out noise removing processing for multiple times based on the first target hidden vector, the target image obtained based on the second target hidden vector not only can embody the target effect to be increased, but also has better fidelity, so that the quality of adding special effects to the image is improved.

Among other things, the embodiments of the present application successfully achieve the effect of increasing and changing the light and shadow of the image level in the real image (image to be processed) by means of advanced AIGC (Artificial Intelligence Generated Content) technology. By adaptive light and shadow adjustment of the foreground (e.g., target foreground object) and the background, the generated image is more realistic and stereoscopic, and a more realistic visual experience is provided for the user.

In the embodiment of the application, the background is changed according to the effect inscription (target effect) to obtain the target image (ensuring smooth edge transition between the foreground and the background in the target image) on the premise that the foreground is kept unchanged as much as possible based on the inpainting algorithm. And then, based on the first mask diagram, replacing the target foreground object in the target image with the target foreground object in the original image (image to be processed), and performing redrawing on the target foreground object to enable the target foreground object to approach to the background color tone, so as to obtain the target image with fusion edges without artifacts, consistent overall color tone and completely unchanged target foreground object.

The method provided by the embodiment of the application not only can be used for professional photography and improving the aesthetic degree and artistic effect of the photo, but also can be applied to the manufacture of a family photo album, so that each photo can be perfectly presented. In addition, the method provided by the embodiment of the application can be applied to various fields such as movies, televisions, advertisements and the like, and the quality and visual effect of pictures are improved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An image processing method, the method comprising:

acquiring a feature map of an image to be processed;

carrying out noise adding processing on the feature map for multiple times to obtain a target feature map, and obtaining a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding processing obtained in the previous time, and the target feature map is the feature map obtained in the last noise adding processing;

denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and a first mask map to obtain a second target hidden vector, wherein the first mask image is a mask map of a target foreground object in the image to be processed;

And obtaining a corresponding target image based on the second target hidden vector.

2. The method of claim 1, wherein denoising the first target hidden vector a plurality of times based on the target effect, the feature map, and the first mask map to obtain a second target hidden vector, comprises:

and denoising the first target hidden vector for a plurality of times based on the target effect, the image constraint information, the target diffusion model, the feature map and the first mask map to obtain a second target hidden vector, wherein the image constraint information characterizes the features of the content in the image to be processed.

3. The method according to claim 2, wherein the method further comprises:

acquiring at least one of depth constraint information and contour constraint information of the image to be processed as the image constraint information;

the depth constraint information is obtained based on a depth map of the image to be processed and a target neural network, and the contour constraint information is obtained based on a line manuscript map of the image to be processed and the target neural network.

4. The method of claim 1, wherein denoising the first target hidden vector a plurality of times based on the target effect, the feature map, and the first mask map to obtain a second target hidden vector, comprises:

In the current denoising process, adding noise corresponding to the current denoising process to the feature map to obtain a first feature map;

based on the target effect, the first feature map, the first mask map and the input hidden vector in the current denoising process, obtaining an updated hidden vector in the current denoising process;

removing noise corresponding to the current denoising process from the updated hidden vector to obtain an output hidden vector;

if the denoising times meet the target conditions, the hidden vector output in the current denoising process is used as a second target hidden vector, if the denoising times do not meet the target conditions, the next denoising process is started, the hidden vector output in the current denoising process is used as an input hidden vector in the next denoising process, and the input hidden vector in the first denoising process is the first target hidden vector.

5. The method of claim 4, wherein the obtaining the updated hidden vector in the current denoising process based on the target effect, the first feature map, the first mask map, and the input hidden vector in the current denoising process comprises:

Multiplying the first feature map with a first mask map to obtain a first vector;

multiplying the input hidden vector in the current denoising process with a second mask map to obtain a second vector, wherein the second mask map is a mask map of contents except for a target foreground object in the image to be processed;

and obtaining the updated hidden vector in the current denoising process based on the first vector, the second vector and the target effect.

6. The method of claim 5, wherein the obtaining the updated hidden vector in the current denoising process based on the first vector, the second vector and the target effect comprises:

obtaining a hidden vector to be processed based on the first vector and the second vector;

and performing effect processing on the hidden vector to be processed based on the target effect and the target diffusion model to obtain the updated hidden vector in the current denoising process.

7. The method of claim 1, wherein the obtaining a corresponding target image based on the second target hidden vector comprises:

obtaining a corresponding initial output image based on the second target hidden vector;

and replacing the target foreground object in the initial output image with the target foreground object in the image to be processed based on the first mask map so as to obtain a target image.

8. The method of claim 7, wherein the replacing the target foreground object in the initial output image with the target foreground object in the image to be processed based on the first mask map to obtain a target image comprises:

acquiring a target foreground object from the image to be processed based on the first mask map;

adjusting the color style information of the target foreground object to obtain an adjusted target foreground object;

and replacing the target foreground object in the initial output image with the adjusted target foreground object to obtain a target image.

9. The method of claim 8, wherein the color style information comprises hue, and wherein the adjusting the color style information of the target foreground object comprises:

obtaining first color information based on color values of a plurality of color channels of the image to be processed;

obtaining second color information based on color values of a plurality of color channels of an area except a target foreground object in the target image;

and adjusting the tone of the target foreground object based on the first color information and the second color information.

10. The method of claim 9, wherein the first color information comprises a first color average and the second color information comprises a second color average; the adjusting the hue of the target foreground object based on the first color information and the second color information includes:

and adjusting the tone of the target foreground object based on the ratio of the second color mean to the first color mean.

11. An image processing apparatus, characterized in that the apparatus comprises:

the characteristic map acquisition unit is used for acquiring a characteristic map of the image to be processed;

the noise adding unit is used for carrying out noise adding processing on the feature map for multiple times to obtain a target feature map and a first target hidden vector corresponding to the target feature map, wherein the object subjected to the noise adding processing at the last time is the feature map subjected to the noise adding obtained by the noise adding processing at the last time, and the target feature map is the feature map obtained by the noise adding processing at the last time;

the denoising unit is used for denoising the first target hidden vector for a plurality of times based on the target effect, the feature map and a first mask map to obtain a second target hidden vector, wherein the first mask image is a mask map of a target foreground object in the image to be processed;

And the image generation unit is used for obtaining a corresponding target image based on the second target hidden vector.

12. An electronic device comprising a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-10.

13. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, wherein the program code, when being executed by a processor, performs the method of any of claims 1-10.