CN116485944A

CN116485944A - Image processing method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN116485944A
Application number: CN202310294651.5A
Authority: CN
Inventors: 王瑞琛; 马建; 陈宸; 鲁浩楠
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2023-03-23
Filing date: 2023-03-23
Publication date: 2023-07-25

Abstract

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, and relates to the technical field of image processing. The image processing method comprises the following steps: acquiring an original image and a mask image, wherein the mask image is used for identifying a region to be edited of the original image; determining a cutting frame according to the region to be edited, cutting the original image by using the cutting frame to obtain a sub-image to be edited, and cutting the mask image by using the cutting frame to obtain a mask sub-image; generating an edited sub-image based on the sub-image to be edited, the mask sub-image and the text information; and fusing the edited sub-image with the original image. The present disclosure can improve image editing effects.

Description

Image processing method and device, computer readable storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technology, and in particular, to an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device.

Background

The image editing is used as a function with higher user demands, and the image can be edited and processed according to the user demands. Such as adding, deleting or replacing objects on the image, modifying properties (e.g., color, hairstyle, etc.) of a certain object, creatively replacing the image background, etc.

However, the current image editing algorithm may have a problem of poor image editing effect.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, thereby overcoming the problem of poor image editing effects at least to some extent.

According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring an original image and a mask image, wherein the mask image is used for identifying a region to be edited of the original image; determining a cutting frame according to the region to be edited, cutting the original image by using the cutting frame to obtain a sub-image to be edited, and cutting the mask image by using the cutting frame to obtain a mask sub-image; generating an edited sub-image based on the sub-image to be edited, the mask sub-image and the text information; and fusing the edited sub-image with the original image.

According to a second aspect of the present disclosure, there is provided an image processing apparatus including: the image acquisition module is used for acquiring an original image and a mask image, wherein the mask image is used for identifying a region to be edited of the original image; the image clipping module is used for determining a clipping frame according to the region to be edited, clipping the original image by using the clipping frame to obtain a sub-image to be edited, and clipping the mask image by using the clipping frame to obtain a mask sub-image; the image generation module is used for generating an edited sub-image based on the sub-image to be edited, the mask sub-image and the text information; and the image fusion module is used for fusing the edited sub-image with the original image.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described image processing method.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising a processor; and a memory for storing one or more programs which, when executed by the processor, cause the processor to implement the image processing method described above.

In some embodiments of the present disclosure, a clipping frame is determined according to an area to be edited identified in a mask image, and an original image and the mask image are respectively subjected to image clipping by using the clipping frame to obtain a sub-image to be edited and a mask sub-image, an edited sub-image is generated based on the sub-image to be edited and the mask sub-image and text information, and the edited sub-image is fused with the original image to obtain an edited image corresponding to the original image. Compared with the text guiding image editing scheme based on image cutting operation, the text guiding image editing scheme disclosed by the invention aims at the whole original image in editing operation in some technologies, the editing operation aims at the sub-image determined based on cutting operation, the reduction of the image editing processing range enables the follow-up algorithm to focus on the editing area, the occurrence of editing failure and editing error conditions can be effectively reduced, the image editing effect is improved, and especially for images with smaller editing area or complex background, the scheme disclosed by the invention can obviously optimize the image editing effect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 shows a schematic diagram of a system architecture implementing an image processing scheme of an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of inputs and outputs of an image processing scheme of an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of an image processing scheme of one embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of editing failures that may occur after the scheme illustrated in FIG. 3 is employed;

FIG. 5 schematically illustrates a flow chart of an image processing method of an exemplary embodiment of the present disclosure;

fig. 6 schematically illustrates a schematic diagram of a mask image of an embodiment of the present disclosure;

FIG. 7 schematically illustrates a diagram of a process of determining a crop box in accordance with one embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of a rectangular box of an embodiment of the present disclosure;

FIG. 9 schematically illustrates a process of determining a crop box in accordance with another embodiment of the present disclosure;

FIG. 10 schematically illustrates a flow chart corresponding to FIG. 9 for determining a crop box;

FIG. 11 schematically illustrates a schematic diagram of cropping an original image with a crop box to generate a sub-image to be edited in an embodiment of the present disclosure;

FIG. 12 schematically illustrates a schematic diagram of cropping a mask image with a crop box to generate a mask sub-image in accordance with an embodiment of the disclosure;

FIG. 13 shows a schematic diagram of the manner in which edited sub-images are generated;

FIG. 14 schematically illustrates a architectural diagram of the image editing model of FIG. 13;

FIG. 15 shows a schematic representation of an image corresponding to the scene of FIG. 4 obtained by means of direct fusion stitching;

FIG. 16 shows a schematic diagram of generating a fill image according to an embodiment of the present disclosure;

FIG. 17 illustrates a schematic diagram of a transparency mask map of an embodiment of the disclosure;

FIG. 18 illustrates a schematic diagram of a process of image fusion in accordance with some embodiments of the present disclosure;

FIG. 19 shows a schematic representation of an image corresponding to FIG. 15 obtained using a transparency fusion algorithm;

Fig. 20 schematically shows a flowchart of the entire processing procedure of the image processing method of the embodiment of the present disclosure;

FIG. 21 schematically illustrates an effect diagram of an image generated using an exemplary image processing scheme of the present disclosure;

fig. 22 schematically shows a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 23 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations. In addition, all of the following terms "first," "second," "third," etc. are used for distinguishing purposes only and should not be taken as a limitation of the present disclosure.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which an image processing scheme of an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture may include a terminal device 11 and a server 12, where the terminal device 11 and the server 12 may be connected by a wired, wireless communication link, or an optical fiber cable, etc.

The embodiments of the present disclosure are not limited in the type of terminal device 11, for example, the terminal device 11 may include, but is not limited to, a smart phone, a tablet computer, a portable computer, a smart wearable device, and the like.

In an example in which the terminal device 11 implements the image processing scheme of the embodiment of the present disclosure, first, the terminal device 11 may acquire an original image and a mask image for identifying a region to be edited of the original image. The original image may be an image acquired by the terminal device 11 by means of an image capturing module configured thereon, or an image stored in an album of the terminal device 11, or an image acquired by the terminal device 11 from another device such as the server 12, and the mask image may be an image generated in response to an input operation by the user. Next, the terminal device 11 may determine a crop frame according to the region to be edited, crop the original image with the crop frame to obtain a sub-image to be edited, and crop the mask image with the crop frame to obtain a mask sub-image. Subsequently, the terminal device 11 may generate an edited sub-image based on the sub-image to be edited, the mask sub-image, and the text information. Then, the terminal device 11 may fuse the edited sub-image with the original image to generate an image subjected to image editing corresponding to the original image.

In an example where the server 12 implements the image processing scheme of the presently disclosed embodiments, first, the server 12 may acquire an original image and a mask image for identifying a region to be edited of the original image. Wherein the server 12 may acquire the original image and the mask image from the terminal device 11, or the server 12 may acquire only the mask image from the terminal device 11 and the original image from the storage unit of the server 12. Next, the server 12 may determine a crop box from the region to be edited, crop the original image with the crop box to obtain a sub-image to be edited, and crop the mask image with the crop box to obtain a mask sub-image. Subsequently, the server 12 may generate an edited sub-image based on the sub-image to be edited, the mask sub-image, and the text information. The server 12 may then fuse the edited sub-image with the original image to generate an image corresponding to the original image that has undergone image editing. In addition, the server 12 may also feed back the edited image to the terminal device 11 for viewing, saving, etc. by the user.

Fig. 2 shows a schematic diagram of inputs and outputs of an image processing scheme of an embodiment of the present disclosure. Referring to fig. 2, the input of the image processing scheme of the embodiment of the present disclosure includes an original image, a mask image, and text information. It will be appreciated that both the image and the information are data-bearing, and thus the original image may also be referred to as original image data, the mask image may also be referred to as mask image data, and the text information may also be referred to as text data. The image processing process of the embodiment of the disclosure is adopted to process the original image, the mask image and the text information, and the output of the image processing scheme is the edited image corresponding to the original image.

Fig. 3 schematically illustrates a schematic diagram of an image processing scheme of one embodiment of the present disclosure. Referring to fig. 3, an original image 31, a mask image 32, and text information 33 (user input "gosla") are input as an image editing model 300, and an image editing task is performed by the image editing model 300 to generate an edited image 34. It can be seen that an object (or referred to as content) corresponding to the text information 33 is edited at a position corresponding to the mask in the edited image 34, compared with the original image 31 before editing.

The scheme shown in fig. 3 of the present disclosure can better realize the p-graph effect. However, based on the processing procedure of fig. 3, problems of editing failure and poor editing effect may occur in some scenes. For example, this approach may be problematic in scenes where the original image background is complex or the edit area is small in the mask image.

As shown in fig. 4, the original image 41 includes an editing area 400 corresponding to the mask image, and after inputting text such as "one stone" and performing the above-described processing, no stone appears in the edited image 42, i.e., image editing fails. This is because the editing area is too small with respect to the entire original image, and the inputted text semantics are not clear enough with respect to the semantics of the entire original image, and therefore, the image editing model cannot understand the generation target of the editing area, resulting in image editing failure.

In order to solve the above-mentioned problem to improve the image editing effect, the present disclosure further provides a new image processing scheme.

As described above with reference to fig. 1, the image processing scheme of the embodiment of the present disclosure may be performed by a terminal device or a server, however, for convenience of description, the image processing scheme of the embodiment of the present disclosure is described below by taking the terminal device as an example. That is, the respective steps of the following image processing method may be performed by the terminal device, and the following image processing apparatus may be configured in the terminal device.

Fig. 5 schematically shows a flowchart of an image processing method of an exemplary embodiment of the present disclosure. Referring to fig. 5, the image processing method may include the steps of:

s52, acquiring an original image and a mask image, wherein the mask image is used for identifying a region to be edited of the original image.

In the exemplary embodiment of the present disclosure, the original image may be an image acquired by the terminal device by means of the camera module configured thereon, or may be an image in the album of the terminal device itself, or may be an image acquired by the terminal device from another device. In addition, the original image may be an image of a real scene, or may be a virtually constructed image (such as an animation). The present disclosure is not limited in terms of source, size, content, etc. of the original image.

The mask image may also be referred to as a mask image, and is mainly used for shielding an image area for the purpose of processing only the corresponding image area. The mask image of the embodiment of the present disclosure may be a binary image consistent with the original image size. For example, a region with a value of 0 is a region where image editing is not necessary, a region with a value of 1 is a region where image editing is necessary, and in these embodiments, a region with a value of 1 is determined as a region to be edited that identifies an original image. For another example, a region with a value of 1 is a region where image editing is not necessary, a region with a value of 0 is a region where image editing is necessary, and in these embodiments, a region with a value of 0 is determined as a region to be edited that identifies an original image. It will be appreciated that the effect of the masking image of the present disclosure includes indicating which region or regions in the original image need to be edited.

Fig. 6 schematically shows a schematic diagram of a mask image of an embodiment of the present disclosure. Referring to fig. 6, a mask image 60 includes therein a region 600 to be edited for identifying an original image.

For the manner in which the mask image is acquired, in some embodiments of the present disclosure, it may be drawn manually by a user. For example, the mask image is determined in response to a drawing operation by a user through an application program for realizing image drawing configured on the terminal device.

In other embodiments of the present disclosure, the terminal device may generate the mask image in response to a region selection operation for the original image. Specifically, in the case that the original image is displayed on the display interface of the terminal device, the user may select the region to be edited from the original image through clicking, touching, sliding or other operation modes, and the terminal device may generate the mask image based on the region to be edited selected by the user.

In still other embodiments of the present disclosure, in the case of acquiring the inputted text information, first, the terminal device may extract an object feature included in the text information, for example, the text information is "put one golschla", where the object feature is "golschla". Then, the terminal device determines an area matched with the object features contained in the text information from the original image, for example, the terminal device determines an area capable of placing the gosla in the original image through image analysis. Wherein if the number of matched regions is a plurality, one region may be randomly selected or one region nearest to the center of the image may be selected, which is not limited by the present disclosure. Subsequently, the terminal device may generate a mask image based on the region in the original image that matches the object feature contained in the text information, such as by using the region in which the gosla can be placed.

It will be appreciated that in embodiments where the mask image is determined based on text information, the mask image is adaptively generated from the text information without manual manipulation by the user.

S54, determining a cutting frame according to the area to be edited, cutting the original image by using the cutting frame to obtain a sub-image to be edited, and cutting the mask image by using the cutting frame to obtain the mask sub-image.

In an exemplary embodiment of the present disclosure, the crop box contains coordinate position information of an area to be cropped in the image, that is, the crop box may be applied to the image to implement a cropping operation of the coordinate position area corresponding to the crop box. In addition, it can be appreciated that in some embodiments of the present disclosure, the cropping operation based on the cropping frame does not destroy the image itself, which is equivalent to extracting the corresponding region from the image, which is unchanged. In other embodiments of the present disclosure, the content of the cropped region in the image is missing after undergoing a crop box based cropping operation.

The terminal equipment can determine the minimum circumscribed frame of the area to be edited, and determine the cutting frame according to the minimum circumscribed frame of the area to be edited.

According to some embodiments of the present disclosure, the terminal device may directly determine the minimum circumscribed frame of the area to be edited as a crop frame. That is, the coordinate position of the crop frame coincides with the coordinate position of the minimum circumscribed frame of the region to be edited.

In view of the fact that the more image backgrounds are, the better the processing effect is when the follow-up model algorithm is used for processing. In this way, the clipping frame is determined as the minimum external frame, so that the problem that the editing result cannot be well integrated into the image background may occur, and the generated edited image is unnatural. In order to optimize the image editing effect, the terminal device can expand the minimum external frame of the area to be edited to obtain a cutting frame.

According to other embodiments of the present disclosure, the terminal device may perform equal-scale amplification on the minimum circumscribed frame of the area to be edited to determine the crop frame.

Referring to fig. 7, for the region 700 to be edited contained in the mask image 70, first, a minimum bounding box 701 of the region 700 to be edited may be determined, and as shown in fig. 7, the minimum bounding box 701 is a minimum bounding rectangle of the region 700 to be edited. Next, the minimum bounding box 701 may be scaled up to obtain a crop box 702.

The embodiment of the disclosure also provides a mode of equal-proportion amplification, which comprises the following steps:

referring to fig. 8, for an original image of height H and width W, one quadruple (x ₁ ,y ₁ ,x ₂ ,y ₂ ) To represent the minimum bounding rectangle, and the four values represent the abscissa and ordinate of the upper left and lower right corners of the minimum bounding rectangle, respectively. In this case, the crop box is determined by the following formula:

x′ ₁ ＝max(0，x ₁ -W′)；y′ ₁ ＝max(0，y ₁ -H′)；x′ ₂ ＝min(W，x ₂ +W′)；y′ ₂ ＝min(H，y ₂ +H′)

Crop box= (x' ₁ ，x′ ₂ ，y′ ₁ ，y′ ₂ )

Wherein, alpha is a control variable input by a user and can be a default constant built in an algorithm for indicating the importance ratio of the area to be edited.

According to still other embodiments of the present disclosure, the terminal device may determine an area ratio of the area to be edited in the minimum circumscribed frame of the area to be edited, and expand the minimum circumscribed frame of the area to be edited according to the area ratio of the area to be edited in the minimum circumscribed frame of the area to be edited, so as to determine the crop frame.

Referring to fig. 9, for the region to be edited 900 included in the mask image 90, first, a minimum circumscribed frame 901 of the region to be edited 900 may be determined, and as shown in fig. 9, the minimum circumscribed frame 901 is a minimum circumscribed rectangle of the region to be edited 900. Next, an area-wise expansion may be performed on the minimum bounding box 901 to determine the crop box 902.

Specifically, first, the area ratio of the region to be edited in the minimum circumscribed frame of the region to be edited can be determined, and when a four-tuple (x ₁ ,y ₁ ,x ₂ ,y ₂ ) Under the condition of representing the minimum circumscribed rectangle, the area occupation ratio of the area to be edited in the minimum circumscribed frame can be determined by adopting the following formula:

s (to-be-edited region) represents the area of the to-be-edited region, if the to-be-edited region is represented by 1, all 1 values in the to-be-edited image are counted, and the number of 1 values is determined, so that the area of the to-be-edited region is obtained.

And if the area occupation ratio of the area to be edited in the minimum circumscribed frame of the area to be edited is larger than or equal to the occupation ratio threshold value, performing one-time expansion operation on the minimum circumscribed frame to obtain an intermediate circumscribed frame of the area to be edited. The duty ratio threshold may be α, which is not described herein. The term "one expansion operation" characterizes an expansion operation performed and performed only once, which may specifically refer to the following operations:

x ₁ ＝max(0，x ₁ -1)，x ₂ ＝min(x ₂ +1，W)，y ₁ ＝max(0，y ₁ -1)，y ₂ ＝min(y ₂ +1，H)

wherein the middle external frame is obtained by executing the one-time expansion operation (x) ₁ ，y ₁ ，x ₂ ，y ₂ )。

Then, the terminal device can determine the area ratio of the area to be edited in the middle external frame, if the area ratio of the area to be edited in the middle external frame is greater than or equal to the threshold value of the area ratio, the middle external frame is expanded once, and the middle external frame is updated by utilizing the external frame obtained after the expansion operation once. And if the area occupation ratio of the area to be edited in the intermediate circumscribed frame is smaller than the occupation ratio threshold value, or the size of the intermediate circumscribed frame is the same as the size of the original image, determining the intermediate circumscribed frame as a cutting frame.

To better illustrate aspects of determining a crop box by area ratio in accordance with some embodiments of the present disclosure, FIG. 10 schematically illustrates a flow chart of a process for determining a crop box.

In step S1002, the terminal device may determine an area ratio of the area to be edited in the minimum circumscribed frame of the area to be edited.

In step S1004, the terminal device may determine whether the area ratio of the area to be edited in the smallest circumscribed frame of the area to be edited is greater than or equal to the ratio threshold. If the duty ratio is smaller than the threshold value, step S1006 is executed; if it is greater than or equal to the duty cycle threshold, step S1008 is performed.

In step S1006, the terminal device takes the minimum circumscribed frame as a crop frame.

In step S1008, the terminal device may perform an expansion operation on the minimum circumscribed frame to obtain an intermediate circumscribed frame of the area to be edited.

In step S1010, the terminal device may determine an area ratio of the area to be edited in the middle circumscribed frame of the area to be edited.

In step S1012, the terminal device may determine whether the area ratio of the area to be edited in the middle circumscribed frame of the area to be edited is greater than or equal to the above-described ratio threshold. If the duty ratio is smaller than the duty ratio threshold, step S1020 is performed; if it is greater than or equal to the duty threshold, step S1014 is performed.

In step S1014, the terminal device may perform an expansion operation on the intermediate circumscribed frame once. It is understood that the one expansion operation in this step is the same as the one expansion operation in step S1008.

In step S1016, the terminal device may determine whether the expanded intermediate circumscribed frame is the same size as the original image. If not, then step S1018 is performed; if so, step S1020 is performed.

In step S1018, the terminal device may update the intermediate circumscribed frame, and return to step S1010 to continue execution.

In step S1020, the terminal device may determine the intermediate circumscribed frame as a crop frame.

The present disclosure does not limit the shape of the minimum bounding box and/or the crop box, and may be rectangular or square, for example. In addition, in view of the subsequent model processing procedure, if the input of the model is square, the shape of the crop box may be configured at least as square to ensure that the image is not stretched and distorted due to scaling.

After determining the cropping frame, on the one hand, the terminal device may crop the original image with the cropping frame to obtain the sub-image to be edited. Referring to fig. 11, for an original image 111, after determining a crop box 1100, the terminal device may crop out a sub-image 112 to be edited.

On the other hand, the terminal device may crop the mask image using a cropping frame to obtain the mask sub-image. Referring to fig. 12, for a mask image, after determining a crop box 1200, the terminal device may crop out the mask sub-image 122.

S56, generating an edited sub-image based on the sub-image to be edited, the mask sub-image and the text information.

In an exemplary embodiment of the present disclosure, the text information may be information input by a user. For example, the text information may be text information input by a user through a keyboard or the like. For another example, the text information may be text information obtained by text converting a voice signal input by the user by the terminal device. In addition, the text information can also be the text information randomly generated by the terminal equipment. The present disclosure is not limited in terms of the source, content, etc. of the textual information.

After determining the sub-image to be edited, the mask sub-image, and the text information, the terminal device may generate an edited sub-image based on the sub-image to be edited, the mask sub-image, and the text information.

Referring to fig. 13, the terminal device may input the sub-image to be edited, the mask sub-image, and the text information into an image editing model, and output the image editing model as an edited sub-image.

If the sizes of the sub-image to be edited and the mask sub-image are inconsistent with the input size of the image editing model, the terminal device may scale the sub-image to be edited to a size consistent with the input size of the image editing model (e.g., 512×512) to obtain a first input image, and may scale the mask sub-image to a size consistent with the input size of the image editing model to obtain a second input image. Subsequently, the terminal device may input the first input image, the second input image, and the text information into the image editing model to generate an edited sub-image.

The image editing model of the embodiments of the present disclosure may be a stable diffusion inpainting model, an countermeasure generation network, or the like, to which the present disclosure is not limited.

An image editing model of an embodiment of the present disclosure will be described with reference to fig. 14. On the one hand, the mask sub-image may be resized to obtain a resized mask image (resized mask image); alternatively, the mask sub-image may be applied to the sub-image to be edited to obtain a masked sub-image (masked image). The masked sub-images are fed to an image encoder for image encoding, outputting hidden variables of the masked images.

The random noise hidden variable is used as initial noise for generating a result, a concat operation is performed with the resized mask image and the mask image hidden variable, and the result is input to the diffusion model.

The text encoder encodes the input text information and outputs a vector representation of the text, which can be accessed into a diffusion model, and the result generated by the control model conforms to the text semantics.

The diffusion model outputs a result image hidden variable according to the input data, specifically, the diffusion model receives the control of the text vector and generates the result image hidden variable conforming to the text description.

The image decoder decodes the resulting image hidden variable to obtain the edited sub-image.

S58, fusing the edited sub-image with the original image.

In the case that the edited sub-image determined in step S56 is a sub-image whose image size is not restored, the edited sub-image may be scaled to a size consistent with the size of the sub-image to be edited, so as to obtain a sub-image to be fused, and the sub-image to be fused is fused with the original image, so as to generate an edited image corresponding to the original image.

In the case that the edited sub-image determined in step S56 is the sub-image with the restored image size or the size itself is consistent with the size of the sub-image to be edited, the terminal device may directly fuse the edited sub-image with the original image to generate an edited image corresponding to the original image.

The procedure of the fusion operation is described below:

according to some embodiments of the present disclosure, the terminal device may replace the pixel value of the corresponding position in the original image with the pixel value of the edited sub-image. It can be appreciated that in the embodiment of the disclosure, in the image processing process of clipping, editing and the like, each sub-image retains corresponding coordinate position information, and after determining the edited sub-image, the pixel value can be replaced by using the coordinate position information.

Corresponding to the example shown in fig. 4, referring to fig. 15, an edited image 150 may be generated using the above-described processes of the present disclosure. However, as shown in image 150, the fused boundary (rectangular box) is apparent, resulting in an unnatural image.

In order to improve the effect of image editing, the present disclosure also provides some optimization methods.

According to further embodiments of the present disclosure, in one aspect, the terminal device may generate a filler image consistent with the size of the original image using the edited sub-image. As shown in fig. 16, the terminal device may generate a filler image 161 consistent with the size of the original image using the edited sub-image 160. For example, an area of the fill image 161 other than the edited sub-image 160 may be filled with 1.

On the other hand, the terminal device can generate a transparency mask image by using the original image, the minimum circumscribed frame of the area to be edited and the clipping frame. The transparency mask image comprises a first area, a second area and a third area, wherein the first area corresponds to a minimum circumscribed frame of an area to be edited, the second area corresponds to an area outside the minimum circumscribed frame of the area to be edited and inside the cutting frame, and the third area corresponds to an area outside the cutting frame in the original image.

Referring to fig. 17, for the original image 1710, a crop box 1711, and a minimum circumscribed box 1712 of the region to be edited. Based on the crop box 1711 and the minimum circumscribed box 1712 of the region to be edited, the original image 1710 may be divided into the above-described first region, second region, and third region, and the terminal device may generate the transparency mask image 1720 according to the first region, second region, and third region.

As exemplarily shown in fig. 17, for a first area corresponding to the smallest circumscribed frame of the area to be edited, the mask map transparency takes a value of 255; aiming at a third area which corresponds to the original image except the cutting frame, the transparency value of the mask map is 0; and aiming at a second region which is outside the minimum circumscribed frame of the region to be edited and corresponds to the inside of the cutting frame, the transparency of the mask map is valued in an interpolation gradual change mode according to the range of 0-255.

After determining the filler image and the transparency mask image, the terminal device may perform transparency gradient fusion on the original image, the filler image, and the transparency mask image to generate an edited image corresponding to the original image.

Referring to fig. 18, first, on the one hand, a terminal device generates a fill image 1820 using the edited sub-image 1810; on the other hand, the terminal device generates a transparency mask image 1830 from the image area relation among the original image, the minimum circumscribed frame of the area to be edited, and the crop frame. Next, the terminal device may take the fill image 1820, transparency mask image 1830, and original image 1840 as inputs to a transparency fusion algorithm to obtain an edited image 1850 corresponding to the original image 1840.

Fig. 19 shows an effect diagram of generating an edited image based on the processing manner of transparency gradation fusion, corresponding to the example shown in fig. 4. Compared to the fusion method of fig. 15 in which direct stitching is adopted, the problem of obvious fusion boundary does not occur in the image 190, and it can be seen that the above-mentioned transparency gradient fusion method of the embodiment of the disclosure helps to improve the image editing effect.

According to further embodiments of the present disclosure, the terminal device may divide the original image into a first region, a second region, and a third region, similarly, the first region corresponds to a minimum bounding box of the region to be edited, the second region corresponds to a region outside the minimum bounding box of the region to be edited and inside the crop frame, and the third region corresponds to a region outside the crop frame.

In this case, for the first region, the terminal device may replace the pixel value of the first region in the original image with the pixel value of the minimum circumscribed frame of the region to be edited in the edited sub-image. For the second region, the terminal device may perform interpolation processing on the pixel values of the region except the minimum circumscribed frame of the region to be edited in the edited sub-image and the pixel values of the second region in the original image, and update the pixel values of the second region in the original image with the interpolation result, where the weight of the interpolation is not limited in the embodiment of the present disclosure. For the third region, the pixel values of the third region in the original image are maintained unchanged.

Compared with the transparency gradient fusion mode, the embodiment does not need to generate the filling image and the transparency mask image, and the process of generating the corresponding image is saved under the condition of equivalent effect.

Fig. 20 schematically shows a flowchart of the entire processing procedure of the image processing method of the embodiment of the present disclosure.

Referring to fig. 20, in the case of acquiring an original image 201 and a mask image 202, first, a terminal device may determine a cropping frame 203 according to an area to be edited in the mask image, and crop the mask image 202 and the original image 201 by using the cropping frame 203, respectively, to obtain a mask sub-image 204 and a sub-image 205 to be edited.

Next, the terminal device may take the mask sub-image 204, the edited sub-image 205, and the entered text (e.g., kokuki) as inputs to the image editing model, whereby the image editing model outputs the edited sub-image 206.

The terminal device may then fuse the edited sub-image 206 with the original image 201 to generate an edited image 207 that corresponds to the original image 201 and matches the input text semantics.

In addition, in order to further verify the effect of the image processing scheme of the embodiment of the disclosure, experiments are performed on different types of original images and texts respectively, so that a result with good image editing effect is obtained. As shown in fig. 21, in the case where the region to be edited of the mask is determined and text ("lightning") is input, a better image editing effect can be obtained by applying the image processing scheme of the present disclosure. The scheme of the embodiment of the disclosure has strong universality and can be widely applied to scenes such as p-map, image sample set expansion and the like.

It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

Further, an image processing apparatus is also provided in the present exemplary embodiment.

Fig. 22 schematically shows a block diagram of an image processing apparatus of an exemplary embodiment of the present disclosure. Referring to fig. 22, an image processing apparatus 22 according to an exemplary embodiment of the present disclosure may include an image acquisition module 221, an image cropping module 223, an image generation module 225, and an image fusion module 227.

Specifically, the image obtaining module 221 may be configured to obtain an original image and a mask image, where the mask image is used to identify a region to be edited of the original image; the image cropping module 223 may be configured to determine a cropping frame according to the region to be edited, crop the original image with the cropping frame to obtain a sub-image to be edited, and crop the mask image with the cropping frame to obtain a mask sub-image; the image generation module 225 may be configured to generate an edited sub-image based on the sub-image to be edited, the mask sub-image, and the text information; the image fusion module 227 may be used to fuse the edited sub-image with the original image.

According to an example embodiment of the present disclosure, the image cropping module 223 may be configured to perform: determining a minimum circumscribed frame of the area to be edited; and determining a cutting frame according to the minimum circumscribed frame of the area to be edited.

According to an example embodiment of the present disclosure, the image cropping module 223 may be configured to perform: and carrying out equal proportion amplification on the minimum circumscribed frame of the area to be edited so as to determine the cutting frame.

According to an exemplary embodiment of the present disclosure, the image cropping module 223 may be further configured to perform: determining the area occupation ratio of the area to be edited in the minimum circumscribed frame of the area to be edited; and expanding the minimum circumscribed frame of the area to be edited according to the area occupation ratio of the area to be edited in the minimum circumscribed frame of the area to be edited so as to determine the cutting frame.

According to an example embodiment of the present disclosure, the process of determining a crop box by the image cropping module 223 by area ratio may be configured to perform: if the area ratio of the area to be edited in the minimum circumscribed frame of the area to be edited is larger than or equal to the threshold value of the area ratio, performing one-time expansion operation on the minimum circumscribed frame to obtain an intermediate circumscribed frame of the area to be edited; determining the area occupation ratio of the area to be edited in the middle external frame, if the area occupation ratio of the area to be edited in the middle external frame is larger than or equal to a occupation ratio threshold value, performing one-time expansion operation on the middle external frame, and updating the middle external frame by using the external frame obtained after one-time expansion operation; and if the area occupation ratio of the area to be edited in the intermediate circumscribed frame is smaller than the occupation ratio threshold value, or the size of the intermediate circumscribed frame is the same as the size of the original image, determining the intermediate circumscribed frame as a cutting frame.

According to an example embodiment of the present disclosure, the image generation module 225 may be configured to perform: scaling the sub-image to be edited to a size consistent with the input size of the image editing model to obtain a first input image; scaling the mask sub-image to a size consistent with the input size of the image editing model to obtain a second input image; the first input image, the second input image, and the text information are input into an image editing model to generate an edited sub-image.

According to an example embodiment of the present disclosure, the image fusion module 227 may be configured to perform: scaling the edited sub-image to a size consistent with the size of the sub-image to be edited so as to obtain a sub-image to be fused; and fusing the sub-images to be fused with the original images.

According to an example embodiment of the present disclosure, the image fusion module 227 may be configured to perform: and replacing the pixel value of the corresponding position in the original image by the pixel value of the edited sub-image.

According to an exemplary embodiment of the present disclosure, the image fusion module 227 may be further configured to perform: generating a filling image with the same size as the original image by using the edited sub-image; generating a transparency mask image by using the original image, the minimum external frame of the area to be edited and the cutting frame; the transparency mask image comprises a first area, a second area and a third area, wherein the first area corresponds to a minimum circumscribed frame of an area to be edited, the second area corresponds to an area outside the minimum circumscribed frame of the area to be edited and inside the cutting frame, and the third area corresponds to an area outside the cutting frame in the original image; and carrying out transparency gradient fusion on the original image, the filling image and the transparency mask image.

According to an exemplary embodiment of the present disclosure, an original image is divided into a first region, a second region, and a third region, the first region corresponding to a minimum bounding box of a region to be edited, the second region corresponding to a region outside the minimum bounding box of the region to be edited and inside a crop box, the third region corresponding to a region outside the crop box. In this case, the image fusion module 227 may be further configured to perform: replacing the pixel value of the first area in the original image with the pixel value of the minimum circumscribed frame of the area to be edited in the edited sub-image; interpolation processing is carried out on the pixel values of the areas except the minimum circumscribed frame of the area to be edited in the edited sub-image and the pixel values of the second area in the original image, and the pixel values of the second area in the original image are updated by interpolation results; the pixel values of the third region in the original image are maintained unchanged.

According to an exemplary embodiment of the present disclosure, the image acquisition module 221 may be further configured to perform: a mask image is generated in response to a region selection operation for the original image.

According to an exemplary embodiment of the present disclosure, the image acquisition module 221 may be further configured to perform: extracting object features contained in the text information; determining an area matched with the object characteristics contained in the text information from the original image; a mask image is generated based on an area in the original image that matches the object feature contained in the text information.

Since each functional module of the image processing apparatus according to the embodiment of the present disclosure is the same as that of the above-described method embodiment, a detailed description thereof will be omitted.

Fig. 23 shows a schematic diagram of an electronic device suitable for use in implementing exemplary embodiments of the present disclosure. The terminal device of the exemplary embodiment of the present disclosure may be configured as in the form of fig. 23. It should be noted that the electronic device shown in fig. 23 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiments of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, enable the processor to implement the image processing method of the exemplary embodiments of the present disclosure.

Specifically, as shown in fig. 23, the electronic device 230 may include: processor 2310, internal memory 2321, external memory interface 2322, universal serial bus (Universal Serial Bus, USB) interface 2330, charge management module 2340, power management module 2341, battery 2342, antenna 1, antenna 2, mobile communication module 2350, wireless communication module 2360, audio module 2370, sensor module 2380, display screen 2390, camera module 2391, indicator 2392, motor 2393, keys 2394, and subscriber identity module (Subscriber Identification Module, SIM) card interface 2395, among others. The sensor module 2380 may include a depth sensor, a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

It is to be understood that the illustrated structure of the presently disclosed embodiments does not constitute a particular limitation of the electronic device 230. In other embodiments of the present disclosure, electronic device 230 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 2310 may include one or more processing units, for example: the processor 2310 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, a video codec, a digital signal processor (Digital Signal Processor, DSP), a baseband processor, and/or a Neural network processor (Neural-network Processing Unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. In addition, a memory may be provided in the processor 2310 for storing instructions and data.

The electronic device 230 may implement a photographing function through an ISP, a camera module 2391, a video codec, a GPU, a display screen 2390, an application processor, and the like. In some embodiments, the electronic device 230 may include 1 or N camera modules 2391, where N is a positive integer greater than 1, and if the electronic device 230 includes N cameras, one of the N cameras is a master camera.

Internal memory 2321 may be used to store computer-executable program code that includes instructions. Internal memory 2321 may include a stored program area and a stored data area. The external memory interface 2322 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 230.

The present disclosure also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiments; or may exist alone without being incorporated into the electronic device.

The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The computer-readable storage medium carries one or more programs which, when executed by one such electronic device, cause the electronic device to implement the methods as described in the embodiments of the present disclosure.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

acquiring an original image and a mask image, wherein the mask image is used for identifying a region to be edited of the original image;

determining a cutting frame according to the region to be edited, cutting the original image by using the cutting frame to obtain a sub-image to be edited, and cutting the mask image by using the cutting frame to obtain a mask sub-image;

generating an edited sub-image based on the sub-image to be edited, the mask sub-image and text information;

and fusing the edited sub-image with the original image.

2. The image processing method according to claim 1, wherein determining a crop box from the region to be edited comprises:

determining the minimum circumscribed frame of the area to be edited;

and determining the cutting frame according to the minimum circumscribed frame of the area to be edited.

3. The image processing method according to claim 2, wherein determining the crop box from the minimum circumscribed frame of the region to be edited comprises:

and carrying out equal proportion amplification on the minimum circumscribed frame of the area to be edited so as to determine the cutting frame.

4. The image processing method according to claim 2, wherein determining the crop box from the minimum circumscribed frame of the region to be edited comprises:

Determining the area occupation ratio of the area to be edited in the minimum circumscribed frame of the area to be edited;

and expanding the minimum circumscribed frame of the area to be edited according to the area occupation ratio of the area to be edited in the minimum circumscribed frame of the area to be edited so as to determine the cutting frame.

5. The image processing method according to claim 4, wherein expanding the minimum bounding box of the region to be edited according to an area ratio of the region to be edited in the minimum bounding box of the region to be edited to determine the crop box includes:

if the area occupation ratio of the area to be edited in the minimum circumscribed frame of the area to be edited is larger than or equal to a occupation ratio threshold value, performing one-time expansion operation on the minimum circumscribed frame to obtain an intermediate circumscribed frame of the area to be edited;

determining the area ratio of the area to be edited in the middle external frame, if the area ratio of the area to be edited in the middle external frame is larger than or equal to the duty ratio threshold, performing the primary expansion operation on the middle external frame, and updating the middle external frame by utilizing the external frame obtained after the primary expansion operation;

And if the area occupation ratio of the area to be edited in the intermediate circumscribed frame is smaller than the occupation ratio threshold value, or the size of the intermediate circumscribed frame is the same as the size of the original image, determining the intermediate circumscribed frame as the cutting frame.

6. The image processing method according to claim 1, wherein generating an edited sub-image based on the sub-image to be edited, the mask sub-image, and text information comprises:

scaling the sub-image to be edited to a size consistent with the input size of the image editing model to obtain a first input image;

scaling the mask sub-image to a size consistent with an input size of the image editing model to obtain a second input image;

inputting the first input image, the second input image and the text information into the image editing model to generate the edited sub-image.

7. The image processing method according to claim 6, wherein fusing the edited sub-image with the original image comprises:

scaling the edited sub-image to a size consistent with the size of the sub-image to be edited so as to obtain a sub-image to be fused;

And fusing the sub-image to be fused with the original image.

8. The image processing method according to claim 1, wherein fusing the edited sub-image with the original image comprises:

and replacing the pixel value of the corresponding position in the original image by using the pixel value of the edited sub-image.

9. The image processing method according to claim 2, wherein fusing the edited sub-image with the original image includes:

generating a filling image with the same size as the original image by using the edited sub-image;

generating a transparency mask image by using the original image, the minimum circumscribed frame of the region to be edited and the cutting frame; the transparency mask image comprises a first area, a second area and a third area, wherein the first area corresponds to the minimum circumscribed frame of the area to be edited, the second area corresponds to an area outside the minimum circumscribed frame of the area to be edited and inside the cutting frame, and the third area corresponds to an area outside the cutting frame in the original image;

and carrying out transparency gradient fusion on the original image, the filling image and the transparency mask image.

10. The image processing method according to claim 2, wherein the original image is divided into a first area, a second area, and a third area, the first area corresponding to a minimum bounding box of the area to be edited, the second area corresponding to an area outside the minimum bounding box of the area to be edited and inside the crop box, the third area corresponding to an area outside the crop box; wherein fusing the edited sub-image with the original image comprises:

replacing the pixel value of the first region in the original image with the pixel value of the minimum circumscribed frame of the region to be edited in the edited sub-image;

performing interpolation processing on the pixel values of the region except the minimum circumscribed frame of the region to be edited in the edited sub-image and the pixel values of the second region in the original image, and updating the pixel values of the second region in the original image by using an interpolation result;

maintaining the pixel value of the third region in the original image unchanged.

11. The image processing method according to claim 1, characterized in that the image processing method further comprises:

The mask image is generated in response to a region selection operation for the original image.

12. The image processing method according to claim 1, characterized in that the image processing method further comprises:

extracting object features contained in the text information;

determining an area matched with the object characteristics contained in the text information from the original image;

and generating the mask image based on the region matched with the object feature contained in the text information in the original image.

13. An image processing apparatus, comprising:

the image acquisition module is used for acquiring an original image and a mask image, wherein the mask image is used for identifying a region to be edited of the original image;

the image clipping module is used for determining a clipping frame according to the region to be edited, clipping the original image by using the clipping frame to obtain a sub-image to be edited, and clipping the mask image by using the clipping frame to obtain a mask sub-image;

the image generation module is used for generating an edited sub-image based on the sub-image to be edited, the mask sub-image and the text information;

and the image fusion module is used for fusing the edited sub-image with the original image.

14. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the image processing method according to any one of claims 1 to 12.

15. An electronic device, comprising:

a processor;

a memory for storing one or more programs that, when executed by the processor, cause the processor to implement the image processing method of any of claims 1 to 12.