CN115082298A

CN115082298A - Image generation method, image generation device, electronic device, and storage medium

Info

Publication number: CN115082298A
Application number: CN202210839803.0A
Authority: CN
Inventors: 王美玲; 李甫; 林天威; 邓瑞峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2022-09-20

Abstract

The present disclosure provides an image generation method, an image generation apparatus, an electronic device, a storage medium, and a program product, which relate to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing, and computer vision, and can be applied to scenes such as portrait processing. The specific implementation scheme is as follows: processing the image to be processed to obtain a target face image; performing style conversion on the target face image and the image to be processed to respectively generate a stylized face image and a stylized global image; and fusing the stylized face image and the stylized global image to generate a stylized target image.

Description

Image generation method, image generation device, electronic device, and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly to the field of deep learning, image processing, and computer vision technology, and can be applied to scenes such as portrait processing. And more particularly, to an image generation method, apparatus, electronic device, storage medium, and program product.

Background

With the rapid development of image processing technology and special effect technology, the original image obtained by real shooting with a camera gradually derives a new processing mode, for example, the original image is converted from a real style to other styles such as colored drawing and cartoon. And the personalized requirements of the user on the images are met through style conversion.

Disclosure of Invention

The disclosure provides an image generation method, an apparatus, an electronic device, a storage medium, and a program product.

According to an aspect of the present disclosure, there is provided an image generation method including: processing the image to be processed to obtain a target face image; carrying out style conversion on the target face image and the image to be processed to respectively generate a stylized face image and a stylized global image; and fusing the stylized face image and the stylized global image to generate a stylized target image.

According to another aspect of the present disclosure, there is provided an image generation apparatus including: the processing module is used for processing the image to be processed to obtain a target face image; the conversion module is used for carrying out style conversion on the target face image and the image to be processed to respectively generate a stylized face image and a stylized global image; and the generating module is used for fusing the stylized face image and the stylized global image to generate a stylized target image. .

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method according to the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method as disclosed herein.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an exemplary application scene diagram of an image generation method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates an exemplary system architecture to which the image generation method and apparatus may be applied, according to an embodiment of the present disclosure;

FIG. 3 schematically shows a flow chart of an image generation method according to an embodiment of the present disclosure;

FIG. 4A schematically illustrates a flow diagram of an image generation method according to an embodiment of the present disclosure;

FIG. 4B schematically shows a flow diagram of an image generation method according to another embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram of a method of training a portrait stylized model, in accordance with an embodiment of the present disclosure;

FIG. 6 schematically shows a schematic diagram of an image generation method according to another embodiment of the present disclosure;

FIG. 7 schematically shows a flow chart of an image generation method according to another embodiment of the present disclosure;

fig. 8 schematically shows a block diagram of an image generation apparatus according to an embodiment of the present disclosure; and

fig. 9 schematically shows a block diagram of an electronic device adapted to implement an image generation method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 schematically shows an exemplary application scene diagram of an image generation method according to an embodiment of the present disclosure.

As shown in fig. 1, an original image 101 is subjected to a style conversion using a first style conversion technique, and a first stylized target image 103 is generated. The original image 101 is subjected to the style conversion using the second style conversion technique, and a second stylized target image 104 is generated.

As shown in fig. 1, an original image 101 is subjected to a style conversion with reference to a style image 102, and a first stylized target image 103 and a second stylized target image 104 having a different image style but similar image contents from those of the original image 101 are generated. The first stylized target image 103 and the second stylized target image 104 are each the same image style as the stylized image 102 but different in image content.

As shown in fig. 1, the original image is subjected to style conversion using different style conversion techniques, and the generated stylized target image, for example, the first stylized target image 103 and the second stylized target image 104, are different in quality. For example, the sharpness of image texture, the chaos of image texture, the fidelity of color, the fidelity of image content, etc. are all different. The definition of image texture, the chaos of image texture, the fidelity of color, the reality of image content, the distortion of facial expression, etc. all become important factors influencing the image effect and video effect.

The embodiment of the disclosure provides an image generation method, an image generation device, an electronic device, a storage medium and a program product.

According to an embodiment of the present disclosure, an image generation method includes: processing the image to be processed to obtain a target face image; carrying out style conversion on the target face image and the image to be processed to respectively generate a stylized face image and a stylized global image; and fusing the stylized face image and the stylized global image to generate a stylized target image.

By using the image generation method provided by the embodiment of the disclosure, the style conversion can be performed on the image to be processed with the face object, for example, portrait colored drawing style conversion is performed, and the image to be processed is converted into a stylized target image with a colored drawing style.

According to the embodiment of the disclosure, a stylized target image having a target style can be quickly generated using an image generation method. In addition, based on the target face image and the image to be processed in the image to be processed, a stylized face image and a stylized global image are respectively generated, the stylized face image and the stylized global image are fused, and the stylized target image is generated, so that the textures of the stylized target image are clear, the face features are stable, the key point features are not distorted, visual flawless feeling is brought to a user, and the use quality of the user is improved.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.

In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

Fig. 2 schematically illustrates an exemplary system architecture to which the image generation method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 2 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the image generation method and apparatus may be applied may include a terminal device, but the terminal device may implement the image generation method and apparatus provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 2, the system architecture 200 according to this embodiment may include

terminal devices

201, 202, 203, a network 204 and a server 205. The network 204 serves as a medium for providing communication links between the

terminal devices

201, 202, 203 and the server 205. Network 204 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

201, 202, 203 to interact with the server 205 via the network 204 to receive or send messages or the like. The

terminal devices

201, 202, 203 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

terminal devices

201, 202, 203 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 205 may be a server providing various services, such as a background management server (for example only) providing support for content browsed by users using the

terminal devices

201, 202, 203. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the image generation method provided by the embodiment of the present disclosure may be generally executed by the

terminal device

201, 202, or 203. Accordingly, the image generation apparatus provided by the embodiment of the present disclosure may also be provided in the

terminal device

201, 202, or 203.

Alternatively, the image generation method provided by the embodiment of the present disclosure may be generally executed by the server 205. Accordingly, the image generation apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 205. The image generation method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 205 and is capable of communicating with the

terminal devices

201, 202, 203 and/or the server 205. Accordingly, the image generation apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 205 and capable of communicating with the

terminal devices

201, 202, 203 and/or the server 205.

For example, the user determines an image to be processed from the

terminal devices

201, 202, and 203, and then transmits the determined image to be processed to the server 205 by using the

terminal devices

201, 202, and 203, the style conversion model for processing the image to be processed is loaded in the server 205 in advance, and the server 205 processes the image to be processed to obtain the target face image. And performing style conversion on the target face image and the image to be processed to respectively generate a stylized face image and a stylized global image. And fusing the stylized face image and the stylized global image to generate a stylized target image. Or by a server or server cluster capable of communicating with the

terminal devices

201, 202, 203 and/or the server 205, and finally generates a stylized target image.

It should be understood that the number of terminal devices, networks, and servers in fig. 2 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.

Fig. 3 schematically shows a flow chart of an image generation method according to an embodiment of the present disclosure.

As shown in fig. 3, the method includes operations S310 to S330.

In operation S310, the image to be processed is processed to obtain a target face image.

In operation S320, the target face image and the image to be processed are subjected to style conversion, and a stylized face image and a stylized global image are generated, respectively.

In operation S330, the stylized face image and the stylized global image are fused to generate a stylized target image.

According to an embodiment of the present disclosure, the image to be processed may refer to an image to be subjected to style conversion. The image to be processed may include a facial object, such as a human face object, an animal face object, and the like, and may also include a background object, such as a landscape, a building, and the like.

According to the embodiment of the present disclosure, the image to be processed may be processed, for example, by using an image segmentation technique to crop a target face image from the image to be processed. Or, extracting the features of the target face image region from the image to be processed by using a feature extraction technology to obtain the target face image.

According to the embodiment of the present disclosure, performing style conversion on the target face image and the image to be processed to generate a stylized face image and a stylized global image, respectively, may refer to: performing first style conversion on the target face image to generate a stylized face image; and performing second style conversion on the image to be processed to generate a stylized global image. The first and second stylistic conversions may be in the same or different conversion manners.

According to an embodiment of the present disclosure, fusing the stylized face image and the stylized global image to generate the stylized target image may refer to: the stylized target image is generated by replacing the image of the region corresponding to the stylized face image in the stylized global image with the stylized face image, but the following may be mentioned, but the present invention is not limited to this: and carrying out weighted summation on the pixel information of the stylized face image and the stylized global image information to generate a stylized target image.

According to an embodiment of the present disclosure, the style of the stylized target image may include at least one of: cartoon style, oil painting style, wash painting style, simple drawing style and colored drawing style. As long as it is a style different from the image style of the image to be processed.

According to other embodiments of the disclosure, compared with a mode of only performing style conversion on an image to be processed, a mode of performing style conversion on a target face image and the image to be processed to generate a stylized face image and a stylized global image respectively, and fusing the stylized face image and the stylized global image to generate a stylized target image is adopted, so that the stylized face image can be used for improving the texture definition of a face object in the stylized target image, reducing the torsion degree of key point features, and further improving the fidelity of the stylized target image. In addition, under the condition that the first style conversion mode and the second style conversion mode are different, the stylized target image can have image contents of two different conversion styles, so that the stylized target image is rich and diverse, the user is provided with a brilliant visual experience, the viscosity of the user is enhanced, and the use quality of the user is improved.

Fig. 4A schematically shows a flow diagram of an image generation method according to another embodiment of the present disclosure.

As shown in fig. 4A, the image to be processed 410 is input into the face processing model 420, and the target face image 430 is output. The target face image 430 is input into a portrait stylization model 440, and a stylized face image 450 is output. The image to be processed 410 is input into the global stylized model 460, and the stylized global image 470 is output. The stylized face image 450 and the stylized global image 470 are fused to generate a stylized target image 480.

According to the embodiment of the disclosure, the face key point feature torsion degree of the target face image is low, and the texture is clear by using the portrait stylized model. The global stylized model can be used for realizing intelligent and efficient style conversion of the stylized global image.

According to another embodiment of the present disclosure, with respect to operation S310 shown in fig. 3, processing the image to be processed to obtain the target face image may further include: and processing the image to be processed to respectively obtain a face image and a face key point feature map. And fusing the facial image and the facial key point feature map to obtain a target facial image.

According to an embodiment of the present disclosure, the face image may refer to: and cutting the background area of the image to be processed to obtain the image with the reserved face object. The facial keypoint feature map may refer to: and extracting key point features of the face object from the image to be processed to obtain a feature map. The method comprises the steps of obtaining a face image and a face key point feature image from an image to be processed, fusing the face image and the face key point feature image to enable a finally obtained target face image to contain global features and face key point features of a face object, and enabling the face object in a stylized face image obtained after style conversion to be high in imaging definition and low in torsion degree of the face key point features.

According to another embodiment of the present disclosure, processing an image to be processed to obtain a face image and a face key point feature map respectively may further include: and carrying out first face detection on the image to be processed to obtain a face detection frame. A face image corresponding to the face detection frame is obtained from the image to be processed. And performing second face detection on the image to be processed to obtain a face key point feature map. For example, the face processing model may include a face detection sub-model and a face keypoint extraction sub-model. The face detection sub-model can be used for carrying out first face detection on the image to be processed to obtain a face detection frame. And performing second face detection on the image to be processed by using the face key point extraction submodel to obtain a face key point feature map.

According to the embodiment of the present disclosure, the network structure of the face detection submodel may include one or more of R-CNN (Region-conditional Neural Networks), Fast R-CNN, SPP-NET (spatial gradient in Deep conditional Networks), R-FCN (Region-based full conditional Networks), YOLO (Young Look one), SSD (Single Shell Multi Box Detector).

According to an embodiment of the present disclosure, the network structure of the face keypoint extraction submodel may include one or more of pfld (a Practical Facial Landmark detector), vgg (visual Geometry Group Networks)16, resnet (basic Neural Networks), asm (active Shape models), aam (active application models), cpr (shell position regression), dcnn (Deep relational Networks), tcnn (Task-Constrained relational depth Networks), nn (Multi-Task relational Networks), tcnn (threaded relational Networks), and data Networks.

According to the embodiment of the present disclosure, a face image corresponding to the face detection frame may be obtained from the image to be processed by means of cropping.

According to the embodiment of the disclosure, the obtaining mode of the face image and the face key point feature map is simple and efficient, and the face image and the face key point feature map are easy to execute on the terminal device, so that a basis is provided for quick execution of the image generation method provided by the embodiment of the disclosure.

According to another embodiment of the present disclosure, the fusing the face image and the face key point feature map to obtain the target face image may further include: and inputting the facial image and the facial key point feature map into the fusion function to obtain a target facial image.

According to an embodiment of the present disclosure, the fusion function may include a Concat fusion function, but is not limited thereto, and may also include an Add fusion function. Any fusion function may be used as long as it can fuse the image features of the face image and the key point features in the face key point feature map.

According to the embodiment of the disclosure, the fusion function is utilized to perform fusion, so that the image features of the face image and the key point features in the face key point feature map can be combined, the specificity between two different features can be highlighted, and further, the stylized face image after style conversion is ensured to improve the definition of the face texture, and meanwhile, the distortion of the key feature points is avoided.

According to the embodiment of the disclosure, the face key point feature map may include key point feature maps of five sense organs, hair, human face and the like. The target face image obtained based on the face image and the face key point feature map can better keep feature information of facial features, expressions and the like, so that the stylized face image subjected to style conversion can avoid the problems of texture disorder or key feature distortion.

Fig. 4B schematically shows a flow diagram of an image generation method according to another embodiment of the present disclosure.

As shown in fig. 4B, the image to be processed 410 is input into the face detection submodel 421, and a face detection frame is output. A face image 431 corresponding to the face detection frame is obtained from the image to be processed 410. The image 410 to be processed is input to the face key point extraction sub-model 422, and a face key point feature map 432 is output. The face image 431 and the face keypoint feature map 432 are input into a fusion function 490, resulting in the target face image 430. The target face image 430 is input into a portrait stylization model 440, and a stylized face image 450 is output. The image to be processed 410 is input into the global stylized model 460, and the stylized global image 470 is output. The stylized face image 450 and stylized global image 470 are fused to generate a stylized target image 480.

According to the embodiment of the present disclosure, the portrait stylized model and the global stylized model may be processing models of the same network structure, but are not limited thereto, and may also be processing models of different network structures. The portrait stylized model and the global stylized model may be the same stylized processing model, but are not limited thereto and may also be different stylized processing models.

According to an embodiment of the present disclosure, the portrait stylized model and the global stylized model may each include at least one of the following network structures: for example, generation modules in the generation countermeasure network (GAN), Pix2Pix (Image-to-Image transformation), derivative Networks of Pix2Pix, and Video-to-Video Synthesis (Video-to-Video Synthesis).

Fig. 5 schematically shows a flowchart of a training method of a portrait stylized model according to an embodiment of the present disclosure.

As shown in fig. 5, a GAN model 520 can be trained using first training samples 510, resulting in a trained GAN model 530. The first training sample may include a stylized image, an original image, and a stylized converted image that matches the original image.

According to an embodiment of the present disclosure, the style of the style image may include at least one of: cartoon style, oil painting style, wash painting style, simple drawing style and colored drawing style. The style images of the same type as the target style type may be determined as the first training sample according to the target style type of the stylized target image.

As shown in fig. 5, the pix2pix model 550 may be trained using the second training samples 540, resulting in the portrait stylized model 560.

As shown in fig. 5, the sample image 541 may be input into the generation module of the trained GAN model 530, resulting in a sample stylized image 542. A sample image 541 and a sample stylized image 542 may be taken as the second training sample 540.

According to an embodiment of the present disclosure, training the pix2pix model with the second training sample, obtaining the portrait stylized model may include: and inputting the sample image into a Pix2Pix model to obtain a stylized image. And adjusting parameters in the Pix2Pix model based on the sample stylized image and the stylized image until the training condition is met. The training conditions may include: and the style of the sample stylized image and the stylized image approaches, the loss value converges, and the parameter adjusting turn reaches one or more of a preset turn and the like.

According to the embodiment of the disclosure, a Pix2Pix model may be used as an initial model to train to obtain a global stylized model in a manner as shown in fig. 5. But is not limited thereto. The portrait stylized model can also be directly used as a global stylized model.

According to an embodiment of the present disclosure, inputting the sample image into a generation module of the trained GAN model, and obtaining the sample stylized image may include: and inputting the sample image into a generation module of the trained GAN model to obtain an initial sample stylized image. And screening the initial sample stylized image, and taking the initial sample stylized image as a sample stylized image under the condition that the initial sample stylized image meets the style conversion condition. The style transition condition may include: the image texture definition is larger than a preset definition threshold, or the key feature torsion degree is lower than a preset torsion degree threshold, and the like.

According to the embodiment of the disclosure, the second training samples are generated by using the trained GAN model, so that the number of the second training samples is large, and the robustness of the portrait stylized model and the global stylized model is improved. In addition, through a screening mode, the second training sample meets the style conversion condition, and the portrait stylized model and the global stylized model are good in style conversion effect and stable.

According to another embodiment of the present disclosure, as shown in operation S330 of fig. 3, fusing the stylized face image and the stylized global image to generate a stylized target image, may further include: a replacement region, a fusion region, and a general region are determined from the stylized global image based on a face detection box corresponding to the face image. And fusing the image corresponding to the fusion area in the stylized face image and the image of the fusion area in the stylized global image to obtain a target fusion area image. Generating a stylized target image based on the target fused region image, the image corresponding to the replacement region in the stylized face image, and the image of the common region in the stylized global image.

According to an embodiment of the present disclosure, a replacement region may refer to a region in a stylized global image to be replaced by a stylized facial image. An image corresponding to the replacement region in the stylized face image may be determined based on the position coordinate information of the face detection frame as a reference.

According to an embodiment of the present disclosure, fusing an image corresponding to a fusion region in a stylized face image and an image of the fusion region in a stylized global image to obtain a target fusion region image may include: and determining first pixel point information corresponding to the fusion area in the stylized face image. And determining second pixel point information of the fusion area in the stylized global image. And obtaining a target fusion area image based on the first pixel point information and the second pixel point information.

According to an embodiment of the present disclosure, the target fusion region image may be obtained using formula (1).

Target _ res ═ mask _ face _ res + (1-mask) ori _ res; formula (1)

The method comprises the steps that Target _ res represents any M pixel point information in a Target fusion area image, mask represents fusion weight (a numerical value between 0 and 1), face _ res represents first pixel point information corresponding to M pixel points in a stylized face image, and ori _ res represents second pixel point information corresponding to M pixel points in a stylized global image.

According to the embodiment of the disclosure, the fusion region is designed, the image corresponding to the fusion region in the stylized face image and the image of the fusion region in the stylized global image are fused to obtain the target fusion region image, and the connection between the stylized face image and the stylized global image in the stylized target image can be natural and real.

According to an embodiment of the present disclosure, "determining a replacement region, a fusion region, and a general region from a stylized global image based on a face detection frame corresponding to a face image" is one way to achieve "dynamic image fusion of an image corresponding to the fusion region in the stylized face image and an image of the fusion region in the stylized global image to obtain a target fusion region image". By utilizing the dynamic fusion mode of the images, the artificial synthetic traces can be reduced on the premise of improving the fidelity of style conversion.

According to an embodiment of the present disclosure, image dynamic fusion may refer to: the size of the fusion region is dynamically adjusted according to the region of the face image, i.e., the face detection frame corresponding to the face image.

According to an embodiment of the present disclosure, determining the replacement region, the fusion region, and the general region from the stylized global image based on the face detection frame corresponding to the face image may further include: based on a face detection frame corresponding to the face image, the size of the face detection frame is determined. Based on the size of the face detection frame, a fusion size is determined. A replacement region, a fused region, and a generic region are determined from the stylized global image based on the face detection box and the fused size.

For example, the face detection frame is a rectangular detection frame, and the face detection frame may be an outer boundary of the fusion region. The inner boundary of the fusion region is determined according to the peripheral size of the face detection frame. For example, the fusion size may include: the inner and outer boundaries of the blend zone are laterally separated by a predetermined length and the inner and outer boundaries of the blend zone are longitudinally separated by a predetermined width. The predetermined length is a predetermined ratio of the length of the face detection frame. The predetermined width is a width of the face detection frame in a predetermined ratio. The predetermined ratio of length and the predetermined ratio of width may be the same, e.g., 15% of each, or may be different. Can be adjusted according to actual conditions.

Fig. 6 schematically shows a schematic diagram of an image generation method according to another embodiment of the present disclosure.

As shown in fig. 6, the image to be processed 610 may be input into the face detection sub-model, and a face detection box 611, such as a dashed box in the image to be processed 610 shown in fig. 6, may be output. Based on the face detection block 611, the outer boundary 621 of the fused region in the stylistic global image 620 is determined. And determines a fusion size, such as a predetermined length and a predetermined width between the outer boundary 621 and the inner boundary 622, based on the size of the outer boundary 621, thereby determining the inner boundary 622. And determines the region between the inner boundary 622 and the outer boundary 621 as a fusion region 623, such as the region between two dashed boxes in the style global image 620 shown in fig. 6. The area within inner boundary 612 in stylized global image 620 is determined as replacement area 624 for stylized global image 620. The area outside the outer boundary 621 in the stylized global image 620 is determined to be the universal area 625. The inner boundary 631 is determined from the stylized face image 630, with the face detection frame 611 as a reference. The region other than the inner boundary 631 in the stylized face image 630 is taken as the fusion region 632. The area within the inner boundary 631 in the stylized face image 630 is taken as the replacement area 633. The image corresponding to the fusion region 632 in the stylized face image 630 and the image of the fusion region 623 in the stylized global image 620 will be fused, resulting in the target fusion region image 641. The image in the replacement region 624 in the stylized global image 620 is replaced with the image in the stylized face image 630 that corresponds to the replacement region 633. The images in the generic area 625 in the stylized global image 620 are retained. That is, the stylized target image 640 is generated based on the target fusion area image 641, the image corresponding to the replacement area 633 in the stylized face image 630, and the image of the common area 625 in the stylized global image 630.

According to an embodiment of the present disclosure, before performing the operation of processing the image to be processed to obtain the target face image as shown in fig. 3, the image generation method may further include: performing face detection on an image to be processed to obtain a face detection result, wherein the face detection comprises at least one of the following items: first surface detection and second surface detection. In the case where it is determined that the image to be processed includes a face object based on the face detection result, an operation of processing the image to be processed to obtain a target face image is performed. In a case where it is determined that the face object is not included in the image to be processed based on the face detection result, performing a style conversion on the image to be processed, generating a stylized global image, and regarding the stylized global image as a stylized target image.

Fig. 7 schematically shows a flow chart of an image generation method according to another embodiment of the present disclosure.

As shown in fig. 7, the method includes operations S710 to S760.

In operation S710, a face detection is performed on an image to be processed, resulting in a face detection result.

According to an embodiment of the present disclosure, the face detection includes at least one of: first surface detection and second surface detection.

In operation S720, it is determined whether a face object is included in the image to be processed based on the face detection result. In the case where it is determined that the face object is included in the image to be processed based on the face detection result, operations S730 to S750 are performed. In the case where it is determined that the face object is not included in the image to be processed based on the face detection result, operation S760 is performed.

In operation S730, the image to be processed is processed to obtain a target face image.

In operation S740, the target face image and the image to be processed are subjected to a style conversion, and a stylized face image and a stylized global image are generated, respectively.

In operation S750, the stylized face image and the stylized global image are fused to generate a stylized target image.

In operation S760, the image to be processed is subjected to style conversion, a stylized global image is generated, and the stylized global image is used as a stylized target image.

According to an embodiment of the present disclosure, for operation S760, the method may further include: and inputting the image to be processed into the global stylized model to obtain a stylized global image.

According to the embodiment of the disclosure, in the case that it is determined that the image to be processed does not include the face object, the image to be processed may be directly subjected to style conversion, for example, the image to be processed is input into the global stylized model, a stylized global image is generated, and the stylized global image is taken as a stylized target image. Therefore, the style conversion effect of the stylized target image is ensured, the processing is simplified, and the efficiency is improved.

Fig. 8 schematically shows a block diagram of an image generation apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the image generation apparatus 800 includes: a processing module 810, a conversion module 820, and a generation module 830.

And the processing module 810 is configured to process the image to be processed to obtain a target face image.

And a conversion module 820, configured to perform style conversion on the target face image and the image to be processed, and generate a stylized face image and a stylized global image, respectively.

And the generating module 830 is configured to fuse the stylized face image and the stylized global image to generate a stylized target image.

According to an embodiment of the present disclosure, a processing module includes: a processing unit and a first fusion unit.

And the processing unit is used for processing the image to be processed to respectively obtain the face image and the face key point feature map.

And the first fusion unit is used for fusing the face image and the face key point feature map to obtain a target face image.

According to an embodiment of the present disclosure, the generating module includes: the device comprises a determining unit, a second fusing unit and a generating unit.

A determination unit configured to determine a replacement region, a fusion region, and a general region from the stylized global image based on a face detection frame corresponding to the face image.

And the second fusion unit is used for fusing the image corresponding to the fusion area in the stylized face image and the image of the fusion area in the stylized global image to obtain a target fusion area image.

A generating unit configured to generate a stylized target image based on the target fusion region image, the image corresponding to the replacement region in the stylized face image, and the image of the general region in the stylized global image.

According to an embodiment of the present disclosure, the second fusion unit includes: a first determining subunit, a second determining subunit, and a first fusing subunit.

And the first determining subunit is used for determining first pixel point information corresponding to the fusion area in the stylized face image.

And the second determining subunit is used for determining second pixel point information of the fusion area in the stylized global image.

And the first fusion subunit is used for obtaining a target fusion area image based on the first pixel point information and the second pixel point information.

According to an embodiment of the present disclosure, the determining unit includes: a third determining subunit, a fourth determining subunit, and a fifth determining subunit.

A third determining subunit operable to determine a size of the face detection frame based on the face detection frame corresponding to the face image.

A fourth determining subunit operable to determine the fusion size based on the size of the face detection frame.

A fifth determining subunit operable to determine a replacement region, a fusion region, and a general region from the stylized global image based on the face detection frame and the fusion size.

According to an embodiment of the present disclosure, the first fusion unit includes: a second fusion subunit.

And the second fusion subunit is used for inputting the face image and the face key point feature map into the fusion function to obtain a target face image.

According to an embodiment of the present disclosure, a processing unit includes: the device comprises a first detection subunit, a sixth determination subunit and a second detection subunit.

And the first detection subunit is used for carrying out first face detection on the image to be processed to obtain a face detection frame.

And a sixth determining subunit for obtaining a face image corresponding to the face detection frame from the image to be processed.

And the second detection subunit is used for performing second face detection on the image to be processed to obtain a face key point feature map.

According to an embodiment of the present disclosure, the image generation apparatus further includes: the system comprises a detection module, a face processing module and a general processing module.

The detection module is used for carrying out face detection on the image to be processed to obtain a face detection result, wherein the face detection comprises at least one of the following items: first surface detection and second surface detection.

And the face processing module is used for executing the operation of processing the image to be processed to obtain the target face image under the condition that the image to be processed contains the face object based on the face detection result.

And the general processing module is used for performing style conversion on the image to be processed under the condition that the image to be processed does not contain the face object based on the face detection result, generating a stylized global image and taking the stylized global image as a stylized target image.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as in an embodiment of the disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as an image generation method. For example, in some embodiments, the image generation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the image generation method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the image generation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image generation method, comprising:

processing the image to be processed to obtain a target face image;

carrying out style conversion on the target face image and the image to be processed to respectively generate a stylized face image and a stylized global image; and

and fusing the stylized face image and the stylized global image to generate a stylized target image.

2. The method of claim 1, wherein the processing the image to be processed to obtain the target facial image comprises:

processing the image to be processed to respectively obtain a face image and a face key point feature map; and

and fusing the facial image and the facial key point feature map to obtain the target facial image.

3. The method of claim 1, wherein said fusing the stylized face image and the stylized global image to generate a stylized target image comprises:

determining a replacement region, a fusion region, and a general region from the stylized global image based on a face detection box corresponding to the face image;

fusing an image corresponding to the fusion region in the stylized face image and an image of the fusion region in the stylized global image to obtain a target fusion region image; and

generating the stylized target image based on the target fused region image, the image of the stylized face image corresponding to the alternate region, and the image of the generic region in the stylized global image.

4. The method of claim 3, wherein the fusing the image of the stylized facial image corresponding to the fusion region with the image of the fusion region in the stylized global image to obtain a target fusion region image comprises:

determining first pixel point information corresponding to the fusion area in the stylized face image;

determining second pixel point information of the fusion region in the stylized global image; and

and obtaining the target fusion area image based on the first pixel point information and the second pixel point information.

5. The method of claim 3 or 4, wherein the determining a replacement region, a fused region, and a generic region from the stylized global image based on a face detection box corresponding to the facial image comprises:

determining a size of the face detection frame based on a face detection frame corresponding to the face image;

determining a fusion size based on the size of the face detection frame; and

determining the replacement region, the fused region, and the generic region from the stylized global image based on the face detection box and the fused size.

6. The method of claim 2, wherein the fusing the face image and the face keypoint feature map to obtain the target face image comprises:

and inputting the facial image and the facial key point feature map into a fusion function to obtain the target facial image.

7. The method according to claim 2, wherein the processing the image to be processed to obtain a face image and a face key point feature map respectively comprises:

performing first face detection on the image to be processed to obtain a face detection frame;

obtaining the face image corresponding to the face detection frame from the image to be processed; and

and carrying out second face detection on the image to be processed to obtain the face key point feature map.

8. The method of claim 7, further comprising:

performing face detection on the image to be processed to obtain a face detection result, wherein the face detection includes at least one of the following: the first face detection, the second face detection;

executing the operation of processing the image to be processed to obtain a target face image under the condition that the image to be processed contains the face object based on the face detection result; and

and under the condition that the image to be processed does not contain the face object based on the face detection result, performing style conversion on the image to be processed, generating the stylized global image, and taking the stylized global image as the stylized target image.

9. An image generation apparatus comprising:

the processing module is used for processing the image to be processed to obtain a target face image;

the conversion module is used for carrying out style conversion on the target face image and the image to be processed to respectively generate a stylized face image and a stylized global image; and

and the generating module is used for fusing the stylized face image and the stylized global image to generate a stylized target image.

10. The apparatus of claim 9, wherein the processing module comprises:

the processing unit is used for processing the image to be processed to respectively obtain a face image and a face key point feature map; and

and the first fusion unit is used for fusing the facial image and the facial key point feature map to obtain the target facial image.

11. The apparatus of claim 9, wherein the generating means comprises:

a determination unit configured to determine a replacement region, a fusion region, and a general region from the stylized global image based on a face detection frame corresponding to the face image;

the second fusion unit is used for fusing an image corresponding to the fusion area in the stylized face image and an image of the fusion area in the stylized global image to obtain a target fusion area image; and

a generating unit configured to generate the stylized target image based on the target fusion region image, the image corresponding to the replacement region in the stylized face image, and the image of the general region in the stylized global image.

12. The apparatus of claim 11, wherein the second fusion unit comprises:

a first determining subunit, configured to determine first pixel point information corresponding to the fusion region in the stylized face image;

the second determining subunit is configured to determine second pixel point information of the fusion region in the stylized global image; and

and the first fusion subunit is used for obtaining the target fusion area image based on the first pixel point information and the second pixel point information.

13. The apparatus according to claim 11 or 12, wherein the determining unit comprises:

a third determination subunit operable to determine a size of a face detection frame corresponding to the face image, based on the face detection frame;

a fourth determining subunit operable to determine a fusion size based on the size of the face detection frame; and

a fifth determining subunit, configured to determine the replacement region, the fusion region, and the general region from the stylized global image based on the face detection frame and the fusion size.

14. The apparatus of claim 10, wherein the first fusion unit comprises:

and the second fusion subunit is used for inputting the face image and the face key point feature map into a fusion function to obtain the target face image.

15. The apparatus of claim 10, wherein the processing unit comprises:

the first detection subunit is used for carrying out first face detection on the image to be processed to obtain a face detection frame;

a sixth determining subunit configured to obtain the face image corresponding to the face detection frame from the image to be processed; and

and the second detection subunit is used for performing second face detection on the image to be processed to obtain the face key point feature map.

16. The apparatus of claim 15, further comprising:

a detection module, configured to perform face detection on the image to be processed to obtain a face detection result, where the face detection includes at least one of: the first face detection, the second face detection;

the face processing module is used for executing the operation of processing the image to be processed to obtain a target face image under the condition that the image to be processed contains the face object based on the face detection result; and

and the general processing module is used for performing style conversion on the image to be processed under the condition that the image to be processed does not contain the face object based on the face detection result, generating the stylized global image and taking the stylized global image as the stylized target image.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.