CN115222895B

CN115222895B - Image generation method, device, equipment and storage medium

Info

Publication number: CN115222895B
Application number: CN202211052349.0A
Authority: CN
Inventors: 郭汉奇; 刘家铭; 胡天舒
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2023-06-27
Anticipated expiration: 2042-08-30
Also published as: CN115222895A

Abstract

The disclosure provides an image generation method, an image generation device, an image storage medium and an image generation program product, relates to the technical field of artificial intelligence, and in particular relates to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like. One embodiment of the method comprises the following steps: acquiring a template character image and a three-dimensional virtual image model; projecting the three-dimensional avatar model to a two-dimensional plane to generate an avatar image consistent with the head pose of the template character image; calculating a color matching image of the head of the template character image and the virtual image; fusing the head of the virtual image to the template character image to obtain a fused image and a fused region mask; an avatar character image is generated based on the color matching image and the fusion image. This embodiment enhances the naturalness of the avatar character image.

Description

Image generation method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, virtual digital people and the like.

Background

In recent years, with the rapid development of computer vision technology based on artificial intelligence, three-dimensional virtual figures are driven according to real figures, so that brand new real figures virtual figures are generated, and the method has a wide application prospect in various fields.

The three-dimensional avatar head fusion technique refers to that a template character image and a specified three-dimensional avatar are given, and a new avatar is generated through a specific algorithm, so that the new avatar is consistent with the head appearance of the three-dimensional avatar, and the body and actions are consistent with the template character.

Disclosure of Invention

The embodiment of the disclosure provides an image generation method, an image generation device, a storage medium and a program product.

In a first aspect, an embodiment of the present disclosure provides an image generating method, including: acquiring a template character image and a three-dimensional virtual image model; projecting the three-dimensional avatar model to a two-dimensional plane to generate an avatar image consistent with the head pose of the template character image; calculating a color matching image of the head of the template character image and the virtual image; fusing the head of the virtual image to the template character image to obtain a fused image; an avatar character image is generated based on the color matching image and the fusion image.

In a second aspect, an embodiment of the present disclosure proposes an image generating apparatus including: an acquisition module configured to acquire a template character image and a three-dimensional avatar model; a projection module configured to project the three-dimensional avatar model onto a two-dimensional plane, generating an avatar image consistent with the head pose of the template character image; a calculation module configured to calculate a color matching image of the template character image and the head of the avatar image; the fusion module is configured to fuse the head of the virtual image to the template character image to obtain a fused image; and a generation module configured to generate an avatar character image based on the color matching image and the fusion image.

In a third aspect, an embodiment of the present disclosure proposes an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as described in any one of the implementations of the first aspect.

In a fifth aspect, embodiments of the present disclosure propose a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

According to the image generation method provided by the embodiment of the invention, the virtual character images with consistent skin colors are generated through similar characteristic color matching, so that the naturalness of the virtual character images is improved. On the one hand, no violations and feelings exist between the head and the body actions; on the other hand, the virtual figure is natural and has no hard trace with the background. And, the avatar character image can be rapidly generated, with low cost and high efficiency. The virtual character image is conveniently and largely expanded, and the application scene is wider. The method can be applied to three-dimensional image digital virtual man products, can also generate corresponding virtual man according to requirements, and is applied to scenes such as advertisement, movie production and the like.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings. The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of one embodiment of an image generation method according to the present disclosure;

FIG. 2 is a flow chart of yet another embodiment of an image generation method according to the present disclosure;

FIG. 3 is an overall flow of an algorithm of the image generation method;

FIG. 4 is a flow of dynamic capture rendering in FIG. 3;

FIG. 5 is a fusion repair flow of FIG. 3;

FIG. 6 is an enhanced flow of FIG. 3;

FIG. 7 is a schematic diagram of a structure of one embodiment of an image generation apparatus according to the present disclosure;

fig. 8 is a block diagram of an electronic device for implementing an image generation method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates a flow 100 of one embodiment of an image generation method according to the present disclosure. The image generation method comprises the following steps:

step 101, a template character image and a three-dimensional avatar model are acquired.

In this embodiment, the execution subject of the image generation method may acquire a template character image and a three-dimensional avatar model.

The template person image is usually an image obtained by photographing a real person, and the image includes the head and body of the real person. Three-dimensional avatar models are typically three-dimensional models that create a digitized avatar that approximates a human avatar by computer graphics techniques.

Step 102, projecting the three-dimensional avatar model to a two-dimensional plane to generate an avatar image consistent with the head pose of the template character image.

In this embodiment, the execution body may project the three-dimensional avatar model onto a two-dimensional plane, generating an avatar image in conformity with the head pose of the template character image.

Generally, the three-dimensional avatar model may be head-driven based on the template character image using a generative head driving technique to generate an avatar image conforming to the head pose of the template character image. Specifically, according to the relevant information of the template character in the template character image, relevant parameters of the head gesture are generated for the three-dimensional virtual image model to render a two-dimensional virtual image under the same gesture. For example, the three-dimensional avatar model is rotated according to the head pose of the template person in the template person image until the three-dimensional avatar model is identical to the head pose of the template person in the template person image, and projected onto a two-dimensional plane, thereby obtaining an avatar image consistent with the head pose of the template person image.

Step 103, calculating a color matching image of the head of the template character image and the avatar image.

In this embodiment, the execution subject may calculate a color matching image of the head of the template character image and the avatar image. Wherein the color matching image may be used to record a target color of the head of the avatar character in the avatar character image to be generated.

Typically, the color similarity of each pixel of the head of the template character image and the avatar image is calculated first to generate a similarity matrix; and then multiplying the avatar image by the similarity matrix to obtain a color matching image.

And 104, fusing the head of the avatar image to the template character image to obtain a fused image.

In this embodiment, the execution subject may fuse the head of the avatar image to the template character image to obtain a fused image.

Generally, the head of the avatar image may be fused to the body of the template character image using a fusion technique. Specifically, the head of the avatar image is overlaid on the head of the template character image, and a fusion image is obtained.

Step 105, generating an avatar character image based on the color matching image and the fusion image.

In this embodiment, the execution subject may generate the avatar character image based on the color matching image and the fusion image.

Typically, the color of the head in the fused image is migrated to the color matching image to produce an avatar character image having consistent body and head skin tone.

With continued reference to fig. 2, a flow 200 of yet another embodiment of an image generation method according to the present disclosure is shown. The image generation method comprises the following steps:

step 201, a template character image and a three-dimensional avatar model are acquired.

In this embodiment, the specific operation of step 201 is described in detail in step 101 in the embodiment shown in fig. 1, and will not be described herein.

Step 202, head keypoints in the template person image are detected.

In this embodiment, the execution subject of the image generation method may detect a head keypoint in the template person image.

In general, head keypoints may be detected by a deep learning method or a conventional image processing method. The accuracy of the head key points detected by the deep learning method is higher. The specific steps can be as follows:

first, a template person image is input to a head detection model, and head position information is obtained.

Wherein the head detection model may be used to detect head position.

Then, based on the head position information, a template head image is segmented from the template person image.

And finally, inputting the head image of the template into a key point detection model to obtain head key points.

Wherein a keypoint detection model can be used to detect head keypoints.

And 203, calculating the distance between the head key points and the corresponding points of the three-dimensional virtual image model, and determining transformation parameters.

In this embodiment, the execution body may calculate a distance between a head key point and a corresponding point of the three-dimensional avatar model, and determine the transformation parameters. Wherein the transformation parameters are parameters transformed from corresponding points of the three-dimensional avatar model to head key points.

In general, according to a mapping relationship between head key points and a three-dimensional avatar model topology, transformation parameters are searched for by an optimization method such that distances between corresponding points are minimized, and the transformation parameters at this time are acquired.

And 204, projecting the three-dimensional avatar model to a two-dimensional plane according to the transformation parameters to generate an avatar image.

In this embodiment, the execution body may project the three-dimensional avatar model to the two-dimensional plane according to the transformation parameters, and generate the avatar image.

Generally, the three-dimensional avatar model is transformed according to transformation parameters, i.e., projected onto a two-dimensional plane, to generate an avatar image conforming to the head pose of the template character image. By calculating the distance between the corresponding points and selecting the transformation parameter with the smallest distance to transform the three-dimensional avatar model, an avatar image consistent with the head posture of the template character image can be quickly generated.

Step 205, extracting template person head features from the template person image and extracting avatar head features from the avatar image.

In this embodiment, the execution subject may extract the template character head features from the template character image and the avatar head features from the avatar image. Wherein the template person's head features may be used to characterize the characteristics of the template person's head. The avatar header characteristics may be used to characterize characteristics that the head of the avatar has.

In general, template character head features and avatar head features may be extracted by a deep learning method. And, the degree of accuracy of the characteristic that deep learning method draws is higher. The specific steps can be as follows:

first, head segmentation is performed on a template character image and an avatar image, respectively, to obtain a template character head mask and an avatar head mask.

Wherein the template character head mask corresponds to the points on the template character image one by one. For the template person head mask, the value of a point corresponding to the template person head in the template person image is 1, and the value of a point corresponding to a portion other than the template person head in the template person image is 0. Similarly, the avatar head mask corresponds to points on the avatar image one by one. For the avatar header mask, the value of a point corresponding to an avatar header in the avatar image is 1, and the value of a point corresponding to the other part except for the avatar header in the avatar image is 0.

Then, feature extraction is performed on the template character image and the avatar image, respectively, to obtain template character features and avatar features.

In general, the template character image and the avatar image are respectively input to a convolutional neural network to perform feature extraction, i.e., the template character feature and the avatar feature can be output.

Finally, the template character head mask is utilized to filter the template character features to obtain template character head features, and the avatar head mask is utilized to filter the avatar features to obtain avatar head features.

Specifically, the template person head features may be retained by multiplying the template person features with the template person head mask, while other features than the template person head features are set to 0. Similarly, the avatar header characteristics may be preserved by multiplying the avatar characteristics with the avatar header mask, and the other characteristics except for the avatar header characteristics are set to 0.

Step 206, calculating a similarity matrix of the template character head features and the avatar head features.

In this embodiment, the execution subject may calculate a similarity matrix of the template character head features and the avatar head features. Wherein elements in the similarity matrix may characterize color similarity of corresponding pixels of the template person image and the head in the avatar image.

In general, the similarity between the features of each point of the template character head and the features of the corresponding point of the avatar head is calculated to obtain a similarity matrix.

Step 207, multiplying the avatar image with the similarity matrix to obtain a color matching image.

In this embodiment, the execution body may multiply the avatar image with the similarity matrix to obtain the color matching image. Wherein the color matching image may be used to record a target color of the head of the avatar character in the avatar character image to be generated.

And step 208, fusing the head of the avatar image to the template character image to obtain a fused image and a fused region mask.

In this embodiment, the executing body may fuse the head of the avatar image to the template character image to obtain a fused image and a fused region mask.

Generally, the head of the avatar image is covered on the head of the template character image, and then the fusion part is corroded and swelled to obtain the fusion image. In addition, the value of the point of the corrosion expansion area is set to 1, and the value of the point of the other areas except the corrosion expansion area is set to 0, so that the fusion area mask can be obtained.

And 209, repairing the fusion image based on the color matching image and the fusion area mask to generate an avatar character image.

In this embodiment, the execution subject may repair the fused image based on the color matching image and the fused region mask, and generate the avatar character image.

In general, the color of the head in the fusion image is migrated to the color matching image, and meanwhile, the color of the fusion area corresponding to the fusion area mask is processed based on the color matching image, so that the virtual character image with consistent body and head skin colors can be generated, and the fusion edge is more natural and the skin colors are more consistent.

In general, the avatar character image may be extracted by a deep learning method. And, the naturalness of the image generated by the deep learning method is higher. Specifically, the color matching image, the fusion area mask and the fusion image are input into a generating convolution network to obtain the avatar character image. Wherein the generating convolutional network may be a network of encoder-decoder structures.

At step 210, feature decomposition is performed on the avatar character image and the avatar image, respectively, to obtain first features and second features of the avatar character image and the avatar image.

In this embodiment, the executing body may perform feature decomposition on the avatar character image and the avatar image, respectively, to obtain first features and second features of the avatar character image and the avatar image. Wherein the first feature belongs to a high frequency feature, which may be used to characterize texture, and the second feature belongs to a low frequency feature, which may be used to characterize shape and color.

In general, characteristics at different granularities of the avatar character image and the avatar image can be calculated separately through laplace decomposition.

Step 211, performing inverse decomposition transformation on the combination of the second feature of the avatar image and the first feature of the avatar image to obtain the target avatar image.

In this embodiment, the executing body may perform inverse transformation of decomposing the combination of the second feature of the avatar character image and the first feature of the avatar image to obtain the target avatar character image.

The avatar character image is generally a low resolution image with serious loss of detail due to data and computational limitations. Therefore, after the character image of the virtual character and the character image of the virtual character are decomposed by the Laplace, the low-frequency character of the character image of the virtual character and the low-frequency character of the character image of the virtual character are combined through the inverse transformation of the decomposition, so that the generated character image of the target virtual character has richer details, higher resolution and higher definition.

As can be seen from fig. 2, the flow 200 of the image generation method in this embodiment highlights the generation steps of the avatar image, the color matching image, and the avatar character image, and adds the image enhancement step, as compared to the corresponding embodiment of fig. 1. Therefore, the scheme described in the embodiment can quickly generate the avatar image consistent with the head gesture of the template character image by calculating the distance between the corresponding points and selecting the transformation parameter with the minimum distance to transform the three-dimensional avatar model. And the color processing is carried out on the fusion area, so that the fusion edge is more natural and the skin color is more consistent. After the character of the virtual character image and the character of the virtual character image are decomposed by the Laplace, the low-frequency character of the character image of the virtual character and the low-frequency character of the image of the virtual character are combined by the inverse transformation of the decomposition, so that the generated character image of the target virtual character has richer details, higher resolution and higher definition.

For ease of understanding, fig. 3 shows the overall flow of the algorithm of the image generation method. As shown in fig. 3, the image generation method mainly consists of three major departments: dynamic capture rendering, fusion repair and enhancement. And performing dynamic capturing on the model person image, and rendering the three-dimensional virtual image model based on the dynamic capturing result to obtain the virtual image. And fusing the template character image and the virtual image, and sequentially repairing and enhancing the fusion result to obtain the target virtual image character image.

Wherein fig. 4 shows the dynamic capture rendering flow in fig. 3. As shown in fig. 4, head detection and key point detection are sequentially performed on the template character image, and fitting optimization is performed on the key point detection result and the three-dimensional avatar model to obtain rendering parameters.

Wherein fig. 5 shows the fusion repair flow in fig. 3. As shown in fig. 4, the template character image and the avatar image are subjected to head segmentation, respectively, to obtain a template character head mask and an avatar head mask. And respectively inputting the template character image and the virtual image into a feature extraction network to perform feature extraction to obtain the template character feature and the virtual image feature. Filtering the template character features by using the template character head mask to obtain template character head features, and filtering the avatar features by using the avatar head mask to obtain avatar head features. And carrying out similarity calculation on the head features of the template characters and the head features of the avatar to obtain a similarity matrix. And performing color transformation on the avatar image by using the similarity matrix to obtain a color matching image. And inputting the fusion image, the color matching image and the fusion area mask of the avatar image and the template character image into a repairing network to obtain the avatar character image.

Wherein fig. 6 shows the enhancement flow in fig. 3. As shown in fig. 6, the avatar image and the avatar character image are respectively subjected to laplace decomposition to obtain components one, two, three, four, five and six. Wherein the first component, the second component and the third component are low frequency components. The fourth component, the fifth component and the sixth component are high frequency components. Combining the fourth component, the fifth component and the sixth component of the avatar image with the first component, the second component and the third component of the avatar character image, and then performing inverse decomposition transformation to obtain the target avatar character image.

With further reference to fig. 7, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an image generating apparatus, which corresponds to the method embodiment shown in fig. 1, and which is particularly applicable to various electronic devices.

As shown in fig. 7, the image generating apparatus 700 of the present embodiment may include: an acquisition module 701, a projection module 702, a calculation module 703, a fusion module 704 and a repair module 705. Wherein the acquiring module 701 is configured to acquire a template character image and a three-dimensional avatar model; a projection module 702 configured to project the three-dimensional avatar model onto a two-dimensional plane, generating an avatar image consistent with the head pose of the template character image; a computing module 703 configured to compute a color matching image of the template character image and the head of the avatar image; a fusion module 704 configured to fuse the head of the avatar image to the template character image to obtain a fused image; the generation module 705 is configured to generate an avatar character image based on the color matching image and the fusion image.

In the present embodiment, in the image generating apparatus 700: the specific processing of the obtaining module 701, the projecting module 702, the calculating module 703, the fusing module 704 and the repairing module 705 and the technical effects thereof may refer to the relevant descriptions of steps 101-105 in the corresponding embodiment of fig. 1, and are not repeated herein.

In some alternative implementations of the present embodiment, the projection module 702 includes: a detection sub-module configured to detect head keypoints in the template person image; a determining sub-module configured to calculate a distance between a head key point and a corresponding point of the three-dimensional avatar model, and determine a transformation parameter; and a projection sub-module configured to project the three-dimensional avatar model to a two-dimensional plane according to the transformation parameters, generating an avatar image.

In some optional implementations of the present embodiment, the detection submodule is further configured to: inputting the template character image into a head detection model to obtain head position information; segmenting a template head image from the template character image based on the head position information; and inputting the template head image into a key point detection model to obtain head key points.

In some alternative implementations of the present embodiment, the computing module 703 includes: an extraction sub-module configured to extract template character head features from the template character image and to extract avatar head features from the avatar image; a computing sub-module configured to compute a similarity matrix of template persona head features and avatar head features, wherein elements in the similarity matrix characterize color similarity of corresponding pixel points of the template persona image and the head in the avatar image; and the multiplication submodule is configured to multiply the avatar image with the similarity matrix to obtain a color matching image.

In some optional implementations of the present embodiment, the extraction submodule is further configured to: head segmentation is respectively carried out on the template character image and the virtual image to obtain a template character head mask and a virtual image head mask; respectively extracting features of the template character image and the virtual image to obtain template character features and virtual image features; filtering the template character features by using the template character head mask to obtain template character head features, and filtering the avatar features by using the avatar head mask to obtain avatar head features.

In some alternative implementations of the present embodiment, the generating module 705 includes: an acquisition sub-module configured to acquire a fusion area mask; and the restoration submodule is configured to restore the fusion image based on the color matching image and the fusion area mask to generate an avatar character image.

In some optional implementations of the present embodiment, the repair submodule is further configured to: and inputting the color matching image, the fusion area mask and the fusion image into a generating convolution network to obtain the avatar character image.

In some optional implementations of the present embodiment, the image generating apparatus 700 further includes: the decomposition module is configured to perform feature decomposition on the avatar character image and the three-dimensional avatar model respectively to obtain first features and second features of the avatar character image and the three-dimensional avatar model, wherein the first feature components are used for representing textures, and the second features are used for representing shapes and colors; and the transformation module is configured to perform inverse transformation of decomposing the second characteristic combination of the avatar character image and the first characteristic combination of the three-dimensional avatar model to obtain a target avatar character image.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, for example, an image generation method. For example, in some embodiments, the image generation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the image generation method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image generation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image generation method, comprising:

acquiring a template character image and a three-dimensional virtual image model;

projecting the three-dimensional avatar model to a two-dimensional plane to generate an avatar image consistent with the head gesture of the template figure image, wherein according to the mapping relation between head key points in the template figure image and the three-dimensional avatar model topological structure, searching transformation parameters by an optimization method, and selecting the transformation parameter with the minimum distance between corresponding points to transform the three-dimensional avatar model to generate the avatar image;

Calculating a color matching image of the template person image and the head of the avatar image;

fusing the head of the virtual image to the template character image to obtain a fused image and a fused region mask;

generating an avatar character image based on the color matching image and the fusion image;

respectively carrying out feature decomposition on the avatar character image and the avatar image to obtain first features and second features of the avatar character image and the avatar image, wherein the first features are used for representing textures, and the second features are used for representing shapes and colors;

and carrying out inverse transformation of decomposition on the second characteristic combination of the avatar image and the first characteristic combination of the avatar image to obtain a target avatar image.

2. The method of claim 1, wherein the projecting the three-dimensional avatar model onto a two-dimensional plane to generate an avatar image that conforms to the head pose of the template character image comprises:

detecting head key points in the template character image;

calculating the distance between the head key points and the corresponding points of the three-dimensional virtual image model, and determining transformation parameters;

And projecting the three-dimensional avatar model to a two-dimensional plane according to the transformation parameters, and generating the avatar image.

3. The method of claim 2, wherein the detecting head keypoints in the template person image comprises:

inputting the template character image into a head detection model to obtain head position information;

segmenting a template head image from the template character image based on the head position information;

and inputting the template head image into a key point detection model to obtain the head key point.

4. The method of claim 1, the computing a color matching image of the template character image and the head of the avatar image, comprising:

extracting template character head features from the template character image, and extracting avatar head features from the avatar image;

calculating a similarity matrix of the template character head features and the avatar head features, wherein elements in the similarity matrix represent color similarity of corresponding pixel points of the template character image and the head in the avatar image;

multiplying the avatar image with the similarity matrix to obtain the color matching image.

5. The method of claim 4, the extracting template character head features from the template character image and extracting avatar head features from the avatar image comprising:

head segmentation is respectively carried out on the template character image and the virtual image to obtain a template character head mask and a virtual image head mask;

respectively extracting features of the template character image and the virtual image to obtain template character features and virtual image features;

and filtering the template character features by using the template character head mask to obtain template character head features, and filtering the avatar features by using the avatar head mask to obtain avatar head features.

6. The method of claim 1, wherein the generating an avatar character image based on the color matching image and the fused image comprises:

acquiring a fusion area mask;

and repairing the fusion image based on the color matching image and the fusion area mask to generate an avatar character image.

7. The method of claim 6, wherein the repairing the fused image based on the color matching image and the fused region mask to generate an avatar character image comprises:

And inputting the color matching image, the fusion area mask and the fusion image into a generating convolution network to obtain the avatar character image.

8. An image generating apparatus comprising:

an acquisition module configured to acquire a template character image and a three-dimensional avatar model;

the projection module is configured to project the three-dimensional avatar model to a two-dimensional plane to generate an avatar image consistent with the head gesture of the template figure image, wherein according to the mapping relation between the head key points in the template figure image and the topological structure of the three-dimensional avatar model, the transformation parameters are searched through an optimization method, and the transformation parameters with the minimum distance between the corresponding points are selected to transform the three-dimensional avatar model to generate the avatar image;

a calculation module configured to calculate a color matching image of the template character image and a head of the avatar image;

the fusion module is configured to fuse the head of the avatar image to the template character image to obtain a fusion image and a fusion area mask;

a generation module configured to generate an avatar character image based on the color matching image and the fusion image;

The decomposing module is configured to respectively perform feature decomposition on the avatar character image and the avatar image to obtain first features and second features of the avatar character image and the avatar image, wherein the first features are used for representing textures, and the second features are used for representing shapes and colors;

and a transformation module configured to inverse transform the combination of the second feature of the avatar image and the first feature of the avatar image to obtain a target avatar image.

9. The apparatus of claim 8, wherein the projection module comprises:

a detection sub-module configured to detect head keypoints in the template person image;

a determining sub-module configured to calculate a distance of the head keypoint from a corresponding point of the three-dimensional avatar model, determining a transformation parameter;

and a projection sub-module configured to project the three-dimensional avatar model to a two-dimensional plane according to the transformation parameters, generating the avatar image.

10. The apparatus of claim 9, wherein the detection sub-module is further configured to:

11. The apparatus of claim 8, the computing module comprising:

an extraction sub-module configured to extract template character head features from the template character image and avatar head features from the avatar image;

a computing sub-module configured to compute a similarity matrix of the template persona head features and the avatar head features, wherein elements in the similarity matrix characterize color similarity of corresponding pixels of the template persona image and the head in the avatar image;

and a multiplication sub-module configured to multiply the avatar image with the similarity matrix to obtain the color matching image.

12. The apparatus of claim 11, the extraction submodule further configured to:

13. The apparatus of claim 8, wherein the generating means comprises:

an acquisition sub-module configured to acquire a fusion area mask;

and the restoration submodule is configured to restore the fusion image based on the color matching image and the fusion area mask and generate an avatar character image.

14. The apparatus of claim 13, wherein the repair sub-module is further configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.