CN115147265A

CN115147265A - Virtual image generation method and device, electronic equipment and storage medium

Info

Publication number: CN115147265A
Application number: CN202210776196.8A
Authority: CN
Inventors: 李�杰; 陈睿智; 赵晨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-10-04
Anticipated expiration: 2042-06-30
Also published as: CN115147265B

Abstract

The utility model provides a virtual image generation method, which relates to the technical field of artificial intelligence, in particular to the technical fields of augmented reality, virtual reality, computer vision, deep learning and the like, and can be applied to the scenes of metauniverse, virtual image generation and the like. The specific implementation scheme is as follows: converting the preset image according to conversion information between a first coordinate system of the target style image and a second coordinate system of the preset image to obtain a first registration image; aligning a plurality of preset bases of the first registration image with a plurality of target style bases of the target style image to obtain a second registration image; obtaining first mapping information according to the conversion information and the second registration image; and generating a target avatar of the target object in the target image according to the target image and the first mapping information. The present disclosure also provides an avatar generation apparatus, an electronic device, and a storage medium.

Description

Virtual image generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to the field of augmented reality, virtual reality, computer vision, deep learning, and the like, and can be applied to the scenes of metas, avatar generation, and the like. More particularly, the present disclosure provides an avatar generation method, apparatus, electronic device, and storage medium.

Background

With the development of artificial intelligence technology, deep learning models are widely used for image processing or image generation in fields such as virtual reality and augmented reality. In addition, the virtual image is widely applied to scenes such as social contact, live broadcast or games.

Disclosure of Invention

The present disclosure provides an avatar generation method, apparatus, device, and storage medium.

According to an aspect of the present disclosure, there is provided an avatar generation method, the method including: converting the preset image according to conversion information between a first coordinate system of the target style image and a second coordinate system of the preset image to obtain a first registration image; aligning a plurality of preset bases of the first registration image with a plurality of target style bases of the target style image to obtain a second registration image; obtaining first mapping information according to the conversion information and the second registration image; and generating a target avatar of the target object in the target image according to the target image and the first mapping information.

According to another aspect of the present disclosure, there is provided an avatar generating apparatus, the apparatus including: the conversion module is used for converting the preset image according to conversion information between a first coordinate system of the target style image and a second coordinate system of the preset image to obtain a first registration image; the alignment module is used for aligning the plurality of preset bases of the first registration image with the plurality of target style bases of the target style image to obtain a second registration image; an obtaining module, configured to obtain first mapping information according to the conversion information and the second registration image; and the first generation module is used for generating a target virtual image of the target object in the target image according to the target image and the first mapping information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an exemplary system architecture to which the avatar generation method and apparatus may be applied, according to one embodiment of the present disclosure;

FIG. 2 is a flow diagram of an avatar generation method according to one embodiment of the present disclosure;

3A-3C are schematic diagrams of an avatar generation method according to one embodiment of the present disclosure;

FIG. 4 is a flow diagram of an avatar generation method according to another embodiment of the present disclosure;

FIG. 5 is a block diagram of an avatar generation apparatus according to one embodiment of the present disclosure; and

fig. 6 is a block diagram of an electronic device to which an avatar generation method may be applied according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The avatar may include various types of images such as a two-dimensional image, a three-dimensional image, a cartoon image, a realistic-style image, a super realistic-style image, and the like. In the process of generating the virtual image, an virtual image can be designed manually. However, manually designing the avatar requires high labor costs. In addition, when the virtual image is generated by using the related software, the related software can respond in real time according to the instruction input by the designer, so that the resource cost of the software is high.

Fig. 1 is a schematic diagram of an exemplary system architecture to which the avatar generation method and apparatus may be applied, according to one embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The backend management server may analyze and process the received data such as the user request, and feed back a processing result (for example, a web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the avatar generation method provided in the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the avatar generation apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The avatar generation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the avatar generation apparatus provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

Fig. 2 is a flowchart of an avatar generation method according to one embodiment of the present disclosure.

As shown in fig. 2, the method 200 may include operations S210 to S240.

In operation S210, the preset image is transformed according to transformation information between the first coordinate system of the target-style image and the second coordinate system of the preset image, so as to obtain a first registration image.

For example, the target style image may be an artificially designed image. The target style image may be a three-dimensional image.

For example, the preset image may be an image preset in the parametric model. The preset image may be a three-dimensional image. In one example, the parameterized models may include, for example, 3d DMM (3 d Morphable models,3d variable models) and Blendshape (hybrid shape) models.

For example, the coordinate systems of the target-style image and the preset image are different. Three-dimensional points in different coordinate systems can be transformed into each other according to the transformation information. The transformation information may be implemented as a transformation matrix. The coordinates of the three-dimensional points may be implemented as a 3 x 1 matrix. And performing matrix multiplication according to the coordinates of the three-dimensional points in the second coordinate system and the transformation matrix, so as to transform the three-dimensional points in the second coordinate system into the first coordinate system.

For example, a first registration image may be obtained by transforming each three-dimensional point in the preset image into the first coordinate system using the transformation matrix.

For example, a first object may be included in the target style image. The first object may be a human, animal, robot, or the like object having a face or a head. For another example, the second object may be included in the preset image. The second object may be a human, an animal, a robot, or the like having a face or a head.

In operation S220, a plurality of preset bases of the first registered image are aligned with a plurality of target style bases of the target style image, resulting in a second registered image.

For example, the second registration image may be obtained by aligning the plurality of preset bases with the plurality of target style bases using a non-rigid iterative Closest Point (npicp) algorithm.

For another example, a triangular patch and a point normal associated with the predetermined basis may be adjusted to obtain the second registration image.

In operation S230, first mapping information is obtained according to the conversion information and the second registration image.

For example, the first mapping information may convert the second registered image to a second coordinate system.

In operation S240, a target avatar of a target object in the target image is generated according to the target image and the first mapping information.

For example, a first three-dimensional image of the target object may be obtained by processing the target image with a parameterized model. An avatar may be obtained by processing the first three-dimensional image with the first mapping information.

For another example, the target object may be an object having a face or a head, such as a human, an animal, or a robot.

With the embodiments of the present disclosure, the first mapping information is determined according to the target style image and the preset image, whereby style information consistent with the target style image can be efficiently added to the avatar of the target object.

In some embodiments, the substrate of the present disclosure is described in detail by taking a parameterized model as a Blendshape model as an example.

A face with an expression can be split into two components: a personality component and an expression component. For example, personality components are related to the nature of the face of an object and may be used to distinguish between faces of different objects. The personality component may not change over a longer time frame (e.g., 7x24 hours). For another example, the expression component may also change within a short time scale (e.g., 1 second). The face of a subject may have a variety of expressions.

For example, the Blendshape model includes a plurality of bases for representing expressions in order to implement the representation of the expression components. In one example, combining different bases may result in one expression component. And then the expression component and the personality component are superposed to determine a virtual face. It is understood that there may be multiple feature points on the virtual face.

For another example, the base may correspond to the subject's facial contours, eyes, mouth, nose, etc.

In some embodiments, the input image may be pre-processed to obtain the target image. The pretreatment mode comprises the following steps: image cropping, translation, and the like. And enabling the target object in the target image to be at a preset position through preprocessing.

In some embodiments, aligning the plurality of preset bases of the first registered image with the plurality of target-style bases of the target-style image to obtain the second registered image comprises: determining a target style substrate corresponding to a preset substrate according to style semantic information of the target style substrate and preset semantic information of the preset substrate; and adjusting the position and the size of a preset substrate corresponding to the target style substrate in the first registration image according to the position and the size of the target style substrate in the target style image to obtain a second registration image. The following will be described in detail with reference to fig. 3A to 3C.

Fig. 3A to 3C are schematic diagrams of an avatar generation method according to one embodiment of the present disclosure.

As shown in fig. 3A, the target style image 301 may be a three-dimensional image. For example, the target style image 301 may also include a plurality of target style bases. The target style substrate may correspond to a five sense organ of the first object. The style semantic information may indicate an organ to which the target style base corresponds. For example, the plurality of target style bases may include: a target style Base _ style _ eye corresponding to the eyes of the first object, a target style Base _ style _ mouth corresponding to the mouth of the first object, and so on.

As shown in fig. 3B, the preset image 302 may be a preset image in the parametric model. The preset image 302 is converted according to a conversion matrix between the first coordinate system of the target style image 301 and the second coordinate system of the preset image 302, so as to obtain a first registration image 303. The coordinate system of the first registration image 303 may coincide with the coordinate system of the target-style image 301. In one example, the transformation matrix may be an affine transformation matrix.

For example, the preset image 302 may include a plurality of initial substrates. The first registration image 303 resulting from the conversion of the preset image 302 may also comprise a plurality of preset bases. Each preset base may correspond to one initial base. For another example, the preset base may correspond to five sense organs of the second object. The preset semantic information may indicate an organ corresponding to the preset style base. The plurality of predetermined substrates may include: a preset Base _ pre _ eye corresponding to an eye of the second object, a preset Base _ pre _ mouth corresponding to a mouth of the second object, and the like.

Next, the plurality of preset bases of the first registration image 303 may be aligned with the plurality of target-style bases of the target-style image. For example, it may be determined that the target style Base _ style _ eye corresponds to the preset Base _ pre _ eye according to style semantic information of the target style Base _ style _ eye and preset semantic information of the preset Base _ pre _ eye. The position of the preset Base _ pre _ eye in the first registration image 303 is adjusted according to the position of the target style Base _ style _ eye in the target style image 301. And adjusting the size of the preset Base _ pre _ eye according to the size of the target style Base _ style _ eye, so that the position of the eye corner in the adjusted first registration image 303 is consistent with the position of the eye corner in the target style image 301. In one example, the size and the point normal of the triangle patch corresponding to the preset Base _ pre _ eye may be adjusted to adjust the position and size of the preset Base _ pre _ eye.

After the adjustment of the plurality of preset bases is completed, a second registration image 304 may be obtained.

From the second registration image 304 and the preset image 302, a mapping relationship between the two can be determined as the first mapping information in various ways.

In one example, the second registered image 304 is converted to the second coordinate system of the preset image 302 using a conversion matrix (or an inverse of the conversion matrix). In the second coordinate system, an initial mapping relationship between the converted second registration image 304 and the preset image 302 is determined. According to the initial mapping relation and the conversion matrix, first mapping information can be obtained. Through the embodiment of the disclosure, the second registration image can have style information of the target style image, and then other images are processed according to the first mapping information by using the parameterized model, and the processed other images can also have style information of the target style image.

As shown in fig. 3C, a first three-dimensional image 306 may be obtained by processing the object image 305 with a parameterized model. By processing the first three-dimensional image 306 with the first mapping information, an avatar 307 may be obtained. It is to be understood that the avatar 307 may be a three-dimensional image or a three-dimensional mesh model.

It is to be appreciated that in the embodiments described above, the avatar may be generated by processing the first three-dimensional image with the first mapping information. In the embodiment of the present disclosure, the first three-dimensional figure may be further processed to obtain an avatar with a higher similarity to the target object. As will be described in detail below.

In some embodiments, obtaining the target avatar of the target object in the target image according to the target image and the first mapping information comprises: determining a first three-dimensional image of the target object according to at least one target feature point of the target object; adjusting the first three-dimensional image to make a first difference between the first three-dimensional image and the target image converge to obtain a second three-dimensional image; and processing the second three-dimensional image using the first mapping information to generate the target avatar.

For example, the first difference may be a minimum projection error. It will be appreciated that the target image may be, for example, a two-dimensional image. And processing the target image by using the parameterized model according to at least one target feature point of the target object to obtain a first three-dimensional image. Adjusting the first three-dimensional image based on the minimum projection error may make the target image more similar to that of the first three-dimensional image, and in particular may make the five sense organs and facial contours of the object in the two images more similar.

For another example, by processing the second three-dimensional image using the first mapping information, an avatar having a higher similarity to the target object can be obtained.

It will be appreciated that after an avatar is derived from the first mapping information and the second three-dimensional image, the avatar may be further processed to obtain a more realistic target avatar. As will be described in detail below.

In some embodiments, processing the second three-dimensional image using the first mapping information to generate the target avatar comprises: processing the second three-dimensional image by using the first mapping information to obtain an initial virtual image; and weighting the plurality of regions of the initial avatar by using the plurality of preset weights respectively to generate the target avatar.

For example, after the second three-dimensional image is processed according to the first mapping information using the parametric model, the resultant avatar may be used as the initial avatar. It will be appreciated that in the case where the parameterized model is a Blendshape model, the face contours of the original avatar may appear distorted. For example, the chin of the original avatar may be sharp and not smooth enough.

In this case, the five-sense organ region of the initial avatar may be weighted with a first preset weight (e.g., 0.8), and the face contour region of the initial avatar may be weighted with a second preset weight (e.g., 0.2). The weighted initial avatar may be the target avatar. By the embodiment of the present disclosure, the virtual image is weighted, and a more real target virtual image can be obtained.

It is to be understood that some embodiments of the avatar are described in detail above. After the target avatar is obtained, the target avatar may be driven such that the target avatar exhibits different expressions. This will be described in detail below with reference to fig. 4.

Fig. 4 is a flowchart of an avatar generation method according to another embodiment of the present disclosure.

As shown in fig. 4, the method 400 may include operations S450 to S470. It is understood that operation S450 may be performed after operation S240 described above.

In operation S450, driving information for a target avatar is acquired.

In the embodiment of the present disclosure, the first mapping information is associated with a plurality of target feature points.

For example, as described above, the first mapping information is derived from the transformation information and the second registration image. The second object in the preset image may have a plurality of initial feature points. The plurality of initial feature points may include, for example: initial feature points associated with the eyes of the second object, initial feature points associated with the mouth of the second object, and so on. In one example, the initial feature points associated with the left eye of the second object may be 6.

For another example, a target object in the target image may have a plurality of target feature points. The plurality of target feature points may include, for example: target feature points associated with the eyes of the target object, target feature points associated with the mouth of the target object, and so on. In one example, the initial feature points associated with the left eye of the target object may be 6. According to the semantic information of the target feature points, the target feature points can be related to the initial feature points, and then the target feature points are related to the first mapping information.

For example, the drive information is associated with at least one target feature point of the plurality of target feature points. In one example, one driving information is used to drive the target avatar such that the avatar exhibits the expression "smile". The drive information may include first sub drive information and second sub drive information. The first sub driving information and the second sub driving information may be respectively related to the following target feature points: target feature points associated with the mouth of the target object and target feature points associated with the eyes of the target object.

In operation S460, the first mapping information is updated with the driving information, resulting in second mapping information.

For example, the first mapping information may include first sub-mapping information, second sub-mapping information, and the like. The first sub-map information and the second sub-map information may be respectively related to the following target feature points: target feature points associated with the mouth of the target object and target feature points associated with the eyes of the target object. For example, the first sub update map information may be obtained by performing various operations based on the first sub map information and the first sub drive information. Or performing various operations according to the second sub-mapping information and the second sub-driving information to obtain second sub-update mapping information. And obtaining second mapping information according to the first sub-updating mapping information and the second sub-updating mapping information.

In operation S470, an updated avatar of the target object is generated according to the target image and the second mapping information.

For example, the first three-dimensional image is processed with the second mapping information to generate an updated avatar such that the updated avatar may exhibit the expression "smile".

Fig. 5 is a block diagram of an avatar generation apparatus according to one embodiment of the present disclosure.

As shown in fig. 5, the apparatus 500 may include a conversion module 510, an alignment module 520, an obtaining module 530, and a generation module 540.

The converting module 510 is configured to convert the preset image according to conversion information between the first coordinate system of the target style image and the second coordinate system of the preset image, so as to obtain a first registration image.

An aligning module 520, configured to align the plurality of preset bases of the first registered image with the plurality of target-style bases of the target-style image to obtain a second registered image.

An obtaining module 530, configured to obtain the first mapping information according to the conversion information and the second registration image.

The first generating module 540 is configured to generate a target avatar of the target object in the target image according to the target image and the first mapping information.

In some embodiments, the first generating module comprises: the first determining submodule is used for determining a first three-dimensional image of the target object according to at least one target feature point of the target object; the first adjusting submodule is used for adjusting the first three-dimensional image to enable a first difference between the first three-dimensional image and the target image to be converged to obtain a second three-dimensional image; and a processing sub-module for processing the second three-dimensional image using the first mapping information to generate a target avatar.

In some embodiments, the processing submodule comprises: the processing unit is used for processing the second three-dimensional image by utilizing the first mapping information to obtain an initial virtual image; and a weighting unit for weighting the plurality of regions of the initial avatar with a plurality of preset weights, respectively, to generate a target avatar.

In some embodiments, the alignment module comprises: the second determining submodule is used for determining a target style substrate corresponding to the preset substrate according to the style semantic information of the target style substrate and the preset semantic information of the preset substrate; and the second adjusting submodule is used for adjusting the position and the size of the preset substrate corresponding to the target style substrate in the first registration image according to the position and the size of the target style substrate in the target style image to obtain a second registration image.

In some embodiments, the first mapping information is associated with a plurality of target feature points, and the apparatus 500 further comprises: the acquisition module is used for acquiring driving information aiming at the target virtual image, wherein the driving information is related to at least one target characteristic point in the plurality of target characteristic points; the updating module is used for updating the first mapping information by using the driving information to obtain second mapping information; a second generating module for generating an updated avatar of the target object according to the target image and the second mapping information

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the device 600 comprises a computing unit 601, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the avatar generation method. For example, in some embodiments, the avatar generation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the avatar generation method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the avatar generation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. An avatar generation method, comprising:

converting the preset image according to conversion information between a first coordinate system of the target style image and a second coordinate system of the preset image to obtain a first registration image;

aligning a plurality of preset bases of the first registration image with a plurality of target style bases of the target style image to obtain a second registration image;

obtaining first mapping information according to the conversion information and the second registration image; and

and generating a target virtual image of a target object in the target image according to the target image and the first mapping information.

2. The method of claim 1, wherein generating a target avatar of a target object in the target image according to the target image and the first mapping information comprises:

determining a first three-dimensional image of the target object according to at least one target feature point of the target object;

adjusting the first three-dimensional image to make a first difference between the first three-dimensional image and the target image converge to obtain a second three-dimensional image; and

processing the second three-dimensional image using the first mapping information to generate the target avatar.

3. The method of claim 2, wherein said processing the second three-dimensional image with the first mapping information to generate the target avatar comprises:

processing the second three-dimensional image by using the first mapping information to obtain an initial virtual image; and

and weighting the plurality of areas of the initial avatar by using a plurality of preset weights respectively to generate the target avatar.

4. The method of claim 1, wherein said aligning a plurality of preset bases of the first registered image with a plurality of target-style bases of the target-style image resulting in a second registered image comprises:

determining the target style substrate corresponding to the preset substrate according to the style semantic information of the target style substrate and the preset semantic information of the preset substrate; and

and adjusting the position and the size of a preset substrate corresponding to the target style substrate in the first registration image according to the position and the size of the target style substrate in the target style image to obtain a second registration image.

5. The method of claim 1, the first mapping information relating to a plurality of target feature points,

the method further comprises the following steps:

acquiring driving information for the target avatar, wherein the driving information is related to at least one target feature point of the plurality of target feature points;

updating the first mapping information by using the driving information to obtain second mapping information;

and generating an updated virtual image of the target object according to the target image and the second mapping information.

6. An avatar generation apparatus comprising:

the conversion module is used for converting the preset image according to conversion information between a first coordinate system of the target style image and a second coordinate system of the preset image to obtain a first registration image;

an alignment module, configured to align a plurality of preset bases of the first registered image with a plurality of target style bases of the target style image to obtain a second registered image;

an obtaining module, configured to obtain first mapping information according to the conversion information and the second registration image; and

and the first generation module is used for generating a target virtual image of a target object in the target image according to the target image and the first mapping information.

7. The apparatus of claim 6, wherein the first generating module comprises:

the first determining submodule is used for determining a first three-dimensional image of the target object according to at least one target feature point of the target object;

a first adjusting submodule, configured to adjust the first three-dimensional image so that a first difference between the first three-dimensional image and the target image converges to obtain a second three-dimensional image; and

and the processing submodule is used for processing the second three-dimensional image by utilizing the first mapping information so as to generate the target virtual image.

8. The apparatus of claim 7, wherein the processing submodule comprises:

the processing unit is used for processing the second three-dimensional image by utilizing the first mapping information to obtain an initial virtual image; and

and the weighting unit is used for respectively weighting the plurality of areas of the initial virtual image by utilizing a plurality of preset weights so as to generate the target virtual image.

9. The apparatus of claim 6, wherein the alignment module comprises:

the second determining submodule is used for determining the target style substrate corresponding to the preset substrate according to the style semantic information of the target style substrate and the preset semantic information of the preset substrate; and

and the second adjusting submodule is used for adjusting the position and the size of a preset substrate corresponding to the target style substrate in the first registration image according to the position and the size of the target style substrate in the target style image to obtain a second registration image.

10. The apparatus of claim 6, the first mapping information relating to a plurality of target feature points,

the device further comprises:

an obtaining module, configured to obtain driving information for the target avatar, wherein the driving information is related to at least one of the plurality of target feature points;

the updating module is used for updating the first mapping information by using the driving information to obtain second mapping information;

and the second generation module is used for generating an updated virtual image of the target object according to the target image and the second mapping information.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 5.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 5.