CN110766777B

CN110766777B - Method and device for generating virtual image, electronic equipment and storage medium

Info

Publication number: CN110766777B
Application number: CN201911053622.XA
Authority: CN
Inventors: 蒋颂晟
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2023-09-29
Anticipated expiration: 2039-10-31
Also published as: CN110766777A

Abstract

The embodiment of the disclosure provides a method, a device, electronic equipment and a storage medium for generating an avatar; the method comprises the following steps: performing feature recognition on a frame image comprising a target object to obtain skeleton features and expression features of the target object; acquiring a basic virtual image model; based on the bone characteristics of the target object, adjusting the bone characteristics of the basic avatar model to obtain an avatar model matched with the bone characteristics of the target object; based on the expression characteristics of the target object, adjusting the expression characteristics of the virtual image model to obtain a target virtual image model matched with the expression characteristics of the target object; the target virtual image model is used for rendering to obtain the virtual image of the target object; by the method and the device, the creation of the personalized virtual image can be realized.

Description

Method and device for generating virtual image, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to a video processing technology, in particular to a method and a device for generating an avatar, electronic equipment and a storage medium.

Background

With the rapid development of the internet industry, artificial intelligence makes the application of the virtual world more and more, and the construction of the virtual image is related from animation to live broadcast, to operation of short video, and the like. In the related art, a general template is mostly adopted to provide an 'virtual image' for a user, the 'virtual image' of the template is similar, individuation is lacked, and the presentation effect is poor.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a storage medium for generating an avatar.

In a first aspect, an embodiment of the present disclosure provides a method for generating an avatar, including:

performing feature recognition on a frame image comprising a target object to obtain skeleton features and expression features of the target object;

acquiring a basic virtual image model;

based on the bone characteristics of the target object, adjusting the bone characteristics of the basic avatar model to obtain an avatar model matched with the bone characteristics of the target object;

based on the expression characteristics of the target object, adjusting the expression characteristics of the virtual image model to obtain a target virtual image model matched with the expression characteristics of the target object; the target avatar model is used for rendering to obtain an avatar of the target object.

In the above scheme, the feature recognition is performed on the frame image including the target object to obtain the skeletal feature and the expression feature of the target object, including:

identifying different parts of the head of the target object contained in the frame image so as to determine image areas corresponding to the parts of the head of the target object;

and carrying out feature extraction on images corresponding to all parts of the head of the target object based on the determined image areas to obtain skeleton features and expression features of the target object.

In the above aspect, the adjusting the bone feature of the base avatar model based on the bone feature of the target object includes:

acquiring skeletal features of the base avatar model;

determining bone transformation information corresponding to the avatar model relative to the base avatar model based on bone characteristics of the target object and bone characteristics of the base avatar model;

and adjusting vertex information of each part in the basic avatar model based on the skeleton transformation information.

In the above aspect, the adjusting vertex information of each part in the base avatar model based on the bone transformation information includes:

Determining a corresponding bone scaling factor, and a corresponding bone displacement, based on the bone transformation information;

and adjusting the positions of all vertexes in the basic avatar model based on the bone scaling coefficient and the bone displacement to obtain the avatar model.

In the above aspect, the adjusting the expression feature of the avatar model based on the expression feature of the target object includes:

determining expression parameters corresponding to target parts in all parts of the head of the virtual image model based on the expression characteristics of the target object, wherein the expression parameters are used for indicating the expression states of the target parts;

acquiring first expression data of the basic avatar model and second expression data of the avatar model;

and adjusting the expression characteristics of the virtual image model based on the expression parameters, the first expression data and the second expression data.

In the above aspect, the adjusting the expression feature of the avatar model based on the expression parameter, the first expression data and the second expression data includes:

interpolation is carried out on the first expression data and the second expression data based on the expression parameters, and an interpolation result is obtained;

And adjusting each vertex in the avatar model based on the interpolation result to obtain the target avatar model.

In the above scheme, the interpolating the first expression data and the second expression data based on the expression parameter to obtain an interpolation result includes:

based on the expression parameters, interpolating the first expression data and the second expression data by adopting the following formula to obtain an interpolation result:

Z＝X*(1-a)+Y*a

wherein Z is the expression data of the target virtual image model, X is the first expression data, Y is the second expression data, and a is the expression parameter.

In the above scheme, the method further comprises:

performing key point identification on a plurality of continuous frame images comprising the target object;

acquiring key point change information of the target object in the multi-frame continuous frame images;

and generating a form update instruction of the avatar based on the key point change information so as to dynamically present the avatar.

In the above scheme, the method further comprises:

receiving a modification request for a target portion of the avatar, the modification request carrying an image of the target object including the target portion;

In response to the modification request, a target avatar model of the target object is updated based on an image of the target object including the target portion to update an avatar of the target object based on the updated target avatar model.

In a second aspect, an embodiment of the present disclosure provides an avatar generating apparatus including:

the first recognition module is used for carrying out feature recognition on a frame image comprising a target object to obtain skeleton features and expression features of the target object;

the acquisition module is used for acquiring a basic virtual image model;

the first adjusting module is used for adjusting the skeleton characteristics of the basic avatar model based on the skeleton characteristics of the target object to obtain an avatar model matched with the skeleton characteristics of the target object;

the second adjusting module is used for adjusting the expression characteristics of the virtual image model based on the expression characteristics of the target object to obtain a target virtual image model matched with the expression characteristics of the target object; the target avatar model is used for rendering to obtain an avatar of the target object.

In the above solution, the first identifying module is further configured to identify different parts of the head of the target object included in the frame image, so as to determine an image area corresponding to each part of the head of the target object;

In the above aspect, the first adjustment module is further configured to obtain a bone feature of the basic avatar model;

In the above aspect, the first adjustment module is further configured to determine a corresponding bone scaling factor and a corresponding bone displacement based on the bone transformation information;

In the above scheme, the second adjustment module is further configured to determine, based on expression characteristics of the target object, expression parameters corresponding to a target portion in each portion of the head of the avatar model, where the expression parameters are used to indicate an expression state of the target portion;

In the above scheme, the second adjustment module is further configured to interpolate the first expression data and the second expression data based on the expression parameter, so as to obtain an interpolation result;

In the above scheme, the second adjustment module is further configured to interpolate the first expression data and the second expression data by using the following formula based on the expression parameter, to obtain an interpolation result:

Z＝X*(1-a)+Y*a

In the above scheme, the device further includes:

the second recognition module is used for carrying out key point recognition on a plurality of continuous frame images comprising the target object;

In the above scheme, the device further includes:

a modification module for receiving a modification request for a target portion of the avatar, the modification request carrying an image of the target object including the target portion;

In a third aspect, an embodiment of the present disclosure provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the method for generating the virtual image provided by the embodiment of the disclosure when executing the executable instructions.

In a fourth aspect, the present disclosure provides a storage medium storing executable instructions that, when executed, are configured to implement the method for generating an avatar as described above provided by the present disclosure.

The application of the embodiment of the present disclosure has the following beneficial effects:

by applying the embodiment of the disclosure, the skeleton feature and the expression feature of the target object are obtained by identifying the frame image of the target object, the skeleton feature of the basic avatar model is adjusted based on the skeleton feature, and the expression feature of the avatar model matched with the skeleton feature of the target object is adjusted based on the expression feature, so that the target avatar model matched with the expression feature of the target object is obtained, and the avatar of the target object is rendered and generated; the target virtual image model is obtained by adjusting the basic virtual image model based on the skeleton characteristics and the expression characteristics of the target object, so that personalized creation of the virtual image can be realized, and the expression effect of the virtual image can be better presented.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a schematic architecture diagram of an avatar generation system provided in an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;

fig. 3 is a flowchart illustrating a method for generating an avatar according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of frame image acquisition of a target object according to an embodiment of the disclosure;

FIG. 5 is a schematic diagram I of an image scanning frame according to an embodiment of the disclosure;

fig. 6 is a schematic diagram ii of an image scanning frame according to an embodiment of the disclosure;

fig. 7 is an interface schematic diagram of face key point detection provided in an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of bone feature adjustment of a base avatar model provided by an embodiment of the present disclosure;

FIG. 9 is a schematic diagram showing an avatar obtained by differently adjusting an avatar model according to an embodiment of the present disclosure;

fig. 10 is a schematic view of an avatar modification interface provided by an embodiment of the present disclosure;

fig. 11 is a second flowchart of a method for generating an avatar according to an embodiment of the present disclosure;

fig. 12 is a schematic structural view of an avatar generation apparatus provided in an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Before explaining the embodiments of the present disclosure in further detail, terms and terminology involved in the embodiments of the present disclosure are explained, and the terms and terminology involved in the embodiments of the present disclosure are applicable to the following explanation.

1) The virtual image converts the expression, action, mind and language of the user into one action of the virtual character in real time through intelligent recognition, and the facial expression, action mind and sound intonation of the virtual character can be completely copied to the user.

2) In response to a condition or state that is used to represent the condition or state upon which the performed operation depends, the performed operation or operations may be in real-time or with a set delay when the condition or state upon which it depends is satisfied; without being specifically described, there is no limitation in the execution sequence of the plurality of operations performed.

Based on the above explanation of terms and terminology involved in the embodiments of the present disclosure, referring to fig. 1, fig. 1 is a schematic architecture diagram of an avatar generation system provided in the embodiments of the present disclosure, in order to support an exemplary application, a terminal 400 (including a terminal 400-1 and a terminal 400-2) is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of both, and data transmission is implemented using a wireless or wired link.

A terminal 400 (e.g., terminal 400-1) for acquiring a frame image containing a target object; based on the target avatar model, an avatar of the target object is rendered and presented.

The server 200 is configured to perform feature recognition on a frame image including a target object to obtain skeletal features and expression features of the target object; acquiring a basic virtual image model; based on the bone characteristics of the target object, adjusting the bone characteristics of the basic avatar model to obtain an avatar model matched with the bone characteristics of the target object; and adjusting the expression characteristics of the virtual image model based on the expression characteristics of the target object to obtain a target virtual image model matched with the expression characteristics of the target object.

Here, in practical applications, the terminal 400 may be various types of user terminals such as a smart phone, a tablet computer, a notebook computer, etc., and may also be a wearable computing device, a Personal Digital Assistant (PDA), a desktop computer, a cellular phone, a media player, a navigation device, a game console, a television, or a combination of any two or more of these data processing devices or other data processing devices; the server 200 may be a separately configured server supporting various services or may be configured as a server cluster.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. The electronic device may be various terminals including mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (PDA, personal Digital Assistant), tablet computers (PAD), portable multimedia players (PMP, portable Media Player), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital Televisions (TVs), desktop computers, and the like. The electronic device shown in fig. 2 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 2, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 210 that may perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 220 or a program loaded from a storage means 280 into a random access Memory (RAM, random Access Memory) 230. In the RAM230, various programs and data required for the operation of the electronic device are also stored. The processing device 210, the ROM 220, and the RAM230 are connected to each other by a bus 240. An Input/Output (I/O) interface 250 is also connected to bus 240.

In general, the following devices may be connected to the I/O interface 250: input devices 260 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 270 including, for example, a liquid crystal display (LCD, liquid Crystal Display), a speaker, a vibrator, and the like; storage 280 including, for example, magnetic tape, hard disk, etc.; and a communication device 290. The communication means 290 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data.

In particular, according to embodiments of the present disclosure, the processes described by the provided flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 290, or from the storage device 280, or from the ROM 220. When the computer program is executed by the processing means 220, functions in the avatar generation method of the embodiment of the present disclosure are performed.

It should be noted that, the computer readable medium described above in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM, erasable Programmable Read Only Memory), a flash memory, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the disclosed embodiments, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the disclosed embodiments, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including electrical wiring, optical fiber cable, radio Frequency (RF), the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being assembled into an electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the avatar generation method provided by the embodiments of the present disclosure.

Computer program code for carrying out operations in embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computers may be connected to the user computer through any kind of network, including a local area network (LAN, local Area Network)) and a wide area network (WAN, wide Area Network), or may be connected to external computers (e.g., connected through the internet using an internet service provider).

The flowcharts and block diagrams provided by the embodiments of the present disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described in the embodiments of the present disclosure may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field programmable gate array (FPGA, field-Programmable Gate Array), an application specific integrated circuit (ASIC, application Specific Integrated Circuit), a special standard product (ASSP, application Specific Standard Parts)), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of the disclosed embodiments, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The following describes a method for generating an avatar provided by the embodiments of the present disclosure. Referring to fig. 3, fig. 3 is a flowchart of an avatar generation method provided in an embodiment of the present disclosure, and in some embodiments, the avatar generation method may be implemented by a server or a terminal, or implemented by the server and the terminal cooperatively, and taking the server implementation as an example, the avatar generation method provided in the embodiment of the present disclosure includes:

step 301: and the server performs feature recognition on the frame image comprising the target object to obtain skeleton features and expression features of the target object.

Here, a frame image including the target object is acquired by the terminal, and an acquisition request of an avatar corresponding to the target object is transmitted based on the frame image. The frame image of the target object is carried in the acquisition request and is used for requesting a target avatar model corresponding to the target object.

In some embodiments, a client, such as an instant messaging client, a microblog client, a short video client, etc., is provided on the terminal, and when a user needs to shoot relevant video of the avatar, a generating instruction of the avatar can be triggered by sliding, clicking, etc. on a view interface displayed on the terminal. The terminal responds to the generation instruction of the virtual image, acquires a frame image containing the target object, and further sends an acquisition request of the virtual image corresponding to the target object to the server based on the frame image.

In practical application, the terminal presents a toolbar containing various shooting prop icons such as a sticker, a filter, an avatar and the like to the user through a view interface, and the user can select a required shooting prop through clicking operation. When the terminal detects that the shooting prop icon selected by the user is an avatar icon, receiving an avatar generation instruction triggered by clicking operation of the user based on the avatar icon. For example, referring to fig. 4, fig. 4 is a schematic diagram of frame image acquisition of a target object provided in an embodiment of the present disclosure, where a terminal presents a preview frame image including the target object through a view interface and presents a page including an avatar icon. When the user clicks the avatar icon, the terminal presents the avatar icon in a selected state, namely the avatar icon can be encircled by a square frame, and at the moment, the terminal receives an avatar generation instruction triggered by the user, so that the frame image of the target object is acquired based on the avatar generation instruction.

In some embodiments, the terminal may also present the image scan frame through the view interface when acquiring a frame image of the target object. The image scanning frame is set based on the target object, is matched with the outline of the target object, and can present corresponding prompt information to the user so as to prompt the user to adjust the shooting posture, the shooting angle, the shooting distance and the like when shooting.

For example, referring to fig. 5, fig. 5 is a schematic diagram one of an image scanning frame provided in an embodiment of the present disclosure, where when a terminal presents an image acquisition interface and detects a target object, the image scanning frame is presented, and a user is prompted by displaying the text "please put a face into the frame" that the face needs to be put into the image scanning frame when creating an avatar. If the terminal detects that the outline of the target object is not in the image scanning frame, the terminal may prompt the user to adjust the shooting pose, angle or distance by the words "please shoot the face", "please move the face into the frame", etc., referring to fig. 6, fig. 6 is a schematic diagram of the image scanning frame provided in the embodiment of the present disclosure, and the outline of the target object in fig. 6 is not matched with the image scanning frame.

The server receives the acquisition request containing the frame image of the target object, and identifies the frame image of the target object so as to acquire the skeleton characteristics and the expression characteristics of the target object.

In some embodiments, the skeletal features and the expression features of the target object in the frame image may be obtained as follows: identifying different parts of the head of the target object contained in the frame image to determine image areas corresponding to the parts of the head of the target object; and carrying out feature extraction on images corresponding to all parts of the head of the target object based on the determined image areas to obtain skeleton features and expression features of the target object.

Here, the head portions of the target object include at least one of: eyes, hair, ears, mouth, nose, eyebrows, beaches, faces. Here, the eyes may include eyes and glasses, and the hair may include hair and a hat.

In some embodiments, if the features of the head portions of the target object are determined, it is first necessary to acquire the image areas in the frame image corresponding to the head portions. Specifically, the terminal can determine the image area of each part of the object of the head of the target object by means of face key point recognition. Here, the face key point refers to a point capable of reflecting local features (such as color features, shape features, and texture features) of the target object in the image, and is generally a set of a plurality of pixel points, for example, the face key point may be an eye key point, a mouth key point, a nose key point, or the like.

In practical application, detecting key points of a human face on a frame image containing a target object, and determining the key points included in each part of the head of the target object; based on the determined key points of the human face, the human face is aligned by adopting a human face alignment algorithm, and further, the region formed by the key points and the image region corresponding to each part of the head of the target object are determined. Referring to fig. 7, fig. 7 is an interface schematic diagram of face keypoint detection provided by an embodiment of the present disclosure, where a dashed box 1 is an image area of a nose determined by the keypoints included in the nose, and a dashed box 2 is an image area of a mouth determined by the keypoints included in the mouth.

Based on the determined image areas corresponding to the head parts of the target object, carrying out area segmentation on the acquired frame images, so that each obtained image corresponds to one part of different parts of the head of the target object; and respectively extracting features of images corresponding to all parts of the head of the target object based on the determined image areas to obtain skeleton features and expression features of the target object.

Step 302: a base avatar model is acquired.

The basic avatar model is prepared in advance for a designer, specifically, is prepared based on standard forms corresponding to each part of the head such as a standard face form, a standard lip form, a standard eye form and the like of a person, and is a general template for generating an avatar.

In practical applications, the creation of the personalized avatar is based on the basic avatar model, and thus, when generating the target avatar model corresponding to the target object, it is necessary to acquire the pre-made basic avatar model.

Step 303: and adjusting the bone characteristics of the basic avatar model based on the bone characteristics of the target object to obtain an avatar model matched with the bone characteristics of the target object.

After the bone feature of the target object is obtained, the bone feature of the basic avatar model may be adjusted based on the bone feature, that is, a face pinching process, so as to obtain an avatar model that matches the bone feature of the target object.

In some embodiments, the skeletal characteristics of the base avatar model may be adjusted by: acquiring skeleton characteristics of a basic virtual image model; determining bone transformation information of the corresponding avatar model relative to the base avatar model based on bone characteristics of the target object and bone characteristics of the base avatar model; and adjusting vertex information of each part in the basic avatar model based on the skeleton transformation information.

First, bone characteristics of a basic avatar model, that is, position information of key points of each bone constituting the basic avatar model, are acquired. According to the bone characteristics of the target object and the bone characteristics of the basic avatar model, for example, according to the change condition of the key point positions of bones, bone transformation information of the avatar model relative to the basic avatar model is determined, and then vertex information of each part in the avatar model can be adjusted according to the bone transformation information.

In some embodiments, the vertex information for each portion of the base avatar model may be adjusted by: determining a corresponding bone scaling factor, and a corresponding bone displacement, based on the bone transformation information; and adjusting the positions of all vertexes in the basic avatar model based on the bone scaling coefficient and the bone displacement to obtain the avatar model.

When vertex information of each part in the basic virtual model is adjusted, bone scaling factors and bone displacements corresponding to bones of each part are determined according to the obtained bone transformation information, and then positions of each vertex in the basic virtual model are adjusted based on the bone scaling factors and the bone displacements.

In practical application, the skeleton transformation information is a transformation matrix, each vertex of the basic virtual model is also represented by the matrix, the process of adjusting the position of each vertex in the basic virtual model is the spatial position transformation of each vertex, and the matrix corresponding to the model vertex can be multiplied by the transformation matrix (namely the skeleton transformation information) to realize the spatial position change of the vertex, so that the scaling and displacement of the skeleton in the basic virtual model can be realized to change the appearance characteristics of the head and achieve the effect of pinching the face.

Illustratively, referring to fig. 8, fig. 8 is a bone feature adjustment schematic of a basic avatar model provided by an embodiment of the present disclosure. Here, the points shown in the figure are the vertices, the connecting lines are the bones to be constructed, and the bones in fig. 8 are the corresponding parts of the eyebrows. Based on the change of the vertex information, the scaling and displacement of bones are realized, so that the appearance of the basic virtual image model is changed, and the effect of pinching the face is achieved.

Step 304: based on the expression characteristics of the target object, adjusting the expression characteristics of the virtual image model to obtain a target virtual image model matched with the expression characteristics of the target object; the target avatar model is used for rendering to obtain an avatar of the target object.

After the expression characteristics of the target object are obtained, the expression characteristics of the obtained virtual image model can be adjusted based on the expression characteristics, so that the expression characteristics of the virtual image model are matched with the expression characteristics of the target object, and the virtual image of the target object can be obtained by rendering based on the adjusted target virtual image model.

In some embodiments, the expressive features of the avatar model may be adjusted by: based on the expression characteristics of the target object, determining expression parameters corresponding to the target part in each part of the head of the virtual image model, wherein the expression parameters are used for indicating the expression state of the target part; acquiring first expression data of a basic avatar model and second expression data of the avatar model; and adjusting the expression characteristics of the avatar model based on the expression parameters, the first expression data and the second expression data.

Based on the obtained expression characteristics of the target portion in each portion of the target object head, determining expression parameters of the corresponding portion in each portion of the avatar model head, namely Blendrope parameters, wherein the expression parameters are used for indicating the current expression state of the target portion, such as that the target portion is an eye, and [0,1] can be used as the expression state for indicating the eye, such as that 0 represents the open eye state and 1 represents the closed eye state.

After determining the expression parameters corresponding to the target parts in the head parts of the virtual image, adjusting the expression characteristics of the virtual image model based on the expression parameters. The problem of poor expression presentation effect of the avatar occurs because the avatar model after the skeleton adjustment is directly adjusted according to the expression parameters. Thus, in the embodiments of the present disclosure, the expression features of the avatar model are adjusted in combination with the base avatar model and the avatar model after bone adjustment. In practical application, first expression data of a basic avatar model and second expression data of the avatar model are acquired, wherein the expression data is position information of each vertex of the avatar model. And further adjusting the expression characteristics of the avatar model based on the first expression data, the second expression data and the expression parameters.

For example, referring to fig. 9, fig. 9 is a schematic diagram showing a comparison of an avatar obtained by differently adjusting an avatar model according to an embodiment of the present disclosure, where the left side is shown to directly adjust the avatar model after having been subjected to bone adjustment according to expression parameters, so that the obtained avatar model is rendered, and it is apparent that there is a problem in an eye portion of a target object, and the degree of fitting of an eyelid is low; the right side diagram is provided by adopting the mode provided by the embodiment of the disclosure, and the avatar model is adjusted according to the expression parameters by combining the expression data of the basic avatar model and the avatar model, namely, the expression characteristics of the model after pinching the face are adjusted by combining the data before pinching the face and the data after pinching the face, so that the avatar with extremely high eyelid fitting degree as shown in the right side diagram is obtained, and the expression of the presented avatar is more natural.

In some embodiments, the expressive features of the avatar model may be adjusted by: interpolation is carried out on the first expression data and the second expression data based on the expression parameters, and an interpolation result is obtained; and adjusting each vertex in the avatar model based on the interpolation result to obtain the target avatar model.

In practical application, interpolation processing can be performed on the first expression data and the second expression data based on the expression parameters, and specifically, the following formula can be adopted to interpolate the first expression data and the second expression data to obtain an interpolation result:

Z＝X*(1-a)+Y*a

Based on the interpolation result, the positions of the vertexes in the avatar model are adjusted to obtain a target avatar model matched with the target object.

Based on the target avatar model matched with the target object and the materials of all parts of the head corresponding to the preset target avatar model, the avatar of the target object is obtained and presented through rendering processing of a graphic processor GPU and the like.

In some embodiments, after the avatar of the target object is obtained, the avatar may also be dynamically presented: performing key point identification on a plurality of continuous frame images including a target object; obtaining key point change information of a target object in multiple continuous frame images; based on the key point change information, a form update instruction of the avatar is generated to dynamically present the avatar.

After the corresponding avatar is generated for the target object, the user video capturing experience can also be improved by dynamically presenting the avatar, i.e., the avatar can be changed according to the change of the motion or expression of the head of the target object.

In practical implementation, the terminal can collect multiple continuous frame images aiming at a target object and send the images to the server. The server performs the following operations for each frame image received for each frame acquired: acquiring position change information of key points of each part of the head of the target object in the frame image relative to key points of corresponding parts of the target object in the frame image of the previous frame; adjusting materials corresponding to all parts of the head based on the position change information of key points of all parts of the head of the target object, updating a target virtual image model based on the adjusted materials, and further generating a form update instruction of the virtual image; and sending a form update instruction of the avatar to the terminal, wherein the form update instruction can carry the updated target avatar model so that the terminal updates the avatar of the presented target object according to the updated target avatar model.

In some embodiments, after the avatar corresponding to the target object is generated, the avatar may also be modified: receiving a modification request for a target portion of the avatar, the modification request carrying an image of a target object comprising the target portion; in response to the modification request, a target avatar model of the target object is updated based on the image of the target object including the target portion to update an avatar of the target object based on the updated target avatar model.

When the user is not satisfied with the constructed avatar or wants to further perfect the improvement of the avatar, the modification instruction of the avatar can be triggered by clicking an avatar icon corresponding to a certain avatar presented by the view interface, i.e., a thumbnail of the avatar, through the terminal. Referring to fig. 10, fig. 10 is a schematic view of an avatar modification interface provided in an embodiment of the present disclosure, where a terminal displays that a user creates two avatars in total, and when the terminal receives a click operation from the user, the user designates an avatar icon corresponding to the modified avatar by selecting a box, and displays a button of "modify avatar" on the view interface for the user to perform the modification operation.

After receiving the modification instruction of the user for the avatar, the terminal re-acquires the frame image of the target object, carries the frame image comprising the target part in a modification request, and sends the modification request to the server so as to request the modification of the target part of the avatar.

After receiving a modification request for modifying a target part of the virtual image, the server analyzes the modification request, acquires a frame image carried in the modification request, and identifies the frame image of the target object comprising the target part so as to determine an image area corresponding to the target part; dividing the frame image based on the image area corresponding to the target part to obtain an image corresponding to the target part; performing category prediction on the image of the target part through a pre-trained neural network model so as to determine the category to which the target part belongs; further, according to the category of the target part, determining the material corresponding to the new target part; and replacing the material of the target part in the original target virtual image model with the material of the new target part to update the target virtual image model of the target object.

And transmitting the updated target avatar model to the terminal, so that the terminal updates the avatar of the presented target object based on the updated target avatar model.

The following describes a method for generating an avatar provided by the embodiments of the present disclosure with reference to a specific embodiment. Referring to fig. 11, fig. 11 is a second flowchart of a method for generating an avatar according to an embodiment of the present disclosure, where the method for generating an avatar according to the embodiment of the present disclosure includes:

step 1101: the terminal acquires a frame image of the target object and sends an avatar acquisition request corresponding to the target object.

Step 1102: and the server performs feature recognition on the frame image of the target object to obtain skeleton features and expression features of the target object.

When the server performs feature recognition, firstly, different parts of the head of the target object contained in the frame image are recognized, the image area corresponding to each part is determined, and feature extraction is performed on the image of the image area to obtain skeleton features and expression features of the target object.

Step 1103: bone characteristics of the base avatar model are obtained.

Step 1104: bone transformation information is determined based on bone characteristics of the target object and bone characteristics of the base avatar model.

Step 1105: based on the bone transformation information, a bone scaling factor and a bone displacement are determined.

Step 1106: based on the bone scaling factor, bone displacement, the positions of the vertices in the base avatar model are adjusted.

Here, the basic avatar model is subjected to bone adjustment to obtain an avatar model, that is, a face pinching process.

Step 1107: and determining expression parameters corresponding to the target parts in each part of the head of the virtual image model based on the expression characteristics of the target object.

Step 1108: first expression data of a basic avatar model and second expression data of the avatar model are acquired.

Step 1109: interpolation is carried out on the first expression data and the second expression data based on the expression parameters, and an interpolation result is obtained;

step 1110: and adjusting the positions of all vertexes in the avatar model based on the interpolation result to obtain the target avatar model.

Step 1111: the target avatar model is transmitted.

Step 1112: the terminal renders and presents the avatar of the target object based on the target avatar model.

Step 1113: and acquiring a second frame image of the target object and sending the second frame image to the server.

Here, the second frame image and the first frame image are continuous frame images.

Step 1114: and the server acquires the position information of the face key point of the target object in the second frame image.

Step 1115: and determining the position change information of the face key points of the target object in the second frame image relative to the face key points of the target object in the first frame image.

Step 1116: based on the position change information of the key points of the face of the target object, updating the target avatar model of the target object, and sending a form update instruction of the avatar to the terminal.

Step 1117: and the terminal updates the virtual image of the presented target object according to the updated target virtual image model carried in the updating instruction.

The following describes units and/or modules in a generating apparatus implementing an avatar provided by an embodiment of the present disclosure. It will be appreciated that the units or modules in the avatar generation apparatus may be implemented in the electronic device shown in fig. 2 in the form of software (e.g., a computer program stored in the above-mentioned computer software program), or in the electronic device shown in fig. 2 in the form of hardware logic components (e.g., FPGA, ASIC, SOC and CPLD) as described above.

Referring to fig. 12, fig. 12 is an alternative structural diagram of an avatar generating apparatus 1200 implementing an embodiment of the present disclosure, illustrating the following modules: the first recognition module 1210, the acquisition module 1220, the first adjustment module 1230, and the second adjustment module 1240 will be described below as functions of the respective modules.

It should be noted that the above classification of modules does not constitute a limitation on the electronic device itself, for example, some modules may be split into two or more sub-modules, or some modules may be combined into one new module.

It should also be noted that the names of the above modules do not constitute limitations on the modules themselves in some cases, for example, the above "first recognition module 1210" may also be described as a module for performing feature recognition on a frame image including a target object, to obtain skeletal features and expression features of the target object ".

For the same reason, elements and/or modules not described in detail in the electronic device do not represent defaults of corresponding elements and/or modules, and any operations performed by the electronic device may be performed by corresponding elements and/or modules in the electronic device.

With continued reference to fig. 12, fig. 12 is a schematic structural view of an avatar generation apparatus 1200 provided in an embodiment of the present disclosure, the apparatus including:

the first recognition module 1210 is configured to perform feature recognition on a frame image including a target object, so as to obtain skeletal features and expression features of the target object;

an acquisition module 1220 for acquiring a base avatar model;

a first adjustment module 1230, configured to adjust the skeletal feature of the basic avatar model based on the skeletal feature of the target object, to obtain an avatar model that matches the skeletal feature of the target object;

a second adjustment module 1240, configured to adjust the expression features of the avatar model based on the expression features of the target object, to obtain a target avatar model that matches the expression features of the target object; the target avatar model is used for rendering to obtain an avatar of the target object.

In some embodiments, the first identifying module 1210 is further configured to identify different parts of a head of the target object included in the frame image, so as to determine an image area corresponding to each part of the head of the target object;

In some embodiments, the first adjustment module 1230 is further configured to obtain skeletal features of the base avatar model;

In some embodiments, the first adjustment module 1230 is further configured to determine a corresponding bone scaling factor, and a corresponding bone displacement, based on the bone transformation information;

In some embodiments, the second adjusting module 1240 is further configured to determine, based on the expression characteristics of the target object, expression parameters corresponding to a target portion in each portion of the avatar model head, where the expression parameters are used to indicate an expression state of the target portion;

In some embodiments, the second adjusting module 1240 is further configured to interpolate the first expression data and the second expression data based on the expression parameter to obtain an interpolation result;

In some embodiments, the second adjusting module 1240 is further configured to interpolate the first expression data and the second expression data based on the expression parameter by using the following formula to obtain an interpolation result:

Z＝X*(1-a)+Y*a

In some embodiments, the apparatus further comprises:

a second recognition module 1250 for performing key point recognition on a plurality of continuous frame images including the target object;

In some embodiments, the apparatus further comprises:

a modification module 1260 for receiving a modification request for a target portion of the avatar, the modification request carrying an image of the target object including the target portion;

It should be noted here that: the description of the apparatus for generating an avatar related to the above is similar to the description of the method described above, and the description of the beneficial effects of the method is omitted herein for details of the technology not disclosed in the embodiments of the apparatus for generating an avatar disclosed in the embodiments of the present disclosure, please refer to the description of the embodiments of the method disclosed in the present disclosure.

The embodiment of the disclosure also provides an electronic device, which comprises:

a memory for storing an executable program;

and the processor is used for realizing the method for generating the virtual image provided by the embodiment of the disclosure when executing the executable program.

The embodiment of the present disclosure also provides a storage medium storing executable instructions that, when executed, are used to implement the avatar generation method provided by the embodiment of the present disclosure.

According to one or more embodiments of the present disclosure, the embodiments of the present disclosure provide a method of generating an avatar, including:

acquiring a basic virtual image model;

According to one or more embodiments of the present disclosure, the embodiments of the present disclosure provide a method for generating an avatar, further including:

the feature recognition is performed on the frame image including the target object to obtain skeleton features and expression features of the target object, including:

the adjusting the skeletal feature of the base avatar model based on the skeletal feature of the target object comprises:

acquiring skeletal features of the base avatar model;

the adjusting vertex information of each part in the basic avatar model based on the bone transformation information comprises:

the adjusting the expression characteristic of the avatar model based on the expression characteristic of the target object includes:

the adjusting the expression features of the avatar model based on the expression parameters, the first expression data and the second expression data includes:

the interpolating the first expression data and the second expression data based on the expression parameters to obtain an interpolation result, including:

Z＝X*(1-a)+Y*a

According to one or more embodiments of the present disclosure, there is also provided an avatar generating apparatus including:

the acquisition module is used for acquiring a basic virtual image model;

The foregoing description is only illustrative of the embodiments of the present disclosure and the technical principles employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method of generating an avatar, the method comprising:

Acquiring a basic virtual image model;

based on the expression characteristics of the target object, adjusting the expression characteristics of the virtual image model to obtain a target virtual image model matched with the expression characteristics of the target object; the target virtual image model is used for rendering to obtain the virtual image of the target object;

based on the expression parameters, the first expression data and the second expression data, adjusting the expression characteristics of the virtual image model;

2. The method of claim 1, wherein the feature recognition of the frame image including the target object to obtain the skeletal feature and the expression feature of the target object comprises:

3. The method of claim 1, wherein the adjusting the skeletal characteristics of the base avatar model based on the skeletal characteristics of the target object comprises:

Acquiring skeletal features of the base avatar model;

4. The method of claim 3, wherein adjusting vertex information for portions of the base avatar model based on the bone transformation information comprises:

5. The method of claim 1, wherein interpolating the first expression data and the second expression data based on the expression parameters to obtain interpolation results comprises:

wherein ,expression data for the target avatar model, < >>For the first expression data, +.>For the second expression data, +.>And the expression parameters are the expression parameters.

6. The method of claim 1, wherein the method further comprises:

7. The method of claim 1, wherein the method further comprises:

8. An avatar generation apparatus, the apparatus comprising:

the acquisition module is used for acquiring a basic virtual image model;

the second adjusting module is used for adjusting the expression characteristics of the virtual image model based on the expression characteristics of the target object to obtain a target virtual image model matched with the expression characteristics of the target object; the target virtual image model is used for rendering to obtain the virtual image of the target object;

the second adjusting module is further configured to determine, based on expression characteristics of the target object, expression parameters corresponding to a target portion in each portion of the head of the avatar model, where the expression parameters are used to indicate an expression state in which the target portion is located;

the second adjusting module is further configured to interpolate the first expression data and the second expression data based on the expression parameter, so as to obtain an interpolation result;

9. The apparatus of claim 8, wherein,

the first recognition module is further configured to recognize different parts of the head of the target object included in the frame image, so as to determine an image area corresponding to each part of the head of the target object;

10. The apparatus of claim 8, wherein,

the first adjusting module is further used for acquiring skeleton characteristics of the basic virtual image model;

11. The apparatus of claim 10, wherein the device comprises a plurality of sensors,

the first adjustment module is further configured to determine a corresponding bone scaling factor and a corresponding bone displacement based on the bone transformation information;

12. The apparatus of claim 8, wherein,

the second adjustment module is further configured to interpolate the first expression data and the second expression data by using the following formula based on the expression parameter, to obtain an interpolation result:

13. The apparatus of claim 8, wherein the apparatus further comprises:

14. The apparatus of claim 8, wherein the apparatus further comprises:

15. An electronic device, the electronic device comprising:

a memory for storing executable instructions;

a processor for implementing the avatar generation method according to any one of claims 1 to 7 when executing the executable instructions.

16. A storage medium storing executable instructions which, when executed, are adapted to carry out the avatar generation method of any one of claims 1 to 7.