CN110390705B

CN110390705B - Method and device for generating virtual image

Info

Publication number: CN110390705B
Application number: CN201810339894.5A
Authority: CN
Inventors: 王丽婧; 辛晓哲; 范典; 王君; 李鲲鹏; 彭飞; 李健涛; 刘建
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-04-16
Filing date: 2018-04-16
Publication date: 2023-11-10
Anticipated expiration: 2038-04-16
Also published as: CN110390705A

Abstract

In order to obtain an avatar consistent with the motion of a target object, only a target image including the target object is acquired by an image acquisition device, motion information representing the target object is detected from the target image, and the avatar is generated according to the motion information. Since the motion information can characterize the motion of the object, the avatar is obtained based on the motion information, that is, it can be ensured that the motion of the avatar is consistent with the motion of the object. Therefore, the scheme for generating the virtual image can realize the generation of the virtual image consistent with the action of the target object by adopting the common image acquisition equipment without adopting complex hardware equipment, and the cost for generating the virtual image is reduced.

Description

Method and device for generating virtual image

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a method and apparatus for generating an avatar.

Background

Currently, there are many practical application scenarios in which it is necessary to map the limb motion of a target object to an avatar so that the motion of the avatar coincides with the motion of the target object, for example: somatosensory games, cartoon making and the like. Before mapping the limb movement of the target object to the virtual image, generally, a Kinect technology is adopted to obtain information representing the limb movement of the target object, and the specific mode is as follows: the Infrared transmitter (Infrared Projector) is adopted to emit laser, the laser is uniformly projected to a measurement space through a grating in front of a lens of the Infrared transmitter, an irradiated object (including a target human body) in the measurement space diffusely reflects the laser to form random scattered light spots, each scattered light spot in the measurement space is collected through an Infrared Camera (Infrared Camera), and data processing is carried out on each scattered light spot to obtain limb action information.

However, in the course of implementing the present invention, the inventors have found that when obtaining information on the movement of a limb using the Kinect technique, a lot of equipment such as an infrared emitter, a grating, an infrared camera, etc. are required, and besides, measurement in a specific measurement space is required. Therefore, the cost of obtaining the limb movement information by using the Kinect technique is high, i.e., the cost of mapping the limb movement of the target object to the avatar is high.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method and a device for generating an avatar, so that the cost for mapping the action of a target object to the avatar can be reduced.

Therefore, the technical scheme for solving the technical problems is as follows:

a first aspect of the present invention provides a method of generating an avatar, the method comprising:

acquiring a target image acquired by image acquisition equipment;

detecting action information representing a target object from the target image;

and generating an avatar according to the action information, wherein the avatar is consistent with the action of the target object.

Optionally, detecting motion information characterizing the target object from the target image includes:

And detecting limb motion information representing the target object from the target image.

Optionally, when the target image includes one of the targets, the detecting limb motion information characterizing the target from the target image includes:

processing the target image by adopting a preset first model, identifying the position information of a preset joint point in the target image, and taking the position information of the preset joint point as limb action information representing the target object; the preset first model is obtained by adopting a preset convolutional neural network algorithm and is used for representing the corresponding relation between the image and the position information of the preset articulation point in the image.

Optionally, when the target image includes at least two targets, the detecting limb motion information characterizing the targets from the target image includes:

processing the target image by adopting a preset second model, and identifying the position information of each preset joint point in the target image, wherein the preset second model is obtained by adopting a preset convolutional neural network algorithm and is used for representing the corresponding relation between the image and the position information of the preset joint point in the image;

Processing the target image by adopting a preset third model, and determining the intimacy between each preset joint point in the target image, wherein the preset third model is obtained by adopting a partial association field algorithm and is used for representing the correspondence between the image and the intimacy between the preset joint points in the image;

and determining the position information of the preset joint points belonging to each target object according to the intimacy between the preset joint points, and taking the position information as limb action information representing the target objects.

Optionally, the generating an avatar according to the motion information, where the avatar is consistent with the motion of the target object includes:

acquiring an image role corresponding to the target object;

and determining the limb model data of the avatar character according to the limb action information of the target object, and generating the avatar consistent with the limb action of the target object.

Optionally, the limb model data of the avatar character is determined according to the limb motion information of the target object, and the avatar consistent with the limb motion of the target object is generated:

according to the limb movement information of the target object, determining limb model data of the figure character corresponding to the limb movement information of the target object from a preset virtual figure library, wherein the preset virtual figure library comprises a corresponding relation between the limb movement information of the target object and the limb model data of the figure character;

Combining the determined limb model data of the avatar character to generate the avatar consistent with the limb action of the target object.

Optionally, when the target image includes a plurality of video images obtained from a target video, the generating an avatar according to the motion information, where the avatar is consistent with the motion of the target object includes:

and generating an animation video of the virtual image according to the limb action information of the target object in the video images, wherein the virtual image in the animation video is consistent with the action of the target object in the target video.

facial expression information characterizing the target object is detected from the target image.

A second aspect of the present invention provides an apparatus for generating an avatar, the apparatus comprising:

the acquisition module is used for acquiring the target image acquired by the image acquisition equipment;

the detection module is used for detecting action information representing a target object from the target image;

and the generation module is used for generating an avatar according to the action information, and the avatar keeps consistent with the action of the target object.

Optionally, the detection module includes:

and the first detection unit is used for detecting limb motion information representing the target object from the target image.

Optionally, when the target image includes one of the targets, the detecting unit includes:

the first identification subunit is used for processing the target image by adopting a preset first model, identifying the position information of a preset joint point in the target image, and taking the position information of the preset joint point as limb action information representing the target object; the preset first model is obtained by adopting a preset convolutional neural network algorithm and is used for representing the corresponding relation between the image and the position information of the preset articulation point in the image.

Optionally, when the target image includes at least two targets, the detection unit includes:

the second recognition subunit is used for processing the target image by adopting a preset second model, and recognizing the position information of each preset joint point in the target image, wherein the preset second model is obtained by adopting a preset convolutional neural network algorithm and is used for representing the corresponding relation between the image and the position information of the preset joint point in the image;

The first determining unit is used for processing the target image by adopting a preset third model, determining the intimacy between each preset joint point in the target image, wherein the preset third model is obtained by adopting a partial association field algorithm and is used for representing the correspondence between the image and the intimacy between the preset joint points in the image;

and the second determining unit is used for determining the position information of the preset joint points belonging to each target object according to the intimacy between the preset joint points and taking the position information as limb action information representing the target objects.

Optionally, the generating module includes:

the acquisition unit is used for acquiring the image roles corresponding to the target objects;

and the generating unit is used for determining the limb model data of the avatar character according to the limb action information of the target object and generating the avatar consistent with the limb action of the target object.

Optionally, the generating module further includes:

a third determining unit, configured to determine, according to the limb motion information of the target object, limb model data of the avatar character corresponding to the limb motion information of the target object from a preset avatar library, where the preset avatar library includes a correspondence between the limb motion information of the target object and the limb model data of the avatar character;

And the combining unit is used for combining the limb model data of the determined avatar character to generate the avatar consistent with the limb action of the target object.

Optionally, when the target image includes a plurality of video images obtained from a target video, the generating module is specifically configured to:

Optionally, the detection module includes:

and the second detection unit is used for detecting facial expression information representing the target object from the target image.

A third aspect of the present invention provides an apparatus for generating an avatar, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising means for performing the method as provided in the first aspect of the present invention described above.

A fourth aspect of the invention provides a non-transitory computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method as provided by the first aspect of the invention described above.

According to the technical scheme, the invention has the following beneficial effects:

in order to obtain an avatar in conformity with the motion of the object, only an object image including the object is acquired by the image acquisition device, motion information characterizing the motion of the object is detected from the object image, and the avatar is generated based on the motion information. Since the motion information can characterize the motion of the object, the avatar is obtained based on the motion information, that is, it can be ensured that the motion of the avatar is consistent with the motion of the object. According to the scheme for generating the virtual image, complicated hardware equipment is not needed, and only ordinary image acquisition equipment is adopted, so that the virtual image consistent with the action of the target object can be generated, and the cost for generating the virtual image is reduced.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method of generating an avatar according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of a limb of a subject;

FIG. 3 is a schematic diagram of an avatar generated when an object is provided according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an avatar generated when a plurality of objects are provided in an embodiment of the present invention;

fig. 5 is a schematic view of an apparatus for generating an avatar according to an embodiment of the present invention;

fig. 6 is a schematic structural view of an apparatus for generating an avatar in an embodiment of the present invention.

Detailed Description

In order to provide a low-cost implementation scheme for generating the avatar, the embodiment of the invention provides a method and a device for generating the avatar, and the embodiment of the invention is described below with reference to the accompanying drawings.

In application scenes such as a motion-sensing game and a cartoon-making, it is necessary to generate an avatar, that is, map an action of a target object onto the avatar so that the action of the avatar matches the action of the target object. Before mapping the motion of the object to the avatar, motion information of the object needs to be acquired first, and typically, motion information representing the motion of the object is acquired through a plurality of dedicated devices by using the Kinect technique.

However, when the Kinect technique is used to obtain the motion information of the target object, a lot of special equipment such as an infrared emitter, a grating, an infrared camera, etc. are required, and besides, measurement is required in a specific measurement space. Therefore, the Kinect technology is adopted to obtain the action information of the target object, the related special equipment is numerous, and the operation is complex; and the cost required to generate the avatar is high due to the high cost of the dedicated device.

In order to solve the above problems, the embodiments of the present invention provide a technical solution for generating an avatar, where only an image acquisition device (for example, a general camera, a camera on a mobile terminal, etc.) acquires a target image, where the target image includes a target object, and motion information of the target object can be obtained by detecting the target image, where the motion information can characterize a motion of the target object. The virtual image is generated based on the action information, so that the action of the virtual image is ensured to be consistent with the action of the target object, complex hardware equipment is not required, and only ordinary image acquisition equipment is adopted, so that the virtual image consistent with the action of the target object can be generated, and the cost for generating the virtual image is reduced.

Exemplary method

Fig. 1 is a flowchart of a method for generating an avatar according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 101, acquiring a target image acquired by an image acquisition device.

An image acquisition device is a device capable of acquiring an image. For example, the image acquisition device may be a video camera, a still camera, a camera carried by a terminal device such as a mobile phone or a computer, etc.

The target image in the invention refers to an image obtained by an image acquisition device only without the assistance of other special devices (such as an infrared emitter, a grating and the like). For example: the method comprises the steps of adopting images in videos acquired by a camera, or taking photos by the camera, or adopting video images or photos acquired by a camera arranged on a terminal device, and the like. Wherein, the figures, animals or active dolls contained in the target image are the target objects.

It can be understood that in one case, the device for generating the virtual image and the image acquisition device can be two independent devices, and the two devices perform image transmission through the established communication connection, so that wired communication connection, such as communication connection established by adopting a signal transmission line, can be established; wireless communication connections may also be established, such as: and adopting WIFI and Bluetooth to establish communication connection. In another case, the device for generating the avatar and the image acquisition device may be two different sub-devices integrated on the same device, where the two devices perform image transmission through a data transmission channel built in the device. For example, for a mobile phone, it includes both a camera with an image acquisition function and a processor with a function of generating an avatar.

In a specific implementation, there are a number of possible implementations for acquiring the target image in step 101.

In one implementation, the target image is an image acquired in real time by an image acquisition device.

In particular implementations, the target image may be an image. In one case, the image acquisition device acquires a single image and sends the image as a target image to the device for generating the avatar; in another case, the image acquisition device acquires a video, intercepts a frame of video image from the video, and sends the video image as a target image to the device for generating the avatar.

The target image may also be a plurality of images. Under the condition, the image acquisition equipment acquires a plurality of images, and sends the images to the equipment for generating the virtual image by taking the images as target images; in another case, the image acquisition device acquires a video, intercepts a plurality of frames of video images from the video, and sends the video images as target images to the device for generating the avatar.

In another implementation, the target image is an image obtained from local memory.

The local memory refers to a memory in the device that generates the avatar. The image in the local memory can be the image which is directly sent to the local memory for storage by the image acquisition equipment after being acquired by the image acquisition equipment; or the images can be acquired by the image acquisition equipment, then sent to the network server, and then downloaded from the network server by the local memory according to the need.

In particular, in one case, the target image may be an image, and the apparatus for generating the avatar selects an image from the images stored in its local memory as the target image; in another case, the target image may be a plurality of images, and the apparatus for generating the avatar selects the plurality of images as the target image from among the images stored in the local memory thereof. When the target image is a plurality of images, the target image may be a plurality of images that are continuous in acquisition time or storage location, or may be a plurality of images that are completely uncorrelated.

For example, on a server of a live platform, there is a local library of images, including 100 images: image 1, images 2, …, image 100. In one case, if the target image is one image, the image 52 is selected as the target image from among 100 images stored in the local memory; in another case, if the target image is 10 images, the image 10, the image 11, the image …, the image 20 are sequentially selected from 100 images in the local memory as the target image; if the target image is 100 images, selecting 'full selection' from the 100 images in the local gallery, or sequentially selecting all images in the local gallery as the target image.

The following steps are described with respect to the case where the target image is one image, and if the target image is a plurality of images, the process of generating the avatar is the same as the operation performed by the target image being one image for each of the plurality of images, but the following process of generating the avatar is repeatedly performed a plurality of times, which is not repeated in this embodiment.

Step 102, detecting motion information representing a target object from the target image.

And step 103, generating an avatar according to the action information, wherein the avatar is consistent with the action of the target object.

In some application scenarios, the motion of the target object may be a limb motion, and detecting motion information representing the target object from the target image includes:

limb motion information characterizing a target object is detected from a target image.

The target image obtained in step 101 includes a target object. As shown in fig. 2, the limb of the subject includes: hands, forearms, upper arms, trunk, thighs, calves, feet, etc. The limb movement of the target means a movement formed by the coordinated movement of the limbs. Such as: waving hands, kicking legs, walking, running, standing, etc.

The limb movement information refers to data for describing the limb movement of the target object. For example, the limb motion information may be position information of each preset joint point on the limb of the target object in the target image; alternatively, the limb movement information may be a relative positional relationship between each preset limb on the target object, or the like.

As an example, when the limb movement information is the position information of the preset joints of the target object in the target image, it is assumed that there are 12 preset joints, which are respectively: the limb movement information of the target object is the position information of the preset 12 joint points of the target object in the target image, namely, 12 coordinate points in the target image corresponding to the 12 joint points.

As another example, when the limb movement information is the relative positional relationship between each preset limb on the target object, it is assumed that there are 8 preset limbs, which are respectively: the limb movement information of the target object is the preset relative position between 8 limbs, namely, the relative angle information between 8 limbs of the target object.

In a specific implementation, the device for generating the avatar detects limb motion information of the target object in the obtained target image. If the limb movement information is the position information of each preset joint point on the limb of the target object in the target image, a point on the target image is taken as a coordinate origin, a two-dimensional coordinate system is established on the target image, and the limb movement information of the target object in the target image is as follows: and all preset joint points correspond to two-dimensional coordinate points (X, Y) in the two-dimensional coordinate system.

The limb movement information may represent the limb movement of the target object. That is, each two-dimensional coordinate point corresponds to a preset joint point, and represents the limb action of the target object of the preset joint point in the target image. For example, if the values of the ordinate of 6 preset articulation points in the obtained limb movement information, namely, 6 coordinate points in the corresponding limb movement information of the left wrist, the left elbow, the left shoulder, the right elbow and the right wrist, are all equal, the limb movement represented by the set of limb movement information is: the hands are lifted.

It will be appreciated that when limb movement information characterizing a target object is detected from a target image, an avatar can be generated from the limb movement information, the avatar remaining in accordance with the limb movement of the target object.

The term "virtual" refers to reproduction and reconstruction of the real world. An avatar, which is a real life-free character, but may be a visual character symbol familiar to people, in most cases, a game character such as: galen, supermary, etc., or, a cartoon binary image, such as: bear, machine cat, young girl warrior, etc.

The avatar in this embodiment is an avatar generated in accordance with the limb motion of the target object in the target image based on the detected limb motion information.

Wherein, the character corresponding to the avatar can be randomly selected from a plurality of avatar characters by the device for generating the avatar, for example, randomly selecting Hello Kitty from a plurality of cartoon characters as the character corresponding to the avatar to be generated; alternatively, the character corresponding to the avatar may be selected based on the physical characteristics of the object represented by the limb motion information, and for example, a model may be made with a relatively long leg, and a young girl fighter may be selected as the character corresponding to the avatar to be generated based on the physical characteristics. In addition, the character corresponding to the avatar may be selected by the user according to his own preference or specific requirements, for example, the user loves a machine cat, and the character corresponding to the avatar to be generated may be set as the machine cat.

The limb movement of the avatar is similar to the limb movement of the target object, and can be represented by the position information of the preset articulation point on the avatar character, wherein the preset articulation point of the avatar is the same as the preset articulation point of the target object. Alternatively, the limb movements of the avatar may be represented by the relative positional relationship between the preset limbs on the character of the avatar, the preset limbs of the avatar being identical to the preset limbs of the target.

The limb movement of the avatar is consistent with the limb movement of the object, that is, the position and angle of each limb of the avatar with respect to the avatar are consistent with the position and angle of each limb of the object with respect to the object. That is, the avatar corresponds to the direction information of a certain standard reference direction (e.g., a horizontal direction) with respect to each limb, and the direction information of the limb corresponding to the object with respect to the standard reference direction is the same.

For example, assuming that the standard reference direction is a horizontal direction, if the limb movement information of the target object is detected in the target image, the direction information including the right upper arm and the right forearm is: the lower right 90 degrees with the horizontal direction as the standard reference direction represents the limb movement corresponding to the right arm of the target object as: sagging; then, the limb motion information corresponding to the right upper arm and the right forearm of the avatar is: the right deviation with the horizontal direction as the standard reference direction is 90 degrees down, and the limb movements corresponding to the right arm of the avatar are also represented as: sagging. Similarly, the direction information of each limb of the avatar is the same as the direction information of the limb corresponding to the target object, so that the limb action of the generated avatar is consistent with the limb action of the target object.

It can be understood that, on platforms such as network live broadcast and psychological treatment, if the user does not want to disclose his own five sense organs to the opposite side, the method provided by this embodiment can be used to generate an avatar, so that the avatar can replace the user to face the opposite side, and the limb motion of the generated avatar is consistent with the limb motion of the user in real time, so that the opposite side can judge the limb motion of the user through the limb motion of the avatar, and can infer and judge the limb motion of the avatar and the change of the limb motion, thereby further knowing the behavior of the user.

By the method provided by the embodiment, the limb actions of the target object in the target image can be mapped to the virtual image by only providing the target image by using the image acquisition equipment, and the virtual image consistent with the limb actions of the target object is generated. Compared with the method for generating the virtual image by using special equipment to realize the mapping from the limb action of the target object to the virtual image, the method for generating the virtual image reduces the cost for generating the virtual image and enables the generation of the virtual image to be simpler and more convenient.

In particular, the limb motion information of the target object can be detected from the target image by a model preset in the device for generating the avatar. According to different numbers of targets in the target image, the preset model adopted by the device for generating the virtual image for detecting the limb movement information of the targets is different.

As an example, when the target image includes one target object, the apparatus for generating the avatar employs a preset first model for detecting limb movement information of the target object.

The preset first model is a model preset by the equipment for generating the virtual image. The first preset model adopts a convolutional neural network algorithm, is input into a target image, and is output as position information of a preset joint point of a target object in the target image, so the first preset model is used for representing the corresponding relation between the target image and the position information of the preset joint point of the target object in the target image.

The preset first model is a model obtained by training a preset convolutional neural network through a large number of training samples. That is, a convolutional neural network and a huge amount of first training images are known, wherein the first training images comprise only one person, and the position information of the preset joint point of the person in the first training images is marked as the known position information. The specific training process is as follows: and taking each first training image in all the first training images as input of a preset convolutional neural network, taking known position information of preset joint points of a person corresponding to the first training images as output of the preset convolutional neural network, continuously adjusting the preset convolutional neural network, and obtaining the convolutional neural network after training is finished, namely the preset first model.

When the apparatus for generating the avatar includes a preset first model and the acquired target image includes a target object, detecting limb motion information representing the target object from the target image may include:

step 102A, processing the target image by adopting a preset first model, identifying the position information of a preset joint point in the target image, and taking the position information of the preset joint point as limb action information representing a target object.

In specific implementation, the device for generating the virtual image inputs the target image into a trained preset first model, the preset first model processes the target image, the position information of a preset joint point of a target object in the target image is output, and the output position information of the preset joint point is the detected limb action information of the target object.

It can be understood that when the preset first model is adopted to detect the limb motion information of the target object, the obtained preset joint point of the target object is identical to the type of the preset joint point in the preset first model, for example, the output result in the preset first model includes: the detected limb movement information of the target object in the target image only comprises the position information of the 12 preset articulation points if the left wrist, the left elbow, the left shoulder, the right elbow, the right wrist, the left crotch, the left knee, the left ankle, the right crotch, the right knee and the right ankle are all the position information of the 12 preset articulation points.

For example, assume that the target image is image 1, the image 1 includes a target object a, and the predetermined first model is a trained model 1. Image 1 is now input into model 1, outputting the position information of the 12 preset nodes of the nail in image 1, i.e., (X) ₁ ，Y ₁ )，(X ₂ ，Y ₂ )、…、(X ₁₂ ，Y ₁₂ ). The position information of the 12 preset articulation points is limb motion information of the nail detected from the image 1, and the limb motion of the nail can be determined according to the 12 coordinate values.

If the target image only comprises one target object, at the moment, through a preset first model, the position information of a preset joint point of the target object in the target image can be determined and used as limb motion information for detecting the target object, and the limb motion information can represent limb motion of the target object and is ready for generating an avatar subsequently.

As another example, when the target image includes at least two targets, the apparatus for generating the avatar employs a preset second model and a preset third model to implement detection of limb movement information of the targets.

The preset second model is a model preset by the device for generating the virtual image. The second preset model adopts a convolutional neural network algorithm, is input as a target image, and outputs the position information of all preset joint points of a plurality of targets in the target image, so the second preset model is used for representing the corresponding relation between the target image and the position information of the preset joint points of the targets in the target image.

Different from a training sample used for training a preset first model, second training images in the training sample used for training a preset second model are images comprising a plurality of training objects; the preset position information of the joint points in the second training image is the position information of the joint points preset by all training objects in the marked second training image. However, the training process of the preset second model is basically identical to the training process of the preset first model, which is not described herein. Wherein the training object can be a person, an animal, or an active doll.

When the apparatus for generating the avatar includes a preset second model and the acquired target image includes a plurality of targets, step 102 may include:

step 102B1, processing the target image by using a preset second model, and identifying the position information of each preset node in the target image, where the preset second model is obtained by using a preset convolutional neural network algorithm and is used for representing the corresponding relationship between the image and the position information of the preset node in the image.

In specific implementation, the device for generating the avatar inputs the target image into a trained preset second model, the preset second model processes the target image, and the position information of a preset articulation point of a target object in the target image is output.

It can be understood that, assuming that the number of preset joint points of each object in the preset second model is n, if m (m > 1) objects are included in the object image, the position information of preset joint points of all objects in the obtained object image through the preset second model is n×m, and n preset joint points on each object are all the same joint points.

For example, assuming that the target image is image 2, the image 1 includes the target objects a and b, and the preset second model is the trained model 2. Image 2 is now input into model 2, and the positional information of the preset 24 nodes of A and B in image 2 is output, i.e., (X) ₁ ，Y ₁ )，(X ₂ ，Y ₂ )、…、(X ₂₄ ，Y ₂₄ )。

Obviously, since the target image includes a plurality of targets, the position information of the preset joint point in the target image is obtained through the preset second model in step 102B1, and is the coordinate value of the joint point belonging to the plurality of targets, and it is not possible to determine which coordinate values are the preset joint points belonging to which target object in the target image through the obtained plurality of coordinate values of the preset joint point, that is, it is not possible to determine the limb motion information for characterizing each target object in the plurality of target objects.

At this time, the apparatus for generating an avatar also needs to classify the obtained position information of all preset nodes by using a preset third model, and identify the position information of the preset nodes belonging to the same object.

The third model is a model preset by the device for generating the virtual image. The preset third model adopts a partial association field algorithm, is input into a target image, and is output as the intimacy between preset joint points in the target image, so the preset third model is used for representing the correspondence between the target image and the intimacy between preset joint points of a target object in the target image.

The preset intimacy between the joint points is used for representing the intimacy degree between the joint points, and generally, the value of the intimacy is between 0 and 1. The degree of affinity indicates the degree of probability that two preset joint points corresponding to the affinity belong to the same target object.

The preset third model is a model obtained by training a preset partial association field algorithm through a large number of training samples. That is, a partial association field algorithm and a massive second training image are known, wherein the second training image comprises a plurality of characters, and the intimacy between preset joint points of the training object in the second training image is marked as the known intimacy between the preset joint points. The specific training process is as follows: and respectively taking each second training image in all the second training images as input of a preset partial association field algorithm, taking the known affinity between preset joint points of the training object corresponding to the second training images as output of the preset partial association field algorithm, continuously adjusting the preset partial association field algorithm, and obtaining the partial association field algorithm after training is finished, namely a preset third model.

When the apparatus for generating the avatar includes a preset third model and the acquired target image includes a plurality of targets, after step 102B1, it may include:

step 102B2, processing the target image by adopting a preset third model, and determining the intimacy between each preset joint point in the target image. The preset third model is obtained by adopting a partial association field algorithm and is used for representing the corresponding relation of the image and the intimacy between preset joint points in the image.

In specific implementation, the device for generating the avatar inputs the target image into a trained preset third model, the preset third model processes the target image, and the affinity between preset joint points of all targets in the target image is output to represent the possibility that each preset joint point and other preset joint points belong to the same target object.

It will be appreciated that when the preset third model is used to detect the intimacy between preset joints of the target object, the intimacy between each preset joint in the target image and all preset joints except the preset joint can be obtained, i.e. the number of acquired affinities of each preset joint is one less than that of all preset joints.

For example, assume that the target image is image 2, the image 2 includes the targets a and b, and the preset third model is the trained model 3. The image 2 is input into the model 3, and the intimacy a between 24 preset joint points A and B in the image 2 is output _i，j Wherein i=1, 2, …, 24, j=1, 2, …, 24, a _i，j Indicating the affinity between the preset joint point i and the preset joint point j.

It should be noted that, the step 102B2 may be performed before the step 102B1, or may be performed simultaneously with the step 102B1, which does not affect the implementation of the present embodiment.

When all preset articulation points corresponding to all targets in the target image are identified by using the preset second model, and the intimacy between all preset articulation points of each target in the target image is determined by using the preset third model, the obtained two parts of data can be used for determining the position information of the articulation point of each target in the target image as the detected limb movement information of the target. In this implementation manner, the method further includes:

step 102B3, determining the position information of the preset joint point belonging to each target object according to the intimacy between the preset joint points, and using the position information as the limb action information representing the target object.

In specific implementation, the results obtained by executing the steps 102B1 and 102B2 are used to determine which preset joints belong to each object from all preset joints of the objects in the object image, that is, determine the position information of the preset joints belonging to each object. The position information of the preset joint point of each object can represent the limb action information of the object.

As an example, all preset joints may be constructed as a full connection graph, and the position information of the preset joints belonging to the same object may be determined in a maximum weight binary matching manner.

Therefore, when the target image comprises a plurality of targets, the limb actions of the targets in the target image can be determined by combining the output results obtained by the preset second model and the preset third model and combining the two models.

Whether the target image comprises one target object or a plurality of target objects, the limb action information used for representing the target objects in the target image can be detected by utilizing a model preset in the device for generating the virtual image, so that preparation is made for the subsequent generation of the virtual image.

After detecting the limb movement information of the target object, the apparatus for generating the avatar may execute step 103, and generate the avatar according to the detected limb movement information. In a specific implementation, the step 103 may specifically include:

acquiring an image role corresponding to a target object;

and determining limb model data of the avatar character according to the limb motion information of the target object, and generating an avatar consistent with the limb motion of the target object.

The limb model of the character figure refers to a model of a minimum limb structure capable of representing limb actions of the character figure and consists of two preset joint points with direct connection relation and connecting line segments thereof. For example, if the avatar character corresponding to the target is a bear, the limb model of the avatar character corresponding to the target is divided into: left forearm, left upper arm, right forearm, right upper arm, left thigh, left calf, right thigh, right calf. The right forearm of the little bear is composed of a preset joint point corresponding to the right wrist of the little bear, a preset joint point corresponding to the right elbow and a connecting line between the two preset joint points.

The apparatus for generating an avatar may generate an avatar including different limb actions by combining different positions of limb models of avatar characters. For example, the right forearm and the right upper arm of the bear are directed to the right and near horizontal, and it is possible to determine the limb movements of the right arm of the bear as: the right arm is lifted to the horizontal.

In some possible implementations, according to the limb motion information of the target object, limb model data of the avatar character is determined, and an avatar consistent with the limb motion of the target object is generated, which specifically may be:

according to the limb movement information of the target object, determining limb model data of the figure character corresponding to the limb movement information of the target object from a preset virtual figure library, wherein the preset virtual figure library comprises the corresponding relation between the limb movement information of the target object and the limb model data of the figure character;

and combining the limb model data of the determined avatar character to generate an avatar consistent with the limb action of the target object.

It is understood that an avatar library may be previously established for storing correspondence between limb motion information of a character and limb model data of each avatar character. That is, the preset avatar library includes a plurality of sets of correspondence relationships, which are limb motion information, avatar characters, and limb model data corresponding to the limb motion information in the avatar characters. Thus, for one character figure, there is limb model data for each angle, corresponding to a plurality of different limb movement information. It will be appreciated that in the same correspondence, the object and character are the same limb type. For example, both arms, or both legs, etc.

For example, the preset virtual image library comprises the corresponding relation between the data of the arms of the people under different actions and the arm model data of the young girls at all angles; there is also correspondence between data of the legs of the person under different actions and leg model data of the young girl fighter at various angles. For another example: the preset virtual image library also comprises the corresponding relation between the data of the arms of the person under different actions and the data of the arm model of the vinylic bear at all angles; there is also correspondence between the data of the legs of the person under different actions and the leg model data of the vinylic bear at various angles.

In the specific implementation, according to the limb action information of a target object, a first step is carried out, wherein a first data set related to a selected image role is searched from a pre-established virtual image library; a second step of determining, for each piece of limb movement information in the limb movement information of the target object, a second data set related to the limb corresponding to the limb movement information from the first data set searched; and thirdly, searching the limb model data of the character corresponding to the limb motion information (such as the consistent angle) of the target object from the second data set according to the corresponding relation between the limb motion information of the target object and the limb model data of the character. And executing the second step and the third step for each limb of the avatar, determining limb model data corresponding to each limb, and finally, combining the limb model data corresponding to all the determined limbs together to generate the avatar consistent with the limb action of the target object.

For example, assuming that the selected character is a young girl fighter, the limb motion information of the target object is obtained as follows: the standing posture, first, obtain the first data correlated to the beauty girl fighter in the virtual image base; for the left arm, searching second data related to the left arm in the obtained first data related to the beauty warrior, and determining limb model data corresponding to the vertical downward posture of the left arm in the second data related to the left arm; similarly, for the right arm, searching second data related to the right arm in the obtained first data related to the young girl warrior, and determining limb model data corresponding to the vertical downward posture of the right arm in the second data related to the right arm; and by analogy, determining limb model data of all limbs of the beauty warrior consistent with the limb actions of the target object. Finally, all of the limb model data determined are combined together to produce a "standing position" of a young girl fighter.

In other possible implementations, when the target image includes a target object, step 103 specifically includes: firstly, acquiring an image role corresponding to a target object; secondly, determining limb model data corresponding to the image role according to detected limb motion information of the target object, namely position information of a preset joint point of the target object; and thirdly, determining the generated avatar according to the limb model data as the generated avatar consistent with the limb action of the target object.

For example, as shown in fig. 3, it is assumed that the character corresponding to the target object a is set as a bear. First, a limb model of a little bear is obtained: left forearm, left upper arm, right forearm, right upper arm, left thigh, left calf, right thigh, right calf. Based on the detected limb movement information A1 (X ₁ ，Y ₁ )，A2(X ₂ ，Y ₂ )、…、A12(X ₁₂ ，Y ₁₂ ) The right arm of the nail is lifted upwards and rightwards, 15 degrees is formed between the right arm and the horizontal direction, the left arm hangs downwards and leftwards, 45 degrees is formed between the left arm and the horizontal direction, and the like. Further, it is possible to determine data of the left and right arms of the small Xiong Duiying, that is, the right arm of the small bear is lifted up to the right, 15 degrees from the horizontal, the left arm is hung down to the left, 45 degrees from the horizontal, and other limb model data of the small bear. As shown in the image on the right side of fig. 3, a little bear in accordance with the limb movements of the nail is generated from all limb model data of the determined character figure.

In another case, when the target image includes a plurality of targets, the specific implementation procedure of step 103 is: the method comprises the steps that firstly, a limb model of an image role corresponding to each object in n objects is obtained; secondly, determining limb model data of an image character corresponding to a first object according to the detected limb motion information of the first object in the limb motion information; thirdly, determining the generated virtual image according to the limb model data, wherein the generated virtual image is used as a first virtual image which is generated and is consistent with the limb action of a first target object; similarly, the operations of the second and third steps are performed for each object until the avatars corresponding to the n objects are determined.

For example, as shown in fig. 4, the target image 2 includes targets a and b. The image role corresponding to the first image is set as a little bear in advance, and the image role corresponding to the second image is set as a young girl fighter. Head partFirst, the limb models of the bear and the young girl fighter are obtained by: left forearm, left upper arm, right forearm, right upper arm, left thigh, left calf, right thigh, right calf. Based on the detected limb movement information B1 (X ₁ ，Y ₁ )，B2(X ₂ ，Y ₂ )、…、B12(X ₁₂ ，Y ₁₂ ) With the implementation shown in fig. 3, the position information of the limb model of the bear is determined, and the generated bear is consistent with the limb action of the nail. Then, based on the detected limb movement information C1 (Z ₁ ，W ₁ )，C2(Z ₂ ，W ₂ )、…、C12(Z ₁₂ ，W ₁₂ ) And determining limb model data of the beauty girl warrior, and further generating the beauty girl warrior consistent with the second limb action. Finally, the generated beauty warrior and the bear are combined to determine the avatar generated according to the target image 2.

In addition, if the proportional relationship between the limbs of the set avatar character is consistent with the proportional relationship between the corresponding limbs of the target object, generating the avatar according to the detected limb motion information in step 103 may specifically include: forming a first triangle by three preset articulation points which are not on the same straight line in the preset articulation points of the target object; forming a second triangle by three preset joint points corresponding to the virtual image in the image roles corresponding to the virtual image; the first triangle and the second triangle are similar, and the similarity ratio is known, so that the position information of other two preset articulation points can be determined according to the position information of a certain known preset articulation point in three preset articulation points of the image role corresponding to the virtual image. Similarly, according to the implementation manner, the position information of all preset joint points in the image roles corresponding to the virtual images can be determined, so that the virtual images consistent with the limb actions of the target objects are generated.

For example, assume that the position information of three of the preset nodes of the target object is: a is that ₁ (25，185)，B ₁ (50，180)，C ₁ (70，190)，A ₁ →B ₁ →C ₁ Sequentially connected to form the targetAnd a right arm. According to preference, the set image role corresponding to the virtual information is a little bear, and three joint points corresponding to the right arm of the little bear are corresponding to A ₂ 、B ₂ 、C ₂ Wherein A is ₂ Is (25, 185), and the upper limb ratio of the known target and the little bear is 5:1. at this time, according to DeltaA ₁ B ₁ C ₁ And DeltaA ₂ B ₂ C ₂ Similarly, the similarity ratio is 5, and B can be calculated ₂ Is (30, 184), C ₂ Is (34, 186). And the like, the position information of all preset nodes of the avatar can be determined, so that the avatar is generated.

It can be understood that when the target image includes a plurality of targets, the avatar may still be generated by using the implementation manner, that is, the above operation may be performed on each avatar role in the avatar, so that each avatar role is consistent with the limb action of the corresponding target, which is not described herein.

The avatar generated in this embodiment may be the avatar itself whose limb motion corresponds to the target object, or may be an image including the avatar. In the image including the avatar, the background of the image may be set by the user, or may be a default background of the device for generating the avatar, or may be a background in the target image, which is not specifically limited in this embodiment.

The above description is based on the scene where one target image is acquired, and in a real application, there are naturally cases where the acquired target image is a plurality of video images acquired in a target video. For the case that the target image is a plurality of video images obtained in the target video, the operation of step 102 needs to be sequentially performed for all video images in the target video, that is, limb motion information of the target object is detected for each video image.

After all video images in the target video are detected, generating an avatar according to the motion information, wherein the avatar is consistent with the motion of the target object, specifically comprises the following steps:

and generating an animation video of the virtual image according to the limb motion information of the target object in the video images, wherein the virtual image in the animation video is consistent with the limb motion of the target object in the target video.

In specific implementation, the process of generating the animated video of the avatar is: the method comprises the steps that firstly, limb action information of a target object detected by all video images is determined for each video image in a target video; generating a first animation image which corresponds to the first video image and comprises an avatar according to the determined limb motion information of the target object in the first video image, wherein the limb motion of the avatar in the first animation image is consistent with the limb motion of the target object in the first video image; and similarly, for each determined video image, executing the operation of the second step until corresponding animation images are generated for all video images in the target video. And thirdly, combining the plurality of animation images generated in the second step into an animation video according to the time sequence of the video images corresponding to the animation images in the target video, wherein the animation video is the animation video which is generated by the target video and comprises the virtual image.

It can be appreciated that the limb motion of the avatar in each animated image of the generated animated video is consistent with the limb motion of the object in the video image corresponding to that animated image.

By using the method provided by the embodiment, the server can map the limb actions of the target object to the virtual figures for each video image in the target video, obtain the animation images comprising the virtual figures consistent with the limb actions of the target object, and further realize the mapping from the target video to the animation video. Compared with the mode of mapping the limb actions of the target object to the virtual images by using special equipment so as to generate the animation video, the method for generating the animation video reduces the cost of generating the animation video and enables the generation of the animation video to be simpler and more convenient.

In other application scenarios, the action of the target object may also be a facial expression, and the facial expression of the generated avatar is required to be consistent with the facial expression of the target object.

For example, in some live scenes, it is not enough that the anchor uses the generated avatar to face the audience instead of the anchor, and only the limb motion of the avatar is kept consistent with the limb motion of the anchor, and the facial expression of the avatar is also required to be consistent with the facial expression of the anchor, so that the audience can know the emotion and expression of the anchor in real time through the generated avatar, and the viewing experience of the audience is improved.

In some implementations, detecting motion information characterizing a target object from a target image includes:

The facial expression information can have various embodiments. In some examples, the facial expression information may be state data of the five sense organs of the subject, such as: the eyes: closing, squinting, opening, etc.; mouth: beep mouth, tuck mouth, open mouth, raise mouth corner, etc. In other examples, the facial expression information may also be a plurality of feature points of the five sense organs obtained using a face detection algorithm and position information of the feature points. In still other examples, the facial expression information may also be location information of feature points of the five sense organs, and corresponding state data, such as: the eyes: [ (I) ₁ ，J ₁ )、(I ₂ ，J ₂ )、(I ₃ ，J ₃ )、(I ₄ ，J ₄ ) Closing]Wherein, "(I) ₁ ，J ₁ )、(I ₂ ，J ₂ )、(I ₃ ，J ₃ )、(I ₄ ，J ₄ ) "position coordinates of 4 feature points of the eye," closed "means the state of the eye.

As an example, the apparatus for generating an avatar may detect and recognize a face of a target object in a target image through a preset machine learning model to obtain facial expression information of the target object.

The preset machine learning model is a model which can determine facial expression information of a person in an image through the image. The input of the preset machine learning model is an image, and the image is output as facial expression information of a person in the image. This facial expression information may be used to characterize the facial expression of a person, such as: smile, laugh, sad, etc.

The preset machine learning model is a model which is preset on the device for generating the virtual image and is obtained after training through massive training samples. The training samples are a plurality of third training images and facial expression information of a target object corresponding to each third training image. That is, an initial machine learning model is known, along with a vast number of third training images and facial expression information known to the target in each third training image. The specific training process is as follows: and taking each third training image in all the third training images as input of an initial machine learning model, taking the facial expression information of the target object corresponding to the third training images as output, continuously adjusting a preset initial machine learning model, and obtaining a machine learning model after training is finished, namely the preset machine learning model.

In specific implementation, the facial expression information of the target object in the target image can be output by inputting the target image into the trained preset machine learning model.

For example, for the target image 1, a preset machine learning model is assumed to be model 4. Then, the target image 1 is input into the model 1, and the facial expression information of the target object a in the target image 1 is output: (I) ₁ ，J ₁ )、(I ₂ ，J ₂ )、…、(I ₅₀ ，J ₅₀ ). It can be understood that the coordinate points of the above facial expression information are distributed at the five sense organs of the object a in the target image 1. The current facial expression of the nail can be known by analyzing the obtained facial expression information.

After facial expression information of the target object in the target image is obtained, the virtual image is processed according to the facial expression information, so that the virtual image is consistent with the facial expression of the target object.

Facial expression, which means that various emotional states are expressed by the change of states of five sense organs such as eyes, mouth, etc., is an important non-language interaction means.

The facial expression of the avatar is consistent with the facial expression of the target object, which means that the emotion of the avatar represented by the face of the avatar is the same as the facial expression of the target object. For example, the facial expression of the object in the target image is: smiling, the facial expression of the generated avatar is also: smiling, the representation avatar remains consistent with the facial expression of the target object.

In a specific implementation, according to the detected facial expression information of the target object, the avatar generated in this embodiment may be further processed, so that the facial expression of the avatar is consistent with the facial expression of the target object.

For example, for the generated avatar 1, it is detected that the current facial expression information of the target object a corresponding to the avatar 1 is (I) ₁ ，J ₁ )、(I ₂ ，J ₂ )、…、(I ₅₀ ，J ₅₀ ) And according to analysis of the detected facial expression information, the current facial expression of the nail is as follows: sadness. Then, based on the detected facial expression information (I ₁ ，J ₁ )、(I ₂ ，J ₂ )、…、(I ₅₀ ，J ₅₀ ) The avatar 1 is further processed to obtain an avatar 2. Wherein the facial expression of the avatar 2 is sad and is consistent with the current facial expression of the nail.

Step 104 may be performed at any time between steps 101 to 105. Step 105 may be performed after step 103 or may be performed simultaneously with step 103. The time when the steps 104 and 105 are specifically performed is not particularly limited in the present embodiment.

It will be appreciated that steps 104 and 105 may be performed with respect to one target image, multiple target images, or multiple video images in a target video; of course, each target image may be one target object or a plurality of target objects. Specifically, several target images, where several targets exist in each target image, do not affect the implementation manner of this embodiment.

In addition, other implementations are possible to generate an avatar that is consistent with the facial expression of the target, such as: as an example, the apparatus for generating an avatar may also pre-establish an avatar expression library for pre-storing facial data of the avatar character under different expression types.

It can be appreciated that, in order to make the facial expression of the generated avatar coincide with the facial expression of the target object, the expression type (such as smile, surprise, etc.) of the facial expression characterizing the target object may be detected from the target image first; then, searching third data related to the selected character from a pre-established virtual character expression library; then, determining the five sense organs data corresponding to the detected expression type from the third data; and finally, using the determined facial data as facial data for generating the virtual image, and generating the virtual image consistent with the facial expression of the target object.

For example, assuming that the selected character is a young girl warrior, first, the expression type of the target is detected, specifically: smile; then, obtaining data related to the beauty and young warriors in the virtual image expression library; then, searching five sense organ data corresponding to smile in the obtained data related to the beauty girl fighter; finally, a "smiling" beauty warrior is generated using the determined five sense organs data, that is, an avatar is generated in conformity with the facial expression of the target object.

After the limb actions of the target object in the target image are mapped to the avatar by using the method provided by the embodiment, the generated avatar is further processed by using the facial expression of the target object in the target image, so that the final avatar is obtained. Compared with the method for mapping the limb actions of the target object to the virtual images by using special equipment, the method of the embodiment maps the facial expressions of the target object to the generated virtual images, reduces the cost for generating the virtual images and enhances the user experience.

Exemplary apparatus

Referring to fig. 5, an apparatus for generating an avatar in an embodiment of the present invention is shown. In this embodiment, the apparatus includes:

an acquiring module 501, configured to acquire a target image acquired by an image acquisition device;

the detection module 502 is configured to detect, from the target image, motion information that characterizes a target object;

and a generating module 503, configured to generate an avatar according to the action information, where the avatar is consistent with the action of the target object.

Optionally, the detection module includes:

Optionally, the generating module includes:

Optionally, the generating module further includes:

Optionally, the detection module includes:

The embodiment is an embodiment of a device corresponding to the embodiment of the method for generating an avatar, and specific implementation manner and achieved technical effects may refer to the description of the embodiment of the method for generating an avatar, which is not repeated herein.

Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the apparatus 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 may include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the device 600. Examples of such data include instructions for any application or method operating on the apparatus 600, contact data, phonebook data, messages, pictures, videos, and the like. The memory 604 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 606 provides power to the various components of the device 600. The power supply components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen between the device 600 and the user that provides an output interface. In some embodiments, the screen may include a liquid crystal display (LAD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front camera and/or a rear camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 600 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIA) configured to receive external audio signals when the apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor assembly 614 may detect the on/off state of the device 600, the relative positioning of the components, such as the display and keypad of the apparatus 600, the sensor assembly 614 may also detect the change in position of the apparatus 600 or one of the components of the apparatus 600, the presence or absence of user contact with the apparatus 600, the orientation or acceleration/deceleration of the apparatus 600, and the change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 614 may also include a light sensor, such as an AMOS or AAD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communication between the apparatus 600 and other devices in a wired or wireless manner. The device 600 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 616 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a near field communication (NFA) module to facilitate short range communications. For example, the NFA module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more application specific integrated circuits (ASIA), digital Signal Processors (DSP), digital Signal Processing Devices (DSPD), programmable Logic Devices (PLD), field Programmable Gate Arrays (FPGA), controllers, microcontrollers, microprocessors, or other electronic components for executing the methods described above.

Specifically, an embodiment of the present invention provides an apparatus for generating an avatar, which may be embodied as an apparatus 600, including a memory 604, and one or more programs, wherein the one or more programs are stored in the memory 604 and configured to be executed by the one or more processors 620, the one or more programs including instructions for:

Acquiring a target image acquired by image acquisition equipment;

acquiring an image role corresponding to the target object;

Embodiments of the present invention also provide a non-transitory computer-readable storage medium, such as memory 604, comprising instructions executable by processor 620 of apparatus 600 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), AD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform a method of generating an avatar, the method comprising:

acquiring a target image acquired by image acquisition equipment;

acquiring an image role corresponding to the target object;

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method of generating an avatar, the method comprising:

Acquiring a target image acquired by image acquisition equipment;

detecting limb motion information representing a target object from the target image;

acquiring an image role corresponding to the target object according to the shape characteristics of the target object;

searching a first data set related to the character of the character from a preset virtual character library; in the virtual character library, for one character, limb data models of all angles exist, and the limb data models correspond to a plurality of different limb action information;

for each piece of limb motion information in the limb motion information of the target object, determining a second data set related to the limb corresponding to each piece of limb motion information from the searched first data set;

searching limb model data corresponding to each limb of the persona corresponding to each limb motion information of the target object from the second data set according to the corresponding relation between the limb motion information of the target object and the limb model data of the persona; the limb model is a model of a minimum limb structure representing limb actions of the character figure and is composed of two preset joint points with direct connection relation and connecting line segments thereof;

And combining the limb model data corresponding to all the limbs of the determined avatar character to generate the avatar consistent with the limb action of the target object.

2. The method of claim 1, wherein when the target image includes one of the targets, the detecting limb movement information characterizing the target from the target image includes:

3. The method of claim 1, wherein when the target image includes at least two of the targets, the detecting limb movement information characterizing the targets from the target image comprises:

4. A method according to any one of claims 1-3, wherein when the target image comprises a plurality of video images obtained from a target video, the generating an avatar according to the motion information, the avatar being consistent with the motion of the target object, comprises:

5. The method of claim 1, wherein detecting motion information characterizing a target object from the target image comprises:

6. An apparatus for generating an avatar, the apparatus comprising:

the generation module is used for generating an virtual image according to the action information, and the virtual image is consistent with the action of the target object;

the detection module comprises:

a first detection unit for detecting limb movement information characterizing the target object from the target image;

the generation module comprises:

the acquisition unit is used for acquiring the image roles corresponding to the target object according to the physical characteristics of the target object;

a third determining unit, configured to search a preset avatar library for a first data set related to the avatar role; in the virtual character library, for one character, limb data models of all angles exist, and the limb data models correspond to a plurality of different limb action information; for each piece of limb motion information in the limb motion information of the target object, determining a second data set related to the limb corresponding to each piece of limb motion information from the searched first data set; searching limb model data corresponding to each limb of the persona corresponding to each limb motion information of the target object from the second data set according to the corresponding relation between the limb motion information of the target object and the limb model data of the persona; the limb model is a model of a minimum limb structure representing limb actions of the character figure and is composed of two preset joint points with direct connection relation and connecting line segments thereof;

And the combining unit is used for combining the limb model data corresponding to all the limbs of the determined avatar character to generate the avatar consistent with the limb action of the target object.

7. The apparatus of claim 6, wherein when the target image includes one of the targets, the first detecting unit includes:

8. The apparatus according to claim 6, wherein when the target image includes at least two of the targets, the first detection unit includes:

9. The apparatus according to any one of claims 6-8, wherein when the target image includes a plurality of video images obtained from a target video, the generating module is specifically configured to:

10. The apparatus of claim 6, wherein the detection module comprises:

11. An apparatus for generating an avatar, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-5.

12. A non-transitory computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of any one of claims 1 to 5.