CN114527873A - Virtual character model control method and device and electronic equipment - Google Patents
Virtual character model control method and device and electronic equipment Download PDFInfo
- Publication number
- CN114527873A CN114527873A CN202210126954.1A CN202210126954A CN114527873A CN 114527873 A CN114527873 A CN 114527873A CN 202210126954 A CN202210126954 A CN 202210126954A CN 114527873 A CN114527873 A CN 114527873A
- Authority
- CN
- China
- Prior art keywords
- dimensional
- data
- key point
- fully
- connected network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000000875 corresponding effect Effects 0.000 claims abstract description 89
- 230000001276 controlling effect Effects 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 24
- 230000009471 action Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 2
- 238000012549 training Methods 0.000 abstract description 15
- 210000003414 extremity Anatomy 0.000 description 68
- 238000010801 machine learning Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 210000000707 wrist Anatomy 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Processing Or Creating Images (AREA)
Abstract
According to the method, the device and the electronic equipment for controlling the virtual character model, different first key point data sets are extracted aiming at different limb parts, different first prediction models are used for processing to obtain first key point three-dimensional position and pose data corresponding to the different limb parts, and then all groups of first key point three-dimensional position and pose data are synthesized to control the same virtual character model to execute corresponding actions. Therefore, decoupling of the first key point three-dimensional pose data corresponding to different limbs can be achieved under the condition that training samples of the prediction model are few, and therefore when the first key point three-dimensional pose data of each group are used for controlling the same virtual human model together, wrong limb linkage of the virtual human model is avoided.
Description
Technical Field
The application relates to the technical field of image processing, in particular to a virtual character model control method, a virtual character model control device and electronic equipment.
Background
In some image processing scenarios, keypoint identification and prediction can be performed on a two-dimensional video image of a person, so as to obtain three-dimensional pose data (such as spatial position coordinates and pose angles) of the limb keypoints of the person for modeling or model control. For example, in some live broadcast scenes, the positions of the key points of the limbs of a human body can be identified for a two-dimensional live broadcast video image acquired from a main broadcast terminal, then three-dimensional pose data prediction is carried out according to coordinate position data of the key points of the limbs in the two-dimensional image, the three-dimensional pose data of the key points of the limbs are acquired, and finally the corresponding virtual character model is driven to simulate the action of the main broadcast according to the three-dimensional pose data. The action of obtaining the three-dimensional pose data according to the prediction of the two-dimensional coordinate data of the key points is usually executed by a machine learning model, but due to the limitation of the number of training samples or the diversity of the training sample data of the machine learning model, the three-dimensional pose data corresponding to relatively independent limbs in a prediction result is possibly too high in coupling, so that the subsequent modeling or model control process generates wrong limb linkage, and the modeling or model control effect is influenced.
Disclosure of Invention
In order to overcome the above disadvantages in the prior art, the present application aims to provide a virtual character model control method, including:
acquiring key point two-dimensional coordinate data of a target person from the two-dimensional image;
aiming at least two limb parts in four limbs of a human body, respectively extracting at least two groups of corresponding first key point data sets from the key point two-dimensional coordinate data;
inputting the at least two groups of first key point data sets into at least two different first prediction models respectively for processing to obtain first key point three-dimensional pose data respectively corresponding to the at least two limb parts;
and controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points.
In a possible implementation manner, the step of extracting, for at least two limb parts in a human limb, at least two corresponding sets of first keypoint data sets from the keypoint two-dimensional coordinate data respectively includes:
and extracting key point two-dimensional coordinate data corresponding to the limb part and key point two-dimensional coordinate data corresponding to the body part from the key point two-dimensional coordinate data as a first key point data set of the limb part for each limb part of the at least two limb parts.
In a possible implementation manner, the three-dimensional pose data of the first key point includes spatial position data and pose angle data of a preset joint point in a corresponding limb part.
In one possible implementation, the method further includes:
taking the two-dimensional coordinate data of the key points of the target person as a second key point data set as a whole;
inputting the second key point data set into a second prediction model for processing to obtain overall three-dimensional pose data, wherein the overall three-dimensional pose data comprises third key point three-dimensional pose data corresponding to the at least two limb parts and second key point three-dimensional pose data corresponding to the trunk part;
the step of controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points comprises the following steps:
and replacing the third key point three-dimensional pose data in the whole three-dimensional pose data by using the first key point three-dimensional pose data, and controlling the virtual character model to execute corresponding actions by using the replaced whole three-dimensional pose data.
In one possible implementation, the at least two limb portions comprise a left arm and a right arm, and the at least two different first predictive models comprise a left arm predictive model and a right arm predictive model;
the left arm prediction model comprises a left arm first full-connection network, a left arm second full-connection network and a left arm third full-connection network which are connected in sequence; the input of the left arm first fully connected network is the first keypoint data set of the 44-dimensional left arm, and the output of the left arm first fully connected network is 512-dimensional data; the input of the left arm second fully-connected network is 512-dimensional data output by the left arm first fully-connected network, and the output of the left arm second fully-connected network is 512-dimensional data; the input of the left arm third fully-connected network is 512-dimensional data output by the left arm second fully-connected network, and the output of the left arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the left arm;
the right arm prediction model comprises a right arm first full-connection network, a right arm second full-connection network and a right arm third full-connection network which are connected in sequence; the input of the right arm first fully connected network is the first keypoint data set of the right arm with 44 dimensions, and the output of the right arm first fully connected network is data with 512 dimensions; the input of the right arm second fully-connected network is 512-dimensional data output by the right arm first fully-connected network, and the output of the right arm second fully-connected network is 512-dimensional data; the input of the right arm third fully-connected network is 512-dimensional data output by the right arm second fully-connected network, and the output of the right arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the right arm;
the second prediction model comprises a first trunk full-connection network, a second trunk full-connection network and a third trunk full-connection network which are connected in sequence; the input of the first trunk fully-connected network is 48-dimensional second key point three-dimensional pose data, and the output of the first trunk fully-connected network is 512-dimensional data; the input of the second trunk fully-connected network is 512-dimensional data output by the first trunk fully-connected network, and the output of the second trunk fully-connected network is 512-dimensional data; the input of the third fully-connected network of the trunk is 512-dimensional data output by the second fully-connected network of the trunk, and the output of the third fully-connected network of the trunk is 144-dimensional integral three-dimensional pose data.
In one possible implementation, the method further includes:
extracting a second key point data set comprising body part key points from the key point two-dimensional coordinate data;
inputting the second key point data set into a second prediction model for processing to obtain second key point three-dimensional pose data;
the step of controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points comprises the following steps:
and controlling the same virtual character model to execute corresponding actions by using the three-dimensional pose data of the first key point and the three-dimensional pose data of the second key point.
In a possible implementation manner, the step of obtaining two-dimensional coordinate data of key points of the target person from the two-dimensional image includes:
acquiring key point two-dimensional coordinate data of a main broadcast user from a first direct broadcast video image of the main broadcast user; the first live video image is the two-dimensional image;
the step of controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points comprises the following steps:
and controlling a virtual character model corresponding to the anchor user to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points, so that the virtual character model executes actions similar to those of the anchor user.
Another object of the present application is to provide a virtual character model control apparatus, including:
the acquisition module is used for acquiring key point two-dimensional coordinate data of a target person from the two-dimensional image;
the extraction module is used for extracting at least two groups of corresponding first key point data sets from the key point two-dimensional coordinate data aiming at least two limb parts in human limbs;
the prediction module is used for inputting the at least two groups of first key point data sets into at least two different first prediction models respectively for processing to obtain first key point three-dimensional pose data respectively corresponding to the at least two limb parts;
and the model control module is used for controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points.
Another object of the present application is to provide an electronic device, which includes a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are executed by the processor, the method for controlling a virtual character model provided in the present application is implemented.
Another object of the present application is to provide a machine-readable storage medium storing machine-executable instructions, which when executed by one or more processors, implement the virtual character model control method provided by the present application.
Compared with the prior art, the method has the following beneficial effects:
according to the virtual character model control method, the virtual character model control device and the electronic equipment, different first key point data sets are extracted aiming at different limb parts, different first prediction models are used for processing to obtain first key point three-dimensional pose data corresponding to the different limb parts, and then all groups of first key point three-dimensional pose data are synthesized to control the same virtual character model to execute corresponding actions. Therefore, decoupling between the first key point three-dimensional pose data corresponding to different limbs can be achieved under the condition that training samples of the prediction model are few, and therefore when the same virtual human model is controlled by using all groups of first key point three-dimensional pose data, wrong limb linkage of the virtual human model is avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flow chart illustrating steps of a virtual character model control method according to an embodiment of the present application.
Fig. 2 is a schematic view of a live broadcast system provided in an embodiment of the present application.
Fig. 3 is a schematic view of an electronic device according to an embodiment of the present application.
Fig. 4 is a functional module schematic diagram of a virtual character model control device provided in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present application, it is noted that the terms "first", "second", "third", and the like are used merely for distinguishing between descriptions and are not intended to indicate or imply relative importance.
In the description of the present application, it is also to be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected. Either mechanically or electrically. They may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
The inventor researches and discovers that the process of obtaining the three-dimensional pose data of the key points of the limbs of the person according to the two-dimensional video image prediction of the person is usually executed by a trained machine learning model. In the training process of the machine learning model, a person wearing the three-dimensional pose sensor generally executes various actions, the three-dimensional pose data of the limb key points acquired by the three-dimensional pose sensor is acquired, and two-dimensional images are acquired by a two-dimensional image acquisition device (such as a camera) to obtain the two-dimensional coordinates of the limb key points on the two-dimensional images. And then, the two-dimensional coordinates and the three-dimensional pose data of each limb key point are used as training samples, and the three-dimensional pose data of the limb key points are predicted by a training machine learning model according to the two-dimensional coordinates of the limb key points.
In the method, due to the display of the number of training samples or the diversity of the training sample data, a large number of scenes in which different limbs act simultaneously may exist in the training samples, but few scenes in which a certain limb moves independently exist. For example, there may be a large number of scenes in the training sample where the left and right hands are active together, and few scenes where only one hand is active. The three-dimensional pose data prediction result output by the trained machine learning model always deviates to the movement of different limbs, and the problem of the transitional coupling of the three-dimensional pose data of different limbs exists, so that the wrong limb linkage occurs in the subsequent model reconstruction or model control. For example, only left-handed motion exists in an actual two-dimensional image, but the prediction result of the machine learning model is left-handed motion and slight motion also exists in the right hand.
In view of the discovery and research of the above problems, the present embodiment provides a solution that can reduce the occurrence of false limb linkage of the virtual character model, and the solution provided by the present embodiment is explained in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a virtual character model control method provided in this embodiment, and the method including various steps will be described in detail below.
Step S110, two-dimensional coordinate data of key points of the target person is acquired from the two-dimensional image.
In this embodiment, the limb key points may correspond to various limb joints of the target person, such as shoulders, elbows, wrists, and the like. In a possible implementation manner, a pre-trained key point recognition model may perform image recognition on a two-dimensional image (e.g., a two-dimensional video image acquired through a camera) including the target person, so as to determine positions of each limb key of the target person in the two-dimensional image, and further obtain key point two-dimensional coordinate data of the target person.
And step S120, aiming at least two limb parts in four limbs of the human body, respectively extracting at least two corresponding groups of first key point data sets from the key point two-dimensional coordinate data.
In this embodiment, at least two corresponding first key point data sets may be extracted from the key point two-dimensional coordinate data for at least two human body limb portions capable of moving independently according to the result of identifying the key points of the human body in the two-dimensional image. For example, the left arm (including the left shoulder, the left upper arm, the left lower arm, and the left hand) and the right arm (including the right shoulder, the right upper arm, the right lower arm, and the right hand) are two limbs which can move relatively independently, in this embodiment, at least one set of first key point data set corresponding to the left arm and one set of first key point data set corresponding to the right arm may be extracted from the key point two-dimensional coordinate data. The first keypoint data set may include two-dimensional coordinate data of keypoints corresponding to a critical joint of a limb portion, for example, the first keypoint data set of the left arm includes at least two-dimensional coordinate data of keypoints of the left elbow and the left wrist, and the first keypoint data set of the right arm includes at least two-dimensional coordinate data of keypoints of the right elbow and the right wrist.
Optionally, in this embodiment, each of the first key point data sets may correspond to different limb parts, for example, the first key point data set corresponding to the left arm may not include coordinate data of each key point corresponding to the right arm, and the first key point data set corresponding to the right arm may not include coordinate data of each key point corresponding to the left arm.
Step S130, the at least two groups of first key point data sets are respectively input into at least two different first prediction models to be processed, and first key point three-dimensional pose data respectively corresponding to the at least two limb parts are obtained.
In this embodiment, the at least two different first prediction models are machine learning models that may not share network parameters. It will be appreciated that in some cases the at least two different first predictive models may have the same model network structure, but may have different model parameters depending on the training samples.
In this embodiment, the first keypoint data sets corresponding to different limb portions may be input to different first prediction models for relatively independent prediction, so that the predicted three-dimensional pose data of each group of first keypoints is decoupled. For example, different first prediction models are respectively input into the first key point data sets of the left arm and the right arm for prediction, so that the two-dimensional coordinate data of the key point corresponding to the right arm does not influence the three-dimensional pose data of the first key point corresponding to the left arm, and the two-dimensional coordinate data of the key point corresponding to the left arm does not influence the three-dimensional pose data of the first key point corresponding to the right arm, thereby realizing the decoupling of the three-dimensional pose data of the first key point between the left arm and the right arm.
The first key point three-dimensional pose data may include spatial position data and pose angle data of a preset joint point in a corresponding limb portion. The spatial position data may be three-dimensional spatial position coordinates corresponding to the joint points, and the posture angle data may be represented by rotational change angles of the joint points in three directions in the three-dimensional space compared with the initial posture.
And step S140, controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points.
In this embodiment, the limb parts corresponding to the virtual character model may be respectively controlled according to each group of the three-dimensional pose data of the first key point. And the three-dimensional pose data of the first key points of each group are decoupled, so that wrong limb linkage of the virtual character model is avoided when the three-dimensional pose data of the first key points of each group are used for controlling the same virtual character model together.
In one possible implementation manner, in step S120, for each of the at least two limb parts, the first keypoint data set including the keypoint two-dimensional coordinate data corresponding to the limb part and the keypoint two-dimensional coordinate data corresponding to the body part is extracted from the keypoint two-dimensional coordinate data.
For example, since the left arm is connected to the trunk and has a strong associated linkage relationship, in this embodiment, when the first key point data set corresponding to the left arm is obtained, the key point two-dimensional coordinate data corresponding to each key joint (such as the left elbow, the left wrist, etc.) of the left arm and the key point two-dimensional coordinate data corresponding to the trunk portion may be extracted from the key point two-dimensional coordinate data as the first key point data set of the left arm. Therefore, when the three-dimensional pose data of the left arm is predicted subsequently, the prediction result can be more accurate according to the two-dimensional coordinate data of the key points of the left arm and the trunk part.
When the virtual character model is controlled to move, the three-dimensional pose data of the body part and the three-dimensional pose data of the trunk part may be needed. Thus, in one possible implementation, the method may further comprise the following steps.
Step S210 is to use the whole of the two-dimensional coordinate data of the key points of the target person as the second key point data set.
Step S220, inputting the second key point data set into a second prediction model for processing, and obtaining overall three-dimensional pose data, wherein the overall three-dimensional pose data comprises third key point three-dimensional pose data corresponding to the at least two limb parts and second key point three-dimensional pose data corresponding to the trunk part.
In this embodiment, since all limbs are connected to the trunk, in this implementation manner, in order to accurately predict the three-dimensional pose data of the trunk, the whole of the two-dimensional coordinate data of the key points of the target person may be input to the second prediction model as the second key point data set for processing. Wherein the data output by the second predictive model may include third keypoint three-dimensional pose data corresponding to the at least two limb portions and second keypoint three-dimensional pose data corresponding to a torso portion. Wherein the torso part may include the upper body except the left and right arms, such as body, neck, and head.
It can be understood that, in this embodiment, due to the display of the number of training samples or the diversity of training samples, there may be an over-coupling condition in the three-dimensional pose data of each limb portion corresponding to the third keypoint. Therefore, in step S140, the first keypoint three-dimensional pose data may be used to replace the third keypoint three-dimensional pose data in the overall three-dimensional pose data, and the overall three-dimensional pose data after replacement processing may be used to control the virtual character model to perform the corresponding action. Therefore, the decoupled prediction result (the first key three-dimensional pose data) is used for replacing the prediction result (the third key three-dimensional pose data) which is possibly coupled excessively in the overall three-dimensional pose data, and the overall three-dimensional pose data after replacement processing is used for controlling the virtual character model to execute corresponding actions, so that the situation that the virtual character model has wrong linkage of limb actions can be avoided.
Specifically, in this embodiment, taking the prediction processing of the three-dimensional pose data of the upper body of the target person as an example, the at least two limb portions include a left arm and a right arm, and the at least two different first prediction models include a left arm prediction model and a right arm prediction model.
In this case, the left arm prediction model comprises a left arm first fully connected network, a left arm second fully connected network and a left arm third fully connected network connected in sequence. The input to the left arm first fully connected network is the first keypoint data set for the left arm in 44 dimensions, which may include two-dimensional coordinate data for the 22 joints of the left arm and torso. The output of the first full-connection network of the left arm is 512-dimensional data, the input of the second full-connection network of the left arm is 512-dimensional data output by the first full-connection network of the left arm, and the output of the second full-connection network of the left arm is 512-dimensional data. The input of the left arm third fully-connected network is 512-dimensional data output by the left arm second fully-connected network, and the output of the left arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the left arm, wherein the data can comprise 6D pose data of 2 joints of the left elbow and the left wrist.
The right arm prediction model comprises a right arm first full-connection network, a right arm second full-connection network and a right arm third full-connection network which are connected in sequence. The input to the right arm first fully connected network is the first keypoint data set for the right arm in 44 dimensions, which may include two-dimensional coordinate data for the right arm and 22 joints of the torso. The output of the right arm first full-connection network is 512-dimensional data, the input of the right arm second full-connection network is 512-dimensional data output by the right arm first full-connection network, and the output of the right arm second full-connection network is 512-dimensional data. The input of the right arm third fully-connected network is 512-dimensional data output by the right arm second fully-connected network, and the output of the right arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the right arm, wherein the data can comprise 6D pose data of 2 joints of the right elbow and the right wrist.
The second prediction model comprises a first trunk full-connection network, a second trunk full-connection network and a third trunk full-connection network which are connected in sequence. The input of the first fully-connected network of the torso is 48-dimensional three-dimensional pose data of the second keypoints, which may include two-dimensional coordinate data of 24 joints of the left arm, the right arm, and the torso. The output of the first fully-connected network of the trunk is 512-dimensional data, the input of the second fully-connected network of the trunk is 512-dimensional data output by the second fully-connected network of the trunk, the output of the second fully-connected network of the trunk is 512-dimensional data, the input of the third fully-connected network of the trunk is 512-dimensional data output by the second fully-connected network of the trunk, the output of the third fully-connected network of the trunk is 144-dimensional integral three-dimensional pose data, which can include 6D pose data of 24 joints of the left arm, the right arm and the trunk.
As another possible implementation, the method further includes the following steps.
Step S310, extracting a second keypoint data set including the torso part keypoints from the keypoint two-dimensional coordinate data.
And step S320, inputting the second key point data set into a second prediction model for processing to obtain the three-dimensional pose data of the second key point.
And in step S140, the first and second keypoint three-dimensional pose data may be used to control the same virtual character model to perform corresponding actions.
Wherein the torso portion may include parts of the upper body of the human body other than the left and right arms, such as the body, neck, and head. In the implementation mode, the key point two-dimensional coordinate data corresponding to the trunk part is independently extracted, and the independent prediction model is used for processing, so that the predicted key point three-dimensional pose data of the trunk part and other limb parts are also decoupled, and the subsequent generation of wrong limb or body linkage in modeling or control of the model is further avoided.
In this embodiment, the above scheme can be applied to avatar control in a live system. The two-dimensional image containing the target task can be a live broadcast video image obtained from a live broadcast terminal, and the virtual character model can be a virtual character image corresponding to a anchor.
Specifically, referring to fig. 2, fig. 2 is a schematic diagram of a live system, which may include a main broadcast terminal 201, a server 202 and a viewer terminal 203.
The anchor user can shoot a live video image through the anchor terminal 201, and the live video image can contain a half-body or whole-body image of the anchor user.
The server 202 may be a stand-alone device or a cluster of multiple cooperating devices. The server 202 may obtain two-dimensional coordinate data of key points of the anchor user from a live video image of the anchor user, and predict and obtain three-dimensional pose data of first key points corresponding to each limb part according to the two-dimensional coordinate data of the key points. And then controlling a virtual character model corresponding to the anchor user to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points, so that the virtual character model executes actions similar to those of the anchor user. The server 202 may then send a second live video image containing the avatar and send the second live video image to the spectator terminal 203 or the anchor terminal 201 for display.
Based on the same inventive concept, the present embodiment also provides an electronic device, which may have certain image processing capability, for example, the electronic device may be a personal computer or the server 202 shown in fig. 2.
Referring to fig. 3, fig. 3 is a block diagram of the electronic device 100. The electronic device 100 comprises a virtual character model control device 110, a machine-readable storage medium 120 and a processor 130.
The elements of the machine-readable storage medium 120, the processor 130, and the communication unit 140 are electrically connected to each other, directly or indirectly, to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The virtual character model control device 110 includes at least one software function module which can be stored in the machine readable storage medium 120 in the form of software or firmware (firmware) or is fixed in an Operating System (OS) of the electronic device 100. The processor 130 is configured to execute executable modules stored in the machine-readable storage medium 120, such as software functional modules and computer programs included in the virtual character model control device 110.
The machine-readable storage medium 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The machine-readable storage medium 120 is used for storing a program, and the processor 130 executes the program after receiving an execution instruction.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like. But may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 4, the embodiment further provides a virtual character model control device 110, where the virtual character model control device 110 includes at least one functional module that can be stored in a machine-readable storage medium 120 in a software form. Functionally, the virtual character model control device 110 may include an obtaining module 111, an extracting module 112, a predicting module 113, and a model control module 114.
The obtaining module 111 is configured to obtain two-dimensional coordinate data of key points of a target person from a two-dimensional image.
In this embodiment, the obtaining module 111 may be configured to execute step S110 shown in fig. 1, and reference may be made to the description of step S110 for a detailed description of the obtaining module 111.
The extracting module 112 is configured to extract, for at least two limb portions of a human body, at least two corresponding sets of first key point data sets from the key point two-dimensional coordinate data, respectively.
In this embodiment, the extracting module 112 may be configured to execute step S120 shown in fig. 1, and reference may be made to the description of step S120 for a detailed description of the extracting module 112.
The prediction module 113 is configured to input the at least two groups of first keypoint data sets into at least two different first prediction models respectively for processing, so as to obtain first keypoint three-dimensional pose data corresponding to the at least two limb portions respectively.
In this embodiment, the prediction module 113 may be configured to perform the step S130 shown in fig. 1, and the detailed description about the prediction module 113 may refer to the description about the step S130.
The model control module 114 is configured to control the same virtual character model to execute corresponding actions according to the obtained sets of three-dimensional pose data of the first key points.
In this embodiment, the model control module 114 can be used to execute step S140 shown in fig. 1, and the detailed description about the model control module 114 can refer to the description about step S140.
In summary, according to the method, the device and the electronic device for controlling the virtual character model provided by the embodiment of the application, different first key point data sets are extracted for different limb parts, different first prediction models are used for processing to obtain first key point three-dimensional pose data corresponding to the different limb parts, and then all groups of the first key point three-dimensional pose data are synthesized to control the same virtual character model to execute corresponding actions. Therefore, decoupling between the first key point three-dimensional pose data corresponding to different limbs can be achieved under the condition that training samples of the prediction model are few, and therefore when the same virtual human model is controlled by using all groups of first key point three-dimensional pose data, wrong limb linkage of the virtual human model is avoided.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A virtual character model control method is characterized by comprising the following steps:
acquiring key point two-dimensional coordinate data of a target person from the two-dimensional image;
aiming at least two limb parts in four limbs of a human body, respectively extracting at least two groups of corresponding first key point data sets from the key point two-dimensional coordinate data;
inputting the at least two groups of first key point data sets into at least two different first prediction models respectively for processing to obtain first key point three-dimensional pose data respectively corresponding to the at least two limb parts;
and controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points.
2. The method according to claim 1, wherein the step of extracting at least two corresponding sets of first keypoint data sets from the keypoint two-dimensional coordinate data for at least two limb portions of the human body, respectively, comprises:
and extracting key point two-dimensional coordinate data corresponding to the limb part and key point two-dimensional coordinate data corresponding to the body part from the key point two-dimensional coordinate data as a first key point data set of the limb part for each limb part of the at least two limb parts.
3. The method of claim 2, wherein the first keypoint three-dimensional pose data comprises spatial position data and pose angle data of a pre-determined joint point in the corresponding limb portion.
4. The method of claim 2, further comprising:
taking the two-dimensional coordinate data of the key points of the target person as a second key point data set as a whole;
inputting the second key point data set into a second prediction model for processing to obtain integral three-dimensional pose data; wherein the overall three-dimensional pose data comprises third keypoint three-dimensional pose data corresponding to the at least two limb portions and second keypoint three-dimensional pose data corresponding to a torso portion;
the step of controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points comprises the following steps:
and replacing the third key point three-dimensional pose data in the whole three-dimensional pose data by using the first key point three-dimensional pose data, and controlling the virtual character model to execute corresponding actions by using the replaced whole three-dimensional pose data.
5. The method of claim 4, wherein the at least two limb portions comprise a left arm and a right arm, and the at least two different first predictive models comprise a left arm predictive model and a right arm predictive model;
the left arm prediction model comprises a left arm first full-connection network, a left arm second full-connection network and a left arm third full-connection network which are connected in sequence; the input of the left arm first fully connected network is the first keypoint data set of the 44-dimensional left arm, and the output of the left arm first fully connected network is 512-dimensional data; the input of the left arm second fully-connected network is 512-dimensional data output by the left arm first fully-connected network, and the output of the left arm second fully-connected network is 512-dimensional data; the input of the left arm third fully-connected network is 512-dimensional data output by the left arm second fully-connected network, and the output of the left arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the left arm;
the right arm prediction model comprises a right arm first full-connection network, a right arm second full-connection network and a right arm third full-connection network which are connected in sequence; the input of the right arm first fully connected network is the first keypoint data set of the right arm with 44 dimensions, and the output of the right arm first fully connected network is data with 512 dimensions; the input of the right arm second fully-connected network is 512-dimensional data output by the right arm first fully-connected network, and the output of the right arm second fully-connected network is 512-dimensional data; the input of the right arm third fully-connected network is 512-dimensional data output by the right arm second fully-connected network, and the output of the right arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the right arm;
the second prediction model comprises a first trunk full-connection network, a second trunk full-connection network and a third trunk full-connection network which are connected in sequence; the input of the first trunk fully-connected network is 48-dimensional second key point three-dimensional pose data, and the output of the first trunk fully-connected network is 512-dimensional data; the input of the second trunk fully-connected network is 512-dimensional data output by the first trunk fully-connected network, and the output of the second trunk fully-connected network is 512-dimensional data; the input of the third fully-connected network of the trunk is 512-dimensional data output by the second fully-connected network of the trunk, and the output of the third fully-connected network of the trunk is 144-dimensional integral three-dimensional pose data.
6. The method of claim 2, further comprising:
extracting a second key point data set comprising body part key points from the key point two-dimensional coordinate data;
inputting the second key point data set into a second prediction model for processing to obtain second key point three-dimensional pose data;
the step of controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points comprises the following steps:
and controlling the same virtual character model to execute corresponding actions by using the three-dimensional pose data of the first key point and the three-dimensional pose data of the second key point.
7. The method of claim 1, wherein the step of obtaining the two-dimensional coordinate data of the key points of the target person from the two-dimensional image comprises:
acquiring key point two-dimensional coordinate data of a main broadcast user from a first direct broadcast video image of the main broadcast user; the first live video image is the two-dimensional image;
the step of controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points comprises the following steps:
and controlling a virtual character model corresponding to the anchor user to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points, so that the virtual character model executes actions similar to those of the anchor user.
8. A virtual character model control device is characterized in that the device comprises:
the acquisition module is used for acquiring key point two-dimensional coordinate data of a target person from the two-dimensional image;
the extraction module is used for extracting at least two groups of corresponding first key point data sets from the key point two-dimensional coordinate data aiming at least two limb parts in human limbs;
the prediction module is used for inputting the at least two groups of first key point data sets into at least two different first prediction models respectively for processing to obtain first key point three-dimensional pose data respectively corresponding to the at least two limb parts;
and the model control module is used for controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points.
9. An electronic device comprising a processor and a machine-readable storage medium having stored thereon machine-executable instructions that, when executed by the processor, implement the method of any of claims 1-7.
10. A machine-readable storage medium having stored thereon machine-executable instructions which, when executed by one or more processors, perform the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210126954.1A CN114527873A (en) | 2022-02-11 | 2022-02-11 | Virtual character model control method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210126954.1A CN114527873A (en) | 2022-02-11 | 2022-02-11 | Virtual character model control method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114527873A true CN114527873A (en) | 2022-05-24 |
Family
ID=81621979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210126954.1A Pending CN114527873A (en) | 2022-02-11 | 2022-02-11 | Virtual character model control method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114527873A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118570432A (en) * | 2024-07-30 | 2024-08-30 | 浙江核新同花顺网络信息股份有限公司 | Virtual person posture correction method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488824A (en) * | 2020-04-09 | 2020-08-04 | 北京百度网讯科技有限公司 | Motion prompting method and device, electronic equipment and storage medium |
CN111611903A (en) * | 2020-05-15 | 2020-09-01 | 北京百度网讯科技有限公司 | Training method, using method, device, equipment and medium of motion recognition model |
CN111640172A (en) * | 2020-05-08 | 2020-09-08 | 大连理工大学 | Attitude migration method based on generation of countermeasure network |
CN113705520A (en) * | 2021-09-03 | 2021-11-26 | 广州虎牙科技有限公司 | Motion capture method and device and server |
US20210374989A1 (en) * | 2020-06-02 | 2021-12-02 | Naver Corporation | Distillation of part experts for whole-body pose estimation |
-
2022
- 2022-02-11 CN CN202210126954.1A patent/CN114527873A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488824A (en) * | 2020-04-09 | 2020-08-04 | 北京百度网讯科技有限公司 | Motion prompting method and device, electronic equipment and storage medium |
CN111640172A (en) * | 2020-05-08 | 2020-09-08 | 大连理工大学 | Attitude migration method based on generation of countermeasure network |
CN111611903A (en) * | 2020-05-15 | 2020-09-01 | 北京百度网讯科技有限公司 | Training method, using method, device, equipment and medium of motion recognition model |
US20210374989A1 (en) * | 2020-06-02 | 2021-12-02 | Naver Corporation | Distillation of part experts for whole-body pose estimation |
CN113705520A (en) * | 2021-09-03 | 2021-11-26 | 广州虎牙科技有限公司 | Motion capture method and device and server |
Non-Patent Citations (1)
Title |
---|
高翔;黄法秀;刘春平;陈虎;: "3DMM与GAN结合的实时人脸表情迁移方法", 计算机应用与软件, no. 04, 12 April 2020 (2020-04-12) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118570432A (en) * | 2024-07-30 | 2024-08-30 | 浙江核新同花顺网络信息股份有限公司 | Virtual person posture correction method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110457414B (en) | Offline map processing and virtual object display method, device, medium and equipment | |
US9330470B2 (en) | Method and system for modeling subjects from a depth map | |
CN111488824A (en) | Motion prompting method and device, electronic equipment and storage medium | |
US8374394B2 (en) | Augmented reality method and devices using a real time automatic tracking of marker-free textured planar geometrical objects in a video stream | |
WO2023071964A1 (en) | Data processing method and apparatus, and electronic device and computer-readable storage medium | |
US11132845B2 (en) | Real-world object recognition for computing device | |
CN109815776B (en) | Action prompting method and device, storage medium and electronic device | |
CN110544301A (en) | Three-dimensional human body action reconstruction system, method and action training system | |
CN113449570A (en) | Image processing method and device | |
US20200097732A1 (en) | Markerless Human Movement Tracking in Virtual Simulation | |
US20160210761A1 (en) | 3d reconstruction | |
CN111833457A (en) | Image processing method, apparatus and storage medium | |
CN117635897B (en) | Three-dimensional object posture complement method, device, equipment, storage medium and product | |
CN110211222A (en) | A kind of AR immersion tourism guide method, device, storage medium and terminal device | |
CN113658211A (en) | User posture evaluation method and device and processing equipment | |
CN111508033A (en) | Camera parameter determination method, image processing method, storage medium, and electronic apparatus | |
CN115482556A (en) | Method for key point detection model training and virtual character driving and corresponding device | |
CN112882576A (en) | AR interaction method and device, electronic equipment and storage medium | |
Caliskan et al. | Multi-view consistency loss for improved single-image 3d reconstruction of clothed people | |
Lee et al. | From human pose similarity metric to 3D human pose estimator: Temporal propagating LSTM networks | |
CN114527873A (en) | Virtual character model control method and device and electronic equipment | |
US20230401740A1 (en) | Data processing method and apparatus, and device and medium | |
CN116485953A (en) | Data processing method, device, equipment and readable storage medium | |
CN116266408A (en) | Body type estimating method, body type estimating device, storage medium and electronic equipment | |
CN110148202B (en) | Method, apparatus, device and storage medium for generating image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |