CN114527873A

CN114527873A - Virtual character model control method and device and electronic equipment

Info

Publication number: CN114527873A
Application number: CN202210126954.1A
Authority: CN
Inventors: 钱立辉; 韩欣彤
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-24
Anticipated expiration: 2042-02-11
Also published as: CN114527873B

Abstract

The virtual character model control method, device, and electronic device provided by the embodiments of the present application extract different first key point data sets for different limb parts, and use different first prediction models for processing to obtain different limb parts corresponding to different limb parts. The three-dimensional pose data of the first key point is then combined with each group of three-dimensional pose data of the first key point to control the same virtual character model to perform corresponding actions. In this way, the decoupling between the 3D pose data of the first key point corresponding to different limbs can be realized even when there are few training samples of the prediction model, so that the same set of 3D pose data of the first key point can be controlled together by using each group. Avoid incorrect limb linkages on the avatar model.

Description

Virtual character model control method, device and electronic device

技术领域technical field

本申请涉及图像处理技术领域，具体而言，涉及一种虚拟人物模型控制方法、装置及电子设备。The present application relates to the technical field of image processing, and in particular, to a virtual character model control method, device, and electronic device.

背景技术Background technique

在一些图像处理场景中，可以对人物的二维视频图像进行关键点识别和预测，从而获得人物的肢体关键点的三维位姿数据(如空间位置坐标和姿态角)进行建模或模型控制。例如，在一些直播场景中，可以对从主播终端获取到的二维直播视频图像进行人体肢体关键点位置识别，然后根据肢体关键点在二维图像中的坐标位置数据进行三维位姿数据预测，获得肢体关键点的三维位姿数据，最后根据三维位姿数据驱动对应的虚拟人物模型模仿主播的动作。其中，根据关键点二维坐标数据预测获得三维位姿数据的动作通常是由机器学习模型执行的，但是由于机器学习模型的训练样本数量或者训练样本数据多样性的限制，可能导致在预测结果中相对独立的肢体对应的三维位姿数据耦合性过高，使得后续建模或模型控制过程产生错误的肢体联动，影响建模或模型控制效果。In some image processing scenarios, key points can be identified and predicted on a two-dimensional video image of a person, so as to obtain the three-dimensional pose data (such as spatial position coordinates and attitude angle) of the key points of the person's limbs for modeling or model control. For example, in some live broadcast scenarios, the position of key points of human limbs can be identified on the two-dimensional live video image obtained from the host terminal, and then the three-dimensional pose data can be predicted according to the coordinate position data of the key points of the limbs in the two-dimensional image. Obtain the 3D pose data of the key points of the limbs, and finally drive the corresponding virtual character model to imitate the anchor's actions according to the 3D pose data. Among them, the action of predicting and obtaining three-dimensional pose data according to the two-dimensional coordinate data of key points is usually performed by a machine learning model. However, due to the limitation of the number of training samples of the machine learning model or the diversity of training sample data, it may cause in the prediction results. The coupling of the 3D pose data corresponding to the relatively independent limbs is too high, which makes the subsequent modeling or model control process produce wrong limb linkage, which affects the modeling or model control effect.

发明内容SUMMARY OF THE INVENTION

为了克服现有技术中的上述不足，本申请的目的在于提供一种虚拟人物模型控制方法，所述方法包括：In order to overcome the above deficiencies in the prior art, the purpose of this application is to provide a virtual character model control method, the method comprising:

从二维图像中获取目标人物的关键点二维坐标数据；Obtain the key point two-dimensional coordinate data of the target person from the two-dimensional image;

针对人体四肢中的至少两个肢体部分，分别从所述关键点二维坐标数据中提取对应的至少两组第一关键点数据集；For at least two limb parts of the human body, extract corresponding at least two sets of first key point data sets from the two-dimensional coordinate data of the key points respectively;

将所述至少两组第一关键点数据集分别输入至少两个不同的第一预测模型进行处理，获得分别与所述至少两个肢体部分对应的第一关键点三维位姿数据；Inputting the at least two sets of first key point data sets into at least two different first prediction models for processing, respectively, to obtain three-dimensional pose data of first key points corresponding to the at least two limb parts respectively;

根据获得的各组所述第一关键点三维位姿数据控制同一虚拟人物模型执行相应的动作。Control the same virtual character model to perform corresponding actions according to the obtained three-dimensional pose data of the first key points in each group.

在一种可能的实现方式中，所述针对人体四肢中的至少两个肢体部分，分别从所述关键点二维坐标数据中提取对应的至少两组第一关键点数据集的步骤，包括：In a possible implementation manner, the step of extracting corresponding at least two sets of first key point data sets from the two-dimensional coordinate data of the key points for at least two limb parts of the human body, respectively, includes:

针对所述至少两个肢体部分的每个肢体部分，从所述关键点二维坐标数据中提取包括与该肢体部分对应的关键点二维坐标数据及与躯干部分对应的关键点二维坐标数据作为该肢体部分的第一关键点数据集。For each limb part of the at least two limb parts, extracting from the key point two-dimensional coordinate data includes the key point two-dimensional coordinate data corresponding to the limb part and the key point two-dimensional coordinate data corresponding to the torso part as the first keypoint dataset for this limb part.

在一种可能的实现方式中，所述第一关键点三维位姿数据包括对应的肢体部分中预设关节点的空间位置数据及姿态角度数据。In a possible implementation manner, the three-dimensional pose data of the first key point includes spatial position data and attitude angle data of preset joint points in the corresponding limb part.

在一种可能的实现方式中，所述方法还包括：In a possible implementation, the method further includes:

将目标人物的关键点二维坐标数据整体作为第二关键点数据集；Taking the two-dimensional coordinate data of the key points of the target person as a whole as the second key point data set;

将所述第二关键点数据集输入第二预测模型进行处理，获得整体三维位姿数据，其中，所述整体三维位姿数据包括与所述至少两个肢体部分对应的第三关键点三维位姿数据及与躯干部分对应的第二关键点三维位姿数据；Inputting the second key point data set into a second prediction model for processing to obtain overall three-dimensional pose data, wherein the overall three-dimensional pose data includes the three-dimensional position of the third key point corresponding to the at least two limb parts pose data and the 3D pose data of the second key point corresponding to the torso part;

所述根据获得的各组所述第一关键点三维位姿数据控制同一虚拟人物模型执行相应的动作的步骤，包括：The step of controlling the same virtual character model to perform corresponding actions according to the obtained three-dimensional pose data of each set of the first key points includes:

使用所述第一关键点三维位姿数据替换所述整体三维位姿数据中的第三关键点三维位姿数据，并使用替换处理后的整体三维位姿数据控制所述虚拟人物模型执行相应的动作。Use the first key point three-dimensional pose data to replace the third key point three-dimensional pose data in the overall three-dimensional pose data, and use the replaced overall three-dimensional pose data to control the virtual character model to perform corresponding operations. action.

在一种可能的实现方式中，所述至少两个肢体部分包括左臂和右臂，所述至少两个不同的第一预测模型包括左臂预测模型和右臂预测模型；In a possible implementation manner, the at least two limb parts include a left arm and a right arm, and the at least two different first prediction models include a left arm prediction model and a right arm prediction model;

所述左臂预测模型包括依次连接的左臂第一全连接网络、左臂第二全连接网络和左臂第三全连接网络；所述左臂第一全连接网络的输入为44维的左臂的所述第一关键点数据集，所述左臂第一全连接网络的输出为512维的数据；所述左臂第二全连接网络的输入为所述左臂第一全连接网络输出的512维的数据，所述左臂第二全连接网络的输出为512维的数据；所述左臂第三全连接网络的输入为所述左臂第二全连接网络输出的512维的数据，所述左臂第三全连接网络的输出为12维的左臂的所述第一关键点三维位姿数据；The left arm prediction model includes the first fully connected network of the left arm, the second fully connected network of the left arm, and the third fully connected network of the left arm connected in sequence; the input of the first fully connected network of the left arm is a 44-dimensional left arm. The first key point data set of the arm, the output of the first fully connected network of the left arm is 512-dimensional data; the input of the second fully connected network of the left arm is the output of the first fully connected network of the left arm The output of the left arm second fully connected network is 512-dimensional data; the input of the left arm third fully connected network is the 512-dimensional data output by the left arm second fully connected network , the output of the third fully connected network of the left arm is the 12-dimensional three-dimensional pose data of the first key point of the left arm;

所述右臂预测模型包括依次连接的右臂第一全连接网络、右臂第二全连接网络和右臂第三全连接网络；所述右臂第一全连接网络的输入为44维的右臂的所述第一关键点数据集，所述右臂第一全连接网络的输出为512维的数据；所述右臂第二全连接网络的输入为所述右臂第一全连接网络输出的512维的数据，所述右臂第二全连接网络的输出为512维的数据；所述右臂第三全连接网络的输入为所述右臂第二全连接网络输出的512维的数据，所述右臂第三全连接网络的输出为12维的右臂的所述第一关键点三维位姿数据；The right arm prediction model includes the first fully connected network of the right arm, the second fully connected network of the right arm, and the third fully connected network of the right arm connected in sequence; the input of the first fully connected network of the right arm is a 44-dimensional right arm. The first key point data set of the arm, the output of the first fully connected network of the right arm is 512-dimensional data; the input of the second fully connected network of the right arm is the output of the first fully connected network of the right arm 512-dimensional data, the output of the second fully connected network of the right arm is 512-dimensional data; the input of the third fully connected network of the right arm is the 512-dimensional data output by the second fully connected network of the right arm , the output of the third fully connected network of the right arm is the 12-dimensional three-dimensional pose data of the first key point of the right arm;

所述第二预测模型包括依次连接的躯干第一全连接网络、躯干第二全连接网络和躯干第三全连接网络；所述躯干第一全连接网络的输入为48维的所述第二关键点三维位姿数据，所述躯干第一全连接网络的输出为512维的数据；所述躯干第二全连接网络的输入为所述躯干第一全连接网络输出的512维的数据，所述躯干第二全连接网络的输出为512维的数据；所述躯干第三全连接网络的输入为所述躯干第二全连接网络输出的512维的数据，所述躯干第三全连接网络的输出为144维的所述整体三维位姿数据。The second prediction model includes the first fully connected network of the torso, the second fully connected network of the torso, and the third fully connected network of the torso; the input of the first fully connected network of the torso is the 48-dimensional second key point three-dimensional pose data, the output of the first fully connected network of the torso is 512-dimensional data; the input of the second fully connected network of the torso is the 512-dimensional data output by the first fully connected network of the torso, the said The output of the second fully connected network of the torso is 512-dimensional data; the input of the third fully connected network of the torso is the 512-dimensional data output by the second fully connected network of the torso, and the output of the third fully connected network of the torso is the 144-dimensional overall three-dimensional pose data.

从所述关键点二维坐标数据中提取包括躯干部分关键点的第二关键点数据集；extracting a second keypoint data set including the keypoints of the torso part from the two-dimensional coordinate data of the keypoints;

将所述第二关键点数据集输入第二预测模型进行处理，获得第二关键点三维位姿数据；Inputting the second key point data set into the second prediction model for processing to obtain the three-dimensional pose data of the second key point;

使用所述第一关键点三维位姿数据和所述第二关键点三维位姿数据控制同一虚拟人物模型执行相应的动作。The same virtual character model is controlled to perform corresponding actions by using the three-dimensional pose data of the first key point and the three-dimensional pose data of the second key point.

在一种可能的实现方式中，所述从二维图像中获取目标人物的关键点二维坐标数据的步骤，包括：In a possible implementation manner, the step of obtaining the key point two-dimensional coordinate data of the target person from the two-dimensional image includes:

从主播用户的第一直播视频图像中获取主播用户的关键点二维坐标数据；所述第一直播视频图像为所述二维图像；Acquire two-dimensional coordinate data of key points of the host user from the first live video image of the host user; the first live video image is the two-dimensional image;

根据获得的各组所述第一关键点三维位姿数据，控制与所述主播用户对应的虚拟人物模型执行相应的动作，使所述虚拟人物模型执行与所述主播用户相似的动作。According to the obtained three-dimensional pose data of each set of the first key point, the virtual character model corresponding to the anchor user is controlled to perform corresponding actions, so that the virtual character model performs similar actions to the anchor user.

本申请的另一目的在于提供一种虚拟人物模型控制装置，所述装置包括：Another object of the present application is to provide a virtual character model control device, the device comprising:

获取模块，用于从二维图像中获取目标人物的关键点二维坐标数据；The acquisition module is used to acquire the two-dimensional coordinate data of the key points of the target person from the two-dimensional image;

提取模块，用于针对人体四肢中的至少两个肢体部分，分别从所述关键点二维坐标数据中提取对应的至少两组第一关键点数据集；an extraction module, configured to extract corresponding at least two sets of first key point data sets from the two-dimensional coordinate data of the key points for at least two limb parts of the human body;

预测模块，用于将所述至少两组第一关键点数据集分别输入至少两个不同的第一预测模型进行处理，获得分别与所述至少两个肢体部分对应的第一关键点三维位姿数据；A prediction module, configured to input the at least two sets of first key point data sets into at least two different first prediction models for processing, and obtain the three-dimensional poses of the first key points corresponding to the at least two limb parts respectively data;

模型控制模块，用于根据获得的各组所述第一关键点三维位姿数据控制同一虚拟人物模型执行相应的动作。The model control module is configured to control the same virtual character model to perform corresponding actions according to the obtained three-dimensional pose data of the first key points in each group.

本申请的另一目的在于提供一种电子设备，包括处理器及机器可读存储介质，所述机器可读存储介质存储有机器可执行指令，所述机器可执行指令在被所述处理器执行时，实现本申请提供的虚拟人物模型控制方法。Another object of the present application is to provide an electronic device, including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are executed by the processor. When the control method of the virtual character model provided by the present application is realized.

本申请的另一目的在于提供一种机器可读存储介质，所述机器可读存储介质存储有机器可执行指令，所述机器可执行指令在被一个或多个处理器执行时，实现本申请提供的虚拟人物模型控制方法。Another object of the present application is to provide a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions, when executed by one or more processors, implement the present application Provided avatar model control method.

相对于现有技术而言，本申请具有以下有益效果：Compared with the prior art, the present application has the following beneficial effects:

本申请实施例提供的虚拟人物模型控制方法、装置及电子设备，通过针对不同肢体部分提取不同的第一关键点数据集，并使用不同的第一预测模型进行处理以获得与不同肢体部分对应的第一关键点三维位姿数据，然后综合各组第一关键点三维位姿数据一起控制同一虚拟人物模型执行相应的动作。如此，在预测模型的训练样本较少的情况下也可以实现不同肢体对应的第一关键点三维位姿数据之间的解耦合，从而在使用各组第一关键点三维位姿数据一起控制同一虚拟人物模型时避免虚拟人物模型出现错误的肢体联动。The virtual character model control method, device, and electronic device provided by the embodiments of the present application extract different first key point data sets for different limb parts, and use different first prediction models for processing to obtain different limb parts corresponding to different limb parts. The three-dimensional pose data of the first key point is then combined with each group of three-dimensional pose data of the first key point to control the same virtual character model to perform corresponding actions. In this way, the decoupling between the 3D pose data of the first key point corresponding to different limbs can be realized even when the training samples of the prediction model are few, so that the same set of 3D pose data of the first key point can be controlled together by using each group. Avoid wrong limb linkages on the avatar model.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following drawings will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.

图1为本申请实施例提供的虚拟人物模型控制方法的步骤流程示意图。FIG. 1 is a schematic flowchart of steps of a method for controlling a virtual character model provided by an embodiment of the present application.

图2为本申请实施例提供的直播系统的示意图。FIG. 2 is a schematic diagram of a live broadcast system provided by an embodiment of the present application.

图3为本申请实施例提供的电子设备的示意图。FIG. 3 is a schematic diagram of an electronic device provided by an embodiment of the present application.

图4为本申请实施例提供的虚拟人物模型控制装置的功能模块示意图。FIG. 4 is a schematic diagram of functional modules of an apparatus for controlling a virtual character model provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.

因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

在本申请的描述中，需要说明的是，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In the description of the present application, it should be noted that the terms "first", "second", "third", etc. are only used to distinguish descriptions, and cannot be understood as indicating or implying relative importance.

在本申请的描述中，还需要说明的是，除非另有明确的规定和限定，术语“设置”、“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接。可以是机械连接，也可以是电连接。可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本申请中的具体含义。In the description of this application, it should also be noted that, unless otherwise expressly specified and limited, the terms "arrangement", "installation", "connection" and "connection" should be interpreted in a broad sense, for example, it may be a fixed connection, It can also be a detachable connection, or an integral connection. It can be a mechanical connection or an electrical connection. It can be directly connected, or indirectly connected through an intermediate medium, and it can be the internal communication between two components. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood in specific situations.

经发明人研究发现，在根据人物的二维视频图像预测获得人物肢体关键点三维位姿数据的过程通常由训练好机器学习模型执行。在机器学习模型的训练过程中，通常先由穿戴有三维位姿传感器的人员执行各种动作，通过三维位姿传感器采集到的肢体关键点三维位姿数据，并通过二维图像采集设备(如摄像机)采集二维图像获得在二维图像上的肢体关键点二维坐标。然后将各肢体关键点的二维坐标和三维位姿数据作为训练样本，训练机器学习模型根据肢体关键点的二维坐标对肢体关键点的三维位姿数据进行预测。According to the research of the inventor, it is found that the process of obtaining the three-dimensional pose data of the key points of a person's limbs according to the two-dimensional video image of the person is usually performed by a trained machine learning model. In the training process of the machine learning model, people wearing 3D pose sensors usually perform various actions first. camera) to acquire two-dimensional images to obtain two-dimensional coordinates of limb key points on the two-dimensional images. Then, the two-dimensional coordinates and three-dimensional pose data of each limb key point are used as training samples, and the machine learning model is trained to predict the three-dimensional pose data of the limb key points according to the two-dimensional coordinates of the limb key points.

上述方法中，由于训练样本数量或者训练样本数据多样性的显示，在训练样本中可能存在大量的不同肢体同时动作的场景，但少有出现某个肢体单独活动的场景。例如，训练样本中可能存在大量左手和右手一起活动的场景，少有出现仅活动一只手的场景。这就会导致训练出的机器学习模型输出的三维位姿数据预测结果总是偏向于不同肢体一起活动，存在不同肢体的三维位姿数据过渡耦合的问题，进而导致后续的模型重建或者模型控制中出现错误的肢体联动。例如，实际二维图像中仅存在左手动作，但机器学习模型的预测结果为左手动作且右手也随之存在轻微动作。In the above method, due to the display of the number of training samples or the diversity of training sample data, there may be a large number of scenes in which different limbs act simultaneously in the training samples, but there are few scenes in which a limb moves alone. For example, there may be a large number of scenes in which the left hand and the right hand are active together in the training sample, and there are few scenes in which only one hand is active. This will lead to the prediction result of the 3D pose data output by the trained machine learning model is always biased towards different limbs moving together, and there is a problem of transition coupling between the 3D pose data of different limbs, which will lead to subsequent model reconstruction or model control. An incorrect limb linkage occurred. For example, in the actual 2D image, there is only left hand motion, but the prediction result of the machine learning model is that the left hand motion is followed by a slight motion of the right hand.

有鉴于对上述问题的发现和研究，本实施例提供一种可以减少虚拟人物模型产生错误肢体联动的方案，下面对本实施例提供的方案进行详细阐述。In view of the discovery and research on the above-mentioned problems, this embodiment provides a solution that can reduce false limb linkages generated by a virtual character model. The solution provided by this embodiment is described in detail below.

请参见图1，图1为本实施例提供的一种虚拟人物模型控制方法的流程图，以下将对所述方法包括各个步骤进行详细阐述。Please refer to FIG. 1. FIG. 1 is a flowchart of a method for controlling a virtual character model provided in this embodiment. The method includes various steps in detail below.

步骤S110，从二维图像中获取目标人物的关键点二维坐标数据。Step S110: Acquire two-dimensional coordinate data of key points of the target person from the two-dimensional image.

在本实施例中，所述肢体关键点可以对应于目标人物各个肢体关节，例如，肩部、手肘、手腕等。在一种可能的实现方式中，可以通过预先训练的关键点识别模型对包含所述目标人物的二维图像(如通过摄像头采集的二维视频图像)进行图像识别，从而确定所述目标人物各个肢体关键在所述二位图像中的位置，进而获得所述目标人物的关键点二维坐标数据。In this embodiment, the limb key points may correspond to various limb joints of the target person, such as shoulders, elbows, wrists, and the like. In a possible implementation manner, a pre-trained key point recognition model may be used to perform image recognition on a two-dimensional image (such as a two-dimensional video image collected by a camera) containing the target person, so as to determine each of the target person The position of the limb key in the two-bit image, and then the two-dimensional coordinate data of the key point of the target person is obtained.

步骤S120，针对人体四肢中的至少两个肢体部分，分别从所述关键点二维坐标数据中提取对应的至少两组第一关键点数据集。Step S120, for at least two limb parts of the human body, extract corresponding at least two sets of first key point data sets from the two-dimensional coordinate data of the key points, respectively.

在本实施例中，可以根据对二维图像的人体关键点识别结果，针对能够独立活动的至少两个人体肢体部分，分别从所述关键点二维坐标数据中提取对应的至少两组第一关键点数据集。例如，左臂(包括左肩、左上臂、左下臂、左手)和右臂(包括右肩、右上臂、右下臂、右手)是可以相对独立活动的两个肢体，则在本实施例中可以从所述关键点二维坐标数据中至少提取一组与左臂对应的第一关键点数据集和一组与右臂对应的第一关键点数据集。所述第一关键点数据集中可以包括对应肢体部分的关键关节的关键点二维坐标数据，例如，左臂的第一关键点数据集至少包括左肘、左腕的关键点二维坐标数据，右臂的第一关键点数据集至少包括右肘、右腕的关键点二维坐标数据。In this embodiment, according to the recognition result of the human body key points of the two-dimensional image, for at least two human body parts that can move independently, at least two groups of corresponding first two-dimensional coordinate data can be extracted from the key point two-dimensional coordinate data respectively. Keypoint dataset. For example, the left arm (including the left shoulder, the upper left arm, the lower left arm, and the left hand) and the right arm (including the right shoulder, the upper right arm, the lower right arm, and the right hand) are two limbs that can move relatively independently. At least one set of first keypoint data sets corresponding to the left arm and a set of first keypoint data sets corresponding to the right arm are extracted from the keypoint two-dimensional coordinate data. The first key point data set may include two-dimensional coordinate data of key points of key joints corresponding to limb parts. For example, the first key point data set of the left arm at least includes the two-dimensional coordinate data of key points of the left elbow and the left wrist, and the right The first key point data set of the arm at least includes the two-dimensional coordinate data of the key points of the right elbow and the right wrist.

可选地，在本实施例中，每个所述第一关键点数据集可以对应不同肢体部分，例如，左臂对应的第一关键点数据集中可以不包括右臂对应的各关键点的坐标数据，右臂对应的第一关键点数据集中，可以不包括左臂对应的各关键点的坐标数据。Optionally, in this embodiment, each of the first key point data sets may correspond to different limb parts, for example, the first key point data set corresponding to the left arm may not include the coordinates of each key point corresponding to the right arm. Data, the first key point data set corresponding to the right arm may not include the coordinate data of each key point corresponding to the left arm.

步骤S130，将所述至少两组第一关键点数据集分别输入至少两个不同的第一预测模型进行处理，获得分别与所述至少两个肢体部分对应的第一关键点三维位姿数据。Step S130: Input the at least two sets of first key point data sets into at least two different first prediction models for processing, and obtain first key point three-dimensional pose data corresponding to the at least two limb parts respectively.

在本实施例中，所述至少两个不同的第一预测模型为可以为不共享网络参数的机器学习模型。可以理解的是，在一些情况下，所述至少两个不同的第一预测模型可以具有相同模型网络结构，但根据训练样本不同，其可以具有不同的模型参数。In this embodiment, the at least two different first prediction models may be machine learning models that do not share network parameters. It can be understood that, in some cases, the at least two different first prediction models may have the same model network structure, but may have different model parameters according to different training samples.

在本实施例中，可以将不同肢体部分对应的第一关键点数据集输入至不同的第一预测模型进行相对独立的预测，因此预测得出的各组第一关键点三维位姿数据解耦合的。例如，左臂和右臂的第一关键点数据集分别输入了不同的第一预测模型进行预测，因此右臂对应的关键点二维坐标数据不会影响左臂对应的第一关键点三维位姿数据，左臂对应的关键点二维坐标数据不会影响右臂对应的第一关键点三维位姿数据，从而实现了左臂和右臂之间第一关键点三维位姿数据的解耦合。In this embodiment, the first key point data sets corresponding to different limb parts can be input into different first prediction models for relatively independent prediction, so the three-dimensional pose data of each group of first key points obtained by prediction are decoupled of. For example, the first keypoint data sets of the left arm and the right arm are respectively input into different first prediction models for prediction, so the two-dimensional coordinate data of the keypoint corresponding to the right arm will not affect the three-dimensional position of the first keypoint corresponding to the left arm. Attitude data, the 2D coordinate data of the key point corresponding to the left arm will not affect the 3D pose data of the first key point corresponding to the right arm, thus realizing the decoupling of the 3D pose data of the first key point between the left arm and the right arm .

其中，所述第一关键点三维位姿数据可以包括对应的肢体部分中预设关节点的空间位置数据及姿态角度数据。所述空间位置数据可以为关节点对应的三维空间位置坐标，所述姿态角度数据可以关节点相较于初始姿态在三维空间三个方向上的转动变化角度表示。Wherein, the three-dimensional pose data of the first key point may include spatial position data and attitude angle data of preset joint points in the corresponding limb part. The space position data may be three-dimensional space position coordinates corresponding to the joint points, and the attitude angle data may be represented by the rotation change angles of the joint points in three directions in the three-dimensional space compared with the initial attitude.

步骤S140，根据获得的各组所述第一关键点三维位姿数据控制同一虚拟人物模型执行相应的动作。Step S140, controlling the same virtual character model to perform corresponding actions according to the obtained three-dimensional pose data of the first key points in each group.

在本实施例中，可以根据各组所述第一关键点三维位姿数据分别控制所述虚拟人物模型对应的肢体部分。由于各组所述第一关键点三维位姿数据是解耦合的，从而在使用各组第一关键点三维位姿数据一起控制同一虚拟人物模型时避免虚拟人物模型出现错误的肢体联动。In this embodiment, the limb parts corresponding to the virtual character model can be controlled respectively according to the three-dimensional pose data of the first key points in each group. Since each group of the 3D pose data of the first key point is decoupled, when using each group of the 3D pose data of the first key point to control the same avatar model together, the erroneous limb linkage of the avatar model is avoided.

在一种可能的实现方式中，步骤S120中可以针对所述至少两个肢体部分的每个肢体部分，从所述关键点二维坐标数据中提取包括与该肢体部分对应的关键点二维坐标数据及与躯干部分对应的关键点二维坐标数据作为该肢体部分的第一关键点数据集。In a possible implementation manner, in step S120, for each limb part of the at least two limb parts, the two-dimensional coordinate data of the key point including the two-dimensional coordinate of the key point corresponding to the limb part may be extracted from the key point two-dimensional coordinate data The data and the two-dimensional coordinate data of the key points corresponding to the body part are used as the first key point data set of the limb part.

例如，由于左臂与躯干相连，具有较强关联的联动关系，因此在本实施例中，获取左臂对应的第一关键点数据集时，可以从所述关键点二维坐标数据中提取左臂各关键关节(如左肘、左腕等)对应的关键点二维坐标数据及与躯干部分对应的关键点二维坐标数据作为左臂的第一关键点数据集。如此，在后续针对左臂进行三维位姿数据预测时，根据左臂和躯干部分的关键点二维坐标数据一起进行预测，可以使预测结果更加准确。For example, since the left arm is connected to the torso and has a strong linkage relationship, in this embodiment, when acquiring the first key point data set corresponding to the left arm, the left arm can be extracted from the two-dimensional coordinate data of the key point. The two-dimensional coordinate data of the key points corresponding to each key joint of the arm (such as the left elbow, the left wrist, etc.) and the two-dimensional coordinate data of the key points corresponding to the torso are used as the first key point data set of the left arm. In this way, when predicting the three-dimensional pose data of the left arm subsequently, the two-dimensional coordinate data of the key points of the left arm and the torso can be predicted together, so that the prediction result can be more accurate.

在控制所述虚拟人物模型进行活动时，除了需要肢体部分的三维位姿数据，可能还需要躯干部分的三维位姿数据。因此，在一种可能的实现方式中，所述方法还可以包括以下步骤。When controlling the virtual character model to perform activities, in addition to the three-dimensional pose data of the limbs, the three-dimensional pose data of the torso may also be required. Therefore, in a possible implementation manner, the method may further include the following steps.

步骤S210，将目标人物的关键点二维坐标数据整体作为第二关键点数据集。Step S210, taking the whole two-dimensional coordinate data of the key points of the target person as the second key point data set.

步骤S220，将所述第二关键点数据集输入第二预测模型进行处理，获得整体三维位姿数据，其中，所述整体三维位姿数据包括与所述至少两个肢体部分对应的第三关键点三维位姿数据及与躯干部分对应的第二关键点三维位姿数据。Step S220, inputting the second key point data set into a second prediction model for processing to obtain overall three-dimensional pose data, wherein the overall three-dimensional pose data includes a third key corresponding to the at least two limb parts point three-dimensional pose data and the second key point three-dimensional pose data corresponding to the body part.

在本实施例中，由于所有肢体均与躯干连接，因此在该实现方式中，为了准确对躯干部分的三维位姿数据进行预测，可以将目标人物的关键点二维坐标数据整体作为第二关键点数据集输入至所述第二预测模型进行处理。其中，所述第二预测模型输出的数据可以包括与所述至少两个肢体部分对应的第三关键点三维位姿数据及与躯干部分对应的第二关键点三维位姿数据。其中，所述躯干部分可以包括人体上半身除左右臂以外部分，如身体、脖子、头部。In this embodiment, since all limbs are connected to the torso, in this implementation, in order to accurately predict the three-dimensional pose data of the torso, the entire two-dimensional coordinate data of the key points of the target person can be used as the second key The point dataset is input to the second prediction model for processing. The data output by the second prediction model may include the three-dimensional pose data of the third key point corresponding to the at least two limb parts and the three-dimensional pose data of the second key point corresponding to the torso part. Wherein, the torso part may include parts of the upper body of the human body other than the left and right arms, such as the body, neck, and head.

可以理解的是，在本实施例中，由于训练样本数量或训练样本多样性的显示，各所述肢体部分对应所述第三关键点三维位姿数据可能存在过度耦合的情况。因此，在步骤S140中，可以使用所述第一关键点三维位姿数据替换所述整体三维位姿数据中的第三关键点三维位姿数据，并使用替换处理后的整体三维位姿数据控制所述虚拟人物模型执行相应的动作。如此，使用解耦合的预测结果(第一关键三维位姿数据)替代整体三维位姿数据中可能过度耦合的预测结果(第三关键三维位姿数据)，再使用替换处理后的整体三维位姿数据控制所述虚拟人物模型执行相应的动作，可以避免虚拟人物模型出现肢体动作错误联动的情况。It can be understood that, in this embodiment, due to the display of the number of training samples or the diversity of training samples, the three-dimensional pose data corresponding to the third key point of each of the limb parts may be over-coupled. Therefore, in step S140, the 3D pose data of the third key point in the overall 3D pose data can be replaced with the 3D pose data of the first key point, and the overall 3D pose data after the replacement process can be used to control The virtual character model performs corresponding actions. In this way, the decoupled prediction results (the first key 3D pose data) are used to replace the possibly over-coupled prediction results (the third key 3D pose data) in the overall 3D pose data, and then the overall 3D pose after replacement is used. The data controls the virtual character model to perform corresponding actions, which can prevent the virtual character model from causing incorrect linkage of body movements.

具体地，在本实施例中，以对目标人物的上半身进行三维位姿数据预测处理为例，所述至少两个肢体部分包括左臂和右臂，所述至少两个不同的第一预测模型包括左臂预测模型和右臂预测模型。Specifically, in this embodiment, taking the prediction processing of three-dimensional pose data on the upper body of the target person as an example, the at least two limb parts include a left arm and a right arm, and the at least two different first prediction models Including left arm prediction model and right arm prediction model.

在此情况下，所述左臂预测模型包括依次连接的左臂第一全连接网络、左臂第二全连接网络和左臂第三全连接网络。所述左臂第一全连接网络的输入为44维的左臂的所述第一关键点数据集，其中可以包括左臂和躯干的22个关节的二位坐标数据。所述左臂第一全连接网络的输出为512维的数据，所述左臂第二全连接网络的输入为所述左臂第一全连接网络输出的512维的数据，所述左臂第二全连接网络的输出为512维的数据。所述左臂第三全连接网络的输入为所述左臂第二全连接网络输出的512维的数据，所述左臂第三全连接网络的输出为12维的左臂的所述第一关键点三维位姿数据，其中可以包括左肘和左腕2个关节的6D位姿数据。In this case, the left arm prediction model includes a first fully connected network of the left arm, a second fully connected network of the left arm, and a third fully connected network of the left arm, which are connected in sequence. The input of the first fully connected network for the left arm is the 44-dimensional first keypoint data set of the left arm, which may include two-dimensional coordinate data of 22 joints of the left arm and the torso. The output of the first fully connected network of the left arm is 512-dimensional data, the input of the second fully connected network of the left arm is the 512-dimensional data output by the first fully connected network of the left arm, and the first fully connected network of the left arm is 512-dimensional data. The output of the second fully connected network is 512-dimensional data. The input of the third fully connected network of the left arm is the 512-dimensional data output by the second fully connected network of the left arm, and the output of the third fully connected network of the left arm is the first fully connected network of the 12-dimensional left arm. Three-dimensional pose data of key points, which can include 6D pose data of two joints of the left elbow and the left wrist.

所述右臂预测模型包括依次连接的右臂第一全连接网络、右臂第二全连接网络和右臂第三全连接网络。所述右臂第一全连接网络的输入为44维的右臂的所述第一关键点数据集，其中可以包括右臂和躯干的22个关节的二位坐标数据。所述右臂第一全连接网络的输出为512维的数据，所述右臂第二全连接网络的输入为所述右臂第一全连接网络输出的512维的数据，所述右臂第二全连接网络的输出为512维的数据。所述右臂第三全连接网络的输入为所述右臂第二全连接网络输出的512维的数据，所述右臂第三全连接网络的输出为12维的右臂的所述第一关键点三维位姿数据，其中可以包括右肘和右腕2个关节的6D位姿数据。The right arm prediction model includes a first fully connected network for the right arm, a second fully connected network for the right arm, and a third fully connected network for the right arm that are connected in sequence. The input of the first fully connected network for the right arm is the 44-dimensional first keypoint data set of the right arm, which may include two-bit coordinate data of 22 joints of the right arm and the trunk. The output of the first fully connected network of the right arm is 512-dimensional data, the input of the second fully connected network of the right arm is the 512-dimensional data output by the first fully connected network of the right arm, the The output of the second fully connected network is 512-dimensional data. The input of the third fully connected network of the right arm is the 512-dimensional data output by the second fully connected network of the right arm, and the output of the third fully connected network of the right arm is the first fully connected network of the 12-dimensional right arm. Three-dimensional pose data of key points, which can include 6D pose data of two joints of the right elbow and the right wrist.

所述第二预测模型包括依次连接的躯干第一全连接网络、躯干第二全连接网络和躯干第三全连接网络。所述躯干第一全连接网络的输入为48维的所述第二关键点三维位姿数据，其中可以包括左臂、右臂和躯干的24个关节的二位坐标数据。所述躯干第一全连接网络的输出为512维的数据，所述躯干第二全连接网络的输入为所述躯干第二全连接网络输出的512维的数据，所述躯干第二全连接网络的输出为512维的数据，所述躯干第三全连接网络的输入为所述躯干第二全连接网络输出的512维的数据，所述躯干第三全连接网络的输出为144维的所述整体三维位姿数据，其中可以包括左臂、右臂和躯干的24个关节的6D位姿数据。The second prediction model includes a first fully connected network of the torso, a second fully connected network of the torso, and a third fully connected network of the torso, which are connected in sequence. The input of the first fully connected network of the torso is the 48-dimensional three-dimensional pose data of the second key point, which may include two-dimensional coordinate data of 24 joints of the left arm, the right arm and the torso. The output of the torso first fully connected network is 512-dimensional data, the input of the torso second fully connected network is the 512-dimensional data output by the torso second fully connected network, and the torso second fully connected network The output is 512-dimensional data, the input of the third fully connected network of the torso is the 512-dimensional data output by the second fully connected network of the torso, and the output of the third fully connected network of the torso is the 144-dimensional Overall 3D pose data, which can include 6D pose data of 24 joints of the left arm, right arm and torso.

作为另一种可能的实现方式，所述方法还包括以下步骤。As another possible implementation manner, the method further includes the following steps.

步骤S310，从所述关键点二维坐标数据中提取包括躯干部分关键点的第二关键点数据集。Step S310, extracting a second key point data set including the key points of the torso part from the two-dimensional coordinate data of the key points.

步骤S320，将所述第二关键点数据集输入第二预测模型进行处理，获得第二关键点三维位姿数据。Step S320, the second key point data set is input into the second prediction model for processing to obtain the three-dimensional pose data of the second key point.

并且在步骤S140中，可以使用所述第一关键点三维位姿数据和所述第二关键点三维位姿数据控制同一虚拟人物模型执行相应的动作。And in step S140, the three-dimensional pose data of the first key point and the three-dimensional pose data of the second key point may be used to control the same virtual character model to perform corresponding actions.

其中，所述躯干部分可以包括人体上半身除左右臂以外的部分，如身体、脖子、头部。在该实现方式中，通过单独提取所述躯干部分对应的关键点二维坐标数据，并且使用单独的预测模型进行处理，从而预测得到的躯干部分也其他肢体部分的关键点三维位姿数据也是解耦的，进一步避免了后续在建模或控制模型时产生错误的肢体或身体联动。Wherein, the torso part may include parts of the upper body of the human body other than the left and right arms, such as the body, neck, and head. In this implementation, by separately extracting the two-dimensional coordinate data of the key points corresponding to the torso part, and using a separate prediction model for processing, the predicted three-dimensional pose data of the key points of the torso part and other limb parts are also solutions. Coupling, further avoiding wrong limb or body linkage when modeling or controlling the model later.

在本实施例中，上述方案可以应用于直播系统中的虚拟形象控制。其中，包含所述目标任务的二维图像可以为从直播终端获取的直播视频图像，所述虚拟人物模型可以为与主播对应的虚拟人物形象。In this embodiment, the above solution can be applied to avatar control in a live broadcast system. Wherein, the two-dimensional image including the target task may be a live video image obtained from a live broadcast terminal, and the virtual character model may be a virtual character image corresponding to the host.

具体地，在一种可能的实现方式中，请参照图2，图2为一种直播系统的示意图，该直播系统可以包括主播终端201、服务器202及观众终端203。Specifically, in a possible implementation manner, please refer to FIG. 2 , which is a schematic diagram of a live broadcast system. The live broadcast system may include a host terminal 201 , a server 202 and a viewer terminal 203 .

主播用户可以通过所述主播终端201拍摄直播视频图像，该直播视频图像中可以包含主播用户的半身或全身图像。The host user may use the host terminal 201 to shoot live video images, and the live video images may include a half-body or full-body image of the host user.

所述服务器202可以是一个独立的设备或者多个协同工作的设备组成的集群。所述服务器202可以从主播用户的直播视频图像中获取主播用户的关键点二维坐标数据，并根据所述关键点二维坐标数据预测获得各肢体部分对应的第一关键点三维位姿数据。再根据获得的各组所述第一关键点三维位姿数据，控制与所述主播用户对应的虚拟人物模型执行相应的动作，使所述虚拟人物模型执行与所述主播用户相似的动作。然后所述服务器202可以将包含所述虚拟形象的第二直播视频图像，并将所述第二直播视频图像发送给观众终端203或主播终端201进行显示。The server 202 may be an independent device or a cluster composed of multiple devices that work together. The server 202 may obtain the two-dimensional coordinate data of the key point of the host user from the live video image of the host user, and predict and obtain the three-dimensional pose data of the first key point corresponding to each limb part according to the two-dimensional coordinate data of the key point. Then, according to the obtained three-dimensional pose data of each set of the first key point, the virtual character model corresponding to the anchor user is controlled to perform corresponding actions, so that the virtual character model performs similar actions to the anchor user. Then, the server 202 may send the second live video image containing the avatar to the viewer terminal 203 or the host terminal 201 for display.

基于相同的发明构思，本实施例还提供一种电子设备，该电子设备可以具有一定的图像处理能力，例如，该电子设备可以为个人电脑或图2所示的服务器202。Based on the same inventive concept, this embodiment also provides an electronic device, which may have a certain image processing capability. For example, the electronic device may be a personal computer or the server 202 shown in FIG. 2 .

请参照图3，图3是所述电子设备100的方框示意图。所述电子设备100包括虚拟人物模型控制装置110、机器可读存储介质120、处理器130。Please refer to FIG. 3 , which is a schematic block diagram of the electronic device 100 . The electronic device 100 includes a virtual character model control apparatus 110 , a machine-readable storage medium 120 , and a processor 130 .

所述机器可读存储介质120、处理器130以及通信单元140各元件相互之间直接或间接地电性连接，以实现数据的传输或交互。例如，这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。所述虚拟人物模型控制装置110包括至少一个可以软件或固件(firmware)的形式存储于所述机器可读存储介质120中或固化在所述电子设备100的操作系统(operating system，OS)中的软件功能模块。所述处理器130用于执行所述机器可读存储介质120中存储的可执行模块，例如所述虚拟人物模型控制装置110所包括的软件功能模块及计算机程序等。The elements of the machine-readable storage medium 120 , the processor 130 and the communication unit 140 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these elements may be electrically connected to each other through one or more communication buses or signal lines. The virtual character model control device 110 includes at least one device that can be stored in the machine-readable storage medium 120 in the form of software or firmware (firmware) or solidified in an operating system (operating system, OS) of the electronic device 100 . Software function modules. The processor 130 is configured to execute executable modules stored in the machine-readable storage medium 120 , such as software function modules and computer programs included in the virtual character model control device 110 .

其中，所述机器可读存储介质120可以是，但不限于，随机存取存储器(RandomAccess Memory，RAM)，只读存储器(Read Only Memory，ROM)，可编程只读存储器(Programmable Read-Only Memory，PROM)，可擦除只读存储器(Erasable ProgrammableRead-Only Memory，EPROM)，电可擦除只读存储器(Electric Erasable ProgrammableRead-Only Memory，EEPROM)等。其中，机器可读存储介质120用于存储程序，所述处理器130在接收到执行指令后，执行所述程序。The machine-readable storage medium 120 may be, but not limited to, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory) , PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable read-only memory (Electric Erasable Programmable Read-Only Memory, EEPROM) and so on. The machine-readable storage medium 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.

所述处理器130可能是一种集成电路芯片，具有信号的处理能力。上述的处理器可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)等。还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 130 may be an integrated circuit chip with signal processing capability. The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), and the like. It may also be a digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

请参照图4，本实施例还提供一种虚拟人物模型控制装置110，虚拟人物模型控制装置110包括至少一个可以软件形式存储于机器可读存储介质120中的功能模块。从功能上划分，虚拟人物模型控制装置110可以包括获取模块111、提取模块112、预测模块113及模型控制模块114。Referring to FIG. 4 , this embodiment further provides a virtual character model control device 110 . The virtual character model control device 110 includes at least one functional module that can be stored in the machine-readable storage medium 120 in the form of software. In terms of functions, the virtual character model control device 110 may include an acquisition module 111 , an extraction module 112 , a prediction module 113 and a model control module 114 .

所述获取模块111用于从二维图像中获取目标人物的关键点二维坐标数据。The acquiring module 111 is used for acquiring the two-dimensional coordinate data of the key points of the target person from the two-dimensional image.

本实施例中，所述获取模块111可用于执行图1所示的步骤S110，关于所述获取模块111的具体描述可参对所述步骤S110的描述。In this embodiment, the acquiring module 111 may be configured to execute the step S110 shown in FIG. 1 , and for the specific description of the acquiring module 111 , please refer to the description of the step S110 .

所述提取模块112用于针对人体四肢中的至少两个肢体部分，分别从所述关键点二维坐标数据中提取对应的至少两组第一关键点数据集。The extraction module 112 is configured to extract corresponding at least two sets of first key point data sets from the two-dimensional coordinate data of the key points for at least two limb parts of the human body, respectively.

本实施例中，所述提取模块112可用于执行图1所示的步骤S120，关于所述提取模块112的具体描述可参对所述步骤S120的描述。In this embodiment, the extraction module 112 may be configured to execute the step S120 shown in FIG. 1 , and for the specific description of the extraction module 112 , please refer to the description of the step S120 .

所述预测模块113用于将所述至少两组第一关键点数据集分别输入至少两个不同的第一预测模型进行处理，获得分别与所述至少两个肢体部分对应的第一关键点三维位姿数据。The prediction module 113 is configured to input the at least two sets of first key point data sets into at least two different first prediction models for processing, respectively, to obtain three-dimensional first key points corresponding to the at least two limb parts respectively. pose data.

本实施例中，所述预测模块113可用于执行图1所示的步骤S130，关于所述预测模块113的具体描述可参对所述步骤S130的描述。In this embodiment, the prediction module 113 may be configured to execute the step S130 shown in FIG. 1 , and for the specific description of the prediction module 113 , please refer to the description of the step S130 .

所述模型控制模块114用于根据获得的各组所述第一关键点三维位姿数据控制同一虚拟人物模型执行相应的动作。The model control module 114 is configured to control the same virtual character model to perform corresponding actions according to the obtained sets of three-dimensional pose data of the first key points.

本实施例中，所述模型控制模块114可用于执行图1所示的步骤S140，关于所述模型控制模块114的具体描述可参对所述步骤S140的描述。In this embodiment, the model control module 114 may be configured to execute the step S140 shown in FIG. 1 , and for the specific description of the model control module 114 , please refer to the description of the step S140 .

综上所述，本申请实施例提供的虚拟人物模型控制方法、装置及电子设备，通过针对不同肢体部分提取不同的第一关键点数据集，并使用不同的第一预测模型进行处理以获得与不同肢体部分对应的第一关键点三维位姿数据，然后综合各组第一关键点三维位姿数据一起控制同一虚拟人物模型执行相应的动作。如此，在预测模型的训练样本较少的情况下也可以实现不同肢体对应的第一关键点三维位姿数据之间的解耦合，从而在使用各组第一关键点三维位姿数据一起控制同一虚拟人物模型时避免虚拟人物模型出现错误的肢体联动。To sum up, the virtual character model control method, device, and electronic device provided by the embodiments of the present application extract different first key point data sets for different limb parts, and use different first prediction models for processing to obtain the same The three-dimensional pose data of the first key points corresponding to different limb parts are then combined with each group of three-dimensional pose data of the first key points to control the same virtual character model to perform corresponding actions. In this way, the decoupling between the 3D pose data of the first key point corresponding to different limbs can be realized even when the training samples of the prediction model are few, so that the same set of 3D pose data of the first key point can be controlled together by using each group. Avoid incorrect limb linkages on the avatar model.

在本申请所提供的实施例中，应该理解到，所揭露的装置和方法，也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现方式中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。In the embodiments provided in this application, it should be understood that the disclosed apparatus and method may also be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, the flowcharts and block diagrams in the accompanying drawings illustrate the architectures, functions and possible implementations of apparatuses, methods and computer program products according to various embodiments of the present application. operate. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.

另外，在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分，也可以是各个模块单独存在，也可以两个或两个以上模块集成形成一个独立的部分。In addition, each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist independently, or two or more modules may be integrated to form an independent part.

所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

以上所述，仅为本申请的各种实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应所述以权利要求的保护范围为准。The above are only various embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application, All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

1. A virtual character model control method is characterized by comprising the following steps:

acquiring key point two-dimensional coordinate data of a target person from the two-dimensional image;

aiming at least two limb parts in four limbs of a human body, respectively extracting at least two groups of corresponding first key point data sets from the key point two-dimensional coordinate data;

inputting the at least two groups of first key point data sets into at least two different first prediction models respectively for processing to obtain first key point three-dimensional pose data respectively corresponding to the at least two limb parts;

and controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points.

2. The method according to claim 1, wherein the step of extracting at least two corresponding sets of first keypoint data sets from the keypoint two-dimensional coordinate data for at least two limb portions of the human body, respectively, comprises:

and extracting key point two-dimensional coordinate data corresponding to the limb part and key point two-dimensional coordinate data corresponding to the body part from the key point two-dimensional coordinate data as a first key point data set of the limb part for each limb part of the at least two limb parts.

3. The method of claim 2, wherein the first keypoint three-dimensional pose data comprises spatial position data and pose angle data of a pre-determined joint point in the corresponding limb portion.

4. The method of claim 2, further comprising:

taking the two-dimensional coordinate data of the key points of the target person as a second key point data set as a whole;

inputting the second key point data set into a second prediction model for processing to obtain integral three-dimensional pose data; wherein the overall three-dimensional pose data comprises third keypoint three-dimensional pose data corresponding to the at least two limb portions and second keypoint three-dimensional pose data corresponding to a torso portion;

the step of controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points comprises the following steps:

and replacing the third key point three-dimensional pose data in the whole three-dimensional pose data by using the first key point three-dimensional pose data, and controlling the virtual character model to execute corresponding actions by using the replaced whole three-dimensional pose data.

5. The method of claim 4, wherein the at least two limb portions comprise a left arm and a right arm, and the at least two different first predictive models comprise a left arm predictive model and a right arm predictive model;

the left arm prediction model comprises a left arm first full-connection network, a left arm second full-connection network and a left arm third full-connection network which are connected in sequence; the input of the left arm first fully connected network is the first keypoint data set of the 44-dimensional left arm, and the output of the left arm first fully connected network is 512-dimensional data; the input of the left arm second fully-connected network is 512-dimensional data output by the left arm first fully-connected network, and the output of the left arm second fully-connected network is 512-dimensional data; the input of the left arm third fully-connected network is 512-dimensional data output by the left arm second fully-connected network, and the output of the left arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the left arm;

the right arm prediction model comprises a right arm first full-connection network, a right arm second full-connection network and a right arm third full-connection network which are connected in sequence; the input of the right arm first fully connected network is the first keypoint data set of the right arm with 44 dimensions, and the output of the right arm first fully connected network is data with 512 dimensions; the input of the right arm second fully-connected network is 512-dimensional data output by the right arm first fully-connected network, and the output of the right arm second fully-connected network is 512-dimensional data; the input of the right arm third fully-connected network is 512-dimensional data output by the right arm second fully-connected network, and the output of the right arm third fully-connected network is 12-dimensional first key point three-dimensional pose data of the right arm;

the second prediction model comprises a first trunk full-connection network, a second trunk full-connection network and a third trunk full-connection network which are connected in sequence; the input of the first trunk fully-connected network is 48-dimensional second key point three-dimensional pose data, and the output of the first trunk fully-connected network is 512-dimensional data; the input of the second trunk fully-connected network is 512-dimensional data output by the first trunk fully-connected network, and the output of the second trunk fully-connected network is 512-dimensional data; the input of the third fully-connected network of the trunk is 512-dimensional data output by the second fully-connected network of the trunk, and the output of the third fully-connected network of the trunk is 144-dimensional integral three-dimensional pose data.

6. The method of claim 2, further comprising:

extracting a second key point data set comprising body part key points from the key point two-dimensional coordinate data;

inputting the second key point data set into a second prediction model for processing to obtain second key point three-dimensional pose data;

and controlling the same virtual character model to execute corresponding actions by using the three-dimensional pose data of the first key point and the three-dimensional pose data of the second key point.

7. The method of claim 1, wherein the step of obtaining the two-dimensional coordinate data of the key points of the target person from the two-dimensional image comprises:

acquiring key point two-dimensional coordinate data of a main broadcast user from a first direct broadcast video image of the main broadcast user; the first live video image is the two-dimensional image;

and controlling a virtual character model corresponding to the anchor user to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points, so that the virtual character model executes actions similar to those of the anchor user.

8. A virtual character model control device is characterized in that the device comprises:

the acquisition module is used for acquiring key point two-dimensional coordinate data of a target person from the two-dimensional image;

the extraction module is used for extracting at least two groups of corresponding first key point data sets from the key point two-dimensional coordinate data aiming at least two limb parts in human limbs;

the prediction module is used for inputting the at least two groups of first key point data sets into at least two different first prediction models respectively for processing to obtain first key point three-dimensional pose data respectively corresponding to the at least two limb parts;

and the model control module is used for controlling the same virtual character model to execute corresponding actions according to the obtained three-dimensional pose data of each group of the first key points.

9. An electronic device comprising a processor and a machine-readable storage medium having stored thereon machine-executable instructions that, when executed by the processor, implement the method of any of claims 1-7.

10. A machine-readable storage medium having stored thereon machine-executable instructions which, when executed by one or more processors, perform the method of any one of claims 1-7.