CN116091622A

CN116091622A - Object action data processing method and device, electronic equipment and storage medium

Info

Publication number: CN116091622A
Application number: CN202310003605.5A
Authority: CN
Inventors: 陈刚; 赵培尧; 周严; 王斌
Original assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-05-09

Abstract

The disclosure relates to an object action data processing method, an object action data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an image frame sequence containing a target object and object pose data corresponding to each image frame in the image frame sequence; the object pose data are obtained based on a pose measurement module arranged on the target object; predicting the motion data of the target object in the target image frame to obtain predicted motion data corresponding to the target image frame; the target image frame is an image frame which is not subjected to motion data prediction in the image frame sequence; and performing error constraint processing on the predicted action data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain target action data corresponding to the target image frame. The method and the device can improve the accuracy of determining the motion data of the target object, and further improve the accuracy of motion driving.

Description

Object action data processing method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a method and a device for processing object motion data, electronic equipment and a storage medium.

Background

Object pose estimation is an important research field of computer vision, and is aimed at performing object pose estimation based on given input information, and can be applied to many application scenarios, such as motion recognition, motion detection, movies, animation, virtual reality, man-machine interaction, motion analysis, and the like.

In the related art, the prediction of the object motion data can be generally performed based on the image information acquired by the monocular camera system, and the problem of inaccurate prediction of the object motion data is caused because the monocular vision may have vision occlusion.

Disclosure of Invention

The disclosure provides an object motion data processing method, an object motion data processing device, electronic equipment and a storage medium, so as to at least solve the problem of inaccurate object motion data prediction in the related art. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided an object action data processing method, including:

acquiring an image frame sequence containing a target object and object pose data corresponding to each image frame in the image frame sequence; the object pose data are obtained based on a pose measurement module arranged on the target object;

Predicting the motion data of the target object in the target image frame to obtain predicted motion data corresponding to the target image frame; the target image frame is an image frame which is not subjected to motion data prediction in the image frame sequence;

and performing error constraint processing on the predicted action data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain target action data corresponding to the target image frame.

In an exemplary embodiment, the performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain target motion data corresponding to the target image frame includes:

performing action driving processing based on the predicted action data to obtain predicted driving data; the prediction driving data is observation data corresponding to the prediction action data;

performing error constraint processing on the prediction driving data based on the target image frame and the object pose data corresponding to the target image frame to obtain target driving data;

The target motion data is determined based on the target drive data.

performing key point detection based on the target image frame to obtain detection key point information of the target object;

performing action driving processing based on the predicted action data to obtain predicted key point information of the target object;

and performing error constraint processing on the predicted action data based on the error information between the detection key point information and the prediction key point information and the object pose data corresponding to the target image frame to obtain the target action data.

In an exemplary embodiment, the detection key point information includes detection confidence degrees corresponding to each of the plurality of target key points;

the performing error constraint processing on the predicted action data based on the error information between the detected key point information and the predicted key point information and the object pose data corresponding to the target image frame to obtain the target action data includes:

Obtaining a target confidence threshold corresponding to the target image frame based on the detection confidence corresponding to each of the plurality of target key points in the target image frame, the detection confidence corresponding to each of the plurality of target key points in the history image frame, and the smooth confidence threshold corresponding to the history image frame; the historical image frames are image frames of the image frame sequence, the time sequence of which is positioned before the target image frame; the smooth confidence threshold is determined based on an average confidence of the plurality of target keypoints in the historical image frames;

determining a first weight corresponding to each of the plurality of target key points in the target image frame based on a preset confidence threshold, a target confidence threshold corresponding to the target image frame and detection confidence corresponding to each of the plurality of target key points in the target image frame; the preset confidence coefficient threshold value is smaller than a target confidence coefficient threshold value corresponding to the target image frame;

based on the first weights corresponding to the target key points, carrying out weighting processing on the error information of the detection key point information and the prediction key point information to generate first weighted error information;

And performing error constraint processing on the predicted action data based on the first weighted error information and the object pose data corresponding to the target image frame to obtain the target action data.

In an exemplary embodiment, the performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame, to obtain the target motion data corresponding to the target image frame includes:

transforming the object pose data corresponding to the target image frame from an inertial coordinate system to a global coordinate system to obtain target bone orientation data of the target object under the global coordinate system;

performing action driving processing based on the predicted action data to obtain predicted bone orientation data of the target object under a global coordinate system;

generating bone orientation error information based on the target bone orientation data and the predicted bone orientation data;

and performing error constraint processing on the predicted action data based on the target image frame and the bone orientation error information to obtain the target action data.

In an exemplary embodiment, the performing error constraint processing on the predicted motion data based on the target image frame and the bone orientation error information to obtain the target motion data includes:

Performing key point detection based on the target image frame to obtain detection key point information of the target object; the detection key point information comprises detection confidence degrees corresponding to a plurality of target key points respectively; the target object has a corresponding relation with a plurality of bones in the target image frame and a plurality of target key points;

determining a second weight corresponding to any bone and a third weight corresponding to the target key point corresponding to any bone under the condition that the detection confidence coefficient of the target key point corresponding to any bone is smaller than a target confidence coefficient threshold; the second weight is greater than the third weight;

weighting the bone orientation error information based on a second weight corresponding to any bone to generate second weighted error information;

based on a third weight corresponding to the target key point corresponding to any bone, carrying out weighting processing on error information between the detection information of the target key point and the prediction information of the target key point to obtain third weighted error information; the predicted information of the target key point is obtained based on action driving processing of the predicted action data;

And performing error constraint processing on the predicted action data based on the second weighted error information and the third weighted error information to obtain the target action data.

In an exemplary embodiment, the predicted action data includes object shape prediction data of the target object;

the performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain target motion data corresponding to the target image frame includes:

determining shape error information based on object shape prediction data of the target object in the target image frame, shape smoothing data corresponding to a historical image frame; the historical image frames are image frames of the image frame sequence, the time sequence of which is positioned before the target image frame; the shape smoothing data is obtained based on time sequence smoothing of object shape data of the target object in the historical image frame;

and performing error constraint processing on the predicted action data based on the shape error information, the target image frame and the object pose data corresponding to the target image frame to obtain the target action data.

In an exemplary embodiment, the predicted motion data includes joint prediction data of the target object;

obtaining joint error information based on joint prediction data corresponding to the target image frame and joint constraint data corresponding to a previous image frame of the target image frame; the joint constraint data corresponding to the previous image frame is obtained based on data error constraint on the joint prediction data of the target object in the previous image frame; the last image frame is an image frame which is positioned before the target image frame in the sequence of image frames in time sequence and is adjacent to the target image frame;

determining a fourth weight based on motion data loss information corresponding to the previous image frame; the motion data loss information corresponding to the previous image frame is determined based on the object pose data corresponding to the previous image frame and constraint driving data corresponding to the previous image frame; the constraint driving data corresponding to the previous image frame is obtained based on data error constraint on the prediction action data corresponding to the previous image frame;

Weighting the joint error information based on the fourth weight to generate fourth weighted error information;

iteratively updating the joint prediction data based on the fourth weighted error information to obtain joint constraint data corresponding to the target image frame;

and performing error constraint processing on the predicted action data based on the target image frame, the object pose data corresponding to the target image frame and the joint constraint data corresponding to the target image frame to obtain the target action data.

In an exemplary embodiment, before the determining the fourth weight based on the motion data loss information corresponding to the previous image frame, the method further includes:

determining constraint driving data corresponding to target action data of the previous image frame;

determining motion data loss information corresponding to the previous image frame based on the object pose data corresponding to the previous image frame and the constraint driving data;

the determining the fourth weight based on the motion data loss information corresponding to the previous image frame includes:

determining the fourth weight based on motion data loss information corresponding to the previous image frame and loss smoothing information corresponding to the previous image frame; and the loss smoothing information corresponding to the previous image frame is obtained by carrying out time sequence smoothing on the action data loss information of the historical image frame with time sequence positioned before the previous image frame in the image frame sequence.

In an exemplary embodiment, the method further includes, before performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain the target motion data corresponding to the target image frame:

acquiring a calibration image frame containing a calibration object and object pose data corresponding to the calibration image frame; the pose measuring module on the calibration object is consistent with the pose measuring module on the target object in arrangement mode;

performing motion data prediction on the calibration object in the calibration image frame to obtain predicted motion data corresponding to the calibration image frame;

performing action driving processing based on the predicted action data corresponding to the calibration image frame to obtain predicted bone orientation data of the calibration object under a global coordinate system;

performing coordinate system transformation processing on the object pose data corresponding to the calibration image frame based on the initial coordinate system transformation parameters to obtain calibration bone orientation data of the calibration object under a global coordinate system;

obtaining calibration orientation error information based on the predicted bone orientation data of the calibration object under the global coordinate system and the calibration bone orientation data;

Iteratively updating the initial coordinate system transformation parameters based on the calibration orientation error information to obtain target coordinate system transformation parameters; the target coordinate system transformation parameters are used for carrying out coordinate system transformation processing on the object pose data corresponding to the target image frames.

According to a second aspect of the embodiments of the present disclosure, there is provided an object action data processing apparatus, including:

a target data acquisition unit configured to perform acquisition of an image frame sequence including a target object, and object pose data corresponding to each image frame in the image frame sequence; the object pose data are obtained based on a pose measurement module arranged on the target object;

a first data prediction unit configured to perform motion data prediction on the target object in a target image frame, and obtain predicted motion data corresponding to the target image frame; the target image frame is an image frame which is not subjected to motion data prediction in the image frame sequence;

and the error constraint unit is configured to perform error constraint processing on the predicted action data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain target action data corresponding to the target image frame.

In an exemplary embodiment, the error constraint unit includes:

a first driving unit configured to perform an action driving process based on the predicted action data to obtain predicted driving data; the prediction driving data is observation data corresponding to the prediction action data;

the first constraint processing unit is configured to execute error constraint processing on the prediction driving data based on the target image frame and the object pose data corresponding to the target image frame to obtain target driving data;

and a target motion data determination unit configured to perform determination of the target motion data based on the target drive data.

In an exemplary embodiment, the error constraint unit includes:

a first detection unit configured to perform keypoint detection based on the target image frame to obtain detection keypoint information of the target object;

the second driving unit is configured to execute action driving processing based on the predicted action data to obtain predicted key point information of the target object;

and the second constraint processing unit is configured to perform error constraint processing on the predicted action data based on the error information between the detection key point information and the prediction key point information and the object pose data corresponding to the target image frame to obtain the target action data.

the second constraint processing unit includes:

a target confidence threshold determining unit configured to perform detection confidence degrees corresponding to the plurality of target key points in the target image frame, detection confidence degrees corresponding to the plurality of target key points in the history image frame, and a smooth confidence threshold corresponding to the history image frame, so as to obtain a target confidence threshold corresponding to the target image frame; the historical image frames are image frames of the image frame sequence, the time sequence of which is positioned before the target image frame; the smooth confidence threshold is determined based on an average confidence of the plurality of target keypoints in the historical image frames;

a first weight determining unit configured to perform determining a first weight corresponding to each of the plurality of target key points in the target image frame based on a preset confidence threshold, a target confidence threshold corresponding to the target image frame, and detection confidence corresponding to each of the plurality of target key points in the target image frame; the preset confidence coefficient threshold value is smaller than a target confidence coefficient threshold value corresponding to the target image frame;

A first weighting unit configured to perform a weighting process on error information of the detected key point information and the predicted key point information based on first weights corresponding to the plurality of target key points, and generate first weighted error information;

and a third constraint processing unit configured to perform error constraint processing on the predicted motion data based on the first weighted error information and the object pose data corresponding to the target image frame, so as to obtain the target motion data.

In an exemplary embodiment, the error constraint unit includes:

a first coordinate system transformation unit configured to perform transformation of object pose data corresponding to the target image frame from an inertial coordinate system to a global coordinate system, to obtain target bone orientation data of the target object in the global coordinate system;

the third driving unit is configured to perform action driving processing based on the predicted action data to obtain predicted bone orientation data of the target object under a global coordinate system;

a first error information determination unit configured to perform generation of bone orientation error information based on the target bone orientation data and the predicted bone orientation data;

And a fourth constraint processing unit configured to perform error constraint processing on the predicted motion data based on the target image frame and the bone orientation error information, to obtain the target motion data.

In an exemplary embodiment, the fourth constraint processing unit includes:

a second detection unit configured to perform keypoint detection based on the target image frame to obtain detection keypoint information of the target object; the detection key point information comprises detection confidence degrees corresponding to a plurality of target key points respectively; the target object has a corresponding relation with a plurality of bones in the target image frame and a plurality of target key points;

a second weight determining unit configured to determine a second weight corresponding to any bone and a third weight corresponding to a target key point corresponding to any bone, if the detection confidence of the target key point corresponding to any bone is smaller than a target confidence threshold; the second weight is greater than the third weight;

a second weighting unit configured to perform weighting processing on the bone orientation error information based on a second weight corresponding to any one of the bones, and generate second weighted error information;

The third weighting unit is configured to perform weighting processing on error information between detection information of the target key point and prediction information of the target key point based on a third weight corresponding to the target key point corresponding to any bone, so as to obtain third weighted error information; the predicted information of the target key point is obtained based on action driving processing of the predicted action data;

and a fifth constraint processing unit configured to perform error constraint processing on the predicted motion data based on the second weighted error information and the third weighted error information, to obtain the target motion data.

the error constraint unit includes:

a shape error information determination unit configured to perform determination of shape error information based on object shape prediction data of the target object in the target image frame, shape smoothing data corresponding to a history image frame; the historical image frames are image frames of the image frame sequence, the time sequence of which is positioned before the target image frame; the shape smoothing data is obtained based on time sequence smoothing of object shape data of the target object in the historical image frame;

And a sixth constraint processing unit configured to perform error constraint processing on the predicted motion data based on the shape error information, the target image frame, and object pose data corresponding to the target image frame, to obtain the target motion data.

the error constraint unit includes:

a joint error information determining unit configured to perform joint prediction data corresponding to the target image frame and joint constraint data corresponding to a previous image frame of the target image frame to obtain joint error information; the joint constraint data corresponding to the previous image frame is obtained based on data error constraint on the joint prediction data of the target object in the previous image frame; the last image frame is an image frame which is positioned before the target image frame in the sequence of image frames in time sequence and is adjacent to the target image frame;

a third weight determining unit configured to perform determination of a fourth weight based on motion data loss information corresponding to the previous image frame; the motion data loss information corresponding to the previous image frame is determined based on the object pose data corresponding to the previous image frame and constraint driving data corresponding to the previous image frame; the constraint driving data corresponding to the previous image frame is obtained based on data error constraint on the prediction action data corresponding to the previous image frame;

A fourth weighting unit configured to perform weighting processing on the joint error information based on the fourth weight, generating fourth weighted error information;

a second iteration updating unit configured to perform iteration updating on the joint prediction data based on the fourth weighted error information, so as to obtain joint constraint data corresponding to the target image frame;

and a seventh constraint processing unit configured to perform error constraint processing on the predicted motion data based on the target image frame, the object pose data corresponding to the target image frame, and the joint constraint data corresponding to the target image frame, to obtain the target motion data.

In an exemplary embodiment, the apparatus further comprises:

a fourth driving unit configured to perform constraint driving data corresponding to the target motion data of the previous image frame;

a loss information unit configured to perform determination of motion data loss information corresponding to the previous image frame based on the object pose data corresponding to the previous image frame and the constraint driving data;

the third weight determination unit includes:

a weight calculation unit configured to perform determination of the fourth weight based on motion data loss information corresponding to the previous image frame and loss smoothing information corresponding to the previous image frame; and the loss smoothing information corresponding to the previous image frame is obtained by carrying out time sequence smoothing on the action data loss information of the historical image frame with time sequence positioned before the previous image frame in the image frame sequence.

In an exemplary embodiment, the apparatus further comprises:

a calibration data acquisition unit configured to perform acquisition of a calibration image frame including a calibration object, and object pose data corresponding to the calibration image frame; the pose measuring module on the calibration object is consistent with the pose measuring module on the target object in arrangement mode;

the second data prediction unit is configured to perform motion data prediction on the calibration object in the calibration image frame to obtain predicted motion data corresponding to the calibration image frame;

a fifth driving unit configured to perform motion driving processing based on the predicted motion data corresponding to the calibration image frame, so as to obtain predicted bone orientation data of the calibration object in a global coordinate system;

the second coordinate system transformation unit is configured to perform coordinate system transformation processing on the object pose data corresponding to the calibration image frame based on the initial coordinate system transformation parameters to obtain calibration bone orientation data of the calibration object under a global coordinate system;

a second error information determining unit configured to perform obtaining calibration orientation error information based on the predicted bone orientation data of the calibration object in the global coordinate system and the calibration bone orientation data;

A third iterative updating unit configured to perform iterative updating of the initial coordinate system transformation parameters based on the calibration orientation error information to obtain target coordinate system transformation parameters; the target coordinate system transformation parameters are used for carrying out coordinate system transformation processing on the object pose data corresponding to the target image frames.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the object action data processing method as described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of a server, enables the server to perform an object action data processing method as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a computer device reads and executes the computer program, causing the device to perform the above-described object action data processing method.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

when the method and the device are used for processing the object action data, the object action data of the target object can be processed together based on the target image frame containing the target object and the object pose data corresponding to the target image frame, so that the image data corresponding to the target image frame is supplemented based on the object pose data, the problem of vision shielding in monocular vision is solved, the overall acquisition of the target object action data is realized, and the accuracy of action characterization of the target object is improved; further, when the motion data of the target object is predicted based on the target image frame, the predicted motion data can be jointly constrained based on the image information in the target image frame and the object pose data corresponding to the target image frame, so that errors between the image information in the target image frame and the object pose data corresponding to the target image frame and the constrained predicted motion data are reduced, the target motion data are determined based on the predicted motion data after constraint processing, the accuracy of determining the predicted motion data of the target object can be improved, and further, the motion driving is performed based on the target motion data, so that the accuracy of the motion driving can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of an implementation environment shown in accordance with an exemplary embodiment;

FIG. 2 is a flowchart of a method for object action data processing, according to an example embodiment;

FIG. 3 is a flow chart illustrating a method of motion driven based error constraint processing in accordance with an exemplary embodiment;

FIG. 4 is a flowchart illustrating an error constraint method based on keypoint detection, according to an exemplary embodiment;

FIG. 5 is a flowchart illustrating a keypoint weight based error constraint method in accordance with an exemplary embodiment;

FIG. 6 is a flowchart illustrating a method for constraint processing based on bone orientation data, according to an exemplary embodiment;

FIG. 7 is a flowchart illustrating a method for constraint processing based on skeletal weights, in accordance with an exemplary embodiment;

FIG. 8 is a flowchart illustrating a method of object shape based data constraint in accordance with an exemplary embodiment;

FIG. 9 is a flowchart illustrating a method for error constraint based on loss information for historical image frames, according to an example embodiment;

FIG. 10 is a flowchart of a method of calculating a fourth weight, shown in accordance with an exemplary embodiment;

FIG. 11 is a flowchart illustrating a coordinate system transformation parameter optimization method, according to an exemplary embodiment;

FIG. 12 is a schematic diagram illustrating an optimization flow of coordinate system transformation parameters, according to an example embodiment;

FIG. 13 is a schematic diagram illustrating a constraint flow of predicted action data according to an example embodiment;

FIG. 14 is a schematic diagram of an object action data processing apparatus, shown in accordance with an exemplary embodiment;

FIG. 15 is a block diagram of an electronic device for object action data processing, shown in accordance with an exemplary embodiment;

FIG. 16 is a block diagram of another electronic device for object action data processing, shown in accordance with an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Referring to fig. 1, a schematic diagram of an implementation environment provided by an embodiment of the disclosure is shown, where the implementation environment may include: a data acquisition end 110 and an action data processing end 120; the data acquisition end 110 and the action data processing end can perform data communication through a network.

Specifically, the data acquisition end 110 may perform object data acquisition on the target object, where the acquired object data may include object image data, object posture data, and the like; the data acquisition end 110 may send the acquired object data to the action data processing end 120, and the action data processing end 120 may predict the action data of the target object based on the received object image data, to obtain predicted action data; and then, performing error constraint processing on the predicted action data based on the object image data and the object posture data to obtain target action data corresponding to the target object.

The data collection terminal 110 may communicate with the action data processing terminal 120 based on Browser/Server (B/S) or Client/Server (C/S) mode. The data acquisition terminal 110 may include: image acquisition devices, inertial measurement devices (Inertial Measurement Unit, IMU), smart wearable devices, and the like.

The action data processing end 120 and the data collecting end 110 can establish communication connection through wires or wirelessly, and an operating system running on the action data processing end 120 can include, but is not limited to, an android system, an IOS system, linux, windows, and the like; specifically, the action data processing end 120 may be a smart phone, a tablet computer, a notebook computer, a digital assistant, an intelligent wearable device, an on-vehicle terminal, a server, or other type of entity device; in the case where the action data processing end 120 is a server, it may include a server that operates independently, or a distributed server, or a server cluster that is composed of a plurality of servers, where the server may be a cloud server.

In order to solve the problem that in the related art, due to the fact that monocular vision may have vision occlusion, prediction of object action data is inaccurate, an embodiment of the present disclosure provides an object action data processing method, and an execution subject of the method may be the action data processing end; referring specifically to fig. 2, the method may include:

s210, acquiring an image frame sequence containing a target object and object pose data corresponding to each image frame in the image frame sequence; the object pose data is obtained based on a pose measurement module arranged on the target object.

The target object in the present embodiment may include an object having a motion change such as a person, an animal, or the like. The image acquisition equipment can acquire an image of the target object to obtain an image frame sequence containing the target object; the pose measuring module can be concretely an inertial measuring device, and the inertial measuring device is arranged at each bone of the target object, so that corresponding object pose data can be obtained through the inertial measuring device. The inertial measurement device (Inertial Measurement Unit, abbreviated as IMU) is a device for measuring three-axis attitude angle (or angular velocity) and acceleration of an object, and the gyroscope and the acceleration sensor are core devices of an inertial navigation system, and by means of the built-in acceleration sensor and gyroscope, the IMU can measure linear acceleration and rotation angular velocity from three directions, and can obtain information such as attitude, velocity and displacement of the object by calculating the measured linear acceleration and rotation angular velocity.

Because the image frame sequence containing the target object and the object pose data of the target object are obtained through different data acquisition devices, the data obtained by the different data acquisition devices are required to be time aligned so as to determine the image information of the target object corresponding to each time node and the object pose data of the corresponding time node, and the specific action performance information of the target object at each time node can be reflected from multiple aspects based on the image information of the target object corresponding to each time node and the object pose data.

S220, predicting motion data of the target object in the target image frame to obtain predicted motion data corresponding to the target image frame; the target image frame is an image frame in the image frame sequence for which motion data is not predicted.

In this embodiment, motion data prediction processing may be sequentially performed on each image frame based on a time sequence precedence relationship of each image frame in the image frame sequence, and the target image frame may be an image frame in the image frame sequence that needs to perform motion prediction constraint.

In the case of a target image frame containing a target object, the target image frame may be subjected to image recognition by a deep learning method to obtain predicted motion data of the target object, specifically, the predicted motion data of the target object represented in the target image frame may be obtained by extracting features of the target image frame.

In this embodiment, the motion data of the target object may include a shape parameter shape, a gesture parameter ose, and the like, where the shape parameter shape represents 10 parameters of the human body with a high, low, thin, and equal proportion of the head to the body, and is a 10-dimensional vector, and the shape change of the human body can be controlled through 10 incremental templates; the pose parameter ose represents a global displacement parameter of 3 degrees of freedom and 72 relative rotation parameters, 75 parameters in total. Accordingly, the predicted motion data corresponding to the target image frame may include predicted shape parameters, predicted pose parameters, etc. for the target object.

S230, performing error constraint processing on the predicted action data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain target action data corresponding to the target image frame.

The action data constraint of the target image frame may refer to performing action data prediction on a target object in the target image frame to obtain predicted action data corresponding to the target image frame; and performing error constraint processing on the prediction action data corresponding to the target image frame based on the image information of the target image frame and the object pose data corresponding to the target image frame. Each image frame in the image frame sequence needs to be subjected to action data constraint processing, so that the predicted action data corresponding to each image frame is closer to the real action data, and the action of a driving object is more real and natural when the action driving is performed based on the constrained predicted action data.

According to the above, the motion performance information of the target object can be reflected from various aspects based on the image information in the target image frame and the object pose data corresponding to the target image frame, and the comprehensiveness and accuracy of the motion performance information can be improved. Under the condition that the motion data of the target object is predicted based on the target image frame, on one hand, the acquired target image frame is inaccurate due to the existence of object shielding or self shielding, further, predicted motion data is obtained based on image recognition performed by the target image frame, and errors possibly exist between the predicted motion data and actual motion data, on the other hand, the predicted motion data is obtained based on the target image frame in a predicting mode, and the object pose data is not influenced by visual shielding without combining with the object pose data corresponding to the target image frame, so that errors exist between the predicted motion data and the actual motion data; based on the reasons of the two aspects, the error constraint processing can be performed on the predicted action data corresponding to the target image frame based on the target image frame and the target pose data corresponding to the target image frame, which are actually acquired, namely, the predicted action data is optimized, so that the target action data which is close to the actual action data is obtained.

When the method and the device are used for processing the object action data, the object action data of the target object can be processed together based on the target image frame containing the target object and the object pose data corresponding to the target image frame, so that the image data corresponding to the target image frame is supplemented based on the object pose data, the problem of vision shielding in monocular vision is solved, the overall acquisition of the target object action data is realized, and the accuracy of action characterization of the target object is improved; further, when the motion data of the target object is predicted based on the target image frame, the predicted motion data can be jointly constrained based on the image information in the target image frame and the object pose data corresponding to the target image frame, so that errors between the image information in the target image frame and the object pose data corresponding to the target image frame and the constrained motion data are reduced, the target motion data are determined based on the motion data after constraint processing, the accuracy of determining the motion data of the target object can be improved, and further, the motion driving is performed based on the target motion data, so that the accuracy of the motion driving can be improved.

The target motion data described in this embodiment is generally data that cannot be directly observed, and the target motion data can be used as an input parameter of the motion driving model, and by inputting the target motion data into the motion driving model, directly observable data corresponding to the target motion data, that is, motion characterization data corresponding to the target motion data, can be obtained. The motion driving model may be an SMPL model (a skin Multi-person linear model), and input parameters of the SMPL model may include shape data, posture data, and the like described above, and output parameters are motion characterization data corresponding to the shape data, posture data, and the like.

Accordingly, referring to fig. 3, a method for motion-driven error constraint processing is shown, which may include:

s310, performing action driving processing based on the predicted action data to obtain predicted driving data; the predicted driving data is observation data corresponding to the predicted motion data.

S320, performing error constraint processing on the prediction driving data based on the target image frame and the object pose data corresponding to the target image frame to obtain target driving data.

S330, determining the target action data based on the target driving data.

In this embodiment, the image frame data and the object pose data included in the target image frame may be observable data, the predicted motion data may be unobservable data, and motion driving processing may be performed on the predicted motion data based on the motion driving model to obtain predicted driving data, where the predicted driving data is motion characterization data corresponding to the predicted motion data. Further, the target image frame and the object pose data corresponding to the target image frame are observable motion characterization data obtained through direct acquisition of the information acquisition device, so that the prediction driving data, the target image frame and the object pose data corresponding to the target image frame are motion characterization data of the target object, error constraint processing can be further carried out on the prediction driving data through the object pose data corresponding to the target image frame and the target image frame, that is, errors between the object pose data corresponding to the target image frame and the prediction driving data are as small as possible, so that corresponding target driving data are obtained, and error constraint processing can be reversely carried out on the prediction motion data after the target driving data are obtained, so that corresponding target motion data are obtained.

Specifically, in the process of performing error constraint processing on the predicted drive data, the following steps may be repeatedly performed: performing action driving processing based on the current predicted action data to obtain current predicted driving data, performing error constraint processing on the current predicted driving data based on the target image frame and the object pose data corresponding to the target image frame to obtain current constraint driving data, and reversely updating the current predicted action data based on the current constraint driving data to obtain current constraint action data; and determining the current constraint action data as current prediction action data until the preset iteration times are met or until the target image frame, the target pose data corresponding to the target image frame and the error information of the current prediction driving data meet preset error conditions.

Therefore, in the process of carrying out data constraint on the predicted action data, the predicted action data which cannot be directly observed can be driven into the observable predicted driving data, so that error comparison calculation is conveniently carried out on the target video frame and the object pose data corresponding to the target video frame, and the error constraint on the predicted action data can be correspondingly realized by constraining the predicted driving data, so that the convenience and operability of carrying out error constraint on the predicted action data are further improved.

When the predicted motion data is subjected to data constraint based on the target image frame and the object pose data corresponding to the target image frame, corresponding observable motion characterization data, such as key point information, can be further determined based on the target image frame and the object pose data corresponding to the target image frame, and the key point information can characterize the position information of each bone of the target object; referring specifically to fig. 4, an error constraint method based on keypoint detection is shown, which may include:

s410, detecting key points based on the target image frames to obtain detection key point information of the target object.

S420, performing action driving processing based on the predicted action data to obtain predicted key point information of the target object.

S430, performing error constraint processing on the predicted action data based on error information between the detected key point information and the predicted key point information and the object pose data corresponding to the target image frame to obtain the target action data.

And inputting the target image frame into a key point identification model to obtain the detection key point information of the corresponding target object. After the motion driving processing is performed on the predicted motion data, the obtained predicted driving data can comprise the predicted key point information corresponding to the target image frame, so that the error constraint processing can be performed on the predicted key point information by detecting the key point information, and the error constraint processing on the predicted motion data can be realized.

Specifically, the detection key point information may be two-dimensional key point information, and after performing action driving processing on the prediction action data, the obtained three-dimensional prediction key point information is three-dimensional prediction key point information, so that dimension mapping is required to be performed on the three-dimensional prediction key point information, and two-dimensional prediction key point information is obtained, thereby facilitating error constraint processing on the two-dimensional detection key point information and the two-dimensional prediction key point information.

In this embodiment, by performing the key point detection on the target image frame, the detected key point information of the target object can be obtained, and the detected key point information can represent the actual feature information of each key portion of the target object, so that by performing error calculation with the detected key point information as a reference and the predicted key point information, the key point error information can be determined, and the error of the predicted action data and the actual action data corresponding to the error description of the key point information exists, and error constraint processing is performed on the predicted action data based on the key point error information, so that the accuracy of determining the target action data can be improved.

In a specific embodiment, the detection key point information includes detection confidence degrees corresponding to the plurality of target key points, and the corresponding weight can be determined based on the detection confidence degrees corresponding to the plurality of target key points; referring to fig. 5, an error constraint method based on key point weights is shown, and the method may include:

S510, obtaining a target confidence threshold corresponding to the target image frame based on the detection confidence corresponding to each of the plurality of target key points in the target image frame, the detection confidence corresponding to each of the plurality of target key points in the history image frame, and the smooth confidence threshold corresponding to the history image frame; the historical image frames are image frames of the image frame sequence, the time sequence of which is positioned before the target image frame; the smoothing confidence threshold is determined based on an average confidence of the plurality of target keypoints in the historical image frames.

S520, determining first weights corresponding to the target key points in the target image frame based on a preset confidence threshold, a target confidence threshold corresponding to the target image frame and detection confidence corresponding to the target key points in the target image frame; the preset confidence threshold is smaller than a target confidence threshold corresponding to the target image frame.

S530, based on the first weights corresponding to the target key points, carrying out weighting processing on the error information of the detection key point information and the prediction key point information, and generating first weighted error information.

S540, performing error constraint processing on the predicted action data based on the first weighted error information and the object pose data corresponding to the target image frame to obtain the target action data.

For each image frame in the image frame sequence, determining the average confidence corresponding to each image frame based on the detection confidence corresponding to the plurality of target key points in each image frame; for each image frame, calculating a smooth confidence corresponding to the historical image frame based on the average confidence corresponding to the historical image frame preceding the image frame; for each image frame, a smooth confidence threshold for the historical image frame may be calculated based on the confidence threshold for the historical image frame preceding the image frame. The target confidence threshold value corresponding to the target image frame may be calculated based on the average confidence level corresponding to the target image frame, the smoothed confidence level corresponding to the historical image frame preceding the target image frame, and the smoothed confidence level threshold value of the historical image frame preceding the target image frame.

Further, in this embodiment, a preset confidence threshold may be set, where the preset confidence threshold may be smaller than the target confidence threshold, and the preset confidence threshold is mainly used to filter out the target key points with smaller detection confidence, for example, the preset confidence may be 0.2,0.25, and the first weight corresponding to the target key points with smaller detection confidence than the preset confidence threshold is 0.

The smoothed data in this embodiment may be achieved by an exponential moving average or a weighted moving average, and the like, and is not limited thereto.

In this embodiment, the correspondence between the weight and the target confidence threshold may be preset, and under the condition that the preset confidence threshold and the target confidence threshold of the target image frame are determined, the detection confidence of each target key point may be compared with the preset confidence threshold or the target confidence threshold to determine the first weight corresponding to each target key point.

In one example, the detected key point information and the predicted key point information may each include position information of each target key point, so that error information corresponding to each target key point may be obtained based on the detected position information and the predicted position information corresponding to each target key point; and then, weighting the error information corresponding to the target key point based on the first weight corresponding to each target key point to obtain first weighted error information corresponding to the target key point. And taking the optimization of the first weighted error information as a target, so that the first weighted error information is as small as possible or meets the preset optimization error condition, and corresponding target action data can be obtained.

The first weighted error information corresponding to the target image frame may be calculated by equation (1):

wherein E is _proj First weighted error information, w, corresponding to the target image frame _f For a first weight corresponding to the target keypoint,

to predict key point information, J _2d,det To checkAnd measuring key point information.

For w _f Can be obtained by the following formula (2):

wherein τ ₁ To preset confidence threshold τ ₂ And c is the detection confidence corresponding to the target key point and is the target confidence threshold.

For τ ₂ Can be obtained by the following formula (3):

/>

wherein, the liquid crystal display device comprises a liquid crystal display device,

for the average confidence of the target image frame, +.>

For the smooth confidence level corresponding to the historical image frame before the target image frame, tau _2,mv A smoothing confidence threshold for historical image frames preceding the target image frame, +.>

τ _2,mv Can be obtained by exponential or weighted sliding averages.

Thus, when calculating the confidence threshold of the target image frame, the method is realized based on the smooth confidence threshold of the historical image frame and the average confidence threshold of the historical image frame, so that the determination of the confidence threshold corresponding to the target image frame is associated with the confidence information corresponding to the historical image frame. The weight of each target key point is further determined based on the confidence threshold value, so that the importance degree of different target key points can be distinguished, the weight is increased for effective information, the weight is reduced for ineffective information, and therefore first weighted error information can be accurately obtained; therefore, error constraint processing is carried out based on the first weighted error information so as to optimize the predicted action data, the matching property of the target action data and the actual action data can be improved, and the error of the target action data and the actual action data is reduced, so that the average confidence threshold value can be adaptively adjusted according to the historical confidence level; for example, the detection confidence coefficient of the complex action is generally lower, and the average confidence coefficient threshold value can be correspondingly reduced so as to retain key information, thereby avoiding the problem that the confidence coefficient corresponding to the complex action is completely cleared when the fixed confidence coefficient threshold value is used.

In an optional embodiment, the predicted motion data is driven, and the obtained predicted driving data may further include predicted bone orientation data of the target object under the global coordinate system; based on the object pose data corresponding to the object image frame, the target skeleton orientation data of the target object under the global coordinate system can be obtained, so that the predicted action data can be subjected to constraint processing based on the skeleton orientation data; referring specifically to fig. 6, a method for constraint processing based on bone orientation data is shown and may include:

s610, object pose data corresponding to the target image frame are transformed from an inertial coordinate system to a global coordinate system, and target skeleton orientation data of the target object under the global coordinate system is obtained.

S620, performing action driving processing based on the predicted action data to obtain predicted bone orientation data of the target object under a global coordinate system.

And S630, generating bone orientation error information based on the target bone orientation data and the predicted bone orientation data.

S640, performing error constraint processing on the predicted motion data based on the target image frame and the bone orientation error information to obtain the target motion data.

Based on the above description of the embodiment, the object pose data may include acceleration and orientation of the inertial measurement device, so as to calculate orientation data of the target object bone. Taking the example that the target object is a human body, 5 inertial measurement devices can be introduced in the embodiment and are respectively arranged at the spine, the left forearm, the right forearm and the left calf of the target object, and sparse object pose data can be obtained correspondingly.

In order to facilitate comparison, the collected object pose data and the bone orientation data obtained by the action driving process can be transformed into a global coordinate system for comparison, specifically, the object pose data corresponding to the object image frame can be transformed into a coordinate system to obtain target bone orientation data of the target object in the global coordinate system, and then bone orientation error information can be generated based on the target bone orientation data and the predicted bone orientation data.

Corresponding bone orientation data can be obtained based on the object pose data acquired by the inertial measurement equipment, and the bone orientation data can characterize the motion of the target object from the angle of bone motion, so that error constraint processing is carried out on predicted motion data through bone orientation error information of the target bone orientation data and the predicted bone orientation data, constraint on the predicted motion data from the angle of bone motion can be realized, and the effectiveness and accuracy of constraint on the predicted motion data are further improved.

In this embodiment, there is a correspondence between the detection key point information obtained by performing key point detection on the target object and the joints of each bone of the target object, for example, each bone has two ends, each end may correspond to one joint, each joint may correspond to at least one key point, and the detection key point may be understood as a detection joint, so that the detection confidence of each key point is different, that is, the detection confidence of each detection joint may be different, so that the confidence of the joint needs to be determined based on the detection confidence of the joint, and then the weight of each bone needs to be determined. Referring specifically to fig. 7, a method for constraint processing based on skeletal weights is shown and may include:

s710, detecting key points based on the target image frames to obtain detection key point information of the target object; the detection key point information comprises detection confidence degrees corresponding to a plurality of target key points respectively; and the target object has a corresponding relation with the target key points on a plurality of bones in the target image frame.

S720, determining a second weight corresponding to any bone and a third weight corresponding to the target key point corresponding to any bone under the condition that the detection confidence coefficient of the target key point corresponding to any bone is smaller than a target confidence coefficient threshold; the second weight is greater than the third weight.

S730, weighting the bone orientation error information based on the second weight corresponding to any bone, and generating second weighted error information.

S740, carrying out weighting processing on error information between detection information of the target key points and prediction information of the target key points based on third weights corresponding to the target key points corresponding to any bone, so as to obtain third weighted error information; and the predicted information of the target key point is obtained based on action driving processing of the predicted action data.

S750, performing error constraint processing on the predicted action data based on the second weighted error information and the third weighted error information to obtain the target action data.

The method for detecting the key point may be identical to the methods shown in fig. 4 and 5, and will not be described herein.

Performing key point detection on the target image frame, and referring to formula (2) when the detection confidence coefficient of the target key point corresponding to any bone is smaller than the target confidence coefficient threshold value, namely c<τ ₁ And τ ₁ ≤c≤τ ₂ The two range sections indicate that the key points in the two range sections may have the problem of inaccurate detection, and for the key points with inaccurate detection, the dependence degree on the information of the key points can be reduced, and the dependence degree on the orientation data of bones corresponding to the key points with inaccurate detection is improved, so that the dependence degree on the information of the key points with inaccurate detection is smaller than the dependence degree on the orientation data of bones corresponding to the key points with inaccurate detection. Namely, under the condition that the detection confidence coefficient of the target key point corresponding to any bone is smaller than the target confidence coefficient threshold value, the weight of the bone orientation error information corresponding to the bone is increased, and the key point corresponding to the bone is reduced The weight of the detection error information, namely the second weight is increased, and the third weight is reduced, so that the data constraint based on the skeleton is more dependent on skeleton orientation data, and the dependence on the key point detection information is reduced.

Further, bone orientation constraints can be achieved by formula (4):

wherein E is _ori For the second weighted error information, w _i For the second weight corresponding to the skeleton, I is an identity matrix, R _rel Obtainable by the formula (5):

the transformation between the inertial coordinate system of the IMU and the global coordinate system, the bone orientation data of the IMU data, the transformation between the local coordinate system of the IMU and the corresponding local coordinate system of the bone, the predicted bone orientation data under the global coordinate system, respectively,/>

The target bone orientation data under the global coordinate system is obtained. />

The confidence information of each target detection point is determined based on the key point detection result, the dependence degree of the image information of the target image frame and the dependence degree of the object pose data are further determined when the data constraint processing is carried out, and under the condition that the detection point is inaccurate when the key point detection is carried out based on the image information of the target image frame, the dependence degree of the corresponding object pose data is increased, so that the accuracy of determining error information can be improved, the optimization result of the target action data is further improved, and the optimization efficiency of the target action data is improved.

The predicted motion data corresponding to the target image frame may include object shape predicted data of the target object, and accordingly, the object shape predicted data corresponding to the target image frame and the object shape corresponding to the history image frame that has undergone error constraint processing may be subjected to error constraint processing, so as to implement constraint on the predicted motion data from the shape dimension of the target object; referring specifically to FIG. 8, a method of object shape based data constraint is illustrated and may include:

s810, determining shape error information based on object shape prediction data of the target object in the target image frame and shape smoothing data corresponding to a historical image frame; the historical image frames are image frames of the image frame sequence, the time sequence of which is positioned before the target image frame; the shape smoothing data is obtained based on time sequence smoothing of object shape data of the target object in the historical image frame.

S820, performing error constraint processing on the predicted action data based on the shape error information, the target image frame and the object pose data corresponding to the target image frame to obtain the target action data.

The object shape data may include shape data and bone length data, and the shape prediction data of the target object includes shape prediction data of the target object and length prediction data of a plurality of bones of the target object, respectively; further in determining the shape error information, specific may include:

determining appearance error information based on appearance prediction data of a target object in a target image frame and appearance smoothing data corresponding to a history image frame; the historical image frames are image frames of the image frame sequence, the time sequence of which is positioned before the target image frame; the appearance smoothing data is obtained based on time sequence smoothing of the appearance data of the target object in the historical image frame;

determining bone length error information based on length prediction data of a plurality of bones of the target object in the target image frame, length smoothing data corresponding to the historical image frame; the length smoothing data is obtained based on time sequence smoothing of the length data of a plurality of bones of the target object in the historical image frame;

shape error information is obtained based on the shape error information and bone length error information.

The determination of the shape error information may be achieved based on equation (6):

wherein E is _shape As the shape error information, the position information is,

shape prediction data corresponding to a target image frame _mv And smoothing data for the appearance corresponding to the historical image frame.

The determination of the bone length error information may be achieved based on equation (7):

wherein E is _bone As the bone length error information,

bone length prediction data for bones corresponding to a target image frame _mv And smoothing data for the skeleton length corresponding to the historical image frames.

Shape is described by an exponential sliding average _mv And bone _mv For a specific update procedure, refer to fig. 8 and 9:

/>

wherein alpha is _s Is preset toParameters shape _mv1 Shape smoothing data, bone, corresponding to a target image frame _mv1 The data is smoothed for the bone length corresponding to the target image frame.

The time sequence constraint is carried out on the predicted shape data corresponding to the target image frame through the smooth shape data corresponding to the historical image frame, so that the shape of the target object is consistent in time sequence, and the stability of the shape of the target object in time sequence is improved.

In this embodiment, the predicted driving data obtained by performing data driving on the predicted motion data may further include joint predicted data of the target object; corresponding errors generally exist between the target motion data and the actual motion data obtained after the data constraint processing is carried out on each image frame, and the error information can be determined as loss information; accordingly, when the error constraint processing is carried out on the current target image frame, the dependence degree of the current target image frame on the historical image frame can be determined based on the loss information corresponding to the historical image frame; referring specifically to fig. 9, a method for error constraint based on loss information of historical image frames is shown, which may include:

S910, obtaining joint error information based on joint prediction data corresponding to the target image frame and joint constraint data corresponding to the last image frame of the target image frame; the joint constraint data corresponding to the previous image frame is obtained based on data error constraint on the joint prediction data of the target object in the previous image frame; the last image frame is an image frame that is temporally located before and adjacent to the target image frame in the image frame sequence.

S920, determining a fourth weight based on motion data loss information corresponding to the previous image frame; the motion data loss information corresponding to the previous image frame is determined based on the object pose data corresponding to the previous image frame and constraint driving data corresponding to the previous image frame; and the constraint driving data corresponding to the previous image frame is obtained based on data error constraint on the prediction action data corresponding to the previous image frame.

S930, weighting the joint error information based on the fourth weight to generate fourth weighted error information.

S940, iteratively updating the joint prediction data based on the fourth weighted error information to obtain joint constraint data corresponding to the target image frame.

S950, performing error constraint processing on the predicted action data based on the target image frame, the object pose data corresponding to the target image frame and the joint constraint data corresponding to the target image frame to obtain the target action data.

The loss between the target motion data and the actual motion data can be represented by the loss between the object pose data and the constraint driving data subjected to the error constraint processing, the loss between the object pose data and the constraint driving data subjected to the error constraint processing can be motion data loss information, and errors exist between the object pose data and the constraint driving data subjected to the error constraint processing, namely, the errors exist between the target motion data and the actual motion data, so that the motion data loss information can represent the error information between the target motion data and the actual motion data. When the motion data loss information of the previous image frame is larger than or equal to the average loss information, when calculating error information corresponding to the joint prediction data and the joint constraint data, reducing the dependency degree of the previous image frame, namely reducing the fourth weight; on the other hand, when the motion data loss information of the previous image frame is smaller than the average loss information, the degree of dependence on the previous image frame, that is, the fourth weight is increased when calculating the error information of the joint prediction data corresponding to the joint constraint data.

When error constraint processing is carried out on the joint prediction data corresponding to the target image frame, the error constraint processing can be carried out on the joint prediction data corresponding to the target image frame based on the joint constraint data corresponding to the previous image frame, so that the joint data of the target object is kept stable between the adjacent image frames, and the problem of abrupt change of the joint data is avoided; in addition, the corresponding fourth weight is determined based on the motion data loss information corresponding to the previous image frame, and the corresponding dependence degree can be determined based on the real optimization condition of the previous image frame, so that the motion capture result can effectively jump out from the result trend with poor prediction, and the time sequence stability of the motion capture result is maintained under the condition that the past result optimization is accurate.

Specifically, for a method for calculating the fourth weight, referring to fig. 10, the method may include:

s1010, determining constraint driving data corresponding to target action data of the previous image frame.

S1020, determining motion data loss information corresponding to the previous image frame based on the object pose data corresponding to the previous image frame and the constraint driving data.

S1030, determining the fourth weight based on motion data loss information corresponding to the previous image frame and loss smoothing information corresponding to the previous image frame; and the loss smoothing information corresponding to the previous image frame is obtained by carrying out time sequence smoothing on the action data loss information of the historical image frame with time sequence positioned before the previous image frame in the image frame sequence.

Further, the fourth weight may be determined based on a ratio of motion data loss information of the previous image frame to motion data loss information of the historical image frame, where the motion data loss information of the historical image frame may be obtained by performing time-series smoothing based on motion data loss information of each historical image frame before the previous image frame.

The joint prediction data may further include joint posture prediction data, which may be one item of predicted motion data corresponding to the target image frame, and joint position prediction data, which may be data obtained by performing motion driving based on the predicted motion data.

Error information corresponding to the joint position data can be obtained based on equation (10):

wherein E is _pos,tc Error information corresponding to the joint position data, w _t For the fourth weight to be the fourth weight,

j is three-dimensional joint position prediction data obtained through action driving _3d,last And constraining the data for the joint position corresponding to the previous image frame.

Error information corresponding to the joint posture data can be obtained based on the formula (11):

wherein E is _ang,tc Is the error information corresponding to the joint gesture data,

phase prediction data for joint pose corresponding to a target image frame _last And constraining the data for the joint gesture corresponding to the previous image frame.

Error information corresponding to the joint prediction data is:

E _tc ＝E _pos,tc +E _ang,tc (12)

further, a fourth weight w _t The following conditions are satisfied:

w _t ∝L _imu,last /L _imu,mv ^-1 (13)

wherein L is _imu,last Loss information for motion data corresponding to the previous image frame, L _imu,mv The smoothing information is lost for the corresponding previous image frame.

When the fourth weight is determined, the loss smoothing information corresponding to the previous image frame is determined based on the motion data loss information corresponding to the previous image frame, and the loss smoothing information can stably represent the average loss level of the historical image frames, so that the suitability of the fourth weight and the image frame sequence can be improved. The corresponding fourth weight is determined based on the motion data loss information corresponding to the previous image frame, and the corresponding dependence degree can be determined based on the real optimization condition of the previous image frame, so that the motion capture result can effectively jump out from the result trend with poor prediction, and the time sequence stability of the motion capture result is maintained under the condition that the past result optimization is accurate.

In this embodiment, when performing error constraint processing on predicted motion data based on object pose data, the object pose data may be converted into data that is convenient to perform calculation, for example, by the above coordinate system conversion processing, the object pose data is converted into target bone orientation data in a global coordinate system, and then the target bone orientation data is compared with the predicted bone orientation data obtained through motion driving processing, so as to implement error constraint processing on the predicted motion data; in the processes of carrying out coordinate system conversion, bone orientation loss calculation and the like, coordinate system transformation parameters are needed, and in order to facilitate the subsequent direct adoption of the coordinate system transformation parameters to carry out coordinate system transformation on the object pose data, iterative optimization can be carried out on the coordinate system transformation parameters in advance so as to obtain the optimized coordinate system transformation parameters. Referring specifically to fig. 11, a method for optimizing transformation parameters of a coordinate system is shown, which may include:

S1110, acquiring a calibration image frame containing a calibration object and object pose data corresponding to the calibration image frame; and the pose measuring module on the calibration object is consistent with the pose measuring module on the target object in setting mode.

S1120, predicting the motion data of the calibration object in the calibration image frame to obtain predicted motion data corresponding to the calibration image frame.

It should be noted that, when the motion prediction is performed on the calibration object in the calibration image frame, the prediction motion data corresponding to the corresponding calibration image frame is directly predicted according to the image data in the calibration image frame, and the object pose data of the calibration object is not combined, because the transformation parameters of the target coordinate system for transforming the object pose data are not yet determined, in this embodiment, the optimization of the transformation parameters of the initial coordinate system is realized by performing data constraint on the object pose data of the calibration object based on the prediction motion data corresponding to the calibration image frame and on the transformation parameters of the initial coordinate system.

S1130, performing motion driving processing based on the predicted motion data corresponding to the calibration image frame to obtain predicted bone orientation data of the calibration object under a global coordinate system.

S1140, performing coordinate system transformation processing on the object pose data corresponding to the calibration image frame based on the initial coordinate system transformation parameters to obtain calibration skeleton orientation data of the calibration object under a global coordinate system.

S1150, obtaining calibration orientation error information based on the predicted bone orientation data of the calibration object under the global coordinate system and the calibration bone orientation data.

S1160, iteratively updating the initial coordinate system transformation parameters based on the calibration orientation error information to obtain target coordinate system transformation parameters; the target coordinate system transformation parameters are used for carrying out coordinate system transformation processing on the object pose data corresponding to the target image frames.

When error constraint processing is carried out on the predicted motion data, coordinate system transformation processing can be carried out on object pose data corresponding to the object image frame based on object coordinate system transformation parameters, so that object skeleton orientation data of the object under a global coordinate system is obtained; and performing error constraint processing on the predicted action data corresponding to the target image frame based on the target image frame and target bone orientation data of the target object under the global coordinate system to obtain target action data corresponding to the target image frame.

The calibration image frame can be an image frame containing a mute gesture, the mute gesture can be a simple gesture such as Tose or Apose, and the data processing complexity can be simplified and the data processing efficiency can be improved by selecting the image frame containing the mute gesture.

The coordinate system transformation parameters required to be optimized in this embodiment include R _ig ,R _ib Wherein R is _ig R is a transformation parameter between an inertial coordinate system and a global coordinate system of the IMU _ib The transformation parameters between the local coordinate system of the IMU and the corresponding bone local coordinate system; the solving step for the coordinate system transformation parameters can refer to fig. 12, including:

1. and calibrating the internal and external parameters of the camera system.

2. And according to the corresponding relation between the acceleration peak value of the IMU and the landing of the sole in the video, performing time synchronization of the camera system and the IMU system to obtain IMU data corresponding to each frame of image.

3. Key points for each frame are detected and smoothed using kalman filtering.

4. Initialization SMPL parameters are estimated using a monocular SMPL prediction method.

5. And (3) selecting a plurality of T/A gesture frames with better prediction quality according to the result of the step (4) for calibration.

6. Using the SMPL prediction result, IMU data and key point detection result of the T/A frame as input, optimizing and solving R _ig ,R _ib 。

In this embodiment, R is optimized using a nonlinear least squares method _ig ,R _ib The formula for calculating the residual is as follows:

wherein R is _ig For transformation between inertial coordinate system and global coordinate system of IMU, R _i Bone orientation data, R, for IMU data _ib Is the transformation between the IMU local coordinate system and the corresponding bone local coordinate system,

For predicted bone orientation data in global coordinate system, < >>

The target bone orientation data under the global coordinate system is obtained. I.e. the bone orientation requiring IMU measurements is consistent with the bone orientation predicted by monocular SMPLΦ represents the extracted axial angle vector. Optimization solution R is needed between each IMU sensor and the bound bone _ib . Optimizing R separately for each IMU sensor as compared to conventional methods _ig In the present disclosure, the same R is jointly optimized for each IMU sensor _ig ,R _ig Representing the transformation between the inertial and global world coordinate systems, should remain unchanged for each IMU sensor, joint optimization ensuring R _ig And the consistency among the IMU sensors is maintained, so that more accurate results can be obtained.

Therefore, the method for performing error constraint processing on the prediction action data corresponding to the target image frame and optimizing and calibrating R based on the target image frame and the object pose data corresponding to the target image frame _ig ,R _ib The method is of great significance for making multi-modal motion capture datasets.

Accordingly, referring to fig. 13, the constraint flow of the predicted motion data mainly includes:

1. collecting image data and IMU data through a data collecting and processing module;

2. performing key point detection and Kalman filtering smoothing based on the acquired data to obtain smooth key points;

3. monocular SMPL prediction is carried out based on image data, and initial SMPL parameters are obtained;

4. and optimizing the initial SMPL parameters based on the acquired image data, IMU data and the smooth key points to obtain an SMPL parameter optimization result.

According to the method, under the monocular camera setting, sparse IMU sensor data are added, various constraints are designed, the motion capturing effect of the self-shielding condition can be effectively improved, and the overall motion capturing precision and the time stability are improved; the multi-level threshold processing scheme of the two-dimensional key point confidence level can adaptively process motion sequences with different complexity, and the capturing effect of the complex motion sequences is improved; optimizing R by iteration _ig ,R _ib An accurate calibration result can be obtained, and the method has important significance for manufacturing the multi-mode data set.

Fig. 14 is an illustration of an object action data processing apparatus according to an exemplary embodiment. Referring to fig. 14, the apparatus includes:

A target data acquisition unit 1410 configured to perform acquisition of an image frame sequence including a target object, and object pose data corresponding to each image frame in the image frame sequence; the object pose data are obtained based on a pose measurement module arranged on the target object;

a first data prediction unit 1420 configured to perform motion data prediction on the target object in a target image frame, resulting in predicted motion data corresponding to the target image frame; the target image frame is an image frame which is not subjected to motion data prediction in the image frame sequence;

and an error constraint unit 1430 configured to perform error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame, so as to obtain target motion data corresponding to the target image frame.

In an exemplary embodiment, the error constraint unit includes:

The second constraint processing unit includes:

a first weight determining unit configured to perform determining a first weight corresponding to each of the plurality of target key points in the target image frame based on a preset confidence threshold, a target confidence threshold corresponding to the target image frame, and detection confidence corresponding to each of the plurality of target key points in the target image frame; the method comprises the steps of carrying out a first treatment on the surface of the The preset confidence coefficient threshold value is smaller than a target confidence coefficient threshold value corresponding to the target image frame;

In an exemplary embodiment, the error constraint unit includes:

In an exemplary embodiment, the fourth constraint processing unit includes:

a second weighting unit configured to perform weighting processing on the bone orientation error information based on second weights corresponding to the bones, and generate second weighted error information;

the error constraint unit includes:

the iteration updating unit is configured to perform iteration updating on the joint prediction data based on the fourth weighted error information to obtain joint constraint data corresponding to the target image frame;

In an exemplary embodiment, the apparatus further comprises:

the third weight determination unit includes:

In an exemplary embodiment, the apparatus further comprises:

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

In an exemplary embodiment, there is also provided a computer readable storage medium including instructions, optionally a ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.; the instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform any one of the methods described above.

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a computer device reads and executes the computer program, causing the device to perform any one of the methods described above.

Fig. 15 is a block diagram illustrating an electronic device for object motion data processing, which may be a terminal, according to an exemplary embodiment, and an internal structure diagram thereof may be as shown in fig. 15. The electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object action data processing. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Fig. 16 is a block diagram illustrating an electronic device for object action data processing, which may be a server, and an internal structure diagram thereof may be as shown in fig. 16, according to an exemplary embodiment. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object action data processing.

It will be appreciated by those skilled in the art that the structures shown in fig. 15 and 16 are merely block diagrams of portions of structures related to the disclosed aspects and do not constitute a limitation of the electronic device to which the disclosed aspects are applied, and that a particular electronic device may include more or fewer components than shown in the figures, or may combine certain components, or have a different arrangement of components.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of processing object motion data, comprising:

2. The method according to claim 1, wherein the performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain the target motion data corresponding to the target image frame includes:

the target motion data is determined based on the target drive data.

3. The method according to claim 1 or 2, wherein the performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame to obtain the target motion data corresponding to the target image frame includes:

4. The method of claim 3, wherein the detection keypoint information comprises detection confidence levels for each of a plurality of target keypoints;

the performing error constraint processing on the predicted motion data based on the error information between the detected key point information and the predicted key point information and the object pose data corresponding to the target image frame to obtain the target motion data includes:

5. The method according to claim 1 or 2, wherein performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame, to obtain the target motion data corresponding to the target image frame includes:

6. The method of claim 5, wherein performing error constraint processing on the predicted motion data based on the target image frame and the bone orientation error information to obtain the target motion data comprises:

7. The method of claim 1, wherein the predicted action data comprises object shape prediction data of the target object;

8. The method according to claim 1 or 2, wherein the predicted motion data comprises joint prediction data of the target object;

9. The method of claim 8, wherein prior to determining a fourth weight based on motion data loss information corresponding to the previous image frame, the method further comprises:

10. The method according to claim 1, wherein the performing error constraint processing on the predicted motion data corresponding to the target image frame based on the target image frame and the object pose data corresponding to the target image frame, before obtaining the target motion data corresponding to the target image frame, the method further comprises:

11. An object motion data processing apparatus, comprising:

an acquisition unit configured to perform acquisition of an image frame sequence including a target object, and object pose data corresponding to each image frame in the image frame sequence; the object pose data are obtained based on a pose measurement module arranged on the target object;

a data prediction unit configured to perform motion data prediction on the target object in a target image frame, and obtain predicted motion data corresponding to the target image frame; the target image frame is an image frame which is not subjected to motion data prediction in the image frame sequence;

12. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the object action data processing method of any one of claims 1 to 10.

13. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the object motion data processing method of any one of claims 1 to 10.