CN116250829A

CN116250829A - Motion attitude evaluation method and device and electronic equipment

Info

Publication number: CN116250829A
Application number: CN202310130499.7A
Authority: CN
Inventors: 张春成; 杜长德; 何晖光
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-06-13

Abstract

The invention provides a motion attitude evaluation method, a motion attitude evaluation device and electronic equipment, wherein the method comprises the following steps: acquiring motion gesture data to be evaluated of a current user, wherein the motion gesture data to be evaluated is gesture data obtained based on a motion video of the current user shot by an optical camera; based on the reference motion gesture data, performing space-time alignment and evaluation on the motion gesture data to be evaluated, and determining a motion gesture evaluation result of the current user; the reference motion gesture data are gesture data determined based on motion videos shot by the optical camera aiming at a target user wearing the IMU. According to the invention, when the reference motion gesture data is acquired in the early stage, the optical camera is combined with the IMU, and only the optical camera is required to acquire the motion gesture data to be evaluated when the user motion gesture is evaluated in the later stage, the IMU is not required to be worn for each evaluation, and the optical camera is required for each evaluation, so that the accuracy of human motion gesture evaluation is improved, and the application range of human motion gesture evaluation is also improved.

Description

Motion attitude evaluation method and device and electronic equipment

Technical Field

The present invention relates to the field of human body posture assessment technologies, and in particular, to a motion posture assessment method and apparatus, and an electronic device.

Background

Along with the continuous development of technology, the research on the motion gesture of the human body is also gradually, conveniently and quickly achieved, for example, the motion gesture of the user is evaluated through a wearable sensor system comprising a control circuit, an inertial measurement unit (Inertial Measurement Unit, IMU) and a plantar pressure insole, namely, when the user wears the IMU and wears the plantar pressure insole, the motion gesture is evaluated through the acquired IMU acceleration signal and gait information during walking.

However, since the existing method for evaluating the motion gesture depends on the IMU acceleration signal, and the IMU is required to be worn by the user to be evaluated every time the motion gesture of the human body is evaluated, the accuracy of the evaluation of the motion gesture of the human body is not high and the application range is greatly limited.

Disclosure of Invention

The invention provides a motion gesture evaluation method, a motion gesture evaluation device and electronic equipment, which are used for solving the defects that in the prior art, the human gesture is evaluated by relying on an IMU acceleration signal acquired by an IMU worn by a human body, and the accuracy of human motion gesture evaluation caused by the need of wearing the IMU by a user to be evaluated is low and the application range is greatly limited.

The invention provides a motion gesture evaluation method, which comprises the following steps:

acquiring motion gesture data to be evaluated of a current user, wherein the motion gesture data to be evaluated is gesture data obtained based on a motion video of the current user shot by an optical camera;

based on reference motion gesture data, performing space-time alignment and evaluation on the motion gesture data to be evaluated, and determining a motion gesture evaluation result of the current user;

the reference motion gesture data are gesture data determined based on motion videos shot by the optical camera aiming at a target user wearing the IMU.

According to the motion gesture evaluation method provided by the invention, the motion gesture data to be evaluated is subjected to space-time alignment and evaluation based on the reference motion gesture data, and the motion gesture evaluation result of the current user is determined, which comprises the following steps:

performing space-time alignment based on the motion gesture data to be evaluated and the reference motion gesture data;

based on a space-time alignment success result, decomposing the reference motion gesture data into a plurality of time microelements, wherein each time microelement corresponds to one reference action in the reference motion gesture data;

Determining different corresponding time segments of the plurality of time infinitesimal in the motion gesture data to be evaluated; each time segment corresponds to one action in the motion gesture data to be evaluated;

determining a time scoring result of the motion gesture data to be evaluated based on the variance statistics of the different time segments;

determining a space evaluation result between the motion gesture data to be evaluated and the reference motion gesture data based on Euclidean distance between the key point position information of each time slice;

and determining the time scoring result and the space scoring result as the motion gesture scoring result of the current user.

According to the motion gesture evaluation method provided by the invention, the space-time alignment based on the motion gesture data to be evaluated and the reference motion gesture data comprises the following steps:

determining an action set of identification marks in the motion gesture data to be evaluated based on a preset action segmentation identification mark model;

determining body part length information of the current user based on the motion gesture data to be evaluated;

correcting the reference motion gesture data based on the body part length information, and determining the individuation motion gesture data of the current user;

And performing space-time alignment on the action set and the personalized motion gesture data.

According to the motion gesture evaluation method provided by the invention, the training process of the preset motion segmentation recognition annotation model comprises the following steps:

constructing an initial action segmentation recognition annotation model comprising a batch standardization layer, a one-dimensional convolution layer, a one-dimensional maximum pooling layer and a complete connection layer;

acquiring a sample data set, wherein the sample data set comprises gesture data obtained based on motion videos of different first sample users shot by the optical camera and IMU acceleration signals acquired by the first sample users wearing an IMU;

dividing the sample data set into a sample training set and a sample testing set;

performing recognition and labeling training of different actions on the initial action segmentation recognition labeling model by using the sample training set, and determining a trained intermediate action segmentation recognition labeling model;

and testing the trained intermediate motion segmentation recognition annotation model by using the sample test set, and determining that the corresponding intermediate motion segmentation recognition annotation model is the preset motion segmentation recognition annotation model when the test result reaches the preset accuracy.

According to the motion gesture evaluation method provided by the invention, the acquisition process of the reference motion gesture data comprises the following steps:

acquiring a motion video of a target user shot by the optical camera and acquiring a target IMU acceleration signal from an IMU worn by the target user, wherein the target user is a user who moves according to standard action requirements and wears the IMU at different joint points;

and acquiring the reference motion gesture data based on the motion video of the target user, the target IMU acceleration signal and a preset motion gesture estimation network model.

According to the motion gesture evaluation method provided by the invention, the method for acquiring the motion gesture data to be evaluated of the current user comprises the following steps:

acquiring a motion video of a current user shot by the optical camera, wherein the current user is a user who moves randomly and does not wear an IMU at any joint point;

and acquiring motion gesture data to be evaluated of the current user based on the motion video of the current user and the preset motion gesture estimation network model.

According to the motion gesture evaluation method provided by the invention, the training process of the preset motion gesture estimation network model comprises the following steps:

Acquiring a motion video of a second sample user shot by the optical camera and a sample IMU acceleration signal; the second sample user is a user which wears the IMU for different joint points and moves according to the standard action requirement;

determining different node location information for the second sample user based on the sample IMU acceleration signal;

and based on the motion video of the second sample user and the position information of the different joint points, performing two-dimensional gesture estimation on an initial motion gesture estimation network model comprising a deep network and a time sequence convolution network, processing the two-dimensional gesture into a three-dimensional gesture, and compensating the training of the missing and misjudged motion information in the three-dimensional gesture to determine the preset motion gesture estimation network model.

According to the motion gesture evaluation method provided by the invention, the method further comprises the following steps:

and information display is carried out based on the motion gesture evaluation result.

The invention also provides a motion gesture evaluation device, which comprises:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring motion gesture data to be evaluated of a current user, and the motion gesture data to be evaluated is gesture data obtained based on a motion video of the current user shot by an optical camera;

The evaluation module is used for carrying out space-time alignment and evaluation on the motion gesture data to be evaluated based on the reference motion gesture data and determining a motion gesture evaluation result of the current user;

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the motion gesture evaluation method according to any one of the above when executing the program.

According to the motion gesture evaluation method, the device and the electronic equipment, the terminal equipment determines the motion gesture evaluation result of the current user in a mode of firstly acquiring the motion gesture data to be evaluated of the current user and then performing space-time alignment and evaluation on the motion gesture data to be evaluated based on the reference motion gesture data. The motion gesture data to be evaluated are gesture data obtained based on the motion video of the current user shot by the optical camera, and the reference motion gesture data are gesture data determined based on the motion video shot by the optical camera for the target user wearing the IMU, so that the purpose of evaluating the motion gesture of the current user is achieved by combining the optical camera with the IMU when the reference motion gesture data are obtained in the early stage and only the optical camera is needed when the motion gesture of the user is evaluated in the later stage, the IMU is not needed to be worn for each evaluation, the optical camera is needed for each evaluation, the capturing precision of the motion video is improved, the precision of human motion gesture evaluation is greatly improved, and meanwhile, the application range of human motion gesture evaluation is greatly improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a motion gesture evaluation method provided by the invention;

FIG. 2 is a schematic diagram of the gesture processing of a time-series convolutional network provided by the present invention;

FIG. 3 is a schematic diagram of a process for three-dimensional pose estimation provided by the present invention;

FIG. 4 is a logic block diagram of a motion gesture evaluation method provided by the present invention;

FIG. 5 is a block diagram of the execution of the motion gesture evaluation method provided by the present invention;

FIG. 6 is a schematic diagram of the structure of the motion profile evaluation apparatus provided by the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the existing human body movement posture evaluation method, under the condition that the thigh, the shank and the instep of the right leg of a user wear the IMU and the right foot wears the plantar pressure insole, the movement posture is evaluated based on the acquired IMU acceleration signals and gait information during walking based on the STM32F767 chip and the RS485 serial port communication mode. Because the motion of each part of the human body of the user is rigid motion, the IMU is an acceleration measurement device for measuring linear acceleration and rotational angular acceleration in three directions, but the existing method for evaluating the motion gesture depends on an IMU acceleration signal, and the user needing to evaluate is required to wear the IMU every time the motion gesture of the human body is evaluated, so that the accuracy of the evaluation of the motion gesture of the human body is not high and the application range is greatly limited.

In order to solve the above technical problems, the present invention provides a motion gesture evaluation method, apparatus, and electronic device, and the motion gesture evaluation method, apparatus, and electronic device of the present invention are described below with reference to fig. 1 to 7, where an execution subject of the motion gesture evaluation method may be a terminal device or a server, and the terminal device may be a personal computer (Personal Computer, PC), a portable device, a notebook computer, a smart phone, a tablet computer, a portable wearable device, or other electronic devices; the server may be one server, or may be a server cluster composed of a plurality of servers, a cloud computing center, or the like. The specific form of the terminal device or the server is not limited herein. The following method embodiments are described taking an execution body as a terminal device as an example.

Referring to fig. 1, a flow chart of a motion gesture evaluation method provided by the invention, as shown in fig. 1, includes the following steps:

step 110, acquiring motion gesture data to be evaluated of a current user, wherein the motion gesture data to be evaluated is gesture data obtained based on a motion video of the current user shot by an optical camera.

The current user is a user to be subjected to gesture evaluation; the optical camera may be a visible light camera, such as a monocular or binocular camera. And the optical camera can be arranged on the terminal equipment or can be independently arranged and in communication connection with the terminal equipment. The setting position of the optical camera is not particularly limited here. The number of the current users can be 1 or a plurality of the current users. The number of current users is not particularly limited here either. Further, the motion video may be continuous video frame images captured by the optical camera aiming at the moving user, where each video frame image in the motion video is an RGB image, and RGB is a color representing three channels of red (red, R), green (G), and blue (B).

Specifically, the terminal device obtains motion gesture data to be evaluated of the current user, and can control to start the optical camera to shoot a motion video of the current user when in motion, for example, the optical camera is started to shoot the motion video when the current user does several continuous motions at random; at this time, the terminal device performs analysis processing based on the motion video shot by the optical camera, for example, performs motion gesture recognition on each image frame in the motion video, so as to obtain motion gesture data to be evaluated of the current user.

Step 120, based on the reference motion gesture data, performing space-time alignment and evaluation on the motion gesture data to be evaluated, and determining a motion gesture evaluation result of the current user; the reference motion pose data is pose data determined based on motion videos captured by the optical camera for a target user wearing the IMU.

The target user is a user wearing the IMU and moving according to standard motion requirements, and the standard motion requirements can include motion standards of each motion in a set of motions, such as a head-up angle of an extension motion, an angle between two arms, and the like. And, the spatio-temporal evaluation may be a temporal evaluation and a spatial evaluation.

Specifically, the terminal device performs space-time alignment and evaluation on the motion gesture data to be evaluated based on the reference motion gesture data, for example, templates the reference motion gesture data to determine a standard motion gesture template, performs gesture evaluation of time dimension and gesture evaluation of space dimension on the motion gesture data to be evaluated based on the standard motion gesture template, so as to determine a space-time evaluation result of the motion gesture data to be evaluated, and determines the space-time evaluation result of the motion gesture data to be evaluated as a motion gesture evaluation result of the current user.

According to the motion gesture evaluation method provided by the invention, the terminal equipment determines the motion gesture evaluation result of the current user in a mode of firstly acquiring the motion gesture data to be evaluated of the current user and then carrying out space-time alignment and evaluation on the motion gesture data to be evaluated based on the reference motion gesture data. The motion gesture data to be evaluated are gesture data obtained based on the motion video of the current user shot by the optical camera, and the reference motion gesture data are gesture data determined based on the motion video shot by the optical camera for the target user wearing the IMU, so that the purpose of evaluating the motion gesture of the current user is achieved by combining the optical camera with the IMU when the reference motion gesture data are obtained in the early stage and only the optical camera is needed when the motion gesture of the user is evaluated in the later stage, the IMU is not needed to be worn for each evaluation, the optical camera is needed for each evaluation, the capturing precision of the motion video is improved, the precision of human motion gesture evaluation is greatly improved, and meanwhile, the application range of human motion gesture evaluation is greatly improved.

Optionally, the specific implementation procedure of step 110 may include:

Firstly, acquiring a motion video of a current user shot by an optical camera, wherein the current user is a user who moves randomly and does not wear an IMU at any joint point; and further acquiring motion gesture data to be evaluated of the current user based on the motion video of the current user and a preset motion gesture estimation network model.

Specifically, for a current user needing motion gesture evaluation, the terminal device firstly acquires a motion video of the current user when the current user randomly and continuously acts, then inputs the acquired motion video into a preset motion gesture estimation network model, and acquires motion gesture data to be evaluated output by the preset motion gesture estimation network model.

According to the motion gesture evaluation method provided by the invention, the terminal equipment acquires the motion gesture data to be evaluated of the current user in a mode of inputting the motion video of the current user into the preset motion gesture estimation network model. The accuracy of acquiring the motion gesture data to be evaluated is improved by combining the preset motion gesture estimation network model, and meanwhile, the accuracy and reliability of cloud gesture evaluation can be effectively improved.

Optionally, the specific process of acquiring the reference motion gesture data in step 120 may include:

Firstly, acquiring a motion video of a target user shot by an optical camera and acquiring a target IMU acceleration signal from an IMU worn by the target user, wherein the target user is a user who moves according to standard action requirements and wears the IMU at different joint points; and further acquiring reference motion attitude data based on the motion video of the target user, the target IMU acceleration signal and a preset motion attitude estimation network model.

The IMU is worn by different joint points of the target user, wherein the different joint points can be joint points which can move on the human body of the target user, such as a neck joint point, a double knee joint point, a double ankle joint point, a double shoulder joint point, a double elbow joint point, a double crotch joint point and a double wrist joint point; furthermore, the waist joint point can also wear the IMU, or can not wear the IMU. The present invention is not particularly limited herein.

Specifically, in order to facilitate the later evaluation of the quick accuracy of the motion gesture, the terminal device may acquire and store reference motion gesture data in advance, that is, instruct the target user to move according to the standard motion requirement, synchronously start the optical camera to shoot the motion video in the motion process of the target user, acquire the target IMU acceleration signals from each IMU worn by the target user, and then input the motion video of the target user and the acquired target IMU acceleration signals into the preset motion gesture estimation network model, so as to acquire the reference motion gesture data output by the preset motion gesture estimation network model.

It should be noted that, because the IMU is worn on different nodes of the target user, the positions of the nodes when the target user performs different actions can be determined based on different target IMU acceleration signals acquired in the motion process, so as to automatically mark each action of the target user in the motion stage and compensate for the loss and misjudgment of motion information caused by factors such as shielding, clothes artifacts, and the like, thereby determining accurate and reliable reference motion gesture data.

According to the motion gesture evaluation method provided by the invention, the terminal equipment acquires the reference motion gesture data by inputting the motion video of the target user into the preset motion gesture estimation network model. The accuracy and the reliability of the reference motion gesture data are greatly improved by combining the preset motion gesture estimation network model, the IMU and the optical technology, and meanwhile, the accuracy of the reference motion gesture data is also improved, so that a foundation is laid for the follow-up accurate evaluation of the motion gesture of the current user.

Optionally, the training process of the preset motion gesture estimation network model may include:

firstly, acquiring a motion video of a second sample user and a sample IMU acceleration signal, wherein the motion video is shot by an optical camera; the second sample user is a user who wears the IMU for different joint points and moves according to the standard action requirement; further sampling IMU acceleration signals, and determining different node position information of a second sample user; and then, based on the motion video of the second sample user and the position information of different joint points, carrying out two-dimensional posture estimation on an initial motion posture estimation network model containing a deep network and a time sequence convolution network, processing the two-dimensional posture into a three-dimensional posture, and training the motion information for compensating the missing and erroneous judgment in the three-dimensional posture, so as to determine a preset motion posture estimation network model.

The motion video of the second sample user may be a plurality of different motion videos shot for 1 second user, or may be one motion video shot for a plurality of second users respectively. And, the number of sample IMU acceleration signals is the same as and corresponds one-to-one to the number of motion videos of the second sample user. Further, the motion video of each second sample user may be monocular RGB motion video.

Specifically, because the IMU is an acceleration measurement device for measuring linear acceleration and rotational angular acceleration in three directions, that is, the IMU generates acceleration signals, when the IMU is worn on each node of a human body, the real-time sample IMU acceleration signals generated by each IMU can reflect the real-time motion state of the human body, including signals such as real-time position and rotation, and at the moment, the acceleration signals can be determined to be second-order differential of position and rotational variables by combining with physical laws and mathematical analysis, that is, the position information of different nodes of the second sample user, that is, the position information of each node of the second sample user wearing the IMU on the human body can be determined by performing second-order differential on the sample IMU acceleration signals.

Based on the above, the terminal device may train the initial motion gesture estimation network model by using the motion video of the second sample user and the different node position information of the second sample user, firstly, input the motion video of the second sample user into the initial motion gesture estimation network model, and estimate the two-dimensional gesture in each video frame image via the deep network, so as to estimate the two-dimensional gesture sequence; then, combining the gesture processing schematic diagram of the time sequence convolution network shown in fig. 2, and processing the two-dimensional gesture sequence into a three-dimensional gesture sequence through the time sequence convolution network; the process of estimating the three-dimensional gesture for each frame of image in the motion video of the second sample user can be as shown in fig. 3, the input image estimates the two-dimensional gesture through the two-dimensional human gesture estimation network, and then estimates the three-dimensional gesture through the three-dimensional gesture network, wherein the two-dimensional human gesture estimation network is the deep network, and the three-dimensional gesture network is the time sequence convolution network; and further selecting specific key points in the three-dimensional gesture sequence, so as to obtain a space motion track of visual positioning, matching the space motion track of the visual positioning with a sample IMU acceleration signal corresponding to a motion video of a second sample user, wherein the matching comprises the steps of correcting the selected specific key points and compensating the loss and misjudgment of the motion information of the key points caused by factors such as shielding, clothes artifacts and the like by using different joint point position information determined based on the matched sample IMU acceleration signal. And obtaining the motion gesture output after the training and the motion gesture estimation network model after the training. After multiple times of training in the method, judging whether the loss of the motion gesture estimation network model after the multiple times of training reaches a preset loss requirement, stopping training if the loss of the motion gesture estimation network model after the multiple times of training reaches the preset loss requirement, and determining that the corresponding motion gesture estimation network model is the preset motion gesture estimation network model when the training is stopped; otherwise, if the loss of the motion gesture estimation network model after multiple times of training does not reach the preset loss requirement, the motion video of the new second sample user and the new sample IMU acceleration signal are used for continuously training the motion gesture estimation network model after multiple times of training. And estimating a network model by the corresponding preset motion gesture until the training is stopped. Specific key points may include, among others, head, hands, wrists, elbows, shoulders, feet, knees and crotch.

It should be noted that, since the motion gesture estimation based on the monocular RGB motion video has the advantage that the context information provided by the adjacent video frame images is helpful for predicting the gesture of the current video frame image, for the occlusion situation, reasonable estimation can be performed according to the gestures of several frame images before and after the current video frame image; in addition, as the bone length of the same second user is unchanged in a section of motion video, constraint conditions of bone length consistency can be introduced to limit, so that the output of a more stable three-dimensional posture is facilitated.

It should be noted that, considering that the nature of the time-series convolution network is convolution in the time domain and the biggest advantage of the time-series convolution network over the cyclic neural network is that a plurality of two-dimensional gesture sequences can be processed in parallel, the computation complexity of the time-series convolution network is low and the parameter amount is small, so that the receptive field of the time-series convolution network can be further expanded by using hole convolution in the time-series convolution network, and a specific network structure can adopt a full convolution network with residual connection. In addition, the time sequence convolution network can also comprise a semi-supervised training method, and the main idea is to add a track prediction model for predicting the absolute coordinates of the root joint, and project the absolute three-dimensional posture under the camera coordinate system back to the two-dimensional plane, so that the re-projection loss is introduced; the semi-supervision method can better improve the model training precision under the condition of limited three-dimensional labels.

It should be noted that, the spatial motion track of visual positioning is matched with the sample IMU acceleration signal corresponding to the motion video of the second sample user, on one hand, the sample IMU acceleration signal is used for correcting the key point of visual positioning and adding real-time motion direction and speed information, and the main contributing factor is the acceleration sensitivity of the sample IMU acceleration signal; on the other hand, the sample IMU acceleration signal is used for complementing the critical point missing caused by factors such as shielding, and the main contributing factor is that the sample IMU acceleration signal is independent of the visual image. In the motion key point data of the combination of the sample IMU acceleration signal and the visual information, the motion records can be carried out through three-dimensional data modeling software, and the motion records are stored as standard motion gesture templates to evaluate the subsequent test motions.

According to the motion gesture evaluation method provided by the invention, the terminal equipment determines the preset motion gesture estimation network model by using the motion video of the second sample user and the sample IMU acceleration signal to perform two-dimensional gesture estimation on the initial motion gesture estimation network model comprising the deep network and the time sequence convolution network, processing the two-dimensional gesture into a three-dimensional gesture and compensating the training mode of the missing and misjudged motion information in the three-dimensional gesture. The precision and accuracy of the preset motion attitude estimation network model are effectively improved by combining training of the deep network, the time sequence convolution network and the sample IMU acceleration signal compensation technology.

Optionally, the specific implementation procedure of step 120 may include:

firstly, performing space-time alignment based on motion gesture data to be evaluated and reference motion gesture data; based on a space-time alignment success result, decomposing the reference motion gesture data into a plurality of time infinitesimal, wherein each time infinitesimal corresponds to one reference action in the reference motion gesture data; further, determining different corresponding time segments of the plurality of time infinitesimal in the motion gesture data to be evaluated; each time segment corresponds to one action in the motion gesture data to be evaluated; then, determining a time scoring result of the motion gesture data to be evaluated based on variance statistics of different time segments; determining a space evaluation result between the motion gesture data to be evaluated and the reference motion gesture data based on Euclidean distance between key point position information of each time segment; and finally, determining the time scoring result and the space scoring result as the motion gesture scoring result of the current user.

Specifically, the terminal equipment performs time alignment and space alignment based on motion gesture data to be evaluated and reference motion gesture data, and establishes a new time line based on a space-time alignment result representing time alignment and space alignment, wherein the new time line has the function of matching time expansion caused by uneven motion speed. First, the reference motion gesture data is decomposed into N pieces Time infinitesimal, N is a positive integer greater than 0; to meet the computable requirement, each time bin is set to 0.01 seconds; then, N time infinitesimal is mapped into the motion gesture data to be evaluated, and in ideal case, each infinitesimal of the motion gesture data to be evaluated is consistent with the standard motion gesture template. However, due to the complex actual motion situation, there may be any expansion and contraction on the time axis, and some decomposition actions are performed faster, and some decomposition actions are performed slower. In order to quantify the motion difference, the time length of each time micro element in the corresponding segment of the motion gesture data to be evaluated can be estimated, and different time segments corresponding to a plurality of time micro elements in the motion gesture data to be evaluated can be determined; finally, taking the sample variance of the different time slices as a time scoring result diff of the motion gesture data to be evaluated _t The calculation formula is as follows:

wherein t is _i Mu for the ith time segment _t Sample mean represented as N time slices; the sample variance of different time slices is the variance statistic of different time slices.

For spatial analysis, the keypoint location information for each time segment may be time averaged to average the time dimension over multiple identical time points. Because the standard motion gesture template and the motion gesture data to be evaluated have a one-to-one correspondence relationship at the time points, the Euclidean distance between the calculated key point position information of each time segment can be determined as a space evaluation result between the motion gesture data to be evaluated and the reference motion gesture data, and the calculation formula is as follows:

Wherein diff is _x For the x-axis scoring result between the motion gesture data to be evaluated and the reference motion gesture data, diff _y For the y-axis scoring result between the motion gesture data to be evaluated and the reference motion gesture data, diff _z For the z-axis scoring result between the motion gesture data to be evaluated and the reference motion gesture data, f _xi Coordinate value g of key point f in x-axis of ith time slice _xi Coordinate value of key point g in x-axis for ith time segment, f _yi Coordinate value g of key point f in y axis of ith time slice _yi Coordinate value of key point g in y-axis for ith time segment, f _zi Coordinate value g of key point f in z-axis of ith time slice _zi Is the coordinate value of the key point g of the ith time slice in the z-axis.

According to the motion gesture evaluation method provided by the invention, the terminal equipment determines the variance statistics of different time slices and Euclidean distance between key point position information of each time slice as the motion gesture evaluation result of the current user by firstly carrying out space-time alignment based on the motion gesture data to be evaluated and the reference motion gesture data, then decomposing the reference motion gesture data into a plurality of time micro-elements, and then determining different time slices corresponding to the time micro-elements in the motion gesture data to be evaluated. The flexibility diversity and reliability accuracy of the motion attitude evaluation are improved by combining the infinitesimal decomposition technology and the space-time evaluation technology.

Optionally, performing space-time alignment based on the motion gesture data to be evaluated and the reference motion gesture data includes:

firstly, determining an action set of identification marks in motion gesture data to be evaluated based on a preset action segmentation identification mark model; determining the length information of the body part of the current user based on the motion gesture data to be evaluated; correcting the reference motion gesture data based on the length information of the body part to determine the individuation motion gesture data of the current user; the action set and the individualized motion gesture data are then spatially-temporally aligned.

Specifically, the terminal device acquires an action set of identification marks in the motion gesture data to be evaluated by inputting the motion gesture data to be evaluated into a preset action segmentation identification mark model, and extracts body part length information of the current user from the motion gesture data to be evaluated, wherein the body part length information can include, but is not limited to, length information of an upper arm, length information of a lower arm, length information of a trunk, length information of a thigh and length information of a lower leg, and uses the body part length information of the current user to perform parameter correction on the standard motion gesture template, namely, parameter information of an upper arm, the lower arm, the trunk, the thigh and the lower leg of the standard motion gesture template, so as to determine individualized motion gesture data of the current user, namely, determine the individualized motion gesture template of the current user. Based on the above, the terminal device performs space-time alignment on the action set and the individualized motion gesture data, so that the individualized motion gesture data can be firstly analyzed into the motion data of the key points, and then the action set identified and marked in the motion gesture data to be evaluated and the motion data of the analyzed key points are subjected to space-time alignment.

In order to improve space-time alignment efficiency, the motion gesture data to be evaluated of the current user can be obtained by indicating the current user to repeatedly perform specified actions and continuously perform different specified actions, then the continuous actions of the motion gesture data to be evaluated are identified in a segmentation mode by using a preset action segmentation and identification marking model, and alignment is performed by using the segments and the individualized motion gesture data of the current user, wherein the alignment comprises two aspects of space alignment and time alignment.

Spatial alignment refers to alignment of the spatial positions of keypoints, while temporal alignment refers to alignment of the temporal sequences of individual keypoints, which are two factors of mutual coupling. Thus two alignments can be achieved simultaneously. In order to avoid the excessively complex calculation process, the invention can use the least squares loss function to treat the evaluationThe price motion gesture data is matched with the standard motion gesture template, the price motion gesture data is matched with the standard motion gesture template by estimating a set of expansion coefficient and offset coefficient on a time axis and a space axis, and the expansion coefficient is beta ₁ And bias coefficient b ₁ The calculation formula of (2) is as follows:

(β ₁ ,b ₁ )＝argmin(f.-(g.β ₁ +b ₁ ))

wherein, f is the coordinate value of the key point f on the three axes of x, y and z, g is the coordinate value of the key point g on the three axes of x, y and z. The matched motion gesture data g' to be evaluated is as follows:

g′＝g.β ₁ +b ₁

Another principle of matching is that the total time and the maximum motion amplitude of the motion gesture data to be evaluated and the standard motion gesture template are kept consistent, and this is achieved by adding another set of expansion coefficients beta on the time axis and the space axis ₂ Is realized by the following calculation formula:

the matched motion gesture data g' to be evaluated is as follows: g "=g' beta ₂ . And the matched motion parameters to be evaluated are used for scoring the motion.

According to the motion gesture evaluation method provided by the invention, the terminal equipment realizes the purpose of performing space-time alignment on the motion gesture data to be evaluated and the reference motion gesture data by determining the identified marked action set in the motion gesture data to be evaluated, correcting the reference motion gesture data based on the length information of the body part of the current user, and performing space-time alignment on the action set and the corrected individual motion gesture data, thereby ensuring that the evaluation of the motion gesture is more accurate and reliable; furthermore, due to the fact that the identification labeling mode is based on the preset action segmentation identification labeling model, the accuracy and the intelligence of the identification action set can be improved.

Optionally, the training process of the preset action segmentation recognition annotation model may include:

Firstly, constructing an initial action segmentation identification labeling model comprising a batch standardization layer, a one-dimensional convolution layer, a one-dimensional maximum pooling layer and a complete connection layer; acquiring a sample data set, wherein the sample data set comprises motion gesture data obtained based on motion videos of different first sample users shot by the optical camera and IMU acceleration signals acquired by the first sample users wearing the IMU; further dividing the sample data set into a sample training set and a sample testing set; then, using a sample training set to conduct recognition and labeling training of different actions on the initial action segmentation recognition labeling model, and determining a trained intermediate action segmentation recognition labeling model; and finally, testing the trained intermediate motion segmentation recognition annotation model by using a sample test set, and determining that the corresponding intermediate motion segmentation recognition annotation model is the preset motion segmentation recognition annotation model when the test result reaches the preset accuracy.

Specifically, aiming at the constructed initial motion segmentation recognition annotation model, the one-dimensional convolution layer can well extract motion information, the maximum pooling layer can reduce the size of the model, and meanwhile, the robustness of the initial motion segmentation recognition annotation model is remarkably improved; and jump connection is adopted in the initial action segmentation recognition annotation model so as to obtain better feature selection of the IMU acceleration signals. Experimental results show that the adopted method can obtain a good classification effect, has a certain practical value, and has strong robustness and generalization capability. And, the sample data set may select 200ms time window data at a ratio of 8:2 for use in determining the sample training set and the sample test set,

Based on the method, the sample training set is used for carrying out recognition and labeling training of different actions on the initial action segmentation recognition labeling model, namely, the IMU acceleration signals are used for carrying out automatic marking on the motion stage in the motion gesture data so as to recognize and label the starting position and the ending position of different actions, specifically, the sample training set can be divided into 10 milliseconds of information of 20 continuous one-dimensional motion data through the initial action segmentation recognition labeling model, the total size is 120 vectors, the 120 vectors are converted into classification vectors with the size of 9, and finally, cross entropy is used as a loss function and a random gradient descent method is used as an optimizer to judge the training effect of the action segmentation recognition labeling model after each training; and the learning rate is set to 0.002 according to the initial weight and threshold of the initial action segmentation recognition annotation model, and the attenuation is 0.9 for every 20 rounds. According to the method, recognition and labeling training of different actions are carried out on the initial action segmentation recognition labeling model, after the intermediate action segmentation recognition labeling model subjected to 200 rounds of training is tested through the sample test set, the 97.28% accuracy can be obtained, at the moment, training can be stopped, the test result is determined to reach the preset accuracy, and the corresponding intermediate action segmentation recognition labeling model is determined to be the preset action segmentation recognition labeling model when the test result reaches the preset accuracy.

According to the motion gesture evaluation method provided by the invention, the terminal equipment determines the preset motion segmentation recognition annotation model by using a sample training set and a sample testing set to train and test the initial motion segmentation recognition annotation model comprising a batch standardization layer, a one-dimensional convolution layer, a one-dimensional maximum pooling layer and a complete connection layer. The accuracy of the motion segmentation identification marking can be effectively improved by combining batch standardization, convolution, pooling, full-connection technology and automatic marking and compensation technology of the IMU, so that the accuracy and reliability of the preset motion segmentation identification marking model are also effectively improved.

Optionally, after step 130, the motion gesture evaluation method provided by the present invention may further include:

Specifically, in the case where the current motion gesture evaluation result includes a time evaluation result and a space evaluation result, the terminal device may display the time evaluation result on a time axis of the display interface, and display the space evaluation result on the display interface at the same time. Further, when the time evaluation result is a time score and the space evaluation result is a space score, the time score and the space score of the motion gesture of the current user in different periods can be fed back to the current user in the form of an error chart; for example, the time score of the action A of the current user at number 12 is 3, and the time score of the action A of the current user at number 13 is 1. So that the current user can visually check the evaluation result of the motion gesture by taking the joint point as a unit in the using stage.

According to the motion gesture evaluation method provided by the invention, the terminal equipment realizes the purpose of intuitively displaying the motion gesture evaluation result to the current user in a mode of information display of the motion gesture evaluation result of the current user, so that the current user can be ensured to know the motion error timely and accurately, and the feedback and improvement effects are improved.

Referring to fig. 4, in order to provide a logic block diagram of a motion gesture evaluation method according to the present invention, as shown in fig. 4, a terminal device inputs a motion integration to a preset motion gesture estimation network model based on a motion video of a target user captured by an optical camera and a target IMU acceleration signal acquired from each IMU worn by the target user, thereby determining reference motion gesture data, and determines a reference sequence, that is, determines reference motion gesture data of the target user based on the reference motion gesture data and the preset motion gesture estimation network model; and simultaneously, carrying out motion analysis based on a motion video of the current user shot by the optical camera and a preset motion gesture estimation network model, determining a sequence to be tested, namely determining motion gesture data to be evaluated of the current user, and finally carrying out space-time alignment and evaluation, namely time scoring and space scoring, on the motion gesture data to be evaluated based on reference motion gesture data, so as to obtain time scoring and space scoring of the motion gesture of the current user in different periods and feeding back the time scoring and the space scoring to the current user in the form of an error chart. Reference is made to the foregoing embodiments for specific procedures involved therein. And will not be described in detail herein.

Referring to fig. 5, a block diagram of the motion gesture evaluation method provided by the present invention is shown in fig. 5, and the motion gesture evaluation method provided by the present invention may include a standard motion gesture template generation stage and a motion gesture scoring stage. In the generation stage of the standard motion gesture template, automatic marking of a motion process and missing misjudgment motion information compensation can be performed based on a preset motion gesture estimation network model and an acquired target IMU acceleration signal, reference motion gesture data of a target user are determined, the standard motion gesture template is determined according to the reference motion gesture data, parameter correction is performed on the standard motion gesture template based on body part length information of the current user in the motion gesture data to be evaluated of the current user, and an individuation motion gesture template of the current user is determined. Aiming at a motion gesture scoring stage, firstly acquiring motion gesture data to be evaluated of a current user based on a preset motion gesture estimation network model, and then performing space-time alignment and evaluation on the motion gesture of the current user based on reference motion gesture data and an individuation motion gesture template of the current user, namely determining time scoring and space scoring. Reference is also made to the foregoing embodiments for specific procedures involved therein. And will not be described in detail herein.

The motion gesture evaluation apparatus provided by the present invention will be described below, and the motion gesture evaluation apparatus described below and the motion gesture evaluation method described above may be referred to correspondingly to each other.

Referring to fig. 6, a schematic structural diagram of a motion posture evaluation device according to the present invention, as shown in fig. 6, the motion posture evaluation device 600 includes:

the acquiring module 610 is configured to acquire motion gesture data to be evaluated of a current user, where the motion gesture data to be evaluated is gesture data obtained based on a motion video of the current user captured by an optical camera;

the evaluation module 620 is configured to perform space-time alignment and evaluation on the motion gesture data to be evaluated based on the reference motion gesture data, and determine a motion gesture evaluation result of the current user; the reference motion gesture data are gesture data determined based on motion videos shot by the optical camera aiming at a target user wearing the IMU.

Optionally, the evaluation module 620 may be specifically configured to perform space-time alignment based on the motion gesture data to be evaluated and the reference motion gesture data; based on a space-time alignment success result, decomposing the reference motion gesture data into a plurality of time infinitesimal, wherein each time infinitesimal corresponds to one reference action in the reference motion gesture data; determining different corresponding time segments of the time bins in the motion gesture data to be evaluated; each time segment corresponds to one action in the motion gesture data to be evaluated; determining a time scoring result of the motion gesture data to be evaluated based on variance statistics of different time segments; determining a space evaluation result between the motion gesture data to be evaluated and the reference motion gesture data based on Euclidean distance between key point position information of each time segment; and determining the time scoring result and the space scoring result as the motion gesture scoring result of the current user.

Optionally, the evaluation module 520 may be further specifically configured to determine an action set of the recognition annotation in the motion gesture data to be evaluated based on a preset action segmentation recognition annotation model; determining the length information of the body part of the current user based on the motion gesture data to be evaluated; correcting the reference motion gesture data based on the length information of the body part, and determining the individuation motion gesture data of the current user; the set of actions and the individualized motion gesture data are spatially-temporally aligned.

Optionally, the motion gesture evaluation device provided by the invention further comprises a training module, wherein the training module is used for constructing an initial motion segmentation recognition annotation model comprising a batch standardization layer, a one-dimensional convolution layer, a one-dimensional maximum pooling layer and a complete connection layer; acquiring a sample data set, wherein the sample data set comprises gesture data obtained based on motion videos of different first sample users shot by an optical camera and IMU acceleration signals acquired by the first sample users wearing the IMU; dividing a sample data set into a sample training set and a sample testing set; performing recognition and labeling training of different actions on the initial action segmentation recognition labeling model by using a sample training set, and determining a trained intermediate action segmentation recognition labeling model; and testing the trained intermediate motion segmentation recognition annotation model by using a sample test set, and determining that the corresponding intermediate motion segmentation recognition annotation model is the preset motion segmentation recognition annotation model when the test result reaches the preset accuracy.

Optionally, the acquiring module 610 may be specifically configured to acquire a motion video of a target user captured by an optical camera and acquire a target IMU acceleration signal from an IMU worn by the target user, where the target user is a user who moves according to a standard action requirement and wears the IMU at different joint points; and acquiring reference motion gesture data based on the motion video of the target user, the target IMU acceleration signal and a preset motion gesture estimation network model.

Optionally, the acquiring module 610 may be further configured to acquire a motion video of a current user captured by the optical camera, where the current user is a user who moves at will and does not wear the IMU at any node; and based on the motion video of the current user and a preset motion gesture estimation network model, acquiring motion gesture data to be evaluated of the current user.

Optionally, the training module may be further configured to obtain a motion video of the second sample user and a sample IMU acceleration signal, where the motion video is captured by the optical camera; the second sample user is a user who wears the IMU for different joint points and moves according to the standard action requirement; determining different node position information of a second sample user based on the sample IMU acceleration signal; based on the motion video of the second sample user and the position information of different joint points, carrying out two-dimensional gesture estimation on an initial motion gesture estimation network model containing a deep network and a time sequence convolution network, processing the two-dimensional gesture into a three-dimensional gesture, and training the motion information for compensating the missing and erroneous judgment in the three-dimensional gesture, so as to determine a preset motion gesture estimation network model.

Optionally, the motion gesture evaluation apparatus provided by the invention may further include a display module, configured to display information based on the motion gesture evaluation result.

Fig. 7 illustrates a physical schematic diagram of an electronic device, and as shown in fig. 7, the electronic device 700 may include: processor 710, communication interface 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a motion profile evaluation method comprising:

based on the reference motion gesture data, performing space-time alignment and evaluation on the motion gesture data to be evaluated, and determining a motion gesture evaluation result of the current user;

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the motion gesture evaluation method provided by the above methods, the method comprising:

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the motion profile evaluation method provided by the above methods, the method comprising:

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A motion profile evaluation method, comprising:

2. The motion gesture evaluation method according to claim 1, wherein the performing space-time alignment and evaluation on the motion gesture data to be evaluated based on reference motion gesture data, determining the motion gesture evaluation result of the current user, comprises:

3. The motion gesture evaluation method according to claim 2, wherein the performing the space-time alignment based on the motion gesture data to be evaluated and the reference motion gesture data includes:

4. The method for evaluating a motion gesture according to claim 3, wherein the training process of the preset motion segmentation recognition annotation model comprises:

constructing an initial action segmentation recognition annotation model comprising a batch standardization layer, a convolution layer, a maximum pooling layer and a full connection layer;

5. The motion gesture evaluation method according to any one of claims 1 to 4, wherein the acquisition process of the reference motion gesture data includes:

6. The method for evaluating a motion gesture according to claim 5, wherein the acquiring motion gesture data to be evaluated of the current user includes:

7. The motion gesture evaluation method according to claim 5 or 6, wherein the training process of the preset motion gesture estimation network model includes:

8. The motion gesture evaluation method according to any one of claims 1 to 4, characterized in that the method further comprises:

9. A motion profile evaluation apparatus, comprising:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of motion profile assessment according to any one of claims 1 to 8 when the program is executed by the processor.