CN116580454A

CN116580454A - Motion evaluation method and device based on target detection and human body posture estimation

Info

Publication number: CN116580454A
Application number: CN202310474749.9A
Authority: CN
Inventors: 余天驰; 徐文峰; 胡文博; 聂磊; 潘连军
Original assignee: Shanghai Xinji Intelligent Technology Co ltd
Current assignee: Shanghai Xinji Intelligent Technology Co ltd
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-08-11

Abstract

The application provides a motion estimation method and a motion estimation device based on target detection and human body posture estimation, comprising the steps of obtaining a motion video of an object to be estimated, extracting a key frame of the motion video, carrying out human body posture estimation on the key frame based on a human body posture estimation neural network, obtaining normalized first position information of a first key point of the object to be estimated when the object to be estimated moves, restoring the first position information into first coordinate information on the key frame, selecting part of key points from the first key points to construct a plurality of first key triangles, calculating a first key angle of the first key triangles according to the first coordinate information, carrying out exponential smoothing processing on the first key angle, obtaining a first smoothing result, inputting the first smoothing result into a target detection model trained in advance, obtaining a detection result, and generating a motion estimation result of the object to be estimated according to the detection result. The scheme of the application can overcome the defects of the prior art and improve the accuracy and objectivity of the motion state evaluation.

Description

Motion evaluation method and device based on target detection and human body posture estimation

Technical Field

The present application relates to the field of computer vision, and in particular, to a motion estimation method, apparatus, computer device, and storage medium based on target detection and human body posture estimation.

Background

The middle school student sports is a comprehensive assessment of student sports quality, wherein examination items and auxiliary training items comprise multiple items such as push-ups, sit-ups, standing long hops, pull-ups and the like. Reasonable training and correct technical actions are critical to students for achieving good performance.

The traditional middle school student physical examination state evaluation method mainly relies on manual measurement and judgment, such as manual timing, manual counting, manual judgment of whether actions are standard or not, and the like. The method requires manual operation by referees, which not only wastes time and labor, but also has subjectivity and error in evaluation results, and is not ideal for accurately evaluating the sports level of students.

In recent years, with the development of computer vision technology, a middle school student's physical training state evaluation method based on target detection and classification gradually becomes a research hotspot. The method utilizes the computer vision technology to identify and monitor the motion state of the students, and realizes the real-time assessment of the motion state of the students. However, the current method has some problems such as low accuracy, unstable classification effect, etc.

Therefore, a new sports status assessment scheme based on computer vision is needed to solve the above problems.

Disclosure of Invention

In order to solve the problems in the prior art, the application provides a motion estimation method, a motion estimation device, computer equipment and a storage medium based on target detection and human body posture estimation, which are used for solving the problems of low accuracy, unstable classification effect and the like in the method for identifying and monitoring the motion state by utilizing the computer vision technology in the prior art.

In order to solve one or more of the technical problems, the application adopts the following technical scheme:

in a first aspect, a motion estimation method based on object detection and human body posture estimation is provided, the method comprising:

acquiring a motion video of an object to be evaluated, and extracting a key frame of the motion video;

performing human body posture estimation on the key frame based on a human body posture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves;

restoring the first position information into first coordinate information on the key frame;

selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information;

performing exponential smoothing on the first key angle to obtain a first smoothing result;

inputting the first smoothing result into a target detection model obtained by training in advance, and obtaining a detection result, wherein the detection result comprises a motion state and an error posture;

and generating a motion evaluation result of the object to be evaluated according to the detection result.

In a specific embodiment, each first key triangle is composed of three first key points, and each first key triangle corresponds to a preset portion of the object to be evaluated.

In a specific embodiment, the motion state includes at least not the motion, the initial motion state, the process motion state, and the final motion state, and the error gesture includes at least one.

In a specific embodiment, the method further comprises:

initializing video stream attitude estimation parameters, and setting the number of standard actions and the number of non-standard actions to be 0;

creating a motion state sequence and an error gesture set, wherein the motion state sequence is initially a null array, the corresponding initial value of each element in the error gesture set is 0, and the elements in the error gesture set respectively correspond to one of the error gestures;

the generating the motion estimation result of the object to be estimated according to the detection result comprises the following steps:

and updating the motion state sequence according to the motion state in the detection result and updating the value corresponding to each element in the error gesture set according to the error gesture in the detection result.

In a specific embodiment, the updating the motion state sequence according to the motion state in the detection result and updating the value corresponding to each element in the error gesture set according to the error gesture in the detection result includes:

when the motion state in the detection result is an initial motion state, if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are all 0, the standard motion number is increased by 1, if the motion state sequence only comprises a process motion state, the non-standard motion number is increased by 1, the values corresponding to the relevant elements in the error gesture set are set to be 1, and if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are not equal to 0, the non-standard motion number is increased by 1;

when the motion state in the detection result is a process motion state and the motion state sequence is a null array or comprises a process motion state and a final motion state, adding the process motion state in the motion state sequence;

and adding a final motion state in the motion state sequence when the motion state in the detection result is the final motion state and the motion state sequence comprises a process motion state.

In a specific embodiment, the method further includes a training process of the object detection model, including:

acquiring a moving image of a target object, and labeling the moving image to obtain label data;

performing human body posture estimation on the moving image based on a human body posture estimation neural network to obtain normalized second position information of a second key point of the target object when the target object moves;

restoring the second position information to second coordinate information on the moving image;

selecting part of key points from the second key points to construct a plurality of second key triangles, and calculating second key angles of the second key triangles according to the second coordinate information;

performing exponential smoothing on the second key angle to obtain a second smoothing result;

and taking the second smooth result as input, taking the label data as output, and training the neural network model to obtain a target detection model.

In a specific embodiment, the tag data includes at least the motion state and the false gesture.

In a second aspect, corresponding to the above-mentioned motion estimation method based on object detection and human body posture estimation, there is also provided a motion estimation device based on object detection and human body posture estimation, the device comprising:

the video extraction module is used for obtaining a motion video of an object to be evaluated and extracting key frames of the motion video;

the gesture estimation module is used for estimating the human body gesture of the key frame based on a human body gesture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves;

the first calculation module is used for restoring the first position information into first coordinate information on the key frame;

the second calculation module is used for selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information;

the smoothing processing module is used for carrying out exponential smoothing processing on the first key angle to obtain a first smoothing result;

the target detection module is used for inputting the first smoothing result into a target detection model obtained by training in advance to obtain a detection result, wherein the detection result comprises a motion state and an error posture;

and the motion evaluation module is used for generating a motion evaluation result of the object to be evaluated according to the detection result.

In a third aspect, there is also provided a computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program, when executed by the processor, implementing the motion estimation method based on object detection and human body pose estimation.

In a fourth aspect, there is also provided a computer readable storage medium having stored therein a computer program which, when executed, implements the motion estimation method based on object detection and human body posture estimation.

According to the specific embodiment provided by the application, the application discloses the following technical effects:

the application provides a motion evaluation method, a motion evaluation device, a motion evaluation computer device and a motion evaluation storage medium based on target detection and human body posture estimation, wherein the method comprises the steps of obtaining a motion video of an object to be evaluated and extracting a key frame of the motion video; performing human body posture estimation on the key frame based on a human body posture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves; restoring the first position information into first coordinate information on the key frame; selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating first key angles of the first key triangles; performing exponential smoothing on the first key angle to obtain a first smoothing result; inputting the first smoothing result into a target detection model obtained by training in advance, and obtaining a detection result, wherein the detection result comprises a motion state and an error posture; and generating a motion evaluation result of the object to be evaluated according to the detection result. The scheme of the application can overcome the defects of the prior art and improve the accuracy and objectivity of the motion state evaluation.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 and fig. 2 are flowcharts of a motion estimation method based on object detection and human body posture estimation according to an embodiment of the present application:

FIG. 3 is a flowchart of a training process of a target detection model provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a motion estimation device based on object detection and human body posture estimation according to an embodiment of the present application;

fig. 5 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

As described in the background art, in the current scheme of identifying and monitoring the motion state of a student by using a computer vision technology to implement real-time assessment of the motion state of the student, there are some problems, such as low accuracy, unstable classification effect, and the like.

In view of the above needs and defects, the application provides a new motion state evaluation method based on computer vision, and in specific implementation, the motion state of an object to be evaluated (such as a student) is identified and monitored based on a target detection and human body posture evaluation algorithm, so that the real-time evaluation of the motion state of the object to be evaluated is realized, the defects of the prior art can be overcome, and the accuracy and objectivity of state evaluation are improved.

The following describes the embodiments of the present application in detail with reference to the drawings.

Example 1

Fig. 1 and fig. 2 are flowcharts of a motion estimation method based on object detection and human body posture estimation according to an embodiment of the present application, and referring to fig. 1 and fig. 2, the method mainly includes the following steps:

s110: and acquiring a motion video of the object to be evaluated, and extracting key frames of the motion video.

The method and the device are mainly suitable for motion state evaluation, wherein an object to be evaluated refers to an object needing to evaluate the motion state, and for example, when the method and the device are applied to a sports examination scene, the object to be evaluated mainly comprises students taking examination and the like. In the implementation, firstly, a motion video of an object to be evaluated is acquired, the motion video is subjected to frame processing, and key frames in the motion video are extracted.

S120: and estimating the human body posture of the key frame based on the human body posture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves.

Specifically, the human body posture estimation neural network adopted in the embodiment of the application comprises, but is not limited to, a BlazePose convolution neural network. As a preferred example, the first key point in the embodiment of the present application may be a main node of the object to be evaluated, and preferably, 32 nodes of the object to be evaluated may be selected as the first key point. Human body posture estimation is carried out on the proposed key frames based on BlazePose convolutional neural network, normalized first position information of all main articulation points (namely first key points) of an object to be evaluated when the object to be evaluated moves is obtained, the normalized first position information is marked as J (x, y, z and v), wherein x and y are represented as normalized coordinate positions of the image articulation points, z represents the depth of the articulation points taking the crotch as an origin, and v represents the possibility of visibility of the articulation points.

S130: and restoring the first position information into first coordinate information on the key frame.

Specifically, the first position information of each main node is restored to first coordinate information on a specific image (i.e. a key frame), wherein the X-coordinate of the first coordinate information of the node on the image is expressed as:

X＝x×image_width

the Y coordinate is expressed as:

Y＝y×image_height

where image_width is the original image width and image_height is the original image height.

S140: and selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information.

Specifically, in the embodiment of the present application, the first key triangle may be a triangle composed of 3 key points, where each key point corresponds to one vertex of the triangle. Wherein each triangle corresponds to a main part of the human body (e.g., head, arm, leg, etc.). By combining these key points, the posture and the motion of the human body can be accurately represented.

As a preferred example, the first critical angle ang1 may be calculated using a triangular cosine function, and by calculating angles between critical triangles, the posture and motion of the human body may be accurately estimated. The specific formula of the first key angle angA is as follows:

wherein b and c are the angles ang _A Adjacent two sides, a is ang _A Opposite sides.

S150: and carrying out exponential smoothing on the first key angle to obtain a first smoothing result.

Specifically, the exponential smoothing method is a special weighted moving average method, and the first key angle is subjected to exponential smoothing processing, so that the influence caused by the gesture recognition error can be reduced. The exponential smoothing algorithm contains two parameters that need to be initialized: sliding window size n and smoothing parameter a. The sliding window is a collection, expressed as:

data_in_window＝[ang _cur ，ang _cur-1 ，.....ang _cur-n ]

the formula for exponential averaging is:

wherein Y is the output smoothing result, and alpha is the smoothing parameter.

S160: inputting the first smoothing result into a target detection model obtained through pre-training, and obtaining a detection result, wherein the detection result comprises a motion state and an error posture.

Specifically, the output value Y (i.e., the first smoothing result) after smoothing the selected first key angle is formed into an array, and the array is used as an input value of a target detection model after area scaling, and the output result of the model is formed into an array composed of a predicted motion state and an error posture.

S170: and generating a motion evaluation result of the object to be evaluated according to the detection result.

In a preferred embodiment of the present application, each of the first critical triangles is composed of three first critical points, and each of the first critical triangles corresponds to a preset portion of the object to be evaluated.

Specifically, the predetermined portion includes, but is not limited to, a main portion of the human body, such as the head, arms, legs, etc. As a preferred example, in the embodiment of the present application, the side length of the first key triangle may be represented by the euclidean distance of two adjacent first key points that constitute the first key triangle. For example, two adjacent first key points constituting a certain first key triangle are a (x _a ，y _a ，z _a ，v _a ) And B (x) _b ，y _b ，z _b ，V _b ) The side lengths corresponding to the key points a and B are euclidean distances between the key points a and B, expressed as:

wherein x is _a ，y _a Is the coordinate position of A (namely the first coordinate information of A) in the original image, and x _b ，y _b Is the coordinate position of B in the original image (i.e., the first coordinate information of B).

As a preferred implementation manner, in the embodiment of the present application, the motion states include at least the motion, the initial motion state s1, the process motion state s2, and the final motion state s3, and the error posture includes at least one of the error posture 1, the error posture 2, the error posture 3, and the like, and the specific error posture may be set according to the actual motion, which is not limited herein.

As a preferred implementation manner, in an embodiment of the present application, the method further includes:

Specifically, before evaluation, firstly initializing video stream attitude estimation parameters, and setting the number of standard actions and the number of non-standard actions as O; secondly, a motion state sequence is created, which is denoted as state_seq, and an error gesture set, which is denoted as INCORRECT_POSTURE { "0": 0, "1": 0, "2": 0, "3": 0, wherein "1", "2", and "3" correspond to the wrong pose 1, the wrong pose 2, and the wrong pose 3, respectively, and if not, the corresponding values are always 0, and "0" represents that the motion does not reach the standard; finally, after the detection result is obtained through prediction by the trained target detection model, the motion state sequence is updated according to the motion state in the detection result, and the values corresponding to the elements in the error posture set are updated according to the error posture in the detection result.

In a preferred embodiment of the present application, the updating the motion state sequence according to the motion state in the detection result and updating the values corresponding to each element in the set of error poses according to the error pose in the detection result include:

when the motion state in the detection result is an initial motion state, if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are all 0, the standard motion number is increased by 1, if the motion state sequence only comprises a process motion state, the non-standard motion number is increased by 1, the values corresponding to the relevant elements in the error gesture set are set to be 1, and if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are not equal to 0, the standard motion number is increased by 1;

Specifically, if the motion state in the detection result is the initial motion state s1, and if the length of the motion state sequence state_seq is 3 and the values in the error POSTURE set incorrect_position set are all 0, the number of motion standards is increased by 1; if the motion state sequence state_seq= [ s2], adding 1 to the number of nonstandard actions, setting a value corresponding to a relevant element in the error posture set to 1, and displaying an error in the video, wherein the error posture motion is not up to standard, for example, if the motion state sequence state_seq= [ s2] is sit-up, the motion state sequence state_seq is up to standard, and if the motion state sequence state_seq is push-up, the motion state sequence state_seq is push-down; if the state_seq is 3 in length and the set of false poses INCORRECT POSTURE is not null, the number of nonstandard actions is increased by 1.

If the motion state in the detection result is the process motion state s2, and the state_seq= [ ] or the state_seq= [ s2, s3], s2 is added in the state_seq.

If the motion state in the detection result is the final motion state s3, and state_seq= [ s2], s3 is added in state_seq.

As a preferred implementation manner, in the embodiment of the present application, after completing one-time counting and judging, an initialization process is performed on the motion state sequence and the error POSTURE set, where after the initialization, state_seq is an empty array, and INCORRECT_POSTURE= { "0": 0, "1": 0, "2": 0, "3": 0}. And then processing the next key frame of the video, outputting parameters such as the number of standard actions, the number of non-standard actions, the motion history and the like if the video is finished, and improving and guiding the motion prompt athletic performance according to the parameters.

As a preferred implementation manner, in an embodiment of the present application, the method further includes a training process of the target detection model, including:

s210: and acquiring a moving image of the target object, and labeling the moving image to obtain tag data.

Specifically, for each moving image, to annotate its motion state and error gesture, annotating its motion state may be represented by a one-hot code, and as a preferred example, there are four motion states for each motion: not the motion, the initial motion state s1, the process motion state s2, the final motion state s3. For example, the moving image is in the state s2, and is denoted by [0, 1,0], and if there is a wrong posture 1, there is no wrong posture 2 and wrong posture 3, the label is [0,0,1,0,1,0,0]. For example, if the sit-up is in the procedural motion state s3, only one posture error is that no error occurs in the knee-bending non-standard image, the label is [0, 1,0]. For example, when the push-up is in the motion state s2, if errors occur in both the knee and waist postures of extension, the label is [0,0,1,0,1,1].

S220: and estimating the human body posture of the moving image based on the human body posture estimation neural network to obtain normalized second position information of a second key point of the target object when the target object moves.

S230: and restoring the second position information into second coordinate information on the moving image.

S240: and selecting part of key points from the second key points to construct a plurality of second key triangles, and calculating second key angles of the second key triangles according to the second coordinate information.

S250: and carrying out exponential smoothing on the second key angle to obtain a second smoothing result.

Specifically, steps S220-S250 may refer to the relevant content of steps S120-S150, which is not described herein.

S260: and taking the second smooth result as input, taking the label data as output, and training the neural network model to obtain a target detection model.

Specifically, the selected array formed by the output values after the second key angles are smoothed is used as an input value of the neural network model after being subjected to area scaling, and the array formed by the marked motion state and the error posture is used as an output value to carry out model training. The method comprises the steps of selecting softmax as a classifier, taking the value of a loss function of a neural network as the sum of prediction state classification loss and error attitude estimation loss, adjusting super parameters, and training a model with highest accuracy as a target detection model.

Example two

Corresponding to the first embodiment, the present application further provides a motion estimation device based on object detection and human body posture estimation, where in this embodiment, the same or similar content as that of the first embodiment may be referred to the above description, and will not be described in detail later. Referring to fig. 4, the apparatus includes:

Example III

Corresponding to the first and second embodiments, the present application further provides a computer device, including: a processor and a memory, the memory storing a computer program executable on the processor, which when executed by the processor, performs the motion estimation method based on object detection and human body posture estimation provided in any one of the above embodiments.

FIG. 5 illustrates, among other things, a computer device that may include a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected by a communication bus 1530.

The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing related programs to implement the technical scheme provided by the present application.

The Memory 1520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the electronic device, a Basic Input Output System (BIOS) for controlling low-level operation of the electronic device. In addition, a web browser 1523, a data storage management system 1524, a device identification information processing system 1525, and the like may also be stored. The device identification information processing system 1525 may be an application program that implements the operations of the steps described above in embodiments of the present application. In general, when the present application is implemented in software or firmware, the relevant program code is stored in the memory 1520 and executed by the processor 1510.

The input/output interface 1513 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The network interface 1514 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

The bus includes a path to transfer information between various components of the device (e.g., the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).

In addition, the electronic device may also obtain information of specific acquisition conditions from the virtual resource object acquisition condition information database, so as to be used for performing condition judgment, and the like.

It is noted that although the above devices illustrate only the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus, etc., in particular implementations, the device may include other components necessary to achieve proper functioning. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present application, and not all of the components shown in the drawings.

Example IV

The present application also provides a computer readable storage medium corresponding to the first to third embodiments, wherein in the present embodiment, the same or similar content as that of the first to third embodiments can be referred to the above description, and the description is omitted.

The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a motion estimation method based on object detection and human body pose estimation as described above.

In some embodiments, when the computer program is executed by the processor, the steps corresponding to the method described in the first embodiment may be further implemented, and reference may be made to the detailed description in the first embodiment, which is not repeated herein.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

The foregoing has outlined the more detailed description of the preferred embodiment of the present application and is provided herein as a detailed description of the principles and embodiments of the present application with the use of specific examples, the above examples being provided for the purpose of facilitating the understanding of the method of the present application and the core ideas thereof; also, it is within the scope of the present application to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. A motion estimation method based on object detection and human body posture estimation, the method comprising:

2. The method according to claim 1, wherein each of the first key triangles consists of three first key points, and each of the first key triangles corresponds to a preset portion of the object to be evaluated.

3. The method of claim 1, wherein the motion state includes at least one of a state other than the motion, a start motion state, a process motion state, and a final motion state.

4. A method of motion estimation based on object detection and human body pose estimation according to claim 3, further comprising:

5. The method according to claim 4, wherein updating the motion state sequence according to the motion state in the detection result and updating the values corresponding to the elements in the set of erroneous postures according to the erroneous posture in the detection result comprises:

6. The method of motion estimation based on object detection and human body pose estimation according to any of claims 1 to 5, further comprising a training process of said object detection model comprising:

7. The method of motion estimation based on object detection and human pose estimation according to claim 6, wherein said tag data includes at least said motion state and said false pose.

8. A motion estimation device based on object detection and human body pose estimation, the device comprising:

9. A computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, which when executed by the processor, implements the object detection and human posture estimation based motion estimation method of any of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, characterized in that the computer program, when executed, implements the motion estimation method based on object detection and human body posture estimation according to any one of claims 1 to 7.