CN116580454A - Motion evaluation method and device based on target detection and human body posture estimation - Google Patents

Motion evaluation method and device based on target detection and human body posture estimation Download PDF

Info

Publication number
CN116580454A
CN116580454A CN202310474749.9A CN202310474749A CN116580454A CN 116580454 A CN116580454 A CN 116580454A CN 202310474749 A CN202310474749 A CN 202310474749A CN 116580454 A CN116580454 A CN 116580454A
Authority
CN
China
Prior art keywords
key
motion
motion state
estimation
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310474749.9A
Other languages
Chinese (zh)
Inventor
余天驰
徐文峰
胡文博
聂磊
潘连军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinji Intelligent Technology Co ltd
Original Assignee
Shanghai Xinji Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinji Intelligent Technology Co ltd filed Critical Shanghai Xinji Intelligent Technology Co ltd
Priority to CN202310474749.9A priority Critical patent/CN116580454A/en
Publication of CN116580454A publication Critical patent/CN116580454A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application provides a motion estimation method and a motion estimation device based on target detection and human body posture estimation, comprising the steps of obtaining a motion video of an object to be estimated, extracting a key frame of the motion video, carrying out human body posture estimation on the key frame based on a human body posture estimation neural network, obtaining normalized first position information of a first key point of the object to be estimated when the object to be estimated moves, restoring the first position information into first coordinate information on the key frame, selecting part of key points from the first key points to construct a plurality of first key triangles, calculating a first key angle of the first key triangles according to the first coordinate information, carrying out exponential smoothing processing on the first key angle, obtaining a first smoothing result, inputting the first smoothing result into a target detection model trained in advance, obtaining a detection result, and generating a motion estimation result of the object to be estimated according to the detection result. The scheme of the application can overcome the defects of the prior art and improve the accuracy and objectivity of the motion state evaluation.

Description

Motion evaluation method and device based on target detection and human body posture estimation
Technical Field
The present application relates to the field of computer vision, and in particular, to a motion estimation method, apparatus, computer device, and storage medium based on target detection and human body posture estimation.
Background
The middle school student sports is a comprehensive assessment of student sports quality, wherein examination items and auxiliary training items comprise multiple items such as push-ups, sit-ups, standing long hops, pull-ups and the like. Reasonable training and correct technical actions are critical to students for achieving good performance.
The traditional middle school student physical examination state evaluation method mainly relies on manual measurement and judgment, such as manual timing, manual counting, manual judgment of whether actions are standard or not, and the like. The method requires manual operation by referees, which not only wastes time and labor, but also has subjectivity and error in evaluation results, and is not ideal for accurately evaluating the sports level of students.
In recent years, with the development of computer vision technology, a middle school student's physical training state evaluation method based on target detection and classification gradually becomes a research hotspot. The method utilizes the computer vision technology to identify and monitor the motion state of the students, and realizes the real-time assessment of the motion state of the students. However, the current method has some problems such as low accuracy, unstable classification effect, etc.
Therefore, a new sports status assessment scheme based on computer vision is needed to solve the above problems.
Disclosure of Invention
In order to solve the problems in the prior art, the application provides a motion estimation method, a motion estimation device, computer equipment and a storage medium based on target detection and human body posture estimation, which are used for solving the problems of low accuracy, unstable classification effect and the like in the method for identifying and monitoring the motion state by utilizing the computer vision technology in the prior art.
In order to solve one or more of the technical problems, the application adopts the following technical scheme:
in a first aspect, a motion estimation method based on object detection and human body posture estimation is provided, the method comprising:
acquiring a motion video of an object to be evaluated, and extracting a key frame of the motion video;
performing human body posture estimation on the key frame based on a human body posture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves;
restoring the first position information into first coordinate information on the key frame;
selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information;
performing exponential smoothing on the first key angle to obtain a first smoothing result;
inputting the first smoothing result into a target detection model obtained by training in advance, and obtaining a detection result, wherein the detection result comprises a motion state and an error posture;
and generating a motion evaluation result of the object to be evaluated according to the detection result.
In a specific embodiment, each first key triangle is composed of three first key points, and each first key triangle corresponds to a preset portion of the object to be evaluated.
In a specific embodiment, the motion state includes at least not the motion, the initial motion state, the process motion state, and the final motion state, and the error gesture includes at least one.
In a specific embodiment, the method further comprises:
initializing video stream attitude estimation parameters, and setting the number of standard actions and the number of non-standard actions to be 0;
creating a motion state sequence and an error gesture set, wherein the motion state sequence is initially a null array, the corresponding initial value of each element in the error gesture set is 0, and the elements in the error gesture set respectively correspond to one of the error gestures;
the generating the motion estimation result of the object to be estimated according to the detection result comprises the following steps:
and updating the motion state sequence according to the motion state in the detection result and updating the value corresponding to each element in the error gesture set according to the error gesture in the detection result.
In a specific embodiment, the updating the motion state sequence according to the motion state in the detection result and updating the value corresponding to each element in the error gesture set according to the error gesture in the detection result includes:
when the motion state in the detection result is an initial motion state, if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are all 0, the standard motion number is increased by 1, if the motion state sequence only comprises a process motion state, the non-standard motion number is increased by 1, the values corresponding to the relevant elements in the error gesture set are set to be 1, and if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are not equal to 0, the non-standard motion number is increased by 1;
when the motion state in the detection result is a process motion state and the motion state sequence is a null array or comprises a process motion state and a final motion state, adding the process motion state in the motion state sequence;
and adding a final motion state in the motion state sequence when the motion state in the detection result is the final motion state and the motion state sequence comprises a process motion state.
In a specific embodiment, the method further includes a training process of the object detection model, including:
acquiring a moving image of a target object, and labeling the moving image to obtain label data;
performing human body posture estimation on the moving image based on a human body posture estimation neural network to obtain normalized second position information of a second key point of the target object when the target object moves;
restoring the second position information to second coordinate information on the moving image;
selecting part of key points from the second key points to construct a plurality of second key triangles, and calculating second key angles of the second key triangles according to the second coordinate information;
performing exponential smoothing on the second key angle to obtain a second smoothing result;
and taking the second smooth result as input, taking the label data as output, and training the neural network model to obtain a target detection model.
In a specific embodiment, the tag data includes at least the motion state and the false gesture.
In a second aspect, corresponding to the above-mentioned motion estimation method based on object detection and human body posture estimation, there is also provided a motion estimation device based on object detection and human body posture estimation, the device comprising:
the video extraction module is used for obtaining a motion video of an object to be evaluated and extracting key frames of the motion video;
the gesture estimation module is used for estimating the human body gesture of the key frame based on a human body gesture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves;
the first calculation module is used for restoring the first position information into first coordinate information on the key frame;
the second calculation module is used for selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information;
the smoothing processing module is used for carrying out exponential smoothing processing on the first key angle to obtain a first smoothing result;
the target detection module is used for inputting the first smoothing result into a target detection model obtained by training in advance to obtain a detection result, wherein the detection result comprises a motion state and an error posture;
and the motion evaluation module is used for generating a motion evaluation result of the object to be evaluated according to the detection result.
In a third aspect, there is also provided a computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program, when executed by the processor, implementing the motion estimation method based on object detection and human body pose estimation.
In a fourth aspect, there is also provided a computer readable storage medium having stored therein a computer program which, when executed, implements the motion estimation method based on object detection and human body posture estimation.
According to the specific embodiment provided by the application, the application discloses the following technical effects:
the application provides a motion evaluation method, a motion evaluation device, a motion evaluation computer device and a motion evaluation storage medium based on target detection and human body posture estimation, wherein the method comprises the steps of obtaining a motion video of an object to be evaluated and extracting a key frame of the motion video; performing human body posture estimation on the key frame based on a human body posture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves; restoring the first position information into first coordinate information on the key frame; selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating first key angles of the first key triangles; performing exponential smoothing on the first key angle to obtain a first smoothing result; inputting the first smoothing result into a target detection model obtained by training in advance, and obtaining a detection result, wherein the detection result comprises a motion state and an error posture; and generating a motion evaluation result of the object to be evaluated according to the detection result. The scheme of the application can overcome the defects of the prior art and improve the accuracy and objectivity of the motion state evaluation.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 and fig. 2 are flowcharts of a motion estimation method based on object detection and human body posture estimation according to an embodiment of the present application:
FIG. 3 is a flowchart of a training process of a target detection model provided by an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a motion estimation device based on object detection and human body posture estimation according to an embodiment of the present application;
fig. 5 is a schematic diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As described in the background art, in the current scheme of identifying and monitoring the motion state of a student by using a computer vision technology to implement real-time assessment of the motion state of the student, there are some problems, such as low accuracy, unstable classification effect, and the like.
In view of the above needs and defects, the application provides a new motion state evaluation method based on computer vision, and in specific implementation, the motion state of an object to be evaluated (such as a student) is identified and monitored based on a target detection and human body posture evaluation algorithm, so that the real-time evaluation of the motion state of the object to be evaluated is realized, the defects of the prior art can be overcome, and the accuracy and objectivity of state evaluation are improved.
The following describes the embodiments of the present application in detail with reference to the drawings.
Example 1
Fig. 1 and fig. 2 are flowcharts of a motion estimation method based on object detection and human body posture estimation according to an embodiment of the present application, and referring to fig. 1 and fig. 2, the method mainly includes the following steps:
s110: and acquiring a motion video of the object to be evaluated, and extracting key frames of the motion video.
The method and the device are mainly suitable for motion state evaluation, wherein an object to be evaluated refers to an object needing to evaluate the motion state, and for example, when the method and the device are applied to a sports examination scene, the object to be evaluated mainly comprises students taking examination and the like. In the implementation, firstly, a motion video of an object to be evaluated is acquired, the motion video is subjected to frame processing, and key frames in the motion video are extracted.
S120: and estimating the human body posture of the key frame based on the human body posture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves.
Specifically, the human body posture estimation neural network adopted in the embodiment of the application comprises, but is not limited to, a BlazePose convolution neural network. As a preferred example, the first key point in the embodiment of the present application may be a main node of the object to be evaluated, and preferably, 32 nodes of the object to be evaluated may be selected as the first key point. Human body posture estimation is carried out on the proposed key frames based on BlazePose convolutional neural network, normalized first position information of all main articulation points (namely first key points) of an object to be evaluated when the object to be evaluated moves is obtained, the normalized first position information is marked as J (x, y, z and v), wherein x and y are represented as normalized coordinate positions of the image articulation points, z represents the depth of the articulation points taking the crotch as an origin, and v represents the possibility of visibility of the articulation points.
S130: and restoring the first position information into first coordinate information on the key frame.
Specifically, the first position information of each main node is restored to first coordinate information on a specific image (i.e. a key frame), wherein the X-coordinate of the first coordinate information of the node on the image is expressed as:
X=x×image_width
the Y coordinate is expressed as:
Y=y×image_height
where image_width is the original image width and image_height is the original image height.
S140: and selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information.
Specifically, in the embodiment of the present application, the first key triangle may be a triangle composed of 3 key points, where each key point corresponds to one vertex of the triangle. Wherein each triangle corresponds to a main part of the human body (e.g., head, arm, leg, etc.). By combining these key points, the posture and the motion of the human body can be accurately represented.
As a preferred example, the first critical angle ang1 may be calculated using a triangular cosine function, and by calculating angles between critical triangles, the posture and motion of the human body may be accurately estimated. The specific formula of the first key angle angA is as follows:
wherein b and c are the angles ang A Adjacent two sides, a is ang A Opposite sides.
S150: and carrying out exponential smoothing on the first key angle to obtain a first smoothing result.
Specifically, the exponential smoothing method is a special weighted moving average method, and the first key angle is subjected to exponential smoothing processing, so that the influence caused by the gesture recognition error can be reduced. The exponential smoothing algorithm contains two parameters that need to be initialized: sliding window size n and smoothing parameter a. The sliding window is a collection, expressed as:
data_in_window=[ang cur ,ang cur-1 ,.....ang cur-n ]
the formula for exponential averaging is:
wherein Y is the output smoothing result, and alpha is the smoothing parameter.
S160: inputting the first smoothing result into a target detection model obtained through pre-training, and obtaining a detection result, wherein the detection result comprises a motion state and an error posture.
Specifically, the output value Y (i.e., the first smoothing result) after smoothing the selected first key angle is formed into an array, and the array is used as an input value of a target detection model after area scaling, and the output result of the model is formed into an array composed of a predicted motion state and an error posture.
S170: and generating a motion evaluation result of the object to be evaluated according to the detection result.
In a preferred embodiment of the present application, each of the first critical triangles is composed of three first critical points, and each of the first critical triangles corresponds to a preset portion of the object to be evaluated.
Specifically, the predetermined portion includes, but is not limited to, a main portion of the human body, such as the head, arms, legs, etc. As a preferred example, in the embodiment of the present application, the side length of the first key triangle may be represented by the euclidean distance of two adjacent first key points that constitute the first key triangle. For example, two adjacent first key points constituting a certain first key triangle are a (x a ,y a ,z a ,v a ) And B (x) b ,y b ,z b ,V b ) The side lengths corresponding to the key points a and B are euclidean distances between the key points a and B, expressed as:
wherein x is a ,y a Is the coordinate position of A (namely the first coordinate information of A) in the original image, and x b ,y b Is the coordinate position of B in the original image (i.e., the first coordinate information of B).
As a preferred implementation manner, in the embodiment of the present application, the motion states include at least the motion, the initial motion state s1, the process motion state s2, and the final motion state s3, and the error posture includes at least one of the error posture 1, the error posture 2, the error posture 3, and the like, and the specific error posture may be set according to the actual motion, which is not limited herein.
As a preferred implementation manner, in an embodiment of the present application, the method further includes:
initializing video stream attitude estimation parameters, and setting the number of standard actions and the number of non-standard actions to be 0;
creating a motion state sequence and an error gesture set, wherein the motion state sequence is initially a null array, the corresponding initial value of each element in the error gesture set is 0, and the elements in the error gesture set respectively correspond to one of the error gestures;
the generating the motion estimation result of the object to be estimated according to the detection result comprises the following steps:
and updating the motion state sequence according to the motion state in the detection result and updating the value corresponding to each element in the error gesture set according to the error gesture in the detection result.
Specifically, before evaluation, firstly initializing video stream attitude estimation parameters, and setting the number of standard actions and the number of non-standard actions as O; secondly, a motion state sequence is created, which is denoted as state_seq, and an error gesture set, which is denoted as INCORRECT_POSTURE { "0": 0, "1": 0, "2": 0, "3": 0, wherein "1", "2", and "3" correspond to the wrong pose 1, the wrong pose 2, and the wrong pose 3, respectively, and if not, the corresponding values are always 0, and "0" represents that the motion does not reach the standard; finally, after the detection result is obtained through prediction by the trained target detection model, the motion state sequence is updated according to the motion state in the detection result, and the values corresponding to the elements in the error posture set are updated according to the error posture in the detection result.
In a preferred embodiment of the present application, the updating the motion state sequence according to the motion state in the detection result and updating the values corresponding to each element in the set of error poses according to the error pose in the detection result include:
when the motion state in the detection result is an initial motion state, if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are all 0, the standard motion number is increased by 1, if the motion state sequence only comprises a process motion state, the non-standard motion number is increased by 1, the values corresponding to the relevant elements in the error gesture set are set to be 1, and if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are not equal to 0, the standard motion number is increased by 1;
when the motion state in the detection result is a process motion state and the motion state sequence is a null array or comprises a process motion state and a final motion state, adding the process motion state in the motion state sequence;
and adding a final motion state in the motion state sequence when the motion state in the detection result is the final motion state and the motion state sequence comprises a process motion state.
Specifically, if the motion state in the detection result is the initial motion state s1, and if the length of the motion state sequence state_seq is 3 and the values in the error POSTURE set incorrect_position set are all 0, the number of motion standards is increased by 1; if the motion state sequence state_seq= [ s2], adding 1 to the number of nonstandard actions, setting a value corresponding to a relevant element in the error posture set to 1, and displaying an error in the video, wherein the error posture motion is not up to standard, for example, if the motion state sequence state_seq= [ s2] is sit-up, the motion state sequence state_seq is up to standard, and if the motion state sequence state_seq is push-up, the motion state sequence state_seq is push-down; if the state_seq is 3 in length and the set of false poses INCORRECT POSTURE is not null, the number of nonstandard actions is increased by 1.
If the motion state in the detection result is the process motion state s2, and the state_seq= [ ] or the state_seq= [ s2, s3], s2 is added in the state_seq.
If the motion state in the detection result is the final motion state s3, and state_seq= [ s2], s3 is added in state_seq.
As a preferred implementation manner, in the embodiment of the present application, after completing one-time counting and judging, an initialization process is performed on the motion state sequence and the error POSTURE set, where after the initialization, state_seq is an empty array, and INCORRECT_POSTURE= { "0": 0, "1": 0, "2": 0, "3": 0}. And then processing the next key frame of the video, outputting parameters such as the number of standard actions, the number of non-standard actions, the motion history and the like if the video is finished, and improving and guiding the motion prompt athletic performance according to the parameters.
As a preferred implementation manner, in an embodiment of the present application, the method further includes a training process of the target detection model, including:
s210: and acquiring a moving image of the target object, and labeling the moving image to obtain tag data.
Specifically, for each moving image, to annotate its motion state and error gesture, annotating its motion state may be represented by a one-hot code, and as a preferred example, there are four motion states for each motion: not the motion, the initial motion state s1, the process motion state s2, the final motion state s3. For example, the moving image is in the state s2, and is denoted by [0, 1,0], and if there is a wrong posture 1, there is no wrong posture 2 and wrong posture 3, the label is [0,0,1,0,1,0,0]. For example, if the sit-up is in the procedural motion state s3, only one posture error is that no error occurs in the knee-bending non-standard image, the label is [0, 1,0]. For example, when the push-up is in the motion state s2, if errors occur in both the knee and waist postures of extension, the label is [0,0,1,0,1,1].
S220: and estimating the human body posture of the moving image based on the human body posture estimation neural network to obtain normalized second position information of a second key point of the target object when the target object moves.
S230: and restoring the second position information into second coordinate information on the moving image.
S240: and selecting part of key points from the second key points to construct a plurality of second key triangles, and calculating second key angles of the second key triangles according to the second coordinate information.
S250: and carrying out exponential smoothing on the second key angle to obtain a second smoothing result.
Specifically, steps S220-S250 may refer to the relevant content of steps S120-S150, which is not described herein.
S260: and taking the second smooth result as input, taking the label data as output, and training the neural network model to obtain a target detection model.
Specifically, the selected array formed by the output values after the second key angles are smoothed is used as an input value of the neural network model after being subjected to area scaling, and the array formed by the marked motion state and the error posture is used as an output value to carry out model training. The method comprises the steps of selecting softmax as a classifier, taking the value of a loss function of a neural network as the sum of prediction state classification loss and error attitude estimation loss, adjusting super parameters, and training a model with highest accuracy as a target detection model.
Example two
Corresponding to the first embodiment, the present application further provides a motion estimation device based on object detection and human body posture estimation, where in this embodiment, the same or similar content as that of the first embodiment may be referred to the above description, and will not be described in detail later. Referring to fig. 4, the apparatus includes:
the video extraction module is used for obtaining a motion video of an object to be evaluated and extracting key frames of the motion video;
the gesture estimation module is used for estimating the human body gesture of the key frame based on a human body gesture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves;
the first calculation module is used for restoring the first position information into first coordinate information on the key frame;
the second calculation module is used for selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information;
the smoothing processing module is used for carrying out exponential smoothing processing on the first key angle to obtain a first smoothing result;
the target detection module is used for inputting the first smoothing result into a target detection model obtained by training in advance to obtain a detection result, wherein the detection result comprises a motion state and an error posture;
and the motion evaluation module is used for generating a motion evaluation result of the object to be evaluated according to the detection result.
Example III
Corresponding to the first and second embodiments, the present application further provides a computer device, including: a processor and a memory, the memory storing a computer program executable on the processor, which when executed by the processor, performs the motion estimation method based on object detection and human body posture estimation provided in any one of the above embodiments.
FIG. 5 illustrates, among other things, a computer device that may include a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected by a communication bus 1530.
The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing related programs to implement the technical scheme provided by the present application.
The Memory 1520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the electronic device, a Basic Input Output System (BIOS) for controlling low-level operation of the electronic device. In addition, a web browser 1523, a data storage management system 1524, a device identification information processing system 1525, and the like may also be stored. The device identification information processing system 1525 may be an application program that implements the operations of the steps described above in embodiments of the present application. In general, when the present application is implemented in software or firmware, the relevant program code is stored in the memory 1520 and executed by the processor 1510.
The input/output interface 1513 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The network interface 1514 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
The bus includes a path to transfer information between various components of the device (e.g., the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).
In addition, the electronic device may also obtain information of specific acquisition conditions from the virtual resource object acquisition condition information database, so as to be used for performing condition judgment, and the like.
It is noted that although the above devices illustrate only the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus, etc., in particular implementations, the device may include other components necessary to achieve proper functioning. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present application, and not all of the components shown in the drawings.
Example IV
The present application also provides a computer readable storage medium corresponding to the first to third embodiments, wherein in the present embodiment, the same or similar content as that of the first to third embodiments can be referred to the above description, and the description is omitted.
The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a motion estimation method based on object detection and human body pose estimation as described above.
In some embodiments, when the computer program is executed by the processor, the steps corresponding to the method described in the first embodiment may be further implemented, and reference may be made to the detailed description in the first embodiment, which is not repeated herein.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
The foregoing has outlined the more detailed description of the preferred embodiment of the present application and is provided herein as a detailed description of the principles and embodiments of the present application with the use of specific examples, the above examples being provided for the purpose of facilitating the understanding of the method of the present application and the core ideas thereof; also, it is within the scope of the present application to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the application.

Claims (10)

1. A motion estimation method based on object detection and human body posture estimation, the method comprising:
acquiring a motion video of an object to be evaluated, and extracting a key frame of the motion video;
performing human body posture estimation on the key frame based on a human body posture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves;
restoring the first position information into first coordinate information on the key frame;
selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information;
performing exponential smoothing on the first key angle to obtain a first smoothing result;
inputting the first smoothing result into a target detection model obtained by training in advance, and obtaining a detection result, wherein the detection result comprises a motion state and an error posture;
and generating a motion evaluation result of the object to be evaluated according to the detection result.
2. The method according to claim 1, wherein each of the first key triangles consists of three first key points, and each of the first key triangles corresponds to a preset portion of the object to be evaluated.
3. The method of claim 1, wherein the motion state includes at least one of a state other than the motion, a start motion state, a process motion state, and a final motion state.
4. A method of motion estimation based on object detection and human body pose estimation according to claim 3, further comprising:
initializing video stream attitude estimation parameters, and setting the number of standard actions and the number of non-standard actions to be 0;
creating a motion state sequence and an error gesture set, wherein the motion state sequence is initially a null array, the corresponding initial value of each element in the error gesture set is 0, and the elements in the error gesture set respectively correspond to one of the error gestures;
the generating the motion estimation result of the object to be estimated according to the detection result comprises the following steps:
and updating the motion state sequence according to the motion state in the detection result and updating the value corresponding to each element in the error gesture set according to the error gesture in the detection result.
5. The method according to claim 4, wherein updating the motion state sequence according to the motion state in the detection result and updating the values corresponding to the elements in the set of erroneous postures according to the erroneous posture in the detection result comprises:
when the motion state in the detection result is an initial motion state, if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are all 0, the standard motion number is increased by 1, if the motion state sequence only comprises a process motion state, the non-standard motion number is increased by 1, the values corresponding to the relevant elements in the error gesture set are set to be 1, and if the length of the motion state sequence is 3 and the values corresponding to the elements in the error gesture set are not equal to 0, the non-standard motion number is increased by 1;
when the motion state in the detection result is a process motion state and the motion state sequence is a null array or comprises a process motion state and a final motion state, adding the process motion state in the motion state sequence;
and adding a final motion state in the motion state sequence when the motion state in the detection result is the final motion state and the motion state sequence comprises a process motion state.
6. The method of motion estimation based on object detection and human body pose estimation according to any of claims 1 to 5, further comprising a training process of said object detection model comprising:
acquiring a moving image of a target object, and labeling the moving image to obtain label data;
performing human body posture estimation on the moving image based on a human body posture estimation neural network to obtain normalized second position information of a second key point of the target object when the target object moves;
restoring the second position information to second coordinate information on the moving image;
selecting part of key points from the second key points to construct a plurality of second key triangles, and calculating second key angles of the second key triangles according to the second coordinate information;
performing exponential smoothing on the second key angle to obtain a second smoothing result;
and taking the second smooth result as input, taking the label data as output, and training the neural network model to obtain a target detection model.
7. The method of motion estimation based on object detection and human pose estimation according to claim 6, wherein said tag data includes at least said motion state and said false pose.
8. A motion estimation device based on object detection and human body pose estimation, the device comprising:
the video extraction module is used for obtaining a motion video of an object to be evaluated and extracting key frames of the motion video;
the gesture estimation module is used for estimating the human body gesture of the key frame based on a human body gesture estimation neural network to obtain normalized first position information of a first key point of the object to be evaluated when the object to be evaluated moves;
the first calculation module is used for restoring the first position information into first coordinate information on the key frame;
the second calculation module is used for selecting part of key points from the first key points to construct a plurality of first key triangles, and calculating a first key angle of the first key triangles according to the first coordinate information;
the smoothing processing module is used for carrying out exponential smoothing processing on the first key angle to obtain a first smoothing result;
the target detection module is used for inputting the first smoothing result into a target detection model obtained by training in advance to obtain a detection result, wherein the detection result comprises a motion state and an error posture;
and the motion evaluation module is used for generating a motion evaluation result of the object to be evaluated according to the detection result.
9. A computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, which when executed by the processor, implements the object detection and human posture estimation based motion estimation method of any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, characterized in that the computer program, when executed, implements the motion estimation method based on object detection and human body posture estimation according to any one of claims 1 to 7.
CN202310474749.9A 2023-04-27 2023-04-27 Motion evaluation method and device based on target detection and human body posture estimation Pending CN116580454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310474749.9A CN116580454A (en) 2023-04-27 2023-04-27 Motion evaluation method and device based on target detection and human body posture estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310474749.9A CN116580454A (en) 2023-04-27 2023-04-27 Motion evaluation method and device based on target detection and human body posture estimation

Publications (1)

Publication Number Publication Date
CN116580454A true CN116580454A (en) 2023-08-11

Family

ID=87536883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310474749.9A Pending CN116580454A (en) 2023-04-27 2023-04-27 Motion evaluation method and device based on target detection and human body posture estimation

Country Status (1)

Country Link
CN (1) CN116580454A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117216313A (en) * 2023-09-13 2023-12-12 中关村科学城城市大脑股份有限公司 Attitude evaluation audio output method, attitude evaluation audio output device, electronic equipment and readable medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117216313A (en) * 2023-09-13 2023-12-12 中关村科学城城市大脑股份有限公司 Attitude evaluation audio output method, attitude evaluation audio output device, electronic equipment and readable medium

Similar Documents

Publication Publication Date Title
US20190220657A1 (en) Motion recognition device and motion recognition method
CN110458061B (en) Method for identifying old people falling down and accompanying robot
CN109948590B (en) Attitude problem detection method and device
CN108205654B (en) Action detection method and device based on video
CN109376631B (en) Loop detection method and device based on neural network
US11417095B2 (en) Image recognition method and apparatus, electronic device, and readable storage medium using an update on body extraction parameter and alignment parameter
CN111597975B (en) Personnel action detection method and device and electronic equipment
US11074713B2 (en) Recognition device, recognition system, recognition method, and non-transitory computer readable recording medium
CN110688929A (en) Human skeleton joint point positioning method and device
CN110765946B (en) Running posture assessment method, device, equipment and storage medium
CN116580454A (en) Motion evaluation method and device based on target detection and human body posture estimation
CN111932568A (en) Human body image segmentation method, and training method and device of human body image segmentation model
CN115738219A (en) Pull-up evaluation method and device, electronic equipment and storage medium
CN116188695A (en) Construction method of three-dimensional hand gesture model and three-dimensional hand gesture estimation method
CN114343618A (en) Training motion detection method and device
CN111353347B (en) Action recognition error correction method, electronic device, and storage medium
CN111353345B (en) Method, apparatus, system, electronic device, and storage medium for providing training feedback
CN115346640B (en) Intelligent monitoring method and system for closed-loop feedback of functional rehabilitation training
CN116453222A (en) Target object posture determining method, training device and storage medium
CN116343007A (en) Target detection method, device, equipment and storage medium
CN115019399A (en) Human body posture detection method
CN112257642B (en) Human body continuous motion similarity evaluation method and evaluation device
CN113963202A (en) Skeleton point action recognition method and device, electronic equipment and storage medium
CN113392743A (en) Abnormal action detection method, abnormal action detection device, electronic equipment and computer storage medium
CN111260692A (en) Face tracking method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination