CN116740618A - Motion video action evaluation method, system, computer equipment and medium - Google Patents

Motion video action evaluation method, system, computer equipment and medium Download PDF

Info

Publication number
CN116740618A
CN116740618A CN202310987808.2A CN202310987808A CN116740618A CN 116740618 A CN116740618 A CN 116740618A CN 202310987808 A CN202310987808 A CN 202310987808A CN 116740618 A CN116740618 A CN 116740618A
Authority
CN
China
Prior art keywords
motion
human body
key points
detected
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310987808.2A
Other languages
Chinese (zh)
Inventor
王青
雷毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Poly Beijing Technology Co ltd
Original Assignee
Digital Poly Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Poly Beijing Technology Co ltd filed Critical Digital Poly Beijing Technology Co ltd
Priority to CN202310987808.2A priority Critical patent/CN116740618A/en
Publication of CN116740618A publication Critical patent/CN116740618A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a motion video action evaluation method, a motion video action evaluation system, computer equipment and a motion video action evaluation medium, and relates to the technical field of image processing, wherein the motion video action evaluation method comprises the following steps: the method comprises the steps of obtaining an action video to be detected, carrying out space alignment and time alignment on the action video to be detected and a standard action video to be detected, obtaining transformation coordinates of key points of a human body and standard action images corresponding to each action image to be detected, calculating the similarity of the key points, the time sequence complexity of the key point coordinates and the standard deviation of the key point coordinates according to the transformation coordinates and the visibility of the key points of the human body in the action images to be detected and the transformation coordinates of the key points of the human body in the standard action images corresponding to the action images to be detected, and carrying out action evaluation on the action video to be detected according to the similarity of the key points, the time sequence complexity of the key point coordinates and the standard deviation of the key point coordinates.

Description

Motion video action evaluation method, system, computer equipment and medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a motion video motion evaluation method, a motion video motion evaluation system, a computer device, and a medium.
Background
Whether the motion is standard is particularly important for athletes and fitness personnel, so that the motion needs to be evaluated, at present, most of the motion is evaluated by an evaluator, the evaluator needs to receive training of a professional system and accumulate enough experience to effectively carry out evaluation work, but the evaluation process of the evaluator is influenced by personal experience and subjective knowledge of the evaluator, so that certain uncertainty and error exist. Based on this, there is a need for a sports video motion estimation method capable of objectively estimating motion.
Disclosure of Invention
The invention aims to provide a motion video action evaluation method, a motion video action evaluation system, computer equipment and a medium, which can improve the objectivity and the accuracy of evaluation.
In order to achieve the above object, the present invention provides the following solutions:
a method of motion video action assessment, the method comprising:
s1: acquiring a motion video to be detected of a user; the motion video to be detected comprises a plurality of frames of motion images to be detected;
s2: identifying pixel coordinates and visibility of key points of the human body in each motion image to be detected; carrying out coordinate transformation on pixel coordinates of human body key points in each to-be-measured action image and pixel coordinates of human body key points in each standard action image of a standard action video so as to enable the to-be-measured action image and the standard action image to be in spatial position alignment, and obtaining transformation coordinates of the human body key points in each to-be-measured action image and transformation coordinates of the human body key points in each standard action image; time alignment is carried out on the motion video to be detected and the standard motion video to obtain standard motion images corresponding to each motion image to be detected;
S3: calculating the similarity of the key points according to the transformation coordinates and the visibility of the key points of the human body in the to-be-detected action image and the transformation coordinates of the key points of the human body in the standard action image corresponding to the to-be-detected action image;
s4: calculating the standard deviation of key point coordinates and the complexity of the time sequence of the key point coordinates according to the transformation coordinates of the human key points in all the motion images to be detected;
s5: and determining an action evaluation result of the action video to be tested according to the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points.
Optionally, the identifying the pixel coordinates and the visibility of the human body key points in each motion image to be detected specifically includes:
for each motion image to be detected, recognizing pixel coordinates and visibility of key points of a human body in the motion image to be detected by adopting a trained human body posture estimation model; the trained human body posture estimation model is obtained by taking an image of a to-be-detected action sample as input and taking pixel coordinates and visibility of human body key points of the image of the to-be-detected action sample as labels.
Optionally, the transforming the pixel coordinates of the human body key point in each to-be-measured motion image and the pixel coordinates of the human body key point in each standard motion image of the standard motion video to align the spatial positions of the to-be-measured motion image and the standard motion image, so as to obtain transformed coordinates of the human body key point in each to-be-measured motion image and transformed coordinates of the human body key point in each standard motion image, specifically includes:
Establishing a first coordinate system by taking a first hip center point as a coordinate origin, and carrying out coordinate transformation on pixel coordinates of human body key points in each motion image to be tested to obtain transformed coordinates of the human body key points in each motion image to be tested under the first coordinate system; the first hip center point is determined by hip center points in all the motion images to be detected;
establishing a second coordinate system by taking a second hip center point as a coordinate origin, and carrying out coordinate transformation on pixel coordinates of human body key points in each standard action image of the standard action video to obtain transformed coordinates of the human body key points in each standard action image under the second coordinate system; the second hip center point is determined by the hip center points in all the standard motion images.
Optionally, the performing time alignment on the motion video to be detected and the standard motion video specifically includes:
and performing time alignment on the motion video to be detected and the standard motion video by adopting a DTW dynamic time warping algorithm.
Optionally, step S3 specifically includes:
obtaining the similarity of each motion image to be detected and the standard motion image corresponding to the motion image to be detected according to the transformation coordinates and the visibility of the human body key points in the motion image to be detected and the transformation coordinates of the human body key points in the standard motion image corresponding to the motion image to be detected;
And calculating the similarity of the key points according to the similarity of all the motion images to be detected and the standard motion images corresponding to the motion images to be detected.
Optionally, step S4 specifically includes:
selecting m human body key points from all human body key points, and marking the m human body key points as standard deviation human body key points; for each standard deviation human body key point, calculating the standard deviation of the standard deviation human body key points according to the transformation coordinates of the standard deviation human body key points in all the motion images to be detected; calculating the standard deviation of key point coordinates according to the standard deviation of all the standard deviation human body key points;
selecting J personal key points from all the human key points, and marking the J personal key points as complexity human key points; for each complexity human body key point in each motion image to be detected, calculating the complexity of the complexity human body key point in the motion image to be detected according to the transformation coordinates of the complexity human body key point in the motion image to be detected and the transformation coordinates of the complexity human body key point in a previous frame of motion image to be detected, and calculating the time sequence complexity of the complexity human body key point according to the complexity of the complexity human body key point in all the motion images to be detected; and calculating the time sequence complexity of the key point coordinates according to the time sequence complexity of all the complexity human key points.
Optionally, step S5 specifically includes:
and carrying out weighted summation on the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points, and determining an action evaluation result of the action video to be tested.
The invention also provides a motion video action evaluation system, which comprises:
the to-be-detected action video acquisition module is used for acquiring to-be-detected action videos of a user; the motion video to be detected comprises a plurality of frames of motion images to be detected;
the alignment module is used for identifying the pixel coordinates and the visibility of the key points of the human body in each motion image to be detected; carrying out coordinate transformation on pixel coordinates of human body key points in each to-be-measured action image and pixel coordinates of human body key points in each standard action image of a standard action video so as to enable the to-be-measured action image and the standard action image to be in spatial position alignment, and obtaining transformation coordinates of the human body key points in each to-be-measured action image and transformation coordinates of the human body key points in each standard action image; time alignment is carried out on the motion video to be detected and the standard motion video to obtain standard motion images corresponding to each motion image to be detected;
The key point similarity calculation module is used for calculating the key point similarity according to the transformation coordinates and the visibility of the human key points in the to-be-detected action image and the transformation coordinates of the human key points in the standard action image corresponding to the to-be-detected action image;
the standard deviation and complexity calculation module is used for calculating the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points according to the transformation coordinates of the key points of the human body in all the motion images to be detected;
and the action evaluation module is used for determining an action evaluation result of the action video to be tested according to the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points.
The invention also provides a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above described motion video action assessment method.
The present invention also provides a computer readable storage medium storing a computer program adapted to be loaded by a processor and to perform the above-described motion video action evaluation method.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention provides a motion video action evaluation method, a motion video action evaluation system, computer equipment and a medium, wherein the method comprises the following steps: firstly, acquiring a motion video to be detected of a user; the motion video to be detected comprises a plurality of frames of motion images to be detected; identifying pixel coordinates and visibility of key points of the human body in each to-be-detected action image; carrying out coordinate transformation on pixel coordinates of the human body key points in each to-be-measured action image and pixel coordinates of the human body key points in each standard action image of the standard action video so as to enable the to-be-measured action image and the standard action image to be in spatial position alignment, and obtaining transformation coordinates of the human body key points in each to-be-measured action image and transformation coordinates of the human body key points in each standard action image; time alignment is carried out on the motion video to be detected and the standard motion video to obtain a standard motion image corresponding to each motion image to be detected; calculating the similarity of the key points according to the transformation coordinates and the visibility of the key points of the human body in the motion image to be detected and the transformation coordinates of the key points of the human body in the standard motion image corresponding to the motion image to be detected; calculating the standard deviation of key point coordinates and the complexity of the time sequence of the key point coordinates according to the transformation coordinates of the key points of the human body in all the motion images to be detected; and determining an action evaluation result of the action video to be tested according to the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points. According to the method, the similarity of the key points, the complexity of the time sequence of the key point coordinates and the standard deviation of the key point coordinates are calculated through the transformation coordinates of the key points of the human body, and the motion evaluation is carried out on the motion video to be tested through the similarity of the key points, the complexity of the time sequence of the key point coordinates and the standard deviation of the key point coordinates.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a motion video motion estimation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an implementation process of a motion video motion estimation method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a human body placement of key points of a human body according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a human body posture estimation model reasoning flow based on BlazePose provided by the embodiment of the invention;
fig. 5 is an output schematic diagram of a face detector according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to the present invention.
Symbol description:
0-nose; 1-right eye interior; 2-right eye; 3-right eye outer; 4-left eye interior; 5-left eye; 6-left eye outside; 7-right ear; 8-left ear; 9-right mouth corner; 10-left mouth corner; 11-right shoulder; 12-left shoulder; 13-right elbow; 14-left elbow; 15-right wrist; 16-left wrist; 17-right little finger joint; 18-left little finger joint; 19-right index finger joint; 20-left index finger joint; 21-right thumb joints; 22-left thumb joints; 23-right buttocks; 24-left buttocks; 25-right knee; 26-left knee; 27-right ankle; 28-left ankle; 29-right heel; 30-left heel; 31-the big toe of the right foot; 32-left big toe; 1000-a computer device; 1001-a processor; 1002-a communication bus; 1003-user interface; 1004-a network interface; 1005-memory.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The traditional motion capture modeling method generally comprises the following steps:
1) Capturing data acquisition: the sensor system is used for collecting human body motion and generally comprises an Inertial Measurement Unit (IMU), an electromagnetic tracker and other devices, and can capture information such as the position, the posture and the motion state of the human body.
2) Data preprocessing: preprocessing the acquired data, including denoising, interpolation, filtering and other operations, so as to improve the accuracy and reliability of the data.
3) Motion analysis: and extracting the motion characteristics such as key points, joint angles, joint angular velocity and the like by analyzing and processing the preprocessed data.
4) Establishing a motion model: based on the extracted motion characteristics, a motion model is established, and the motion model generally comprises indexes such as joint angles, joint angular velocities and the like, and can be a linear or nonlinear model.
5) Motion reconstruction: and applying the acquired data to a motion model to perform motion reconstruction to obtain a corresponding human motion state.
Key indexes of the motion, such as joint angle, motion speed, acceleration and the like, are extracted through decomposing the motion video motion, and the quality of the motion is evaluated through analyzing the indexes.
In the prior art, professional hardware equipment is required to be used for data acquisition and processing, such as equipment of a sensor, a high-speed camera and the like, and the equipment has high cost, is greatly influenced by factors such as acquisition environment, sensor precision and the like, and is not beneficial to large-scale popularization. In addition, the prior art can only perform offline analysis and processing on actions, and cannot evaluate and feed back actions in real time, which is also problematic in some application scenarios.
Based on the defects of the prior art, the invention provides a motion video motion evaluation method, a motion video motion evaluation system, a motion video motion evaluation computer device and a motion video medium, which can be applied to scenes needing motion quality evaluation, such as athlete motion evaluation, body building evaluation and other complex motions, wherein the similarity of key points, the complexity of a time sequence of the coordinates of the key points and the standard deviation of the coordinates of the key points are calculated through transformation coordinates of key points of a human body, and the motion evaluation is performed on a motion video to be tested through the similarity of the key points, the complexity of the coordinates of the time sequence of the coordinates of the key points and the standard deviation of the coordinates of the key points. The invention does not need special equipment (a sensor and a high-speed camera) and environment when collecting the motion video to be detected, and has greater flexibility in the aspects of actual application scenes and expandability; the evaluation method based on the key point action characteristics is adopted, complex data processing is not needed, and the method is simpler, more convenient and more efficient; the action can be evaluated and fed back in real time, the training aid has the effect of assisting training, and the skill level of athletes and the effect of rehabilitation therapy can be improved.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1 and 2, the present invention provides a motion video motion estimation method, which includes:
s1: acquiring a motion video to be detected of a user; the motion video to be detected comprises a plurality of frames of motion images to be detected.
S2: identifying pixel coordinates and visibility of key points of the human body in each motion image to be detected; carrying out coordinate transformation on pixel coordinates of human body key points in each to-be-measured action image and pixel coordinates of human body key points in each standard action image of a standard action video so as to enable the to-be-measured action image and the standard action image to be in spatial position alignment, and obtaining transformation coordinates of the human body key points in each to-be-measured action image and transformation coordinates of the human body key points in each standard action image; and performing time alignment on the motion video to be detected and the standard motion video to obtain standard motion images corresponding to each motion image to be detected.
S3: and calculating the similarity of the key points according to the transformation coordinates and the visibility of the key points of the human body in the motion image to be detected and the transformation coordinates of the key points of the human body in the standard motion image corresponding to the motion image to be detected.
S4: and calculating the standard deviation of the key point coordinates and the complexity of the key point coordinate time sequence according to the transformation coordinates of the human key points in all the motion images to be detected.
S5: and determining an action evaluation result of the action video to be tested according to the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points.
Firstly, in this embodiment, a mobile terminal with a video recording function, such as a mobile phone or a tablet, is used for video acquisition. The collected actions are divided into two types, namely standard action videos and action videos to be detected. Wherein, a certain limit should be set for video shooting, and the limit standard is as follows:
(1) The illumination intensity is ensured to be certain, and the illumination intensity is not obviously changed in the acquisition process. (2) The background of the video shooting environment is not required to have too many complex elements, so that a good capturing and comparing environment is ensured. (3) In the video acquisition process, only the person to be acquired is allowed to enter the video capture range area, so that the interference of irrelevant personnel on the acquired data is avoided. (4) The video acquisition equipment is guaranteed to have basic imaging quality, and required videos can be recorded clearly. (5) In the acquisition process, the shooting angle should be ensured not to be obviously changed, and the shooting equipment should not be obviously dithered.
The acquisition of video data may be performed by different devices, which often cause differences in resolution and frame rate, which is disadvantageous for the subsequent evaluation, and therefore, after the standard motion video and the motion video to be measured are acquired, a standardized conversion is performed, so that the format, resolution and frame rate of the video data are kept consistent. After the data preprocessing is completed. The type of motion video file stored after conversion is MPEG-4, the frame rate is 30 frames/second, the resolution is 340×256, i.e. the frame width is 340, and the frame height is 256.
After the motion video to be detected is processed, identifying the pixel coordinates and the visibility of the key points of the human body in each motion image to be detected in the motion video to be detected specifically comprises the following steps:
for each motion image to be detected, recognizing pixel coordinates and visibility of key points of a human body in the motion image to be detected by adopting a trained human body posture estimation model; the trained human body posture estimation model is obtained by taking an image of a to-be-detected action sample as input and taking pixel coordinates and visibility of human body key points of the image of the to-be-detected action sample as labels.
The above-mentioned trained body posture estimation model may be a body posture estimation model based on BlazePose, and the detector operates according to the principle shown in FIG. 4, where the pipeline includes a lightweight body pose detector (detector) and a pose tracker network (tracker). The tracker is used for predicting the pixel coordinates of key points of a human body, whether a current video frame is occupied or not and fine adjustment of an interesting area of the current frame. When the tracker predicts that the current frame is empty, the detector is re-run in the next frame. The network structure of the BlazePose detector detector consists of two parts, namely a keypoints detection (key point detection) part and a keypoints regression (key point regression) part, wherein the two parts participate in training together during training, but only feature sharing is performed between the two parts, gradient propagation is avoided, and only regression participates in reasoning during testing, so that the reasoning speed of the whole network is greatly improved.
The ML pipeline (i.e., detector and tracker) for pose tracking in a blazePose-based human pose estimation model is described below:
the detector tracker ML pipeline, in a first pose estimation step, uses a fast face detector on the device, as shown in fig. 4, based on the blazePose human pose estimation model to estimate the person region in the first frame of video image to be measured. The tracker then predicts all 33 pose keypoints (human body keypoints, the specific profile of which is shown in fig. 3) from the human body region.
For the real-time performance of a complete ML pipeline including gesture detection and tracking models, each component must be very fast, using only a few milliseconds per frame. To achieve this, the strongest signal observed for the neural network with respect to torso position is the face (due to its high contrast characteristics and relatively small appearance variations). Thus, the present invention uses a fast face detector Blazeface on the device as a proxy for the person detector. The face detector predicts other person-specific alignment parameters, including: the midpoint between the buttocks of a person, the size of the circle around the whole person, and the angle of inclination (the angle between the line formed by the midpoint of the two shoulders and the midpoint of the buttocks and the vertical line).
Inspired by the sub-millisecond face detector BlazeFace model, a face detector was trained as a proxy for the pose detector. This model only detects the position of the person within the frame and cannot be used to identify the person. In contrast to the face mesh and MediaPipe model (hand tracking model) pipeline, which derive the ROI from the predicted keypoints, for human body pose tracking, two additional virtual keypoints are explicitly predicted, which firmly describe human body center, rotation and scaling as one circle. The inspiration comes from the lycanado viterbi Lu Wei person, as shown in fig. 5, the output of the detector includes: a midpoint of a hip, a center point of a shoulder, a boundary box of a head, a circumcircle of the whole person (boundary box of a human body, which is a circle centered on the center point of the hip), and an inclination angle of a line connecting the center point of the shoulder and the midpoint of the hip. This results in consistent tracking even for very complex situations, such as certain yoga style.
The input of the tracking model (tracker) is the output of the detector, including the coordinates of the central pixel of the shoulder and the central pixel of the buttocks in the first frame image, the boundary frame of the head, the boundary frame of the human body, the inclination angle and the motion image to be detected in the subsequent frames, and the output is the coordinates of the key point pixels of the human body and the visibility of each motion image to be detected, and the visibility has the value between 0 and 1. The tracking model comprises two parts, namely key point detection and key point regression. The pose estimation component of the pipeline predicts the positional pixel coordinates of all 33 human keypoints, each with three degrees of freedom (x, y position and visibility).
It is noted that for the video use case, the detector only runs on the first frame. For the subsequent frames, the character region is derived from the pose keypoints of the previous frame. Firstly, inputting a first frame of motion image to be detected into a detector to obtain a shoulder center pixel coordinate, a hip center pixel coordinate, a head boundary frame, a human body boundary frame and an inclination angle in the first frame of motion image to be detected. And then inputting the coordinates of the central pixel of the shoulder and the coordinates of the central pixel of the buttocks in the first frame of motion image to be detected, the boundary frame and the inclination angle of the human body and the first frame of motion image to be detected into a tracker, and predicting and outputting the coordinates and the visibility of the 33 key point pixels of the human body in the first frame of motion image to be detected by the tracker. When the second frame of motion image to be detected is predicted, the shoulder center pixel coordinate, the hip center pixel coordinate, the boundary frame of the human body and the inclination angle in the current video frame are derived from the gesture landmark calculated in the previous frame of motion image to be detected, then the shoulder center pixel coordinate, the hip center pixel coordinate, the boundary frame of the human body and the inclination angle in the second frame of motion image to be detected are input into the tracker, and if the gesture landmark is predicted, 33 human body key point pixel coordinates and the visibility of the second frame of motion image to be detected are output. If the motion image to be detected cannot be predicted, the motion image to be detected in the frame is input to a detector again, and the shoulder center pixel coordinate, the hip center pixel coordinate, the head boundary frame, the boundary frame of the human body and the inclination angle in the second motion image to be detected are obtained. And inputting the human body key point pixel coordinates and the visibility into the tracker again, and predicting all human body key point pixel coordinates and the visibility in the second frame of motion image to be detected. When the third frame of motion image to be detected is predicted, the shoulder center pixel coordinate, the hip center pixel coordinate, the boundary frame of the human body and the inclination angle in the current video frame are derived from the gesture landmark calculated in the previous frame of motion image to be detected, then the shoulder center pixel coordinate, the hip center pixel coordinate, the boundary frame of the human body and the inclination angle in the third frame of motion image to be detected are input into the tracker, and the human body key point pixel coordinate and the visibility in the third frame of motion image to be detected are predicted. And inputting the frame of motion image to be detected to the detector again as long as the key points of the human body cannot be predicted in the subsequent frame of motion image to be detected, so as to obtain new shoulder center pixel coordinates and hip center pixel coordinates of the human body, a head boundary frame, a boundary frame of the human body and an inclination angle.
The embodiment adopts the human body key point coordinate sequence (the pixel coordinates of the human body key point in all the motion images to be tested form a human body key point coordinate sequence) for evaluation, and has the following advantages:
the information content is rich: the human body key point coordinate sequence can provide rich human body movement information including positions, movement tracks, angle changes and the like of various parts of the human body, and the information is helpful for accurately analyzing details and problems in the human body movement process.
The data structure is simple: the human body key point coordinate sequence can be simply expressed as a time sequence, and the data structure is relatively simple and easy to process and analyze. At the same time, this representation can be conveniently aligned and synchronized with other information (e.g., video).
The expandability is strong: the human body key point coordinate sequence can be conveniently expanded, for example, more key points or data acquired by other sensors can be added, so that the human body motion quality can be more comprehensively estimated.
The visual effect is good: by visual presentation of the human body key point coordinate sequence, details and problems in the human body movement process can be intuitively displayed, and the human body movement can be deeply and comprehensively analyzed and evaluated.
Therefore, the human body movement can be estimated by using the human body key point coordinate sequence to obtain more accurate, comprehensive and visual information, so that the problems in the human body movement can be better identified and improved.
The coordinate transformation is performed on the pixel coordinates of the human body key points in each motion image to be tested and the pixel coordinates of the human body key points in each standard motion image of the standard motion video, so that the motion image to be tested and the standard motion image are aligned in space position, and transformation coordinates of the human body key points in each motion image to be tested and transformation coordinates of the human body key points in each standard motion image are obtained, and the method specifically comprises the following steps:
establishing a first coordinate system by taking a first hip center point as a coordinate origin, and carrying out coordinate transformation on pixel coordinates of human body key points in each motion image to be tested to obtain transformed coordinates of the human body key points in each motion image to be tested under the first coordinate system; the first hip center point is determined by hip center points in all the motion images to be detected.
Establishing a second coordinate system by taking a second hip center point as a coordinate origin, and carrying out coordinate transformation on pixel coordinates of human body key points in each standard action image of the standard action video to obtain transformed coordinates of the human body key points in each standard action image under the second coordinate system; the second hip center point is determined by the hip center points in all the standard motion images.
Specifically, if the pose to be measured (motion image to be measured) is not aligned with the standard pose (standard motion image) in terms of spatial position, there is a problem in calculating the euclidean distance, and there may be a case where the poses themselves match but the calculated euclidean distance is still large. The buttock central point pixel coordinate can be calculated under the original coordinate system, and the buttock average central point pixel coordinate (mid) is obtained by averaging a plurality of frames of buttock central point pixel coordinates x ,mid y ) And changing the hip average center point into an original point under a new coordinate system on the space position by a coordinate transformation mode, establishing the new coordinate system, carrying out translation transformation calculation on the new coordinate system, and respectively obtaining transformation coordinates of key points of a human body in each to-be-tested action image and each standard action image under the new coordinate system.
Wherein x' it Is the abscissa, y 'of the transformed coordinates of the key points of the human body after the spatial position alignment' it Is the ordinate, x of the transformed coordinates of the key points of the human body after the spatial position alignment it Is the abscissa of the pixel coordinates of the key points of the human body, y it Is the ordinate of the pixel coordinates of the key points of the human body.
And simultaneously, alignment is carried out on the spatial scale, the transformation coordinates of all human body key points are subjected to scale normalization according to the width and the height of the image, and the alignment of the gesture to be measured and the standard gesture is completed in space through displacement and scale normalization as follows.
Wherein x' it Is the abscissa, y 'of the transformed coordinates of the key points of the human body after the spatial displacement and the scale are aligned' it Is the ordinate of the transformed coordinates of the key points of the human body after the spatial displacement and the scale are aligned. Width is the Width of the motion image to be measured, and height is the height of the motion image to be measured.
After the motion image to be detected and the standard motion image are spatially aligned (spatial position alignment and spatial scale alignment), the motion video to be detected and the standard motion video are also required to be time-dimensionally aligned, which specifically comprises the following steps:
and performing time alignment (time dimension alignment) on the motion video to be detected and the standard motion video by adopting a DTW dynamic time warping algorithm.
Specifically: the DTW dynamic time warping algorithm is to obtain the mapping relation between the unequal length sequences according to a certain distance measurement rule and through a global minimum distance measurement mode, and the similarity matching between the unequal length sequences can be realized under the mapping relation.
Defining a human body key point coordinate sequence of an action video to be detected as an A sequence, and defining a human body key point coordinate sequence of a standard action video as a B sequence, wherein the length of the A sequence is M, and the length of the B sequence is N, as shown in the following formula:
the distance between the standard motion pose (standard motion image) and the test motion pose (motion image to be measured) is measured using the euclidean distance, and each frame has 33 human key points, then the distance measurement can be set as follows, for a certain frame data b in the standard motion pose n And testing a certain frame data a in the motion gesture m The coordinates of the corresponding ith human body key points are respectively as followsAnd->
Then the a m Frame to-be-measured motion image and b n The distance between the frame standard motion images is:
the distance matrix of each frame of motion image to be detected and each frame of standard motion image can be obtained through the above method, and the distance matrix is as follows:
test action fifth frame d(1,5) d(2,5) d(3,5) d(4,5) d(5,5) d(6,5)
Test action fourth frame d(1,4) d(2,4) d(3,4) d(4,4) d(5,4) d(6,4)
Test action third frame d(1,3) d(2,3) d(3,3) d(4,3) d(5,3) d(6,3)
Test action second frame d(1,2) d(2,2) d(3,2) d(4,2) d(5,2) d(6,2)
Test action first frame d(1,1) d(2,1) d(3,1) d(4,1) d(5,1) d(6,1)
Standard action first frame Standard action second frame Standard action third frame Standard action fourth frame Standard action fifth frame Standard action sixth frame
The main purpose of establishing a distance measurement rule is to find the minimum distance mapping W of two unequal length sequences A and B through a DTW algorithm, wherein the mapping is as follows:
Wherein the kth map isWherein->,/>This path is not arbitrarily chosen and needs to meet several constraints:
the start point and the end point must be aligned, that is, the following formula must be satisfied:
under continuous conditions, the situation that matching can not be carried out across any point exists between the mappings;
monotonic conditions, after which the matching mapping sequence number cannot be smaller than the already matching mapping sequence number.
According to the above limitation, the minimum distance mapping pair can be obtained by dynamic programming, and the cumulative consumption matrix can be defined asThe core calculation mode is as follows:
taking six frames of standard action images and five frames of action images to be detected as examples, the time dimension alignment process is introduced:
standard motion the shortest distances from the first frame to the first frame of motion to be measured, the second frame of motion to be measured, the third frame of motion to be measured, the fourth frame of motion to be measured and the fifth frame of motion to be measured are d (1, 1), d (1, 2), d (1, 3), d (1, 4) and d (1, 5), respectively, so、/>、/>、/>、/>
The shortest distances from the first frame to-be-measured action to the first frame standard action, the second frame standard action, the third frame standard action, the fourth frame standard action and the fifth frame standard action are d (1, 1), d (2, 1), d (3, 1), d (4, 1) and d (5, 1) respectively, so 、/>、/>、/>、/>
First, ask for(2,2),/>The value of (2, 2) is equal to the smallest value in its left triangle (++>(1,2),/>(1,1),/>(2, 1)) plus its own distance d (2, 2). The values of the remaining voids are all calculated as such. Finally, get->The matrix is as follows:
assuming that the shortest path should be from the upper right corner(6, 5) starting to find the minimum value of its left triangle (++>(5,5)、/>(5,4)、/>(6,4)). Let->(5, 4) minimum, next find +.>(5, 4) left triangle minimum value (/ -)>(4,4)、/>(4,3)、/>(5, 3)), and so on, find +.>(1,1). Assume that the last shortest path is +.>(1,1)、/>(2,2)、/>(3,2)、/>(4,3)、/>(5,4)、/>(6, 5), the motion image to be measured corresponds to the standard motion image as follows:
first frame standard motion image Second frame standard motion image Third frame standard action image Fourth frame standard action image Fifth frame standard action image Sixth frame standard action image
First frame of motion image to be measured Second frame of motion image to be measured Second frame of motion image to be measured Third frame of motion image to be measured Fourth frame of motion image to be measured Fifth frame of motion image to be measured
The key indexes to be concerned in the action evaluation are stability, periodicity, balance and the like of the action, and whether the three main indexes are related to the action with higher standard or not. Therefore, a time feature sequence extraction method (similarity of key points), a standard deviation of key point coordinates and complexity of the key point coordinate time sequence are introduced. The time feature sequence extraction method is used for observing the period of the motion, the standard deviation of the key point coordinates is used for observing the stability of the motion, and the complexity of the key point coordinate time sequence is used for observing the balance of the motion.
After finding the standard motion image corresponding to each frame of motion image to be detected, the following steps are further performed:
and obtaining the similarity of each motion image to be detected and the standard motion image corresponding to the motion image to be detected according to the transformation coordinates and the visibility of the human body key points in the motion image to be detected and the transformation coordinates of the human body key points in the standard motion image corresponding to the motion image to be detected.
And calculating the similarity of the key points according to the similarity of all the motion images to be detected and the standard motion images corresponding to the motion images to be detected.
Specifically, the action evaluation task is to obtain not only the similarity of the key points of the key point coordinate sequence of the action to be detected and the key point coordinate sequence of the standard action, but also the action parts focused on for different action postures. Some actions have differences in the importance degree of lower limb stability and upper limb stability, and the attention degree of lower limb stability is higher than that of upper limb stability, so that adjustment weight factors are required to be introduced to different key points. In summary, the invention introduces a key point penalty factor in the similarity OKS calculationThe greater the value, the greater the degree of interest in the key points of the human body, the value being determined by subsequent evaluation And (5) determining. Calculating the similarity of each motion image to be detected and the standard motion image corresponding to the motion image to be detected through the following steps:
wherein OKS n For the similarity of the n-th frame of motion image to be measured and the standard motion image corresponding to the n-th frame of motion image to be measured,visibility (visibility) indicating the ith human critical point, of depictingthe key point of the ith human body>Middle (/ -)>As an impulse function whenAnd 1 is taken. />For visibility, the tracker outputs the visibility of each key point, with a value range of 0-1). s represents the square root of the size of the area occupied by the person (human body bounding box). />The normalization factor representing the ith bone point (human body key point) reflects the influence degree of the current bone point on the whole, and the larger the value is, the worse the labeling effect of the human body key point in the whole data set is; the smaller the value, the better the labeling effect of the whole data set on the key points of the human body is. />And expressing Euclidean distance between the ith human body key point in the motion image to be tested and the transformation coordinates of the standard motion image. s, S->For the values given in advance.
And then calculating the similarity of the key points through the formula, wherein the similarity of the key points is calculated by calculating the average value of the similarity in the continuous total frame number n according to the following formula, wherein the similarity of the key points of each frame of the test action video and the key points of the standard frame is calculated according to the following formula. In this embodiment, after obtaining the similarity of the keypoints, the final result is also normalized if And if the final score is larger than 0.7, the final score of the similarity of the key points is 1, otherwise, the final score of the similarity of the key points is 0, and the action evaluation result is calculated by adopting the final score of the similarity of the standardized key points.
Wherein, the liquid crystal display device comprises a liquid crystal display device,for the similarity of key points, ++>The total frame number of the motion image to be measured.
The step S4 specifically comprises the following steps:
selecting m human body key points from all human body key points, and marking the m human body key points as standard deviation human body key points; for each standard deviation human body key point, calculating the standard deviation of the standard deviation human body key points according to the transformation coordinates of the standard deviation human body key points in all the motion images to be detected; and calculating the standard deviation of the key point coordinates according to the standard deviations of all the standard deviation human body key points. The motion requires torso stabilization, so in this embodiment, the 4 human keypoints 11, 12, 23, 24 shown in fig. 3 are used for the m standard deviation human keypoints for the standard deviation of the keypoint coordinates.
Selecting J personal key points from all the human key points, and marking the J personal key points as complexity human key points; for each complexity human body key point in each motion image to be detected, calculating the complexity of the complexity human body key point in the motion image to be detected according to the transformation coordinates of the complexity human body key point in the motion image to be detected and the transformation coordinates of the complexity human body key point in a previous frame of motion image to be detected, and calculating the time sequence complexity of the complexity human body key point according to the complexity of the complexity human body key point in all the motion images to be detected; and calculating the time sequence complexity of the key point coordinates according to the time sequence complexity of all the complexity human key points. The complexity of the time series of the key point coordinates mainly looks at the degree of change of the hand motion, so in this embodiment, the human key points of the complexity can be selected from the key points (15, 16, 17, 18, 19, 20, 21, 22) of the hand shown in fig. 3, and in this embodiment, two key points are selected for the left and right hands, i.e., j=4.
Specifically, according to the transformation coordinates of the human body key points with a certain standard deviation in all the motion images to be detected, calculating the standard deviation of the human body key points with the standard deviation by the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the standard deviation of a standard deviation human body key point, n is the total frame number, and +.>Is->Certain key point in frame video image to be measured>Coordinates of->Human critical point of standard deviation in video image to be measured for continuous n frames>Average of coordinates.
And then, calculating the standard deviation of the key point coordinates according to the standard deviation of the m standard deviation human body key points by the above formula and the following formula, wherein the standard deviation of the key point coordinates can be used for describing the stability of the action. And normalizing the final result, setting a certain threshold for the total standard deviation of the key points, ifAnd if the final score is smaller than the threshold value, the final score of the key point coordinate standard deviation is 1, otherwise, the final score of the key point coordinate standard deviation is 0, and the action evaluation result is calculated by adopting the final score of the key point coordinate standard deviation.
Wherein, the liquid crystal display device comprises a liquid crystal display device,the standard deviation of the coordinates of the key points is that m is the number of points of the key points of the human body with standard deviation, and +.>Is the standard deviation of the jth keypoint.
The time sequence complexity of the key point coordinates is calculated by calculating the time sequence complexity of the key point coordinates according to the following formula, and the change degree of the action is described by observing the time sequence complexity of the key point coordinates. After obtaining the complexity of the time series of the coordinates of the key points, normalizing the complexity of the time series of the coordinates of the key points, setting a certain threshold value for the sum of the complexity of the time series of the coordinates of the key points, if Score T And if the final score is smaller than the threshold value, the final score of the time sequence complexity of the key point coordinates is 1, otherwise, the final score of the time sequence complexity of the key point coordinates is 0, and the action evaluation result is calculated by adopting the final score of the time sequence complexity of the key point coordinates.
Wherein T is j The time sequence complexity, x, of the jth complexity key point i Human body key for jth complexity in ith frame of motion image to be detectedThe abscissa, x, of the point transformed coordinates i-1 And transforming the abscissa in the coordinate for the jth complex human body key point in the i-1 th frame of motion image to be detected, wherein n is the total frame number of the motion image to be detected.
Wherein Score T And J is the number of the complexity key points for the complexity key point coordinate time sequence complexity.
After the three parameters are calculated, the embodiment further includes:
and carrying out weighted summation on the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points, and determining an action evaluation result of the action video to be tested.
According to the instruction of the clinician, the action condition of each child needs to be comprehensively judged, so that different weights are used for weighting and summing each action, and when the weighted sum of the scores is greater than 0.6, the action is normal. The composite score (action evaluation result) is calculated by the following formula:
Wherein S is the action evaluation result of the action video to be tested, C is the number of parameters used for evaluation, the parameters used comprise the final score of the similarity of the key points, the final score of the standard deviation of the coordinates of the key points and the final score of the time sequence complexity of the coordinates of the key points, in the embodiment, C=3,weight of the c-th score, S c Is the i-th score.
In the embodiment, the weight of the score is determined by the experience of comprehensive diagnosis of a doctor, the similarity score of the key points reflects the standard degree of the child action, and the weight is the largest; the standard deviation score of the key point coordinates and the complexity score of the time sequence of the key point coordinates respectively reflect the stability and the change degree of the child action, and the weight values are set to be the same. After doctor opinion is integrated, weights of the key point similarity score, the key point coordinate standard deviation score and the key point coordinate time sequence complexity score are respectively set to be 0.4, 0.3 and 0.3.
The invention is suitable for evaluating the action quality of athlete actions, body-building actions and the like, and can bring the following benefits when applied to action scoring:
1) By using the objective scoring method, subjectivity and uncertainty of artificial judgment can be reduced, and accuracy and reliability of evaluation can be improved.
2) The method for acquiring the video data does not need a wearable sensor, but directly uses mobile equipment with video recording function such as a mobile phone and the like to acquire the video data, is convenient and quick, has low cost, and is easy to popularize and implement.
3) The method can evaluate the action quality in real time or near real time, and is suitable for actual application scenes such as on-site competition, training and the like.
4) By comprehensively scoring in combination with stability, coordination and periodicity, the action evaluation is performed by using the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points, so that a more comprehensive and accurate evaluation effect is achieved.
The invention also provides a motion video action evaluation system, which comprises:
the to-be-detected action video acquisition module is used for acquiring to-be-detected action videos of a user; the motion video to be detected comprises a plurality of frames of motion images to be detected.
The alignment module is used for identifying the pixel coordinates and the visibility of the key points of the human body in each motion image to be detected; carrying out coordinate transformation on pixel coordinates of human body key points in each to-be-measured action image and pixel coordinates of human body key points in each standard action image of a standard action video so as to enable the to-be-measured action image and the standard action image to be in spatial position alignment, and obtaining transformation coordinates of the human body key points in each to-be-measured action image and transformation coordinates of the human body key points in each standard action image; and performing time alignment on the motion video to be detected and the standard motion video to obtain standard motion images corresponding to each motion image to be detected.
And the key point similarity calculation module is used for calculating the key point similarity according to the transformation coordinates and the visibility of the human key points in the motion image to be detected and the transformation coordinates of the human key points in the standard motion image corresponding to the motion image to be detected.
And the standard deviation and complexity calculation module is used for calculating the standard deviation of the coordinates of the key points and the time sequence complexity of the coordinates of the key points according to the transformation coordinates of the key points of the human body in all the motion images to be detected.
And the action evaluation module is used for determining an action evaluation result of the action video to be tested according to the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points.
The application also provides a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above described motion video action assessment method.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to the present application. As shown in fig. 6, the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, in addition, computer device 1000 may further comprise: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 6, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 1005, which is one type of computer storage medium.
In the computer device 1000 shown in FIG. 6, the network interface 1004 may provide network communication functions; while user interface 1003 is primarily used as an interface for providing input to a user; the processor 1001 may be configured to invoke the device control application stored in the memory 1005 to implement the motion video action evaluation method according to the above embodiment, which will not be described herein.
The present invention also provides a computer readable storage medium storing a computer program adapted to be loaded by a processor and to execute the motion video motion estimation method according to the above embodiment, which will not be described in detail herein.
The above-described program may be deployed to be executed on one computer device or on multiple computer devices that are deployed at one site or on multiple computer devices that are distributed across multiple sites and interconnected by a communication network, and the multiple computer devices that are distributed across multiple sites and interconnected by a communication network may constitute a blockchain network.
The computer readable storage medium may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart card (SMC), a Secure Digital (SD) card, a flash memory card (flashcard), etc. which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (10)

1. A method of motion video action assessment, the method comprising:
s1: acquiring a motion video to be detected of a user; the motion video to be detected comprises a plurality of frames of motion images to be detected;
s2: identifying pixel coordinates and visibility of key points of the human body in each motion image to be detected; carrying out coordinate transformation on pixel coordinates of human body key points in each to-be-measured action image and pixel coordinates of human body key points in each standard action image of a standard action video so as to enable the to-be-measured action image and the standard action image to be in spatial position alignment, and obtaining transformation coordinates of the human body key points in each to-be-measured action image and transformation coordinates of the human body key points in each standard action image; time alignment is carried out on the motion video to be detected and the standard motion video to obtain standard motion images corresponding to each motion image to be detected;
S3: calculating the similarity of the key points according to the transformation coordinates and the visibility of the key points of the human body in the to-be-detected action image and the transformation coordinates of the key points of the human body in the standard action image corresponding to the to-be-detected action image;
s4: calculating the standard deviation of key point coordinates and the complexity of the time sequence of the key point coordinates according to the transformation coordinates of the human key points in all the motion images to be detected;
s5: and determining an action evaluation result of the action video to be tested according to the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points.
2. The motion video motion estimation method according to claim 1, wherein the identifying the pixel coordinates and the visibility of the human body key point in each motion image to be detected specifically includes:
for each motion image to be detected, recognizing pixel coordinates and visibility of key points of a human body in the motion image to be detected by adopting a trained human body posture estimation model; the trained human body posture estimation model is obtained by taking an image of a to-be-detected action sample as input and taking pixel coordinates and visibility of human body key points of the image of the to-be-detected action sample as labels.
3. The motion video motion estimation method according to claim 1, wherein the transforming the pixel coordinates of the human body key point in each of the motion images to be detected and the pixel coordinates of the human body key point in each of the standard motion images of the standard motion video to align the spatial positions of the motion images to be detected and the standard motion images to obtain transformed coordinates of the human body key point in each of the motion images to be detected and transformed coordinates of the human body key point in each of the standard motion images, specifically includes:
establishing a first coordinate system by taking a first hip center point as a coordinate origin, and carrying out coordinate transformation on pixel coordinates of human body key points in each motion image to be tested to obtain transformed coordinates of the human body key points in each motion image to be tested under the first coordinate system; the first hip center point is determined by hip center points in all the motion images to be detected;
establishing a second coordinate system by taking a second hip center point as a coordinate origin, and carrying out coordinate transformation on pixel coordinates of human body key points in each standard action image of the standard action video to obtain transformed coordinates of the human body key points in each standard action image under the second coordinate system; the second hip center point is determined by the hip center points in all the standard motion images.
4. The motion video motion estimation method according to claim 1, wherein the time alignment of the motion video to be measured and the standard motion video specifically includes:
and performing time alignment on the motion video to be detected and the standard motion video by adopting a DTW dynamic time warping algorithm.
5. The motion video motion estimation method according to claim 1, wherein step S3 specifically comprises:
obtaining the similarity of each motion image to be detected and the standard motion image corresponding to the motion image to be detected according to the transformation coordinates and the visibility of the human body key points in the motion image to be detected and the transformation coordinates of the human body key points in the standard motion image corresponding to the motion image to be detected;
and calculating the similarity of the key points according to the similarity of all the motion images to be detected and the standard motion images corresponding to the motion images to be detected.
6. The motion video motion estimation method according to claim 1, wherein step S4 specifically comprises:
selecting m human body key points from all human body key points, and marking the m human body key points as standard deviation human body key points; for each standard deviation human body key point, calculating the standard deviation of the standard deviation human body key points according to the transformation coordinates of the standard deviation human body key points in all the motion images to be detected; calculating the standard deviation of key point coordinates according to the standard deviation of all the standard deviation human body key points;
Selecting J personal key points from all the human key points, and marking the J personal key points as complexity human key points; for each complexity human body key point in each motion image to be detected, calculating the complexity of the complexity human body key point in the motion image to be detected according to the transformation coordinates of the complexity human body key point in the motion image to be detected and the transformation coordinates of the complexity human body key point in a previous frame of motion image to be detected, and calculating the time sequence complexity of the complexity human body key point according to the complexity of the complexity human body key point in all the motion images to be detected; and calculating the time sequence complexity of the key point coordinates according to the time sequence complexity of all the complexity human key points.
7. The motion video motion estimation method according to claim 1, wherein step S5 specifically comprises:
and carrying out weighted summation on the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points, and determining an action evaluation result of the action video to be tested.
8. A motion video motion assessment system, the system comprising:
The to-be-detected action video acquisition module is used for acquiring to-be-detected action videos of a user; the motion video to be detected comprises a plurality of frames of motion images to be detected;
the alignment module is used for identifying the pixel coordinates and the visibility of the key points of the human body in each motion image to be detected; carrying out coordinate transformation on pixel coordinates of human body key points in each to-be-measured action image and pixel coordinates of human body key points in each standard action image of a standard action video so as to enable the to-be-measured action image and the standard action image to be in spatial position alignment, and obtaining transformation coordinates of the human body key points in each to-be-measured action image and transformation coordinates of the human body key points in each standard action image; time alignment is carried out on the motion video to be detected and the standard motion video to obtain standard motion images corresponding to each motion image to be detected;
the key point similarity calculation module is used for calculating the key point similarity according to the transformation coordinates and the visibility of the human key points in the to-be-detected action image and the transformation coordinates of the human key points in the standard action image corresponding to the to-be-detected action image;
the standard deviation and complexity calculation module is used for calculating the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points according to the transformation coordinates of the key points of the human body in all the motion images to be detected;
And the action evaluation module is used for determining an action evaluation result of the action video to be tested according to the similarity of the key points, the standard deviation of the coordinates of the key points and the complexity of the time sequence of the coordinates of the key points.
9. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-7.
CN202310987808.2A 2023-08-07 2023-08-07 Motion video action evaluation method, system, computer equipment and medium Pending CN116740618A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310987808.2A CN116740618A (en) 2023-08-07 2023-08-07 Motion video action evaluation method, system, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310987808.2A CN116740618A (en) 2023-08-07 2023-08-07 Motion video action evaluation method, system, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN116740618A true CN116740618A (en) 2023-09-12

Family

ID=87908314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310987808.2A Pending CN116740618A (en) 2023-08-07 2023-08-07 Motion video action evaluation method, system, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN116740618A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117352161A (en) * 2023-10-11 2024-01-05 凝动万生医疗科技(武汉)有限公司 Quantitative evaluation method and system for facial movement dysfunction

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117352161A (en) * 2023-10-11 2024-01-05 凝动万生医疗科技(武汉)有限公司 Quantitative evaluation method and system for facial movement dysfunction

Similar Documents

Publication Publication Date Title
Li et al. Intelligent sports training system based on artificial intelligence and big data
Chaudhari et al. Yog-guru: Real-time yoga pose correction system using deep learning methods
CN111437583B (en) Badminton basic action auxiliary training system based on Kinect
US11759126B2 (en) Scoring metric for physical activity performance and tracking
US20090042661A1 (en) Rule based body mechanics calculation
Zhu Computer vision-driven evaluation system for assisted decision-making in sports training
Surer et al. Methods and technologies for gait analysis
CN106846372B (en) Human motion quality visual analysis and evaluation system and method thereof
CN116740618A (en) Motion video action evaluation method, system, computer equipment and medium
CN107256390B (en) Hand function evaluation device and method based on change of each part of hand in three-dimensional space position
CN114092971A (en) Human body action evaluation method based on visual image
Nie et al. The construction of basketball training system based on motion capture technology
CN113033501A (en) Human body classification method and device based on joint quaternion
Lin et al. Using hybrid sensoring method for motion capture in volleyball techniques training
US20210286983A1 (en) Estimation method, and computer-readable recording medium recording estimation program
Abd Shattar et al. Experimental Setup for Markerless Motion Capture and Landmarks Detection using OpenPose During Dynamic Gait Index Measurement
CN115953834A (en) Multi-head attention posture estimation method and detection system for sit-up
Jeng Hierarchical linear model approach to explore interaction effects of swimmers’ characteristics and breathing patterns on swimming performance in butterfly stroke with self-developed inertial measurement unit
CN111724901A (en) Method, system and device for predicting structure body parameters based on vision and storage medium
Lau et al. Cost-benefit analysis reference framework for human motion capture and analysis systems
WO2023182726A1 (en) Electronic device and method for segmentation of movement repetitions and extraction of performance metrics
CN115153517B (en) Testing method, device, equipment and storage medium for timing, standing and walking test
Wang et al. A mobile platform app to assist learning human kinematics in undergraduate biomechanics courses
CN116189382A (en) Fall detection method and system based on inertial sensor network
CN117523659A (en) Skeleton-based multi-feature multi-stream real-time action recognition method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination