CN116597507A - Human body action normalization evaluation method and system - Google Patents
Human body action normalization evaluation method and system Download PDFInfo
- Publication number
- CN116597507A CN116597507A CN202310441995.4A CN202310441995A CN116597507A CN 116597507 A CN116597507 A CN 116597507A CN 202310441995 A CN202310441995 A CN 202310441995A CN 116597507 A CN116597507 A CN 116597507A
- Authority
- CN
- China
- Prior art keywords
- human
- motion
- key point
- video
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000282414 Homo sapiens Species 0.000 title claims abstract description 126
- 230000009471 action Effects 0.000 title claims abstract description 50
- 238000011156 evaluation Methods 0.000 title claims abstract description 33
- 238000010606 normalization Methods 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 26
- 238000004364 calculation method Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000007812 deficiency Effects 0.000 claims 2
- 230000000295 complement effect Effects 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 6
- 238000000926 separation method Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 206010016256 fatigue Diseases 0.000 description 2
- 238000013077 scoring method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000000386 athletic effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a human body action normalization evaluation method and a human body action normalization evaluation system, which comprise the following steps: s1, modeling a human body in an action video to be evaluated and a standard action video to obtain a human skeleton key point sequence; s2, performing missing value interpolation on positions with missing values in all the action videos; s3, dividing the acquired human skeleton key point sequences into sequence blocks with consistent lengths; s4, acquiring characteristic information only comprising human action elements from a human skeleton key point sequence; s5, comparing the acquired characteristic information of the motion video to be evaluated and the characteristic information of the standard motion video after time sequence adjustment, and generating a motion standardability evaluation score. The method and the device effectively solve the problems that in the prior art, the competition result is not fair due to subjective factors or manual auditing fatigue and the like, and different results can appear in the video to be evaluated under different visual angles.
Description
Technical Field
The invention relates to the field of human motion analysis, in particular to a human motion normalization evaluation method and system.
Background
With the emphasis of health and physical fitness, more and more people are beginning to actively participate in various forms of athletic activities. Among these sports, normative assessment of sports plays a vital role. However, the conventional scoring method mainly relies on manual referee to perform subjective scoring, but the scoring method has a plurality of disadvantages: first, the subjective factors of the human referee are large, and their judgment may be affected by personal preference, emotion and other factors, thus causing an uneven competition result. Secondly, the human referee may suffer from fatigue, errors, etc., affecting their judgment of the game. Even an experienced referee cannot ensure that a high level of operation is maintained throughout a long period of continuous operation. Meanwhile, in many scenes, human motion videos to be evaluated are usually photographed and collected under different viewing angles due to the influence of factors such as environment and human beings.
Disclosure of Invention
The invention provides a human motion normalization evaluation system, comprising: the system comprises a gesture estimation module, a human skeleton key point sequence coordinate missing value interpolation module, a human skeleton joint point sequence segmentation module, a human joint angle characteristic extraction module and a human action normalization evaluation module.
Further, the gesture estimation module is used for modeling the human body action gesture in the video.
Further, the human skeleton key point sequence coordinate missing value interpolation module is used for complementing missing key points appearing in the video frame.
Further, the human body action normalization evaluation module is used for comparing the human body characteristics of the specified video with the human body characteristics of the video to be evaluated, and performing score evaluation according to the comparison result.
Further, the system also comprises an automatic coding and decoding network module; the automatic coding and decoding network module is provided with an automatic coding and decoding network model for extracting and analyzing the action information characteristics.
Further, the automatic codec network model includes: a motion information encoder, a skeleton structure information encoder, a camera view information encoder and a decoder; the action information encoder, the skeleton structure information encoder and the camera view angle information encoder decouple the key point sequence of the human skeleton into the following three feature vectors: A1. motion information feature vectors related to time represent motion related information of a human body; A2. time-independent human skeleton structure information features, representing the structure information of human skeleton; A3. the time-independent camera view angle feature represents camera angle information at the time of motion video capture.
Further, the decoder is used for orderly recombining the three independent feature vectors obtained by encoding to reconstruct a corresponding human skeleton key point sequence, comparing the key point sequence with a given real sample, and calculating corresponding loss.
A human motion normalization evaluation method, comprising the steps of: s1, modeling a human body in an action video to be evaluated and a standard action video to obtain a human skeleton key point sequence; s2, performing missing value interpolation on positions with missing values in all the action videos; s3, dividing the acquired human skeleton key point sequences into sequence blocks with consistent lengths; s4, acquiring characteristic information only comprising human action elements from a human skeleton key point sequence; s5, comparing the acquired characteristic information of the motion video to be evaluated and the characteristic information of the standard motion video after time sequence adjustment, and generating a motion standardability evaluation score.
Further, in the step S1, human skeleton information in the action video is extracted through an openPose algorithm, so as to obtain a human skeleton key point sequence.
Further, the step S2 includes the following substeps: s21, finding out the front and rear nearest neighbor frames of the key point coordinate missing value, and carrying out feature weighting calculation on the front and rear nearest neighbor frames to obtain P ave :
And t is j T, wherein T is the frame where the coordinate of the missing key point is located, T1 and T2 respectively represent two frames which are nearest to the frame before and after the frame T and have no missing corresponding to the coordinate of the key point, and T is the total frame number of the video; s22, segmenting the sequence two according to the position of the missing value of the key point coordinate, performing polynomial regression on key point data corresponding to each segment of data, and obtaining regression predicted values of the time sequence of the front segment and the rear segment according to the missing key point: p (P) before =y j ;j=0,1,...,i-1;P after =y j The method comprises the steps of carrying out a first treatment on the surface of the j=i+1, i+2,.. T, in which y j The prediction result of polynomial regression is that T is the total frame number of the video; s23, predicting the obtained P ave 、P before 、P after Weighting calculation is carried out, and a final prediction result is obtained: p (P) t,i = 1 / 2 P ave + 1 / 4 P before + 1 / 4 P after 。
In the step S3, the input two-dimensional human skeleton key point sequence is divided into sequence blocks with consistent length through a sliding window with a window size w and a step length r.
Further, the step S5 includes the following substeps: s51, calculating two human motion information feature vector sets F through a DTW algorithm 1 And F 2 A best matching path matrix W between w= { (1, 1), (x, y), (n 1, n 2) }, where x, y are elements of the motion information feature vector, and x is aligned with y; n1 and n2 are the last elements of the two motion information feature vectors which need to be aligned regularly; s52, calculating cosine similarity between all successfully matched motion information feature vector pairs on the optimal matching path matrix W to obtain a human skeleton data sequence block similarity set S:wherein x is k And y k Respectively two motion information featuresThe corresponding kth element in the vector; s53, averaging elements in the set S, and normalizing: />Wherein n is s Is the number of elements in the set S.
The invention provides a human body action normalization evaluation method and system, which effectively solve the problems that in the prior art, the competition result is not fair due to subjective factors or manual auditing fatigue and the like, and different results can appear in videos to be evaluated under different visual angles.
Drawings
FIG. 1 is a flow chart of a method and system for evaluating normalization of human actions according to the present invention;
FIG. 2 is a schematic diagram of a system structure of a method and a system for evaluating normalization of human actions according to the present invention;
fig. 3 is a two-dimensional human body coordinate diagram of a human body motion normalization evaluation method and system according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, illustrates only some, but not all embodiments, and for the sake of clarity, illustration and description not related to the invention is omitted in the drawings and description.
As shown in fig. 1, the invention provides a human motion normalization evaluation method, which comprises the following steps: s1, modeling a human body in an action video to be evaluated and a standard action video to obtain a human skeleton key point sequence; s2, performing missing value interpolation on positions with missing values in all the action videos; s3, dividing the acquired human skeleton key point sequences into sequence blocks with consistent lengths; s4, acquiring characteristic information only comprising human action elements from a human skeleton key point sequence; s5, comparing the acquired characteristic information of the motion video to be evaluated and the characteristic information of the standard motion video after time sequence adjustment, and generating a motion standardability evaluation score.
In the step S1, human skeleton information in the action video is extracted through an OpenPose algorithm, and a human skeleton key point sequence is obtained.
Common time series missing value interpolation algorithms include mean interpolation, median interpolation, regression interpolation, neighbor interpolation, and the like. Since the video naturally has good context information, it is easily conceivable that an average interpolation algorithm may be employed to implement interpolation of missing coordinates. If in the t-th frame image, the ith key point coordinate P t,i In the absence of the key point data corresponding to the previous and subsequent frames, P t,i The predicted values of (2) are:
the step S2 comprises the following substeps: s21, finding out the front and rear nearest neighbor frames of the key point coordinate missing value, and carrying out feature weighting calculation on the front and rear nearest neighbor frames to obtain P ave :And t is j T, wherein T is the frame where the coordinate of the missing key point is located, T1 and T2 respectively represent two frames which are nearest to the frame before and after the frame T and have no missing corresponding to the coordinate of the key point, and T is the total frame number of the video; s22, segmenting the sequence two according to the position of the missing value of the key point coordinate, performing polynomial regression on key point data corresponding to each segment of data, and obtaining regression predicted values of the time sequence of the front segment and the rear segment according to the missing key point: p (P) before =y j ;j=0,1,...,i-1;P after =y j The method comprises the steps of carrying out a first treatment on the surface of the j=i+1, i+2,.. T, in which y j The prediction result of polynomial regression is that T is the total frame number of the video; s23, predicting the obtained P ave 、P before 、P after Weighting calculation is carried out, and a final prediction result is obtained: p (P) t,i = 1 / 2 P ave + 1 / 4 P before + 1 / 4 P after 。
And S3, dividing the input two-dimensional human skeleton key point sequence into sequence blocks with consistent length through a sliding window with the window size w and the step length r.
The step S5 comprises the following substeps: s51, calculating two human motion information feature vector sets F through a DTW algorithm 1 And F 2 A best matching path matrix W between w= { (1, 1), (x, y), (n 1, n 2) }, where x, y are elements of the motion information feature vector, and x is aligned with y; n1 and n2 are the last elements of the two motion information feature vectors which need to be aligned regularly; s52, calculating cosine similarity between all successfully matched motion information feature vector pairs on the optimal matching path matrix W to obtain a human skeleton data sequence block similarity set S:wherein x is k And y k The corresponding kth element in the two motion information feature vectors is respectively; s53, averaging elements in the set S, and normalizing: />Wherein n is s Is the number of elements in the set S.
As shown in fig. 2, the present invention provides a human motion normalization evaluation system including: the system comprises a gesture estimation module, a human skeleton key point sequence coordinate missing value interpolation module, a human skeleton joint point sequence segmentation module, a human joint angle characteristic extraction module and a human action normalization evaluation module.
The gesture estimation module is used for modeling the human body action gesture in the video. The human skeleton key point sequence coordinate missing value interpolation module is used for complementing missing key points appearing in the video frame. The human body action normalization evaluation module is used for comparing the human body characteristics of the specified video with the human body characteristics of the video to be evaluated and performing score evaluation according to the comparison result.
The system also comprises an automatic coding and decoding network module; the automatic coding and decoding network module is provided with an automatic coding and decoding network model for extracting and analyzing the action information characteristics. The automatic codec network model includes: a motion information encoder, a skeleton structure information encoder, a camera view information encoder and a decoder; the action information encoder, the skeleton structure information encoder and the camera view angle information encoder decouple the key point sequence of the human skeleton into the following three feature vectors: A1. motion information feature vectors related to time represent motion related information of a human body; A2. time-independent human skeleton structure information features, representing the structure information of human skeleton; A3. the time-independent camera view angle feature represents camera angle information at the time of motion video capture.
The decoder is used for orderly recombining the three independent feature vectors obtained by encoding to reconstruct a corresponding human skeleton key point sequence, comparing the key point sequence with a given real sample and calculating corresponding loss.
Extracting a two-dimensional human skeleton key point sequence: modeling the corresponding human body motion gestures in the standard motion video and the motion video to be evaluated according to the existing gesture estimation algorithm, namely extracting human body skeleton information in the motion video through the gesture estimation algorithm to obtain a human body skeleton key point sequence, as shown in fig. 3.
Interpolation of human skeleton key point sequence missing coordinate values: although the current OpenPose algorithm has higher detection precision when in single person detection, coordinate value deletion is still possible in a human skeleton key point sequence extracted by an attitude estimation algorithm due to factors such as motion blur caused by the shielding of a human body or high-speed motion of the human body. Since the key point coordinate data of the same position in the video stream has the characteristic of continuous change, if a certain frame has the key point coordinate missing, both the upper frame and the lower frame are affected. Therefore, only if the missing key points are complemented, the extracted skeleton data can completely express the action gesture information of the human body in the action video, so that the accuracy of the subsequent human body action similarity evaluation task is improved.
Is provided withTwo-dimensional human skeleton key point sequences corresponding to the standard action video and the action video to be evaluated respectively, wherein the time length is T respectively 1 And T 2 First windowSliding window handle with w step length r>And->Divided into two human skeleton key point data sequence block (patch) sets X 1 And X 2 :
Then X is taken up 1 And X 2 Sequentially inputting the elements in the sequence to a motion encoder in a trained coding and decoding network to extract motion information feature vectors to obtain two human motion information feature vector sets F 1 And F 2 :
The invention constrains and separates three potential features and reconstructs corresponding two-dimensional human skeleton key point sequences through a loss function consisting of three components of cross reconstruction loss, reconstruction loss and triplet loss.
Cross reconstruction loss: when training the automatic coding and decoding network, each iteration randomly selects a pair of samples from the data set, and a two-dimensional human body action sequence is obtained through the coder and decoder to be output. The purpose of using cross-reconstruction losses is to minimize the difference between input and output:
reconstruction loss: in addition to cross-reconstruction, at each iterative training, a network is also required to reconstruct the original inputSample entering:thus, the total reconstruction loss function is: l (L) rec_corss =L rec +L cross 。
Triplet loss: the use of cross reconstruction loss and reconstruction loss ensures that the corresponding two-dimensional human skeleton key point sequences are recombined after the actions of the same category are coded and decoded, but no separation requirement is explicitly given among different potential features, so that the potential space of one attribute may contain information of other two attributes. In order to enhance feature separation and enable action samples of the same class to present a good clustering effect in a potential space, the invention introduces triplet loss in depth measurement learning to increase the distance between classes and reduce the distance within the classes. The final goal of the triplet loss is to make the distance between the positive sample and the anchor sample smaller and the distance between the negative sample and the anchor sample larger and larger under the embedding space; the distance between the anchor sample and the positive sample in the embedding space is smaller than the distance between the anchor sample and the negative sample in the embedding space, so that similarity calculation is realized:wherein (1)>And->Is an anchor sample; m is the maximum separation between positive and negative samples, here taking a value of 0.3. Similarly, for skeleton and camera view encoders, the same triplet loss function is used to clarify the separation of features, whose formulas are:
thus, the total triplet loss function is: l (L) triplct =L triplct_M +L triplct_S +L triplct_V Adding the two loss functions to obtain a total loss function as follows: l=l rec_cross +L triplct 。
The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.
Claims (12)
1. A human motion normalization evaluation system, comprising: the system comprises a gesture estimation module, a human skeleton key point sequence coordinate missing value interpolation module, a human skeleton joint point sequence segmentation module, a human joint angle characteristic extraction module and a human action normalization evaluation module.
2. The human motion normalization assessment system of claim 1, wherein the pose estimation module is configured to model human motion poses in a video.
3. The human motion normalization evaluation system according to claim 1, wherein the human skeleton key point sequence coordinate deficiency value interpolation module is configured to complement a deficiency key point occurring in a video frame.
4. The human motion normalization evaluation system according to claim 1, wherein the human motion normalization evaluation module is configured to compare the human body feature of the specified video with the human body feature of the video to be evaluated, and perform the score evaluation according to the comparison result.
5. The human action normalization evaluation system of claim 1, further comprising an automatic codec network module; the automatic coding and decoding network module is provided with an automatic coding and decoding network model for extracting and analyzing the action information characteristics.
6. The human action normalization evaluation system of claim 5, wherein the automatic codec network model comprises: a motion information encoder, a skeleton structure information encoder, a camera view information encoder and a decoder; the action information encoder, the skeleton structure information encoder and the camera view angle information encoder decouple the key point sequence of the human skeleton into the following three feature vectors: A1. motion information feature vectors related to time represent motion related information of a human body; A2. time-independent human skeleton structure information features, representing the structure information of human skeleton; A3. the time-independent camera view angle feature represents camera angle information at the time of motion video capture.
7. The human motion normalization evaluation system according to claim 6, wherein the decoder is configured to sequentially reconstruct three independent feature vectors obtained by encoding into corresponding human skeleton key point sequences, and compare the corresponding human skeleton key point sequences with given real samples to calculate corresponding losses.
8. A human motion normalization evaluation method based on a human motion normalization evaluation system according to any one of claims 1 to 7, characterized by comprising the steps of: s1, modeling a human body in an action video to be evaluated and a standard action video to obtain a human skeleton key point sequence; s2, performing missing value interpolation on positions with missing values in all the action videos; s3, dividing the acquired human skeleton key point sequences into sequence blocks with consistent lengths; s4, acquiring characteristic information only comprising human action elements from a human skeleton key point sequence; s5, comparing the acquired characteristic information of the motion video to be evaluated and the characteristic information of the standard motion video after time sequence adjustment, and generating a motion standardability evaluation score.
9. The method for evaluating the normalization of human motion according to claim 8, wherein in the step S1, human skeleton information in the motion video is extracted by using an openPose algorithm to obtain a human skeleton key point sequence.
10. The method of evaluating the normalization of human actions according to claim 8, wherein said step S2 comprises the sub-steps of: s21, finding out the front and rear nearest neighbor frames of the key point coordinate missing value, and carrying out feature weighting calculation on the front and rear nearest neighbor frames to obtain P ave :And t is j T, wherein T is the frame where the coordinate of the missing key point is located, T1 and T2 respectively represent two frames which are nearest to the frame before and after the frame T and have no missing corresponding to the coordinate of the key point, and T is the total frame number of the video; s22, segmenting the sequence two according to the position of the missing value of the key point coordinate, performing polynomial regression on key point data corresponding to each segment of data, and obtaining regression predicted values of the time sequence of the front segment and the rear segment according to the missing key point: p (P) before =y j ;j=0,1,...,i-1;P after =y j The method comprises the steps of carrying out a first treatment on the surface of the j=i+1, i+2,.. T, in which y j The prediction result of polynomial regression is that T is the total frame number of the video; s23, predicting the obtained P ave 、P before 、P after Weighting calculation is carried out, and a final prediction result is obtained: p (P) t,i = 1 / 2 P ave + 1 / 4 P before + 1 / 4 P after 。
11. The method for evaluating the normalization of human motion according to claim 8, wherein in the step S3, the input two-dimensional human skeleton key point sequence is divided into sequence blocks with consistent length through a sliding window with a window size w and a step size r.
12. The method of evaluating the normalization of human actions according to claim 8, wherein said step S5 comprises the sub-steps of: s51, calculating two human motion information feature vector sets F through a DTW algorithm 1 And F 2 A best matching path matrix W between w= { (1, 1), (x, y), (n 1, n 2) }, where x, y are elements of the motion information feature vector, and x is aligned with y; n1 and n2 are the last elements of the two motion information feature vectors which need to be aligned regularly;
s52, calculating cosine similarity between all successfully matched motion information feature vector pairs on the optimal matching path matrix W to obtain a human skeleton data sequence block similarity set S:wherein x is k And y k The corresponding kth element in the two motion information feature vectors is respectively;
s53, averaging elements in the set S, and normalizing:wherein n is s Is the number of elements in the set S.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310441995.4A CN116597507A (en) | 2023-04-23 | 2023-04-23 | Human body action normalization evaluation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310441995.4A CN116597507A (en) | 2023-04-23 | 2023-04-23 | Human body action normalization evaluation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116597507A true CN116597507A (en) | 2023-08-15 |
Family
ID=87603581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310441995.4A Pending CN116597507A (en) | 2023-04-23 | 2023-04-23 | Human body action normalization evaluation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116597507A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117809849A (en) * | 2024-02-29 | 2024-04-02 | 四川赛尔斯科技有限公司 | Analysis method and system for walking postures of old people with cognitive dysfunction |
-
2023
- 2023-04-23 CN CN202310441995.4A patent/CN116597507A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117809849A (en) * | 2024-02-29 | 2024-04-02 | 四川赛尔斯科技有限公司 | Analysis method and system for walking postures of old people with cognitive dysfunction |
CN117809849B (en) * | 2024-02-29 | 2024-05-03 | 四川赛尔斯科技有限公司 | Analysis method and system for walking postures of old people with cognitive dysfunction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Seeing invisible poses: Estimating 3d body pose from egocentric video | |
Cao et al. | Egocentric gesture recognition using recurrent 3d convolutional neural networks with spatiotemporal transformer modules | |
Xu et al. | Deep image matting | |
JP6664163B2 (en) | Image identification method, image identification device, and program | |
CN109815826B (en) | Method and device for generating face attribute model | |
Minhas et al. | Incremental learning in human action recognition based on snippets | |
Chen et al. | Frame difference energy image for gait recognition with incomplete silhouettes | |
Johnson et al. | Sparse coding for alpha matting | |
Zhang et al. | Learning based alpha matting using support vector regression | |
CN107666853A (en) | Beat signals are determined according to video sequence | |
CN111507334B (en) | Instance segmentation method based on key points | |
CN112200162B (en) | Non-contact heart rate measuring method, system and device based on end-to-end network | |
CN110458235B (en) | Motion posture similarity comparison method in video | |
CN111723687A (en) | Human body action recognition method and device based on neural network | |
CN116597507A (en) | Human body action normalization evaluation method and system | |
CN112633221A (en) | Face direction detection method and related device | |
CN106778576B (en) | Motion recognition method based on SEHM characteristic diagram sequence | |
CN115393964A (en) | Body-building action recognition method and device based on BlazePose | |
CN115331777A (en) | Action evaluation method and device, electronic equipment and computer readable storage medium | |
CN115346272A (en) | Real-time tumble detection method based on depth image sequence | |
Szankin et al. | Influence of thermal imagery resolution on accuracy of deep learning based face recognition | |
Panse et al. | A dataset & methodology for computer vision based offside detection in soccer | |
Deora et al. | Salient image matting | |
CN115862119B (en) | Attention mechanism-based face age estimation method and device | |
CN114170651A (en) | Expression recognition method, device, equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |