CN112464808B - Rope skipping gesture and number identification method based on computer vision - Google Patents

Rope skipping gesture and number identification method based on computer vision Download PDF

Info

Publication number
CN112464808B
CN112464808B CN202011352942.8A CN202011352942A CN112464808B CN 112464808 B CN112464808 B CN 112464808B CN 202011352942 A CN202011352942 A CN 202011352942A CN 112464808 B CN112464808 B CN 112464808B
Authority
CN
China
Prior art keywords
rope skipping
key
key points
coordinate
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011352942.8A
Other languages
Chinese (zh)
Other versions
CN112464808A (en
Inventor
杨婷
肖利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shufeng Technology Co ltd
Chengdu Ruima Technology Co ltd
Original Assignee
Hangzhou Shufeng Technology Co ltd
Chengdu Ruima Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shufeng Technology Co ltd, Chengdu Ruima Technology Co ltd filed Critical Hangzhou Shufeng Technology Co ltd
Priority to CN202011352942.8A priority Critical patent/CN112464808B/en
Publication of CN112464808A publication Critical patent/CN112464808A/en
Application granted granted Critical
Publication of CN112464808B publication Critical patent/CN112464808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a rope skipping posture and number identification method based on computer vision, which trains a skeleton sequence topological graph structure of each frame of image with extracted features by using an ST-CGN model; performing softmax on the features extracted by the ST-CGN network, outputting action classification and confidence, and counting rope skipping actions if the action classification is output as rope skipping; and calculating cosine similarity between the standard motion characteristic vector and the current motion characteristic vector, and judging that the current motion is the standard motion when the cosine similarity is greater than a preset threshold value. The invention can judge whether the rope skipping posture is correct by using a GCN and cosine similarity algorithm, and can calculate the number of the skipping ropes by using a maximum value or minimum value method.

Description

Rope skipping posture and number identification method based on computer vision
Technical Field
The invention relates to the field of human body action analysis, in particular to a rope skipping posture and number identification method based on computer vision.
Background
The skipping rope can achieve aerobic and anaerobic exercise effects at the same time, exercise muscles of multiple parts, exercise coordination and balance, and enhance cardio-pulmonary function. The long-time pattern rope skipping training can effectively improve the bone density of the juveniles and obviously enhance physical qualities such as strength, explosive force and the like. The rope skipping posture recognition means that the rope skipping posture is recognized and compared with the standard posture, and the wrong rope skipping posture is corrected. The rope skipping number statistics refers to the number statistics aiming at different rope skipping methods.
The method adopted in the prior art is that a human body key point recognition algorithm is used for recognizing human body key points on images of continuous video frames in a computer vision mode, rope skipping actions and postures are recognized through GCN as input, and wrong actions are pre-warned by comparing the rope skipping actions and the postures with standard postures, so that harm to a human body caused by an improper rope skipping mode is avoided. And further counting the number of the skipping ropes in real time according to the change rule of the key points. The defects that the manual concentration degree is limited, the counter strongly depends on rope skipping, and the requirement of audio frequency on the surrounding environment is high are eliminated.
The GCN algorithm can keep translation invariance on data with a non-Euclidean structure, and the core of the GCN is based on the spectral decomposition of a Laplace matrix. The GCN is also a neural network layer, and the propagation modes among layers are as follows:
Figure BDA0002801861700000011
wherein,
Figure BDA0002801861700000012
i is an identity matrix;
Figure BDA0002801861700000013
is that
Figure BDA0002801861700000014
A degree matrix of (c); h is a characteristic of each layer, and H is X for the input layer; δ is a non-linear activation function.
The human body key points and the adjacent relation naturally form a topological structure, the GCN can effectively extract the space characteristics of a topological graph compared with the common CNN, a group of specific actions (not only rope skipping actions) of the human body is strongly related to the relation between each key point, and different characteristics of topological graphs of various different action key points can be effectively extracted through the GCN, so that higher accuracy can be obtained by extracting the characteristics of the human body key points by applying the GCN.
Disclosure of Invention
The invention aims to: a rope skipping posture and number identification method based on computer vision is provided, a GCN and cosine similarity calculation method is used for judging whether the rope skipping posture is correct, a maximum value or a minimum value method is used for calculating the number of skipping ropes, and the problems are solved.
The technical scheme adopted by the invention is as follows:
a rope skipping posture and number identification method based on computer vision comprises the following steps of:
step S1: collecting a video of a human body rope skipping;
step S2: detecting key points of a human body in each frame of a video to obtain the coordinate positions and the confidence degrees of the coordinates of 18 key points of the human body in each frame, and numbering the key points in sequence as 0,1, \ 8230;, 17;
and step S3: processing the coordinate data of the key points, and normalizing the coordinates of the key points to-0.5;
and step S4: constructing a framework sequence diagram structure by using the coordinates and the confidence degrees of the coordinates of the key points in the screened T frame key frames and each frame key frame, and recording that a time-space diagram of the framework sequence is G = (V, E), and a node V = { Vti | T = 1., T; i =0,. And n-1}, the feature vector F (vti) = { xti, yti, scoreti }, of the ith node of the tth frame, (xti, yti) is the processed coordinate of the ith keypoint, and scoreti is the coordinate confidence of the ith keypoint;
two edge sets are constructed, one edge set is a space structure edge Es = { vttj | (i, j) ∈ H }, H is a group of naturally connected human body joints and a time sequence, the edge set EF = { vtiv (T + 1) i }, vtiv (T + 1) i is a connecting edge of two continuous frames of the same key points, the input dimensionality of the network is (N, C, V, T), wherein N is a sample size of one batch, C is the number of the key points 18, V is 3 and represents the processed coordinates and confidence coefficient (x, y, score) of each key point, and T is the number of the key frames.
Step S5: extracting features by using a spatial map convolution and a time sequence convolution, wherein the spatial map convolution is as follows:
Figure BDA0002801861700000021
wherein the normalization term Z ti (v tj )=|{v tk |l ti (v tk )=l ti (v tj )}|,p(v ti ,v tj ) As a sampling function, p (v) ti ,v tj )=v tj The set of neighboring pixels is defined as B (v) ti )={v tj |d(v tj ,v ti ) D ≦ D }, wherein D (v) tj ,v ti ) Refers to from v tj To v ti The shortest distance of (d);
sampling function p (v) ti ,v tj ) The obtained neighboring pixels are divided into different subsets, each subset has a digital label, and a neighboring pixel is mapped to a corresponding subset label l ti :B(v ti ) → 0, K-1}, weight equation w (v) ti ,v tj )=w′(l ti (v tj ));
Extending the model of the space domain into the time domain to obtain a sampling function of the time chart convolution as follows:
Figure BDA0002801861700000023
wherein gamma is the size of the time domain convolution kernel, and the weight equation is
Figure BDA0002801861700000022
Step S6: training a skeleton sequence topological graph structure of each frame of image with the extracted features by using an ST-CGN model;
step S7: performing softmax on the features extracted by the ST-CGN network, outputting action classification and confidence, and counting rope skipping actions if the action classification is output as rope skipping;
step S8: the y coordinate set of 18 human key points of the rope skipping action starting and ending time period is S = { (y 10, y 20.,. Y (T-1) 0, yT0), (y 11, y 21.,. Y (T-1) 1, yT1),. Once., (y 116, y 216.,. Once.y (T-1) 16, yT16), (y 117, y 217.,. Y (T-1) 17, yT17) }, wherein yti is the y coordinate of the ith node of the T-th frame, and each element in the set S is subjected to smoothing operation, namely, a new set S1 is obtained by averaging every continuous n numbers;
solving a maximum value set of each element in the S1 and recording the maximum value set as Ma, and a minimum value set of each element and recording the minimum value set as Mi; the extreme value is calculated in such a way that if a i Is a maximum value, then a i Satisfies the following conditions: a is i >a i-1 And a is i ≥a i+1 (ii) a If a i Is a minimum value, then a i Satisfies the following conditions: a is i <a i-1 And a is i ≦a i+1
Counting the lengths of the elements Ma and Mi, counting the lengths to generate a dictionary record D, wherein the length of a key of the D is the length, the value is the number of the counted lengths, the values of the D are sorted from large to small, and the key with the largest value minus 1 is the number of the skipping ropes;
step S9: constructing a skeleton included angle vector to calculate cosine similarity, and comparing a characteristic vector included angle S A = An (a) (b) - (c) (d), an (a) (b) - (c) (d) represents a vector of keypoints a and keypoints b
Figure BDA0002801861700000031
Vectors with keypoint c and keypoint d
Figure BDA0002801861700000032
The included angle of (c);
step S10: computing standard motion feature vectors
Figure BDA0002801861700000033
And current motion feature vector
Figure BDA0002801861700000034
Cosine similarity of (2), x i As feature vectors
Figure BDA0002801861700000035
The ith component of (2); y is i As feature vectors
Figure BDA0002801861700000036
When the cosine similarity is greater than a preset threshold value, the current action is judged to be a standard action.
To better implement the present solution, further, the feature vector in step S9
Figure BDA0002801861700000037
The specific feature vector angle set is as follows: s A = { An (0) (1) - (1) (5), an (0) (1) - (1) (2), an (1) (2) - (1) (5), an (1) (2) - (2) (3), an (2) (3) - (3) (4), an (1) (5) - (5) (6), an (5) (6) - (6) (7), an (1) (2) - (1) (8), an (1) (5) - (1) (11), an (1) (8) - (8) (9), an (8) (9) - (9) (10), an (1) (11) - (11) (12), an (11) (12) - (12) (13), an (2) (8) - (2) (3), an (5) (11) - (5) (6) }, wherein 18 key points are numbered nose 0, neck 1, right shoulder 2, right elbow 3, right hand 4, left shoulder 5, elbow 6, right elbow 8, left knee 9, right knee 10, left ankle 13, left hand knee joint 13, left knee joint 11, ankle 13, right eye 14, left eye 15, right ear 16, left ear 17.
In order to better implement the present solution, further, the cosine similarity is calculated in step S10 by calculating the cosine of the eigenvector included angle, and calculating the variance or standard deviation between the cosine and the cosine of the preset eigenvector included angle, and the preset threshold is generally 0.68 when calculating the variance.
In order to better implement the present solution, further, the method for training the skeleton sequence topological graph structure of each frame of image with extracted features by using the ST-CGN model in step S6 includes: firstly performing batch normalization on input data, then passing through 9 ST-GCN units, then obtaining 256-dimensional feature vectors of each sequence by global posing, and finally classifying by using a SoftMax function to obtain a final label;
each ST-GCN adopts a Resnet structure, 64 channels are output from the first three layers, 128 channels are output from the middle three layers, 256 channels are output from the last three layers, and after each ST-CGN structure, the characteristics dropout and the strides of the 4 th and 7 th time domain convolution layers are randomly set to be 2 according to the probability of 0.5;
training with SGD resulted in a learning rate of 0.01, with a 10% reduction in learning rate per 20 epochs.
In order to better implement the present solution, further, the method for normalizing the coordinates of the key points to-0.5 to 0.5 in step S3 specifically includes: processing the position (x 0, y 0) of each key point to obtain new key point coordinates (x, y)
x=(x0/w)-0.5,y=(y0/h)-0.5
Where w is the image width and h is the image height.
Many prior art solutions use the variance or standard deviation of the coordinates directly to determine whether the rope skipping action is standard, knowing that the standard deviation reflects the degree of dispersion of a data set, the variance is used to measure the degree of deviation between a random variable and its mathematical expectation (i.e., mean). In the prior art, the score of the human key point is obtained by adding the two indexes and taking a threshold, and the threshold is set according to the score, so as to judge whether the human key point is accurate or not, and the score is specific to a single target. The cosine similarity measurement method is used in the scheme to calculate the cosine value of the included angle between two vectors to evaluate the similarity of the two vectors, and is specific to two targets, and further, in rope skipping, whether the action is standard or not is strongly related to the included angle between limbs of a human body, and the relation between the action and the coordinate position, namely the height of the rope skipping or the possible left-right movement in the rope skipping process is not large, so that the similarity between the current action and the standard action can be more effectively compared by using the included angle between the limbs of the human body as the vector for calculating the cosine similarity, and whether the current action is standard or not is judged.
In addition, in a general key point sequence input behavior recognition algorithm of N frames of video frames, a sliding window method is adopted, assuming that an input video sequence is N frames, the size of a sliding window is N frames, and the number of moving steps is s frames, the N frames of video sequences with the size of the sliding window are input by taking the 0 th frame as a starting point to enter a network, then the window is moved backwards by s frames, and then the N frames of key point sequences with the size of the sliding window are obtained, and the method is used for continuously sliding on the video sequence to be used as the input of the behavior recognition network. The behavior recognition network adopts a space-time graph convolution network, a GCN (graph convolution network) is used for extracting spatial local information on a spatial level, a TCN (time convolution network) is used for implicitly learning time dynamic information on a temporal level, namely for each key point, not only the adjacent key point of each key point on the spatial level but also the adjacent key point of each key point on the temporal level are considered, and the concept of the neighborhood is expanded to the space-time dimension, so that the key points are more accurately recognized.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the rope skipping posture and number identification method based on computer vision can judge whether the rope skipping posture is correct or not by using a GCN and cosine similarity algorithm;
2. the rope skipping posture and number identification method based on computer vision uses a maximum value or minimum value method to calculate the number of skipping ropes, and is convenient and rapid.
Drawings
In order to more clearly illustrate the technical solution, the drawings needed to be used in the embodiments are briefly described below, and it should be understood that, for those skilled in the art, other related drawings can be obtained according to the drawings without creative efforts, wherein:
FIG. 1 is a diagram of the ST-CGN network architecture of the present invention;
FIG. 2 is a schematic diagram of 18 key points of the human body of the present invention;
FIG. 3 is a schematic flow diagram of the present invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and therefore should not be considered as a limitation to the scope of protection. All other embodiments, which can be obtained by a worker skilled in the art based on the embodiments of the present invention without making creative efforts, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through an intermediary, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The present invention will be described in detail with reference to fig. 1 to 3.
Example 1
A rope skipping posture and number recognition method based on computer vision is disclosed, as shown in fig. 3, and comprises the following steps which are carried out in sequence:
step S1: collecting a video of a human body rope skipping;
step S2: detecting key points of a human body in each frame of a video to obtain the coordinate positions and the confidence degrees of the coordinates of 18 key points of the human body in each frame, and numbering the key points in sequence as 0,1, \ 8230;, 17;
and step S3: processing the coordinate data of the key points, and normalizing the coordinates of the key points to-0.5;
and step S4: constructing a framework sequence diagram structure by using the coordinates and the confidence degrees of the coordinates of the key points in the screened T frame key frames and each frame key frame, and recording that a time-space diagram of the framework sequence is G = (V, E), and a node V = { Vti | T = 1., T; i =0,. And n-1}, the feature vector F (vti) = { xti, yti, scoreti }, of the ith node of the tth frame, (xti, yti) is the processed coordinate of the ith keypoint, and scoreti is the coordinate confidence of the ith keypoint;
two edge sets are constructed, one edge set is a space structure edge Es = { vttj | (i, j) ∈ H }, H is a group of naturally connected human body joints and a time sequence, the edge set EF = { vtiv (T + 1) i }, vtiv (T + 1) i is a connecting edge of two continuous frames of the same key points, the input dimensionality of the network is (N, C, V, T), wherein N is a sample size of one batch, C is the number of the key points 18, V is 3 and represents the processed coordinates and confidence coefficient (x, y, score) of each key point, and T is the number of the key frames.
Step S5: extracting features by using a spatial map convolution and a time sequence convolution, wherein the spatial map convolution is as follows:
Figure BDA0002801861700000061
wherein the normalization term Z ti (v tj )=|{v tk |l ti (v tk )=l ti (v tj ) I, the base equivalent to the corresponding subset, p (v) ti ,v tj ) As a sampling function, p (v) ti ,v tj )=v tj The set of neighboring pixels is defined as B (v) ti )={v tj |d(v tj ,v ti ) ≧ D }, wherein D (v) tj ,v ti ) Refers to from v tj To v ti The shortest distance of (d);
sampling function p (v) ti ,v tj ) The resulting neighboring pixels are divided into different subsets, each subset having a digital label, and a neighboring pixel is mapped to a corresponding subset label/ ti :B(v ti ) → 0, K-1}, weight equation w (v) ti ,v tj )=w′(l ti (v tj ));
Extending the model of the space domain into the time domain to obtain a sampling function of the time chart convolution as follows:
Figure BDA0002801861700000067
wherein gamma is the size of the time domain convolution kernel, and the weight equation is
Figure BDA0002801861700000066
Step S6: training a skeleton sequence topological graph structure of each frame of image with the extracted features by using an ST-CGN model;
step S7: performing softmax on the features extracted by the ST-CGN network, outputting action classification and confidence, and counting rope skipping actions if the action classification is output as rope skipping;
step S8: the y coordinate set of 18 human key points of the rope skipping action starting and ending time period is S = ({ (y 10, y 20.. The y (T-1) 0, yT0), (y 11, y 21., y (T-1) 1, yT1),. The y116, y 216.,. The y (T-1) 16, yT16), (y 117, y 217.,. The y (T-1) 17, yT17) }, wherein yti is the y coordinate of the ith node of the tth frame, and each element in the set S is smoothed, namely, a new set S1 is obtained by averaging every continuous n numbers;
solving a maximum value set of each element in the S1 and recording the maximum value set as Ma, and a minimum value set of each element and recording the minimum value set as Mi; the extreme value is calculated in such a way that if a i Is a maximum value, then a i Satisfies the following conditions: a is i >a i-1 And a is i ≥a i+1 (ii) a If a i Is a minimum value, then a i Satisfies the following conditions: a is a i <a i-1 And a is i ≦a i+1
Counting the lengths of the elements Ma and Mi, counting the lengths to generate a dictionary record D, wherein the length of a key of the D is the length, the value is the number of the counted lengths, the values of the D are sorted from large to small, and the key with the largest value minus 1 is the number of the skipping ropes;
step S9: constructing a skeleton included angle vector to calculate cosine similarity, and comparing a characteristic vector included angle S A = An (a) (b) - (c) (d), an (a) (b) - (c) (d) represents a vector of keypoints a and b
Figure BDA0002801861700000062
Vectors with keypoint c and keypoint d
Figure BDA0002801861700000063
The included angle of (A);
step S10: computing standard motion feature vectors
Figure BDA0002801861700000064
With current motion feature vector
Figure BDA0002801861700000065
Cosine similarity of (2), x i As feature vectors
Figure BDA0002801861700000071
The ith component of (a); y is i As feature vectors
Figure BDA0002801861700000072
When the cosine similarity is greater than a preset threshold value, the current action is judged to be a standard action.
Many prior art solutions use the variance or standard deviation of coordinates directly to determine whether the rope skipping action is standard, and we know that the standard deviation reflects the degree of dispersion of a data set, and the variance is used to measure the degree of deviation between a random variable and its mathematical expectation (i.e., mean). In the prior art, the score of the human key point is obtained by adding the two indexes and taking a threshold, and the threshold is set according to the score, so as to judge whether the human key point is accurate or not, and aim at a single target. The cosine similarity measurement method is used in the scheme to calculate the cosine value of the included angle between two vectors to evaluate the similarity of the two vectors, and is specific to two targets, and further, in rope skipping, whether the action is standard or not is strongly related to the included angle between limbs of a human body, and the relation between the action and the coordinate position, namely the height of the rope skipping or the possible left-right movement in the rope skipping process is not large, so that the similarity between the current action and the standard action can be more effectively compared by using the included angle between the limbs of the human body as the vector for calculating the cosine similarity, and whether the current action is standard or not is judged.
In addition, in a general key point sequence input behavior recognition algorithm of N frames of video frames, a sliding window method is adopted, assuming that an input video sequence is N frames, the size of a sliding window is N frames, the number of moving steps is s frames, the N frames of video sequences with the size of the sliding window is input by taking a 0 th frame as a starting point to enter a network, then the window is moved backwards by s frames, and then the N frames of key point sequences with the size of the sliding window are obtained, and the method continuously slides on the video sequence to be used as the input of a behavior recognition network. The behavior recognition network adopts a space-time graph convolution network, a GCN (graph convolution network) is used for extracting spatial local information on a spatial level, a TCN (time convolution network) is used for implicitly learning time dynamic information on a temporal level, namely for each key point, not only the adjacent key point of each key point on the spatial level but also the adjacent key point of each key point on the temporal level are considered, and the concept of the neighborhood is expanded to the space-time dimension, so that the key points are more accurately recognized.
Example 2
The feature vector in the step S9
Figure BDA0002801861700000073
The specific set of eigenvector included angles is: s. the A = { An (0) (1) - (1) (5), an (0) (1) - (1) (2), an (1) (2) - (1) (5), an (1) (2) - (2) (3), an (2) (3) - (3) (4), an (1) (5) - (5) (6), an (5) (6) - (6) (7), an (1) (2) - (1) (8), an (1) (5) - (1) (11), an (1) (8) - (8) (9), an (8) (9) - (9) (10), an (1) (11) - (11) (12), an (11) (12) - (12) (13), an (2) (8) - (2) (3), an (5) (11) - (5) (6) }, wherein 18 key points are numbered nose 0, neck 1, right shoulder 2, right elbow 3, right wrist 4, left shoulder 5, ankle 6, left wrist 7, right wrist 8, right knee 9, right knee 10, left ankle 11, left ankle 13, right eye 14, left eye 15, right ear 16, left ear 17.
Further, the cosine similarity is calculated in step S10 by calculating the cosine of the eigenvector included angle, and calculating the variance or standard deviation of the cosine of the eigenvector included angle and the cosine of the preset eigenvector included angle, and the preset threshold is generally 0.68 when calculating the variance.
Further, the method for training the skeleton sequence topological graph structure of each frame of image with extracted features by using the ST-CGN model in step S6 includes: firstly performing batch normalization on input data, then passing through 9 ST-GCN units, then obtaining 256-dimensional feature vectors of each sequence by global posing, and finally classifying by using a SoftMax function to obtain a final label;
each ST-GCN adopts a Resnet structure, 64 channels are output from the first three layers, 128 channels are output from the middle three layers, 256 channels are output from the last three layers, after the ST-CGN structure is passed by each time, the characteristic dropout is randomly set to be 2 according to the probability of 0.5, and the strides of the 4 th time domain convolution layer and the 7 th time domain convolution layer are set to be 2;
training with SGD, the learning rate is 0.01, and the learning rate is reduced by 10% every 20 epochs.
Further, the method for normalizing the coordinates of the key points to-0.5 to 0.5 in the step S3 specifically includes: processing the position (x 0, y 0) of each key point to obtain new key point coordinates (x, y)
x=(x0/w)-0.5,y=(y0/h)-0.5
Where w is the image width and h is the image height.
Many prior art solutions use the variance or standard deviation of coordinates directly to determine whether the rope skipping action is standard, and we know that the standard deviation reflects the degree of dispersion of a data set, and the variance is used to measure the degree of deviation between a random variable and its mathematical expectation (i.e., mean). In the prior art, the score of the human key point is obtained by adding the two indexes and taking a threshold, and the threshold is set according to the score, so as to judge whether the human key point is accurate or not, and aim at a single target. The cosine similarity measurement method is used in the scheme to calculate the cosine value of the included angle between two vectors to evaluate the similarity of the two vectors, and is specific to two targets, and further, in rope skipping, whether the action is standard or not is strongly related to the included angle between limbs of a human body, and the relation between the action and the coordinate position, namely the height of the rope skipping or the possible left-right movement in the rope skipping process is not large, so that the similarity between the current action and the standard action can be more effectively compared by using the included angle between the limbs of the human body as the vector for calculating the cosine similarity, and whether the current action is standard or not is judged.
In addition, in a general key point sequence input behavior recognition algorithm of N frames of video frames, a sliding window method is adopted, assuming that an input video sequence is N frames, the size of a sliding window is N frames, the number of moving steps is s frames, the N frames of video sequences with the size of the sliding window is input by taking a 0 th frame as a starting point to enter a network, then the window is moved backwards by s frames, and then the N frames of key point sequences with the size of the sliding window are obtained, and the method continuously slides on the video sequence to be used as the input of a behavior recognition network. The behavior recognition network adopts a space-time graph convolution network, a GCN (graph convolution network) is used for extracting spatial local information on a spatial level, a TCN (time convolution network) is used for implicitly learning time dynamic information on a temporal level, namely for each key point, not only the adjacent key point of each key point on the spatial level but also the adjacent key point of each key point on the temporal level are considered, and the concept of the neighborhood is expanded to the space-time dimension, so that the key points are more accurately recognized.
Other parts of this embodiment are the same as those of embodiment 1, and thus are not described again.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims (5)

1. A rope skipping posture and number identification method based on computer vision is characterized in that: comprises the following steps which are carried out in sequence:
step S1: collecting a video of a human body rope skipping;
step S2: detecting key points of a human body in each frame of a video to obtain the coordinate positions and the confidence degrees of the coordinates of 18 key points of the human body in each frame, and numbering the key points as 0,1, \8230, 17;
and step S3: processing the coordinate data of the key points, and normalizing the coordinates of the key points to-0.5;
and step S4: constructing a skeleton sequence diagram structure by using the coordinates and the confidence degrees of the coordinates of the key points in the key frames of the T frames and the key frames of each frame, recording that a time-space diagram of a skeleton sequence is G = (V, E), and recording that a node V = { Vti | T = 1., T; i =0,. And n-1}, the feature vector F (vti) = { xti, yti, scoreti }, of the ith node of the tth frame, (xti, yti) is the processed coordinate of the ith keypoint, and scoreti is the coordinate confidence of the ith keypoint;
constructing two edge sets, wherein one edge set is a spatial structure edge Es = { vttj | (i, j) ∈ H }, H is a group of naturally connected human body joints and a time sequence, an edge set EF = { vtiv (T + 1) i }, vtiv (T + 1) i is a connecting edge of two continuous frames of the same key point, the input dimension of the network is (N, C, V, T), wherein N is a sample size of a batch, C is the number of the key points 18, V is 3, and represents the processed coordinate and confidence coefficient (x, y, score) of each key point, and T is the number of the key frames;
step S5: extracting features by using spatial map convolution and time sequence convolution, wherein the spatial map convolution is as follows:
Figure FDA0002801861690000011
wherein the normalization term Z tt (v tj )=|{v tk |l ti (v tk )=l ti (v tj )}|,p(v ti ,v tj ) As a sampling function, p (v) ti ,v tj )=v tj The set of neighboring pixels is defined as B (v) ti )={v tj |d(v tj ,v tt ) D ≦ D }, wherein D (v) tj ,v ti ) Refers to from v tj To v ti The shortest distance of (c);
sampling function p (v) ti ,v tj ) The obtained neighboring pixels are divided into different subsets, each subset has a digital label, and a neighboring pixel is mapped to a corresponding subset label l ti :B(v ti ) → 0, K-1}, weight equation w (v) ti ,v tj )=w′(l ti (v tj ));
Extending the model of the space domain into the time domain to obtain a sampling function of the time chart convolution as follows:
Figure FDA0002801861690000013
wherein gamma is the size of the time domain convolution kernel, and the weight equation is
Figure FDA0002801861690000012
Step S6: training a skeleton sequence topological graph structure of each frame of image with the extracted features by using an ST-CGN model;
step S7: performing softmax on the features extracted by the ST-CGN network, outputting action classification and confidence, and counting rope skipping actions if the action classification is output as rope skipping;
step S8: the y coordinate set of 18 human key points of the rope skipping action starting and ending time period is S = { (y 10, y 20.,. Y (T-1) 0, yT0), (y 11, y 21.,. Y (T-1) 1, yT1),. Once., (y 116, y 216.,. Once.y (T-1) 16, yT16), (y 117, y 217.,. Y (T-1) 17, yT17) }, wherein yti is the y coordinate of the ith node of the T-th frame, and each element in the set S is subjected to smoothing operation, namely, a new set S1 is obtained by averaging every continuous n numbers;
solving a maximum value set of each element in the S1 and marking as Ma, and a minimum value set of each element and marking as Mi; the extreme value is calculated in such a way that if a i Is a maximum value, then a i Satisfies the following conditions: a is i >a i-1 And a is i ≥a i+1 (ii) a If a i Is a minimum value, then a i Satisfies the following conditions: a is i <a i-1 And a is i ≦a i+1
Counting the lengths of the elements Ma and Mi, counting the lengths to generate a dictionary record D, wherein the length of a key D is the length, the value is the number of the counted lengths, sequencing the values of the D from large to small, and subtracting 1 from the key corresponding to the largest value to obtain the number of the skipping ropes;
step S9: constructing a skeleton included angle vector to calculate cosine similarity, and comparing the included angles of the characteristic vectors∠S A = An (a) (b) - (c) (d), an (a) (b) - (c) (d) represents a vector of keypoints a and keypoints b
Figure FDA0002801861690000021
Vectors with keypoint c and keypoint d
Figure FDA0002801861690000022
The included angle of (A);
step S10: computing standard motion feature vectors
Figure FDA0002801861690000023
With current motion feature vector
Figure FDA0002801861690000024
Cosine similarity of (2), x i As feature vectors
Figure FDA0002801861690000025
The ith component of (2); y is i As feature vectors
Figure FDA0002801861690000026
When the cosine similarity is greater than a preset threshold value, the current action is judged to be a standard action.
2. The rope skipping posture and number identification method based on the computer vision as claimed in claim 1, wherein: the feature vector in the step S9
Figure FDA0002801861690000027
The specific feature vector angle set is as follows: s A = { An (0) (1) - (1) (5), an (0) (1) - (1) (2), an (1) (2) - (1) (5), an (1) (2) - (2) (3), an (2) (3) - (3) (4), an (1) (5) - (5) (6), an (5) (6) - (6) (7), an (1) (2) - (1) (8), an (1) (5) - (1) (11), an (1) (8) - (8) (9), an (8) (9) - (9) (10), an (1) (11) - (11) (12), an (11) (12) - (12) (13), an (2) (8) - (2) (3), an (5) (11) - (5) (6) }, wherein 18 key points are numbered nose 0, neck 0Child 1, right shoulder 2, right elbow 3, right hand side 4, left shoulder 5, left elbow 6, left hand side 7, right hip 8, right knee 9, right ankle 10, left hip 11, left knee 12, left ankle 13, right eye 14, left eye 15, right ear 16, left ear 17.
3. The rope skipping posture and number identification method based on the computer vision as claimed in claim 1, wherein: the cosine similarity in step S10 is calculated by calculating the cosine of the included angle of the feature vector and calculating the variance or standard deviation of the cosine of the included angle of the feature vector and the preset feature vector.
4. The rope skipping posture and number identification method based on the computer vision as claimed in claim 1, wherein: the method for training the skeleton sequence topological graph structure of each frame of image with extracted features by using the ST-CGN model in the step S6 comprises the following steps: firstly, performing batch normalization on input data, then obtaining 256-dimensional feature vectors of each sequence after passing through 9 ST-GCN units and then a global posing, and finally classifying by using a SoftMax function to obtain a final label;
each ST-GCN adopts a Resnet structure, 64 channels are output from the first three layers, 128 channels are output from the middle three layers, 256 channels are output from the last three layers, after the ST-CGN structure is passed by each time, the characteristic dropout is randomly set to be 2 according to the probability of 0.5, and the strides of the 4 th time domain convolution layer and the 7 th time domain convolution layer are set to be 2;
training with SGD, the learning rate is 0.01, and the learning rate is reduced by 10% every 20 epochs.
5. The rope skipping posture and number identification method based on the computer vision as claimed in claim 1, wherein: the method for normalizing the coordinates of the key points to-0.5 in the step S3 specifically comprises the following steps: processing the position (x 0, y 0) of each key point to obtain a new key point coordinate (x, y)
x=(x0/w)-0.5,y=(y0/h)-0.5
Where w is the image width and h is the image height.
CN202011352942.8A 2020-11-26 2020-11-26 Rope skipping gesture and number identification method based on computer vision Active CN112464808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011352942.8A CN112464808B (en) 2020-11-26 2020-11-26 Rope skipping gesture and number identification method based on computer vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011352942.8A CN112464808B (en) 2020-11-26 2020-11-26 Rope skipping gesture and number identification method based on computer vision

Publications (2)

Publication Number Publication Date
CN112464808A CN112464808A (en) 2021-03-09
CN112464808B true CN112464808B (en) 2022-12-16

Family

ID=74808944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011352942.8A Active CN112464808B (en) 2020-11-26 2020-11-26 Rope skipping gesture and number identification method based on computer vision

Country Status (1)

Country Link
CN (1) CN112464808B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113137983B (en) * 2021-04-30 2023-08-22 深圳市恒星物联科技有限公司 Self-learning well lid posture monitoring method and monitoring system
CN114764946B (en) * 2021-09-18 2023-08-11 北京甲板智慧科技有限公司 Action counting method and system based on time sequence standardization and intelligent terminal
CN113893517B (en) * 2021-11-22 2022-06-17 动者科技(杭州)有限责任公司 Rope skipping true and false judgment method and system based on difference frame method
CN114343618B (en) * 2021-12-20 2024-10-11 中科视语(北京)科技有限公司 Training action detection method and device
CN114360060B (en) * 2021-12-31 2024-04-09 北京航空航天大学杭州创新研究院 Human body action recognition and counting method
CN115205750B (en) * 2022-07-05 2023-06-13 北京甲板智慧科技有限公司 Motion real-time counting method and system based on deep learning model
CN115100745B (en) * 2022-07-05 2023-06-20 北京甲板智慧科技有限公司 Swin transducer model-based motion real-time counting method and system
CN115937989B (en) * 2023-01-19 2023-09-22 山东领峰教育科技集团有限公司 Online education intelligent analysis system and method based on scaling processing
CN117612245B (en) * 2023-09-26 2024-08-06 广州乐体科技有限公司 Automatic counting method for conventional rope skipping test
CN117079192B (en) * 2023-10-12 2024-01-02 东莞先知大数据有限公司 Method, device, equipment and medium for estimating number of rope skipping when personnel are shielded
CN117253290B (en) * 2023-10-13 2024-05-10 景色智慧(北京)信息科技有限公司 Rope skipping counting implementation method and device based on yolopose model and storage medium
CN117880588A (en) * 2023-11-27 2024-04-12 无锡伙伴智能科技有限公司 Video editing method, device, equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016042039A1 (en) * 2014-09-16 2016-03-24 Foundation For Research And Technology - Hellas (Forth) Gesture recognition apparatuses, methods and systems for human-machine interaction
CN109758716A (en) * 2019-03-26 2019-05-17 林叶蓁 A kind of rope skipping method of counting based on acoustic information
CN109876416A (en) * 2019-03-26 2019-06-14 浙江大学 A kind of rope skipping method of counting based on image information
CN110096950A (en) * 2019-03-20 2019-08-06 西北大学 A kind of multiple features fusion Activity recognition method based on key frame
CN110245593A (en) * 2019-06-03 2019-09-17 浙江理工大学 A kind of images of gestures extraction method of key frame based on image similarity
WO2020015076A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Facial image comparison method and apparatus, computer device, and storage medium
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN110991340A (en) * 2019-12-03 2020-04-10 郑州大学 Human body action analysis method based on image compression
WO2020134478A1 (en) * 2018-12-29 2020-07-02 北京灵汐科技有限公司 Face recognition method, feature extraction model training method and device thereof
CN111814719A (en) * 2020-07-17 2020-10-23 江南大学 Skeleton behavior identification method based on 3D space-time diagram convolution
CN111881731A (en) * 2020-05-19 2020-11-03 广东国链科技股份有限公司 Behavior recognition method, system, device and medium based on human skeleton
CN111985579A (en) * 2020-09-04 2020-11-24 王宗亚 Double-person diving synchronism analysis method based on camera cooperation and three-dimensional skeleton estimation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8929600B2 (en) * 2012-12-19 2015-01-06 Microsoft Corporation Action recognition based on depth maps
US9489570B2 (en) * 2013-12-31 2016-11-08 Konica Minolta Laboratory U.S.A., Inc. Method and system for emotion and behavior recognition

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016042039A1 (en) * 2014-09-16 2016-03-24 Foundation For Research And Technology - Hellas (Forth) Gesture recognition apparatuses, methods and systems for human-machine interaction
WO2020015076A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Facial image comparison method and apparatus, computer device, and storage medium
WO2020134478A1 (en) * 2018-12-29 2020-07-02 北京灵汐科技有限公司 Face recognition method, feature extraction model training method and device thereof
CN110096950A (en) * 2019-03-20 2019-08-06 西北大学 A kind of multiple features fusion Activity recognition method based on key frame
CN109758716A (en) * 2019-03-26 2019-05-17 林叶蓁 A kind of rope skipping method of counting based on acoustic information
CN109876416A (en) * 2019-03-26 2019-06-14 浙江大学 A kind of rope skipping method of counting based on image information
CN110245593A (en) * 2019-06-03 2019-09-17 浙江理工大学 A kind of images of gestures extraction method of key frame based on image similarity
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN110991340A (en) * 2019-12-03 2020-04-10 郑州大学 Human body action analysis method based on image compression
CN111881731A (en) * 2020-05-19 2020-11-03 广东国链科技股份有限公司 Behavior recognition method, system, device and medium based on human skeleton
CN111814719A (en) * 2020-07-17 2020-10-23 江南大学 Skeleton behavior identification method based on 3D space-time diagram convolution
CN111985579A (en) * 2020-09-04 2020-11-24 王宗亚 Double-person diving synchronism analysis method based on camera cooperation and three-dimensional skeleton estimation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Song S等.An end-to-end spatio-temporal attention model for human action recognition from skeleton data.《Proceedings of the Association for the Advance of Artificial Intelligence》.2017,第4263-4270页. *
融合双重时空网络流和attention机制的人体行为识别;谯庆伟;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180215(第2期);第I138-2110页 *

Also Published As

Publication number Publication date
CN112464808A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112464808B (en) Rope skipping gesture and number identification method based on computer vision
CN113496217B (en) Method for identifying human face micro expression in video image sequence
Zadeh et al. Convolutional experts constrained local model for 3d facial landmark detection
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
CN109389074B (en) Facial feature point extraction-based expression recognition method
CN109344692B (en) Motion quality evaluation method and system
CN102938070B (en) A kind of behavior recognition methods based on action subspace and weight behavior model of cognition
CN107944431A (en) A kind of intelligent identification Method based on motion change
CN111783748A (en) Face recognition method and device, electronic equipment and storage medium
CN109902565B (en) Multi-feature fusion human behavior recognition method
CN105389583A (en) Image classifier generation method, and image classification method and device
KR20190061538A (en) Method and apparatus of recognizing motion pattern base on combination of multi-model
Lima et al. Simple and efficient pose-based gait recognition method for challenging environments
CN115482580A (en) Multi-person evaluation system based on machine vision skeletal tracking technology
CN106709508A (en) Typical weight correlation analysis method utilizing characteristic information
CN111259759B (en) Cross-database micro-expression recognition method and device based on domain selection migration regression
Szankin et al. Influence of thermal imagery resolution on accuracy of deep learning based face recognition
CN106971176A (en) Tracking infrared human body target method based on rarefaction representation
CN104517123A (en) Sub-spatial clustering method guided by local motion feature similarity
Canavan et al. Fitting and tracking 3D/4D facial data using a temporal deformable shape model
Chen et al. Skeleton moving pose-based human fall detection with sparse coding and temporal pyramid pooling
Das et al. Human gait recognition using deep neural networks
CN116597507A (en) Human body action normalization evaluation method and system
CN116012942A (en) Sign language teaching method, device, equipment and storage medium
Hachaj et al. Application of hidden markov models and gesture description language classifiers to oyama karate techniques recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant