CN112883930A

CN112883930A - Real-time true and false motion judgment method based on full-connection network

Info

Publication number: CN112883930A
Application number: CN202110335993.8A
Authority: CN
Inventors: 吴友银; 吕瑞
Original assignee: Movers Technology Hangzhou Co ltd
Current assignee: Movers Technology Hangzhou Co ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2021-06-01

Abstract

The invention discloses a real-time true and false motion judgment method based on a full-connection network, which comprises a model training stage: acquiring a data set: inputting the motion video into a human body key point detection model in a single-frame image mode, and outputting key point data of a human body to form a data set sample; sampling in an oversampling mode by taking the current motion as a positive sample, and sampling in an undersampling mode by taking other motions as a negative sample to form a sampling sample; selecting a training set from the sampling samples, inputting the training set into a fully-connected neural network, and finally calculating and updating the Loss; further comprising an implementation judgment stage: and inputting the data to be detected as a model and outputting a judgment result. The invention has the beneficial effects that: the method is based on a human body key point detection model, utilizes the human body key point data to establish a model, and identifies the type of human body motion in the video and whether the motion is performed or not through the fitted model.

Description

Real-time true and false motion judgment method based on full-connection network

Technical Field

The invention relates to the technical field of data identification, in particular to a real-time true and false motion judgment method based on a full-connection network.

Background

With the emphasis of the nation and the society on the physique of primary and middle school students and the rapid development of artificial intelligence, it becomes necessary that the artificial intelligence enters the sports field. Whether the current motion is the approximate motion calculation method.

1. Traditional image difference frame method

And comparing the difference between the image transmitted by the camera and the image transmitted by the previous frame, wherein the difference is the moving part.

The disadvantages are as follows: the method has the advantages of high cost, poor performance, high requirement on environment and incapability of judging whether the sport is really done.

2. Deep learning classification (classification)

The stages of human body movement are classified through images transmitted by the camera, and whether the movement is calculated according to the cycle times of each stage.

Commonly used high accuracy models are VGG, MobileNet, ResNet, etc.

The disadvantages are as follows: high cost and poor performance.

3. Deep learning Semantic segmentation method (Semantic segmentation)

The image transmitted by the camera is classified into pixels belonging to the human body and pixels belonging to the background, and judgment is carried out according to the change of the human body pixels. Common accurate models are unet, depeplab, etc.

4. Deep learning Object detection method (Object detection)

And (4) framing the position of the person by the image transmitted by the camera, and judging the movement according to the change of the external frame. Common high performance models are SSD, YOLO, etc.

The disadvantages are as follows: it is not possible to judge whether the movement is actually being made.

Disclosure of Invention

The invention aims to provide a real-time true and false motion judgment method based on a fully-connected network

In order to achieve the purpose, the invention provides the following technical scheme:

the real-time true and false motion judgment method based on the full-connection network comprises a model training stage:

acquiring a data set: inputting the motion video into a human body key point detection model in a single-frame image mode, and outputting key point data of a human body to form a data set sample;

sampling in an oversampling mode by taking the current motion as a positive sample, and sampling in an undersampling mode by taking other motions as a negative sample to form a sampling sample;

selecting a training set from the sampling samples, inputting the training set into a fully-connected neural network, and finally calculating and updating the Loss;

further comprising an implementation judgment stage: and inputting the data to be detected as a model and outputting a judgment result.

Preferably, the data set output from the human body key point detection model is normalized, and the normalization result is the width of the X-axis/image of the key point and the height of the Y-axis/image of the key point.

Preferably, the training machine performs data enhancement processing before inputting the fully-connected neural network, and the data enhancement processing comprises data translation enhancement, data scaling enhancement and data left-right turning enhancement.

Preferably, a random 25% of all positive samples are used as the positive sample validation set, a random 25% of all negative samples are used as the negative sample validation set, and the rest are used as the training set.

Preferably, the calculation of Loss adopts a two-class cross entropy Loss function:

the update includes a back-propagation and gradient descent process for the fully connected network:

δ^l＝((w^l+1)^Tδ^l+1)⊙σ′(z^l) (BP2)

compared with the prior art, the invention has the beneficial effects that: the method is based on a human body key point detection model, utilizes the human body key point data to establish a model, and identifies the type of human body motion in the video and whether the motion is performed or not through the fitted model.

In addition, in the aspect of data set sampling, the positive samples adopt oversampling, and the negative samples adopt an undersampling mode, so that the problem of unbalanced data is solved, and the method is used in the field, and the accuracy is greatly improved.

Drawings

Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment is established on a human body key point detection model, and each frame image of a video is input into a human body key point model (such as posnet, openpos, and pospro Networks), so as to detect a human body key point, store the key point into numerical data, and perform motion truth and false judgment through the numerical data.

Specifically, the real-time true and false motion determination method based on the full-connection network performs the training of the determination model, and the output of the model (PoseNet, OpenPose, pospro Networks) is the data set source for model training. The model training comprises the following steps:

data set acquisition and processing

(1) And effectively intercepting and classifying the collected 20 types of single-person sports videos.

20 types of single sports:

1. rope skipping, 2 walking, 3 dancing, 4 high leg lifting, 5 wave ratio jumping, 6 pull-up, 7 flat plate support, 8 sit-up, 9 standing body forward bending, 10 sitting body forward bending, 11 open-close jumping, 12 nonstandard jumping rope, 13 nonstandard high leg lifting, 14 nonstandard wave ratio jumping, 15 nonstandard pull-up, 16 nonstandard flat plate support, 17 nonstandard sit-up, 18 nonstandard standing body forward bending, 19 nonstandard sitting body forward bending, 20 nonstandard open-close jumping.

The effective partial intercepting and classifying method is to delete the impurity part (not the motion) in the video and finally store different motion videos respectively.

(2) Reading each frame of the identified image, detecting a key point data set of the human body through the human body key point detection model, storing the key point data set into a text file, and scrambling;

the effect of scrambling the data set: for models that are sensitive to randomness, typically NN, it is important to scramble the data. For models that are less sensitive to randomness, it is theoretically possible not to disturb. Whether the sensitivity is sensitive or not depends on the data magnitude, the complexity and the internal calculation mechanism of the algorithm, and an algorithm randomness sensitivity list with clear longitude and latitude is not available at present. Since scrambling the data does not yield a worse result, it is generally recommended to scramble the full amount of data;

(3) data normalization

The result of normalization is the width of the x-axis/image of the keypoint, the height of the y-axis/image of the keypoint;

(4) keeping the text file of the current motion as a positive sample, and keeping the text files of other motions as negative samples;

(5) over-sampling is adopted for positive samples, and under-sampling is adopted for negative samples (solving the problem of unbalanced data sets).

Oversampling: it will increase whether the movement is by a small number of group members in the training set. The advantage of oversampling is that the information in the original training set is not preserved, since all observations of the few and most classes are preserved. On the other hand, it is prone to overfitting;

undersampling: in contrast to oversampling, the goal is to reduce whether the motion of most samples balances the class distribution. Useful information may be discarded because it is deleting observations from the original dataset.

Over-sampling is adopted for positive samples: all positive samples are used. The negative samples adopt undersampling: whether the motion is as much as the positive sample is randomly taken among all the negative samples.

It should be noted here that the above sampling manner is set to fully meet the purpose of the present invention, i.e. motion recognition; in motion recognition, motion is judged through a single picture, which is difficult to be very large, because the motion in each motion may have very large similarity; in the traditional sampling mode, no matter over-sampling or under-sampling, the pre-estimated accuracy is difficult to achieve; by sampling the sampling mode, scientific balance of the data training set is realized, and outstanding contribution is made to the accuracy of the final result.

(6) And (4) segmenting the data set.

And taking the random 25% data set of all positive samples as a positive sample verification set, taking the random 25% data set of all negative samples as a negative sample verification set, and taking the rest as a training set.

Training set: data samples for model fitting. And carrying out gradient reduction on the training error in the training process, and carrying out learning to obtain trainable weight parameters.

And (4) verification set: is a sample set left alone in the model training process, which can be used to adjust the hyper-parameters of the model and to make a preliminary assessment of the model's ability.

The verification set can be used in the training process, and generally, the effect is seen by running the verification set once after several epochs are finished during training. The first benefit of this is that problems with the model or parameters can be discovered in time, such as divergence of the model on the validation set, strange results (e.g. infinity), no or slow growth of the mAP, etc., and then training can be terminated in time, and the model can be reconciled or adjusted without waiting until the training is finished. Another benefit is the generalization ability of the validation model, which considers whether the model is over-fit if the effect on the validation set is much worse than on the training set. Meanwhile, different models can be compared through the verification set. In a general neural network, we use the verification data set to find the optimal network depth, or decide the stopping point of the back propagation algorithm or select whether the hidden layer neuron moves in the neural network.

2. Training of models

a. And taking out a data set in the disordered training set as network input.

b. Data Augmentation (Data Augmentation) is adopted for translation, scaling and left-right turning.

Image enhancement in computer vision is that artificial vision invariance (semantic invariance) introduces prior knowledge. Data enhancement is also essentially the simplest and straightforward way to improve model performance. Data enhancement may bring some Regularization (Regularization) effect, which may reduce the structural risk of the model. Data enhancement can improve the robustness of the model. Data enhancement in some way makes the model more focused on the general patterns of those data, while eliminating some data that is not relevant to the general patterns.

c. And (4) classifying the full connection.

The classification of all data sets provides a fully connected neural network for logistic regression. The final output excitation function of the logistic regression is the Sigmoid function. The Sigmoid function formula is defined as follows:

d. and calculating the difference (Loss) between the network output and the label, and performing gradient descent (gradient) weight updating on the network by Back Propagation.

The Binary Cross Entropy Loss Function (Binary Cross Engine Loss Function) is as follows:

the procedure of Back Propagation (Back Propagation) and gradient descent (gradient device) of a fully connected network is as follows:

δ^l＝((w^l+1)^Tδ^l+1)⊙σ′(z^l) (BP2)

after the model training is finished, the video or picture to be detected can be input into the model to obtain a recognition result.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. The real-time true and false motion judgment method based on the full-connection network is characterized by comprising the following steps: the method comprises a model training stage:

2. The method for real-time true and false motion determination based on the fully-connected network according to claim 1, wherein: and carrying out normalization processing on the data set output from the human body key point detection model, wherein the normalization result is the width of the X/image of the key point and the height of the Y axis/image of the key point.

3. The method for real-time true and false motion determination based on the fully-connected network according to claim 2, wherein: and the training machine performs data enhancement processing before inputting the data into the fully-connected neural network, wherein the data enhancement processing comprises data translation enhancement, data scaling enhancement and data left-right turning enhancement.

4. The method for real-time true and false motion determination based on the fully-connected network according to claim 3, wherein: all positive samples were taken as random 25% of the positive sample validation set, all negative samples were taken as random 25% of the negative sample validation set, and the rest were taken as training set.

5. The method for real-time true and false motion determination based on the fully-connected network according to claim 4, wherein: the calculation of Loss adopts a two-class cross entropy Loss function:

δ^l＝((w^l+1)^Tδ^l+1)⊙σ′(z^l) (BP2)