CN112883930A - Real-time true and false motion judgment method based on full-connection network - Google Patents

Real-time true and false motion judgment method based on full-connection network Download PDF

Info

Publication number
CN112883930A
CN112883930A CN202110335993.8A CN202110335993A CN112883930A CN 112883930 A CN112883930 A CN 112883930A CN 202110335993 A CN202110335993 A CN 202110335993A CN 112883930 A CN112883930 A CN 112883930A
Authority
CN
China
Prior art keywords
data
key point
model
human body
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110335993.8A
Other languages
Chinese (zh)
Inventor
吴友银
吕瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Movers Technology Hangzhou Co ltd
Original Assignee
Movers Technology Hangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Movers Technology Hangzhou Co ltd filed Critical Movers Technology Hangzhou Co ltd
Priority to CN202110335993.8A priority Critical patent/CN112883930A/en
Publication of CN112883930A publication Critical patent/CN112883930A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Psychiatry (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time true and false motion judgment method based on a full-connection network, which comprises a model training stage: acquiring a data set: inputting the motion video into a human body key point detection model in a single-frame image mode, and outputting key point data of a human body to form a data set sample; sampling in an oversampling mode by taking the current motion as a positive sample, and sampling in an undersampling mode by taking other motions as a negative sample to form a sampling sample; selecting a training set from the sampling samples, inputting the training set into a fully-connected neural network, and finally calculating and updating the Loss; further comprising an implementation judgment stage: and inputting the data to be detected as a model and outputting a judgment result. The invention has the beneficial effects that: the method is based on a human body key point detection model, utilizes the human body key point data to establish a model, and identifies the type of human body motion in the video and whether the motion is performed or not through the fitted model.

Description

Real-time true and false motion judgment method based on full-connection network
Technical Field
The invention relates to the technical field of data identification, in particular to a real-time true and false motion judgment method based on a full-connection network.
Background
With the emphasis of the nation and the society on the physique of primary and middle school students and the rapid development of artificial intelligence, it becomes necessary that the artificial intelligence enters the sports field. Whether the current motion is the approximate motion calculation method.
1. Traditional image difference frame method
And comparing the difference between the image transmitted by the camera and the image transmitted by the previous frame, wherein the difference is the moving part.
The disadvantages are as follows: the method has the advantages of high cost, poor performance, high requirement on environment and incapability of judging whether the sport is really done.
2. Deep learning classification (classification)
The stages of human body movement are classified through images transmitted by the camera, and whether the movement is calculated according to the cycle times of each stage.
Commonly used high accuracy models are VGG, MobileNet, ResNet, etc.
The disadvantages are as follows: high cost and poor performance.
3. Deep learning Semantic segmentation method (Semantic segmentation)
The image transmitted by the camera is classified into pixels belonging to the human body and pixels belonging to the background, and judgment is carried out according to the change of the human body pixels. Common accurate models are unet, depeplab, etc.
The disadvantages are as follows: the method has the advantages of high cost, poor performance, high requirement on environment and incapability of judging whether the sport is really done.
4. Deep learning Object detection method (Object detection)
And (4) framing the position of the person by the image transmitted by the camera, and judging the movement according to the change of the external frame. Common high performance models are SSD, YOLO, etc.
The disadvantages are as follows: it is not possible to judge whether the movement is actually being made.
Disclosure of Invention
The invention aims to provide a real-time true and false motion judgment method based on a fully-connected network
In order to achieve the purpose, the invention provides the following technical scheme:
the real-time true and false motion judgment method based on the full-connection network comprises a model training stage:
acquiring a data set: inputting the motion video into a human body key point detection model in a single-frame image mode, and outputting key point data of a human body to form a data set sample;
sampling in an oversampling mode by taking the current motion as a positive sample, and sampling in an undersampling mode by taking other motions as a negative sample to form a sampling sample;
selecting a training set from the sampling samples, inputting the training set into a fully-connected neural network, and finally calculating and updating the Loss;
further comprising an implementation judgment stage: and inputting the data to be detected as a model and outputting a judgment result.
Preferably, the data set output from the human body key point detection model is normalized, and the normalization result is the width of the X-axis/image of the key point and the height of the Y-axis/image of the key point.
Preferably, the training machine performs data enhancement processing before inputting the fully-connected neural network, and the data enhancement processing comprises data translation enhancement, data scaling enhancement and data left-right turning enhancement.
Preferably, a random 25% of all positive samples are used as the positive sample validation set, a random 25% of all negative samples are used as the negative sample validation set, and the rest are used as the training set.
Preferably, the calculation of Loss adopts a two-class cross entropy Loss function:
Figure BDA0002997675220000021
the update includes a back-propagation and gradient descent process for the fully connected network:
Figure BDA0002997675220000031
δl=((wl+1)Tδl+1)⊙σ′(zl) (BP2)
Figure BDA0002997675220000032
Figure BDA0002997675220000033
compared with the prior art, the invention has the beneficial effects that: the method is based on a human body key point detection model, utilizes the human body key point data to establish a model, and identifies the type of human body motion in the video and whether the motion is performed or not through the fitted model.
In addition, in the aspect of data set sampling, the positive samples adopt oversampling, and the negative samples adopt an undersampling mode, so that the problem of unbalanced data is solved, and the method is used in the field, and the accuracy is greatly improved.
Drawings
Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment is established on a human body key point detection model, and each frame image of a video is input into a human body key point model (such as posnet, openpos, and pospro Networks), so as to detect a human body key point, store the key point into numerical data, and perform motion truth and false judgment through the numerical data.
Specifically, the real-time true and false motion determination method based on the full-connection network performs the training of the determination model, and the output of the model (PoseNet, OpenPose, pospro Networks) is the data set source for model training. The model training comprises the following steps:
data set acquisition and processing
(1) And effectively intercepting and classifying the collected 20 types of single-person sports videos.
20 types of single sports:
1. rope skipping, 2 walking, 3 dancing, 4 high leg lifting, 5 wave ratio jumping, 6 pull-up, 7 flat plate support, 8 sit-up, 9 standing body forward bending, 10 sitting body forward bending, 11 open-close jumping, 12 nonstandard jumping rope, 13 nonstandard high leg lifting, 14 nonstandard wave ratio jumping, 15 nonstandard pull-up, 16 nonstandard flat plate support, 17 nonstandard sit-up, 18 nonstandard standing body forward bending, 19 nonstandard sitting body forward bending, 20 nonstandard open-close jumping.
The effective partial intercepting and classifying method is to delete the impurity part (not the motion) in the video and finally store different motion videos respectively.
(2) Reading each frame of the identified image, detecting a key point data set of the human body through the human body key point detection model, storing the key point data set into a text file, and scrambling;
the effect of scrambling the data set: for models that are sensitive to randomness, typically NN, it is important to scramble the data. For models that are less sensitive to randomness, it is theoretically possible not to disturb. Whether the sensitivity is sensitive or not depends on the data magnitude, the complexity and the internal calculation mechanism of the algorithm, and an algorithm randomness sensitivity list with clear longitude and latitude is not available at present. Since scrambling the data does not yield a worse result, it is generally recommended to scramble the full amount of data;
(3) data normalization
The result of normalization is the width of the x-axis/image of the keypoint, the height of the y-axis/image of the keypoint;
(4) keeping the text file of the current motion as a positive sample, and keeping the text files of other motions as negative samples;
(5) over-sampling is adopted for positive samples, and under-sampling is adopted for negative samples (solving the problem of unbalanced data sets).
Oversampling: it will increase whether the movement is by a small number of group members in the training set. The advantage of oversampling is that the information in the original training set is not preserved, since all observations of the few and most classes are preserved. On the other hand, it is prone to overfitting;
undersampling: in contrast to oversampling, the goal is to reduce whether the motion of most samples balances the class distribution. Useful information may be discarded because it is deleting observations from the original dataset.
Over-sampling is adopted for positive samples: all positive samples are used. The negative samples adopt undersampling: whether the motion is as much as the positive sample is randomly taken among all the negative samples.
It should be noted here that the above sampling manner is set to fully meet the purpose of the present invention, i.e. motion recognition; in motion recognition, motion is judged through a single picture, which is difficult to be very large, because the motion in each motion may have very large similarity; in the traditional sampling mode, no matter over-sampling or under-sampling, the pre-estimated accuracy is difficult to achieve; by sampling the sampling mode, scientific balance of the data training set is realized, and outstanding contribution is made to the accuracy of the final result.
(6) And (4) segmenting the data set.
And taking the random 25% data set of all positive samples as a positive sample verification set, taking the random 25% data set of all negative samples as a negative sample verification set, and taking the rest as a training set.
Training set: data samples for model fitting. And carrying out gradient reduction on the training error in the training process, and carrying out learning to obtain trainable weight parameters.
And (4) verification set: is a sample set left alone in the model training process, which can be used to adjust the hyper-parameters of the model and to make a preliminary assessment of the model's ability.
The verification set can be used in the training process, and generally, the effect is seen by running the verification set once after several epochs are finished during training. The first benefit of this is that problems with the model or parameters can be discovered in time, such as divergence of the model on the validation set, strange results (e.g. infinity), no or slow growth of the mAP, etc., and then training can be terminated in time, and the model can be reconciled or adjusted without waiting until the training is finished. Another benefit is the generalization ability of the validation model, which considers whether the model is over-fit if the effect on the validation set is much worse than on the training set. Meanwhile, different models can be compared through the verification set. In a general neural network, we use the verification data set to find the optimal network depth, or decide the stopping point of the back propagation algorithm or select whether the hidden layer neuron moves in the neural network.
2. Training of models
a. And taking out a data set in the disordered training set as network input.
b. Data Augmentation (Data Augmentation) is adopted for translation, scaling and left-right turning.
Image enhancement in computer vision is that artificial vision invariance (semantic invariance) introduces prior knowledge. Data enhancement is also essentially the simplest and straightforward way to improve model performance. Data enhancement may bring some Regularization (Regularization) effect, which may reduce the structural risk of the model. Data enhancement can improve the robustness of the model. Data enhancement in some way makes the model more focused on the general patterns of those data, while eliminating some data that is not relevant to the general patterns.
c. And (4) classifying the full connection.
The classification of all data sets provides a fully connected neural network for logistic regression. The final output excitation function of the logistic regression is the Sigmoid function. The Sigmoid function formula is defined as follows:
Figure BDA0002997675220000061
d. and calculating the difference (Loss) between the network output and the label, and performing gradient descent (gradient) weight updating on the network by Back Propagation.
The Binary Cross Entropy Loss Function (Binary Cross Engine Loss Function) is as follows:
Figure BDA0002997675220000062
the procedure of Back Propagation (Back Propagation) and gradient descent (gradient device) of a fully connected network is as follows:
Figure BDA0002997675220000063
δl=((wl+1)Tδl+1)⊙σ′(zl) (BP2)
Figure BDA0002997675220000064
Figure BDA0002997675220000065
after the model training is finished, the video or picture to be detected can be input into the model to obtain a recognition result.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (5)

1. The real-time true and false motion judgment method based on the full-connection network is characterized by comprising the following steps: the method comprises a model training stage:
acquiring a data set: inputting the motion video into a human body key point detection model in a single-frame image mode, and outputting key point data of a human body to form a data set sample;
sampling in an oversampling mode by taking the current motion as a positive sample, and sampling in an undersampling mode by taking other motions as a negative sample to form a sampling sample;
selecting a training set from the sampling samples, inputting the training set into a fully-connected neural network, and finally calculating and updating the Loss;
further comprising an implementation judgment stage: and inputting the data to be detected as a model and outputting a judgment result.
2. The method for real-time true and false motion determination based on the fully-connected network according to claim 1, wherein: and carrying out normalization processing on the data set output from the human body key point detection model, wherein the normalization result is the width of the X/image of the key point and the height of the Y axis/image of the key point.
3. The method for real-time true and false motion determination based on the fully-connected network according to claim 2, wherein: and the training machine performs data enhancement processing before inputting the data into the fully-connected neural network, wherein the data enhancement processing comprises data translation enhancement, data scaling enhancement and data left-right turning enhancement.
4. The method for real-time true and false motion determination based on the fully-connected network according to claim 3, wherein: all positive samples were taken as random 25% of the positive sample validation set, all negative samples were taken as random 25% of the negative sample validation set, and the rest were taken as training set.
5. The method for real-time true and false motion determination based on the fully-connected network according to claim 4, wherein: the calculation of Loss adopts a two-class cross entropy Loss function:
Figure FDA0002997675210000011
the update includes a back-propagation and gradient descent process for the fully connected network:
Figure FDA0002997675210000021
δl=((wl+1)Tδl+1)⊙σ′(zl) (BP2)
Figure FDA0002997675210000022
Figure FDA0002997675210000023
CN202110335993.8A 2021-03-29 2021-03-29 Real-time true and false motion judgment method based on full-connection network Pending CN112883930A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110335993.8A CN112883930A (en) 2021-03-29 2021-03-29 Real-time true and false motion judgment method based on full-connection network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110335993.8A CN112883930A (en) 2021-03-29 2021-03-29 Real-time true and false motion judgment method based on full-connection network

Publications (1)

Publication Number Publication Date
CN112883930A true CN112883930A (en) 2021-06-01

Family

ID=76039968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110335993.8A Pending CN112883930A (en) 2021-03-29 2021-03-29 Real-time true and false motion judgment method based on full-connection network

Country Status (1)

Country Link
CN (1) CN112883930A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870896A (en) * 2021-09-27 2021-12-31 动者科技(杭州)有限责任公司 Motion sound false judgment method and device based on time-frequency graph and convolutional neural network
CN113893517A (en) * 2021-11-22 2022-01-07 动者科技(杭州)有限责任公司 Rope skipping true and false judgment method and system based on difference frame method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN108629633A (en) * 2018-05-09 2018-10-09 浪潮软件股份有限公司 A kind of method and system for establishing user's portrait based on big data
CN110119703A (en) * 2019-05-07 2019-08-13 福州大学 The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene
CN112464844A (en) * 2020-12-07 2021-03-09 天津科技大学 Human behavior and action recognition method based on deep learning and moving target detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN108629633A (en) * 2018-05-09 2018-10-09 浪潮软件股份有限公司 A kind of method and system for establishing user's portrait based on big data
CN110119703A (en) * 2019-05-07 2019-08-13 福州大学 The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene
CN112464844A (en) * 2020-12-07 2021-03-09 天津科技大学 Human behavior and action recognition method based on deep learning and moving target detection

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870896A (en) * 2021-09-27 2021-12-31 动者科技(杭州)有限责任公司 Motion sound false judgment method and device based on time-frequency graph and convolutional neural network
CN113893517A (en) * 2021-11-22 2022-01-07 动者科技(杭州)有限责任公司 Rope skipping true and false judgment method and system based on difference frame method
CN113893517B (en) * 2021-11-22 2022-06-17 动者科技(杭州)有限责任公司 Rope skipping true and false judgment method and system based on difference frame method

Similar Documents

Publication Publication Date Title
CN104866810B (en) A kind of face identification method of depth convolutional neural networks
CN110309856A (en) Image classification method, the training method of neural network and device
CN111582397B (en) CNN-RNN image emotion analysis method based on attention mechanism
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
AU2017101803A4 (en) Deep learning based image classification of dangerous goods of gun type
CN109710804B (en) Teaching video image knowledge point dimension reduction analysis method
Vallet et al. A multi-label convolutional neural network for automatic image annotation
CN112883930A (en) Real-time true and false motion judgment method based on full-connection network
CN117557886A (en) Noise-containing tag image recognition method and system integrating bias tags and passive learning
Li et al. An improved lightweight network architecture for identifying tobacco leaf maturity based on Deep learning
CN112668486A (en) Method, device and carrier for identifying facial expressions of pre-activated residual depth separable convolutional network
An Pedestrian Re‐Recognition Algorithm Based on Optimization Deep Learning‐Sequence Memory Model
Pan et al. Hybrid dilated faster RCNN for object detection
Wang et al. Multi-scale feature pyramid and multi-branch neural network for person re-identification
Jin et al. VGG-S: Improved Small Sample Image Recognition Model Based on VGG16
CN112991281A (en) Visual detection method, system, electronic device and medium
CN113011436A (en) Traditional Chinese medicine tongue color and fur color collaborative classification method based on convolutional neural network
Abdelaziz et al. Few-shot learning with saliency maps as additional visual information
CN116935438A (en) Pedestrian image re-recognition method based on autonomous evolution of model structure
Wang et al. Image target recognition based on improved convolutional neural network
Wan et al. SGBGAN: minority class image generation for class-imbalanced datasets
CN111242114A (en) Character recognition method and device
CN115439791A (en) Cross-domain video action recognition method, device, equipment and computer-readable storage medium
CN116958615A (en) Picture identification method, device, equipment and medium
Sultana et al. A Deep CNN based Kaggle Contest Winning Model to Recognize Real-Time Facial Expression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination