CN114821819A - Real-time monitoring method for body-building action and artificial intelligence recognition system - Google Patents

Real-time monitoring method for body-building action and artificial intelligence recognition system Download PDF

Info

Publication number
CN114821819A
CN114821819A CN202210757131.9A CN202210757131A CN114821819A CN 114821819 A CN114821819 A CN 114821819A CN 202210757131 A CN202210757131 A CN 202210757131A CN 114821819 A CN114821819 A CN 114821819A
Authority
CN
China
Prior art keywords
key point
dimensional
coordinate
vector
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210757131.9A
Other languages
Chinese (zh)
Other versions
CN114821819B (en
Inventor
张桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong Tongxing Fitness Equipment Co ltd
Original Assignee
Nantong Tongxing Fitness Equipment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Tongxing Fitness Equipment Co ltd filed Critical Nantong Tongxing Fitness Equipment Co ltd
Priority to CN202210757131.9A priority Critical patent/CN114821819B/en
Publication of CN114821819A publication Critical patent/CN114821819A/en
Application granted granted Critical
Publication of CN114821819B publication Critical patent/CN114821819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image recognition, in particular to a real-time body-building action monitoring method and an artificial intelligence recognition system. The method comprises the following steps: acquiring an image and a human body key point category marking image; segmenting the human body part of the image; acquiring the average area of the human body part segmentation corresponding to each human body key point category, and acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point according to the average area; converting the category labels of the key points of the human body into coordinate one-dimensional distribution vectors, and acquiring label value distribution vectors; training a neural network according to the image, the one-dimensional distribution vector of the coordinates and the distribution vector of the label values; and identifying the body-building action state of the personnel according to the human body posture information of each personnel. The method obtains the human body posture labels with different distribution sizes and standard deviations based on the area of the human body part in the image, reduces the ambiguity of artificial labeling, realizes human body posture image recognition by utilizing the one-dimensional vector, saves the computing resource and has high detection precision.

Description

Real-time monitoring method for body-building action and artificial intelligence recognition system
Technical Field
The invention relates to the field of image recognition, in particular to a real-time body-building action monitoring method and an artificial intelligence recognition system.
Background
In recent years, with the increasing working intensity and the increasing living pressure of people, the physical health of people is facing a plurality of challenges. In this context, healthy life becomes a topic of interest, and more people choose to strengthen physical fitness by exercising. On the other hand, the rapid development of electronic technology has also led to an increasing number of people using MEMS-based inertial sensors to monitor people's movement. The mode can better assist physical exercise and has important significance for promoting physical health.
In the aspect of application scenes, human body activity recognition is mainly used for recognizing the states of standing, lying, sitting, riding and the like of a human body, so that the method has great limitation.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a real-time monitoring method for body-building actions and an artificial intelligence recognition system, wherein the adopted technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for monitoring fitness activities in real time, the method including the following steps: a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image; segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category; acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; obtaining the label value of each person in the image according to the positions of the key points in the human body key point category label image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image; establishing a first neural network, and training the first neural network according to the RGB image and the label data; obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data; and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.
Further, the calculation method of the gaussian distribution size of each key point is as follows:
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
showing the distribution size of the ith key point in the ith category key point,
Figure DEST_PATH_IMAGE006
the average area of the human body part segmentation corresponding to the key point of the first category is shown,
Figure DEST_PATH_IMAGE008
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure DEST_PATH_IMAGE010
in order to be a function of rounding up,
Figure DEST_PATH_IMAGE012
indicating the baseline gaussian distribution size.
Further, the method for calculating the standard deviation of the gaussian distribution of each key point comprises:
Figure DEST_PATH_IMAGE014
Figure DEST_PATH_IMAGE016
representing the standard deviation of the gaussian distribution of the ith keypoint in the ith category of keypoints,
Figure 521374DEST_PATH_IMAGE006
the average area of the human body part segmentation corresponding to the key point of the first category is shown,
Figure 359886DEST_PATH_IMAGE008
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure 670781DEST_PATH_IMAGE010
in order to be a function of rounding up,
Figure DEST_PATH_IMAGE018
the standard deviation of the baseline gaussian distribution is indicated.
Further, the method for obtaining the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector comprises the following steps: firstly, generating an x coordinate one-dimensional vector with the width of w and a y coordinate one-dimensional vector with the height of h; for each one-dimensional vector, substituting the corresponding normalized coordinates of the key points into a one-dimensional Gaussian distribution function, and simultaneously substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point; the normalized coordinate of each key point is 0, the neighborhood of each key point is obtained according to the distribution size of the key point, and the normalized coordinate of the neighborhood is a coordinate with the key point as the origin; finally, substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size; each key point has a corresponding distribution size and a one-dimensional Gaussian distribution standard deviation; and then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to obtain an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector of each image.
Further, the method for calculating the tag value of each person includes:
Figure DEST_PATH_IMAGE020
n denotes the nth person in the image, K denotes the number of keypoint categories,
Figure DEST_PATH_IMAGE022
x, Y coordinates representing the ith keypoint of the nth person, w, h are the width and height of the image, respectively, and the label value contains two parts, the front is the x label value and the back is the y label value.
Further, the method for obtaining the x tag value distribution vector and the y tag value distribution vector comprises the following steps: firstly, acquiring one-dimensional Gaussian distribution probability values of all key points of a person, summing corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points, averaging to obtain average one-dimensional Gaussian distribution probability, and multiplying the average one-dimensional Gaussian distribution probability by a label value to obtain a label distribution vector; because the coordinate vector is expressed as x and y one-dimensional vectors, the x label value distribution vector and the y label value distribution vector are also available; all category keypoint label values for each person are the same value.
Further, the structure and loss function of the first neural network is: the first neural network is structurally composed of an image encoder and a full-connection network; the input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, the feature map is subjected to flattening operation to obtain a feature vector, the feature vector is input into a full-connection network for fitting, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key point are output; the loss function is as follows:
Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE026
respectively representing an x coordinate one-dimensional distribution vector label and an x coordinate one-dimensional distribution vector label of network prediction;
Figure DEST_PATH_IMAGE028
respectively representing a one-dimensional distribution vector label of a y coordinate and a one-dimensional distribution vector label of the y coordinate of network prediction;
Figure DEST_PATH_IMAGE030
respectively representing an x label value distribution vector and an x label value distribution vector of network prediction;
Figure DEST_PATH_IMAGE032
respectively representing the y label value distribution vector label and the y label value distribution vector of the network prediction.
Further, the method for acquiring the human body posture information of each person in the image according to the predicted tag data comprises the following steps: acquiring suspected key point x and y coordinates in an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector predicted by a first neural network by using a threshold method; the method comprises the following steps of obtaining the association degree of the x coordinate of the suspected key point and the y coordinate of the suspected key point, wherein the calculation method of the association degree of the coordinates comprises the following steps: calculating the cosine similarity of the one-dimensional Gaussian distribution sequence with the corresponding distribution length by taking the x and y coordinates as a center; and then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain matched x and y coordinates, and the matched coordinates are the coordinates of the key points of the personnel in the image.
Further, the second nerveThe structure and the training method of the network are as follows: the second neural network is a time convolution network and comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information, inputting the time sequence attitude information of each person and outputting the time sequence attitude information as a feature vector; the full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted
Figure DEST_PATH_IMAGE034
(ii) a The label data of the network is marked artificially, and the identification and marking of the body-building action state comprises the following steps: the current state is in a fitness action state and the current state is in a non-fitness action state; and optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.
In a second aspect, another embodiment of the present invention provides a system for identifying exercise activity artificial intelligence, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the steps of the method according to any one of the above methods.
The invention has the following beneficial effects:
the human body posture labels with different distribution sizes and standard deviations are obtained based on the area of the human body part in the image, the ambiguity of the artificial labels is reduced, meanwhile, the human body posture estimation is realized by utilizing the one-dimensional vector, compared with the process of subtracting a decoder in the prior art, the calculation resource is saved, and meanwhile, the problem of false detection caused by the overlapped 2D heat map is also reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a method for monitoring a fitness activity in real time according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of gaussian distributions with different standard deviations in a method for real-time monitoring of exercise activities according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the method for monitoring exercise actions in real time according to the present invention, the specific implementation, structure, features and effects thereof will be provided in conjunction with the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the real-time monitoring method for body-building actions provided by the invention in detail with reference to the accompanying drawings.
Referring to fig. 1, a method for monitoring a fitness activity in real time according to the present invention is shown, wherein the method comprises the following steps:
step S001: a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image;
firstly, a monitoring camera is deployed in a monitoring area, the monitoring camera can acquire images of the area in real time by using a common camera with a resolution of 1080P, and the acquired images are images of an RGB color space. And then detecting key point information of the human body by utilizing a human body posture estimation technology in real time on the RGB image.
The deep learning method is preferred for the posture estimation technology, the precision is high, and in the human body posture estimation technology based on deep learning, the two-dimensional heat map representation always dominates the human body posture estimation for many years due to the high performance of the two-dimensional heat map representation. However, the heatmap-based approach has some drawbacks: quantization errors exist, limited by the spatial resolution of the heatmap. Larger heatmaps require additional up-sampling operations and high resolution expensive processing. Overlapping heat map signals may be mistaken for a single keypoint.
The invention can realize the multi-human body posture estimation by adopting the neural network with the common image classification network structure.
Firstly, labeling an image, labeling a human skeleton key point in the image, wherein the labeling is to label an image coordinate, namely, the image coordinate is allowed to have a corresponding relation with the human skeleton key point. The labeled human skeletal key points can be referred to the COCO human posture estimation data set.
And forming a data set by the RGB image and the human body key point category labeling image.
Step S002: segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category;
obtaining a coordinate vector:
the heat map based approach also has the following problems on the annotation: typically the standard deviation is fixed, meaning that different keypoints are supervised by the same constructed heatmap. Semantic confusion may result from different coverage areas of the same keypoint, with inherent ambiguity in the keypoint coordinates. Artificial scale differences and annotation ambiguities exist.
The present invention separates the x-coordinate and y-coordinate representations of the keypoints into two independent one-dimensional vectors. The keypoint localization task is considered as two subtasks of horizontal and vertical regression.
Firstly, an example of a human body in an image is obtained by using an example segmentation algorithm, wherein the example segmentation algorithm preferably adopts an example segmentation algorithm based on deep learning, and the method adopts a segmentMultiPersonParts method in a BodyPix 2.0 model, so that an example segmentation result of each person in the image and a human body part segmentation result of each person can be obtained, and the total comprises 24 body part segmentation results.
For each body part, one key point or a plurality of key point positions should be used, for example, the left-hand part corresponds to the left-hand key point. And then acquiring the human body part segmentation area corresponding to each key point. Then, the human body part segmentation area corresponding to each key point in all the labeled data sets is counted. For the area of the part segmentation, the larger the area without considering the influence of the human body pose generally means that the closer the optical center of the camera is to the acquired image, the more prominent the area is in the image.
Further, the average area of the human body part segmentation corresponding to each key point in all the labeled data sets is obtained
Figure DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE038
Where T is the number of the keypoint category in all labeled data sets,
Figure DEST_PATH_IMAGE040
and the area of the human body part segmentation corresponding to the jth key point category is shown.
Then, an empirical distribution size B is set, which determines the size of the one-dimensional gaussian distribution in the coordinate vector, with an empirical value of 5.
Furthermore, the average area of the human body part segmentation corresponding to each key point is based on
Figure 982769DEST_PATH_IMAGE036
Obtaining the distribution size:
Figure DEST_PATH_IMAGE002A
Figure 515251DEST_PATH_IMAGE004
showing the distribution size of the ith key point in the ith category key point,
Figure 744238DEST_PATH_IMAGE006
the average area of the human body part segmentation corresponding to the first category key point is shown,
Figure 542430DEST_PATH_IMAGE008
and the area of the human body part segmentation corresponding to the ith key point in the ith category key point is shown. By this formula, the smaller the area of the corresponding portion of the keypoint is, the smaller the distribution thereof is, because the smaller the area is, the smaller it is generally in the image, and the more easily there is an overlap of the distribution region of the keypoint. The larger the area, the larger its distribution, because the larger its label is more ambiguous. I.e. whether the point represents a bone keypoint or not. The following constraints then exist for the distribution size: the minimum distribution size is 3 and the maximum distribution size is 9.
Figure 397122DEST_PATH_IMAGE010
Is an rounding-up function.
For a gaussian distribution, when the mean value is constant, the standard deviation of the gaussian distribution is different, resulting in different distributions, so the standard deviation of the gaussian distribution can be used to represent the uncertainty of the label, and the following formula is also provided:
Figure DEST_PATH_IMAGE014A
where C represents the standard deviation of the reference and the empirical value is 1. The larger the area, the larger the standard deviation, and the closer the values of the distributions. The following constraints then exist for the standard deviation size: the minimum standard deviation was 0.5 and the maximum standard deviation was 2.5.
Step S003: acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; labeling the positions of key points in the image according to the categories of the key points of the human body to obtain the label value of each person in the image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image;
further, an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector are generated, and the method comprises the following steps:
for neural network reasoning, the input is an image of fixed size, where an x-coordinate one-dimensional vector of width w and a y-coordinate one-dimensional vector of height h are first generated.
And for each one-dimensional vector, substituting the normalized coordinates corresponding to the key points into a one-dimensional Gaussian distribution function, and substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point. The function of the one-dimensional gaussian distribution is:
Figure DEST_PATH_IMAGE042
Figure DEST_PATH_IMAGE044
is taken as the mean value of the average value,
Figure DEST_PATH_IMAGE046
is the standard deviation, e is the natural base number,
Figure DEST_PATH_IMAGE048
is a mathematical circumference ratio.
The normalized coordinate of each key point is 0, and the normalized coordinate of the neighborhood is the coordinate with the key point as the origin. And finally substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size. And each key point has the corresponding distribution size and one-dimensional Gaussian distribution standard deviation. For example, if a key point has a distribution size of 3 and a standard deviation of 0.5, taking the x coordinate as an example, the normalized coordinate of the key point and the normalized coordinates of the neighborhood (the distribution size is 3, i.e., the normalized coordinates of the left and right neighborhoods of the key point are-1 and 1, respectively) are substituted into a one-dimensional gaussian distribution function with a standard deviation of 0.5 to obtain corresponding values, and then normalization is performed. The normalization is that the probability value of the coordinate position of the key point is 1.
And then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to finally obtain the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of each image.
Obtaining a label value distribution vector:
because different personnel instances in the image are to be distinguished, label values are introduced, namely the label values of all key points of the same personnel are similar, and the label values of different personnel are different, and the calculation method of the label values of the personnel is as follows:
Figure DEST_PATH_IMAGE020A
n denotes the nth person in the image, K denotes the number of keypoint categories,
Figure 784110DEST_PATH_IMAGE022
x, Y coordinates representing the ith keypoint of the nth person, respectively. w and h are the width and height of the image respectively. The tag value contains two parts, the front being the x tag value and the back being the y tag value.
Then, the label value distribution vector of all key points of the person is obtained, and the method comprises the following steps:
firstly, one-dimensional Gaussian distribution vectors of all key points of the person are obtained, then corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points are summed (the one-dimensional Gaussian distribution probability of each key point can be regarded as a sequence, and then the corresponding positions of the one-dimensional Gaussian distribution probabilities of all the key points are added), then averaging is carried out, the average one-dimensional Gaussian distribution probability is obtained, and the average one-dimensional Gaussian distribution probability is multiplied by a label value, so that the label distribution vector is obtained.
Since the coordinate vectors are expressed as x and y one-dimensional vectors, there are x-label value distribution vectors and y-label value distribution vectors. All key point tag values for each person are the same value.
So far, for each image, an x-coordinate one-dimensional distribution vector, a y-coordinate one-dimensional distribution vector, an x-label value distribution vector and a y-label value distribution vector of each category key point can be obtained. The vectors are used as label data of a network, and human body posture information of each person in the image can be obtained based on the four vectors.
Step S004: establishing a first neural network, and training the first neural network according to the RGB image and the label data;
further, a first neural network is trained, and the structure of the first neural network is two parts, namely an image encoder and a fully-connected network.
The input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, flattening (Flatten) operation is carried out on the feature map to obtain a feature vector, the feature vector is input into a full-connection network to be fitted, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key points are output. The network can adopt common network models such as ResNet and HRNet.
The loss function is as follows:
Figure DEST_PATH_IMAGE024A
Figure 370949DEST_PATH_IMAGE026
respectively representing an x coordinate one-dimensional distribution vector label and an x coordinate one-dimensional distribution vector label of network prediction;
Figure 656437DEST_PATH_IMAGE028
respectively representing a one-dimensional distribution vector label of a y coordinate and a one-dimensional distribution vector label of the y coordinate of network prediction;
Figure 49241DEST_PATH_IMAGE030
respectively representing an x label value distribution vector and an x label value distribution vector of network prediction;
Figure 244730DEST_PATH_IMAGE032
respectively representing the y label value distribution vector label and the y label value distribution vector of the network prediction.
Adam, SGD and the like can be adopted as the optimization method of the network, and an implementer can freely adopt the optimization method.
Step S005: obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data;
so far, the training of the first neural network is completed, and then the extraction and matching of the human body posture are performed based on the output of the first neural network, and the post-processing method comprises the following steps:
and acquiring the x and y coordinates of the suspected key points in the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector by using a threshold method. The empirical threshold value in the thresholding method is 0.85.
Because the coordinate vectors are used for marking the multi-human body gestures to have ambiguity, the x coordinate and the y coordinate of the key point are expressed and separated into two independent one-dimensional vectors, and the obtained x coordinate may correspond to a plurality of y coordinates, the association degree of the coordinates needs to be used, and the association degree of the coordinates is calculated as the cosine similarity of a one-dimensional Gaussian distribution sequence with the coordinates as the center and the distribution length of 5. The cosine similarity between the one-dimensional Gaussian distribution sequence of the x coordinate of each suspected key point and the one-dimensional Gaussian distribution sequence of the y coordinate of each suspected key point is obtained, and the larger the value is, the more the two values are matched.
And then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain the matched x and y coordinates, namely the coordinates of the key points of the personnel in the image.
The coordinate vectors of all the key points are processed to obtain key point information of all the people in the image, then label values and information (namely the sum of the x label values and the y label values) in corresponding positions of the key points are obtained based on the x label value distribution vector and the y label value distribution vector, grouping is carried out based on the label value sum, then the label values of the key points and a group of key points which are closest to the label values of the key points are obtained, and the key points in the group are human body posture information of one person in the image. One method of determining the keypoint label value and the closest set of keypoints is as follows: and acquiring a head key point, taking the head key point as an example, acquiring the label value sum of the head key point, and acquiring the key point with the label value and the minimum variance of each other type of key point. The key points are grouped into a group.
Therefore, the two-dimensional human body posture information of each person in the image can be obtained through the method.
Step S006: and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.
The neural network is utilized to acquire the posture information of each person in the image in real time, then the time convolution network is adopted to carry out body-building action state recognition on the human posture information of the time sequence, and the training details are as follows:
the time convolution network comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information. It inputs the time sequence attitude information of each person and outputs as a feature vector. The full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted
Figure 205733DEST_PATH_IMAGE034
. The label data of the network is artificially marked and is input into the network to be subjected to unique hot coding. And optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.
The body-building action state identification comprises the following steps: the current state is in the body-building action state and the current state is in the non-body-building action state. And then adjusting the data acquisition based on the MEMS inertial sensor based on different classification results, wherein the data acquisition function based on the MEMS inertial sensor is closed in the non-fitness action state, and the data acquisition function based on the MEMS inertial sensor is opened in the fitness action state.
Therefore, the body-building action state of the person can be identified through the neural network.
Based on the same concept as the method embodiment, another embodiment of the present invention further provides a system for identifying artificial intelligence of a fitness activity, where the system includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the steps of the method for monitoring the fitness activity in real time provided in any of the embodiments when executing the computer program, where the method for monitoring the fitness activity in real time is described in detail in the embodiments and is not repeated.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A real-time monitoring method for fitness actions is characterized by comprising the following steps:
a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image;
segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category;
acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; obtaining the label value of each person in the image according to the positions of the key points in the human body key point category label image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image;
establishing a first neural network, and training the first neural network according to the RGB image and the label data;
obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data;
and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.
2. A method for real-time monitoring of fitness activities according to claim 1, wherein the gaussian distribution of each key point is calculated by:
Figure 445367DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
showing the distribution size of the ith key point in the ith category key point,
Figure 564501DEST_PATH_IMAGE004
the average area of the human body part segmentation corresponding to the key point of the first category is shown,
Figure DEST_PATH_IMAGE005
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure 846578DEST_PATH_IMAGE006
in order to be a function of rounding up,
Figure DEST_PATH_IMAGE007
indicating the baseline gaussian distribution size.
3. A method for real-time monitoring of fitness activities according to claim 1, wherein the gaussian distribution standard deviation of each key point is calculated as:
Figure DEST_PATH_IMAGE009
Figure 203610DEST_PATH_IMAGE010
representing the standard deviation of the gaussian distribution of the ith keypoint in the ith category of keypoints,
Figure 631049DEST_PATH_IMAGE004
the average area of the human body part segmentation corresponding to the key point of the first category is shown,
Figure 112846DEST_PATH_IMAGE005
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure 401876DEST_PATH_IMAGE006
in order to be a function of rounding up,
Figure DEST_PATH_IMAGE011
the standard deviation of the baseline gaussian distribution is indicated.
4. A real-time monitoring method for body-building actions according to claim 1, wherein the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector are obtained by:
firstly, generating an x coordinate one-dimensional vector with the width of w and a y coordinate one-dimensional vector with the height of h;
for each one-dimensional vector, substituting the normalized coordinates corresponding to the key points into a one-dimensional Gaussian distribution function, and simultaneously substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point;
the normalized coordinate of each key point is 0, and a neighborhood of each key point is obtained according to the distribution size of the key point, wherein the normalized coordinate of the neighborhood is a coordinate with the key point as an origin; finally, substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size; each key point has a corresponding distribution size and a one-dimensional Gaussian distribution standard deviation;
and then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to obtain an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector of each image.
5. A method for real-time monitoring of fitness activity according to claim 1, wherein the tag value of each person is calculated by:
Figure DEST_PATH_IMAGE013
n denotes the nth person in the image, K denotes the number of keypoint categories,
Figure 738048DEST_PATH_IMAGE014
x, Y coordinates representing the ith keypoint of the nth person, w, h are the width and height of the image, respectively, and the label value contains two parts, the front is the x label value and the back is the y label value.
6. The method for monitoring fitness activities in real time as claimed in claim 1, wherein the x-label value distribution vector and the y-label value distribution vector are obtained by:
firstly, acquiring one-dimensional Gaussian distribution probability values of all key points of a person, summing corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points, averaging to obtain average one-dimensional Gaussian distribution probability, and multiplying the average one-dimensional Gaussian distribution probability by a label value to obtain a label distribution vector;
because the coordinate vector is expressed as x and y one-dimensional vectors, the x label value distribution vector and the y label value distribution vector are also available; all category keypoint label values for each person are the same value.
7. A method as claimed in claim 1, wherein the first neural network has a structure and loss function of:
the first neural network is structurally composed of an image encoder and a full-connection network;
the input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, the feature map is subjected to flattening operation to obtain a feature vector, the feature vector is input into a full-connection network for fitting, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key point are output;
the loss function is as follows:
Figure 87121DEST_PATH_IMAGE016
Figure DEST_PATH_IMAGE017
respectively representing an x coordinate one-dimensional distribution vector label and an x coordinate one-dimensional distribution vector label of network prediction;
Figure 712006DEST_PATH_IMAGE018
respectively representing a one-dimensional distribution vector label of a y coordinate and a one-dimensional distribution vector label of the y coordinate of network prediction;
Figure DEST_PATH_IMAGE019
respectively representing an x label value distribution vector and an x label value distribution vector of network prediction;
Figure 70307DEST_PATH_IMAGE020
respectively representing the y label value distribution vector label and the y label value distribution vector of the network prediction.
8. A method for real-time monitoring of fitness activity according to claim 1, wherein the method for obtaining the body posture information of each person in the image according to the predictive tag data comprises:
acquiring suspected key point x and y coordinates in an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector predicted by a first neural network by using a threshold method;
the method comprises the following steps of obtaining the association degree of the x coordinate of the suspected key point and the y coordinate of the suspected key point, wherein the calculation method of the association degree of the coordinates comprises the following steps:
calculating the cosine similarity of the one-dimensional Gaussian distribution sequence with the corresponding distribution length by taking the x and y coordinates as a center;
and then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain matched x and y coordinates, and the matched coordinates are the coordinates of the key points of the personnel in the image.
9. The method for monitoring the fitness activity in real time as claimed in claim 1, wherein the structure and the training method of the second neural network are as follows:
the second neural network is a time convolution network and comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information, inputting the time sequence attitude information of each person and outputting the time sequence attitude information as a feature vector; the full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted
Figure DEST_PATH_IMAGE021
(ii) a The label data of the network is marked artificially, and the identification and marking of the body-building action state comprises the following steps: the current state is in a fitness action state and the current state is in a non-fitness action state; and optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.
10. A fitness activity artificial intelligence recognition system comprising a processor and a memory, wherein the processor is configured to execute a fitness activity real-time monitoring method according to any one of claims 1-9 stored in the memory.
CN202210757131.9A 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system Active CN114821819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210757131.9A CN114821819B (en) 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210757131.9A CN114821819B (en) 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system

Publications (2)

Publication Number Publication Date
CN114821819A true CN114821819A (en) 2022-07-29
CN114821819B CN114821819B (en) 2022-09-23

Family

ID=82523293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210757131.9A Active CN114821819B (en) 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system

Country Status (1)

Country Link
CN (1) CN114821819B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8837839B1 (en) * 2010-11-03 2014-09-16 Hrl Laboratories, Llc Method for recognition and pose estimation of multiple occurrences of multiple objects in visual images
CN108830150A (en) * 2018-05-07 2018-11-16 山东师范大学 One kind being based on 3 D human body Attitude estimation method and device
CN113673354A (en) * 2021-07-23 2021-11-19 湖南大学 Human body key point detection method based on context information and combined embedding
CN114187665A (en) * 2021-12-20 2022-03-15 长讯通信服务有限公司 Multi-person gait recognition method based on human body skeleton heat map

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8837839B1 (en) * 2010-11-03 2014-09-16 Hrl Laboratories, Llc Method for recognition and pose estimation of multiple occurrences of multiple objects in visual images
CN108830150A (en) * 2018-05-07 2018-11-16 山东师范大学 One kind being based on 3 D human body Attitude estimation method and device
CN113673354A (en) * 2021-07-23 2021-11-19 湖南大学 Human body key point detection method based on context information and combined embedding
CN114187665A (en) * 2021-12-20 2022-03-15 长讯通信服务有限公司 Multi-person gait recognition method based on human body skeleton heat map

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
段俊臣等: "基于人体骨骼点检测与多层感知机的人体姿态识别", 《电子测量技术》 *

Also Published As

Publication number Publication date
CN114821819B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN110084156B (en) Gait feature extraction method and pedestrian identity recognition method based on gait features
Khan et al. Computer-aided gastrointestinal diseases analysis from wireless capsule endoscopy: a framework of best features selection
CN108052896B (en) Human body behavior identification method based on convolutional neural network and support vector machine
Chaudhary et al. Intelligent approaches to interact with machines using hand gesture recognition in natural way: a survey
Várkonyi-Kóczy et al. Human–computer interaction for smart environment applications using fuzzy hand posture and gesture models
JP2017016593A (en) Image processing apparatus, image processing method, and program
Ramesh et al. Cell segmentation using a similarity interface with a multi-task convolutional neural network
Ali et al. Object recognition for dental instruments using SSD-MobileNet
Arif et al. Human pose estimation and object interaction for sports behaviour
CN111968124A (en) Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation
Zarbakhsh et al. Low-rank sparse coding and region of interest pooling for dynamic 3D facial expression recognition
Zhang et al. Out-of-region keypoint localization for 6D pose estimation
Maximili et al. Hybrid salient object extraction approach with automatic estimation of visual attention scale
Al-Saedi et al. Survey of hand gesture recognition systems
Swain et al. Yoga pose monitoring system using deep learning
CN111738062B (en) Automatic re-identification system based on embedded platform
Coppola et al. Applying a 3d qualitative trajectory calculus to human action recognition using depth cameras
Abedi et al. Modification of deep learning technique for face expressions and body postures recognitions
CN114821819B (en) Real-time monitoring method for body-building action and artificial intelligence recognition system
Kim et al. Human activity recognition as time-series analysis
Su et al. Latent fingerprint core point prediction based on Gaussian processes
Zhigang et al. Human behavior recognition method based on double-branch deep convolution neural network
CN113158870A (en) Countermeasure type training method, system and medium for 2D multi-person attitude estimation network
Naik Bukht et al. A Novel Human Interaction Framework Using Quadratic Discriminant Analysis with HMM.
Jiang et al. A fully-convolutional framework for semantic segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant