CN114821819B - Real-time monitoring method for body-building action and artificial intelligence recognition system - Google Patents

Real-time monitoring method for body-building action and artificial intelligence recognition system Download PDF

Info

Publication number
CN114821819B
CN114821819B CN202210757131.9A CN202210757131A CN114821819B CN 114821819 B CN114821819 B CN 114821819B CN 202210757131 A CN202210757131 A CN 202210757131A CN 114821819 B CN114821819 B CN 114821819B
Authority
CN
China
Prior art keywords
key point
dimensional
coordinate
distribution
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202210757131.9A
Other languages
Chinese (zh)
Other versions
CN114821819A (en
Inventor
张桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong Tongxing Fitness Equipment Co ltd
Original Assignee
Nantong Tongxing Fitness Equipment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Tongxing Fitness Equipment Co ltd filed Critical Nantong Tongxing Fitness Equipment Co ltd
Priority to CN202210757131.9A priority Critical patent/CN114821819B/en
Publication of CN114821819A publication Critical patent/CN114821819A/en
Application granted granted Critical
Publication of CN114821819B publication Critical patent/CN114821819B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image recognition, in particular to a real-time body-building action monitoring method and an artificial intelligence recognition system. The method comprises the following steps: acquiring an image and a human body key point category marking image; segmenting the human body part of the image; acquiring the average area of the human body part segmentation corresponding to each human body key point category, and acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point according to the average area; converting the category labels of the key points of the human body into one-dimensional distribution vectors of coordinates, and acquiring the distribution vectors of label values; training a neural network according to the image, the one-dimensional distribution vector of the coordinates and the distribution vector of the label values; and identifying the body-building action state of each person according to the human body posture information of each person. The method obtains the human body posture labels with different distribution sizes and standard deviations based on the human body part areas in the images, reduces the ambiguity of artificial labels, realizes the human body posture image recognition by utilizing the one-dimensional vector, saves the calculation resources and has high detection precision.

Description

Real-time monitoring method for body-building action and artificial intelligence recognition system
Technical Field
The invention relates to the field of image recognition, in particular to a real-time body-building action monitoring method and an artificial intelligence recognition system.
Background
In recent years, with the increasing working intensity and the increasing living pressure of people, the physical health of people is facing a plurality of challenges. In this context, healthy life becomes a topic of interest, and more people choose to strengthen physical fitness by exercising. On the other hand, the rapid development of electronic technology has also led to an increasing number of people using MEMS-based inertial sensors to monitor people's movement. The mode can better assist physical exercise and has important significance for promoting physical health.
In the aspect of application scenes, human body activity recognition is taken as a main part, namely, states of standing, lying, sitting, riding and the like of a human body are recognized, and the method has great limitation.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a real-time monitoring method for body-building actions and an artificial intelligence recognition system, wherein the adopted technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for monitoring fitness activities in real time, the method including the following steps: a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image; segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category; acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; labeling the positions of key points in the image according to the categories of the key points of the human body to obtain the label value of each person in the image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image; establishing a first neural network, and training the first neural network according to the RGB image and the label data; obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data; and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.
Further, the calculation method of the gaussian distribution size of each key point is as follows:
Figure 100002_DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
showing the distribution size of the ith key point in the ith category key point,
Figure 100002_DEST_PATH_IMAGE006
the average area of the human body part segmentation corresponding to the first category key point is shown,
Figure 100002_DEST_PATH_IMAGE008
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure 100002_DEST_PATH_IMAGE010
in order to be a function of rounding up,
Figure 100002_DEST_PATH_IMAGE012
indicating the baseline gaussian distribution size.
Further, the method for calculating the standard deviation of the gaussian distribution of each key point comprises the following steps:
Figure 100002_DEST_PATH_IMAGE014
Figure 100002_DEST_PATH_IMAGE016
representing the standard deviation of the gaussian distribution of the ith keypoint in the ith category of keypoints,
Figure 521374DEST_PATH_IMAGE006
the average area of the human body part segmentation corresponding to the key point of the first category is shown,
Figure 359886DEST_PATH_IMAGE008
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure 670781DEST_PATH_IMAGE010
in order to be a function of rounding up,
Figure 100002_DEST_PATH_IMAGE018
the standard deviation of the baseline gaussian distribution is indicated.
Further, the method for obtaining the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector comprises the following steps: firstly, generating an x coordinate one-dimensional vector with the width of w and a y coordinate one-dimensional vector with the height of h; for each one-dimensional vector, substituting the corresponding normalized coordinates of the key points into a one-dimensional Gaussian distribution function, and simultaneously substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point; the normalized coordinate of each key point is 0, the neighborhood of each key point is obtained according to the distribution size of the key point, and the normalized coordinate of the neighborhood is a coordinate with the key point as the origin; finally, substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size; each key point has a corresponding distribution size and a one-dimensional Gaussian distribution standard deviation; and then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to obtain an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector of each image.
Further, the method for calculating the tag value of each person includes:
Figure 100002_DEST_PATH_IMAGE020
n denotes the nth person in the image, K denotes the number of keypoint categories,
Figure DEST_PATH_IMAGE022
x, Y coordinates representing the ith keypoint of the nth person, w, h are the width and height of the image, respectively, and the label value contains two parts, the front is the x label value and the back is the y label value.
Further, the method for obtaining the x tag value distribution vector and the y tag value distribution vector comprises the following steps: firstly, acquiring one-dimensional Gaussian distribution probability values of all key points of a person, summing corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points, averaging to obtain average one-dimensional Gaussian distribution probability, and multiplying the average one-dimensional Gaussian distribution probability by a label value to obtain a label distribution vector; because the coordinate vector is expressed as x and y one-dimensional vectors, the x label value distribution vector and the y label value distribution vector are also available; all category keypoint label values for each person are the same value.
Further, the structure and loss function of the first neural network is: the first neural network has the structure of an image encoder and a fully connected network; the input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, the feature map is subjected to flattening operation to obtain a feature vector, the feature vector is input into a full-connection network for fitting, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key point are output; the loss function is as follows:
Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE026
respectively representing an x coordinate one-dimensional distribution vector label and an x coordinate one-dimensional distribution vector label of network prediction;
Figure DEST_PATH_IMAGE028
respectively representing a one-dimensional distribution vector label of a y coordinate and a one-dimensional distribution vector label of the y coordinate of network prediction;
Figure DEST_PATH_IMAGE030
respectively representing an x label value distribution vector and an x label value distribution vector of network prediction;
Figure DEST_PATH_IMAGE032
respectively representing the y label value distribution vector label and the y label value distribution vector of the network prediction.
Further, the method for acquiring the human body posture information of each person in the image according to the predicted tag data comprises the following steps: acquiring suspected key point x and y coordinates in an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector predicted by a first neural network by using a threshold method; the method comprises the following steps of obtaining the association degree of the x coordinate of the suspected key point and the y coordinate of the suspected key point, wherein the calculation method of the association degree of the coordinates comprises the following steps: calculating the cosine similarity of the one-dimensional Gaussian distribution sequence with the corresponding distribution length by taking the x and y coordinates as a center; and then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain matched x and y coordinates, and the matched coordinates are the coordinates of the key points of the personnel in the image.
Further, the structure and the training method of the second neural network are as follows: the second neural network is a time convolution network and comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information, inputting the time sequence attitude information of each person and outputting the time sequence attitude information as a feature vector; the full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted
Figure DEST_PATH_IMAGE034
(ii) a The label data of the network is marked artificially, and the identification and marking of the body-building action state comprises the following steps: the current state is in a fitness action state and the current state is in a non-fitness action state; and optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.
In a second aspect, another embodiment of the present invention provides a fitness activity artificial intelligence recognition system, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements the steps of the method as described in any one of the above when executing the computer program.
The invention has the following beneficial effects:
the method and the device acquire the human body posture labels with different distribution sizes and standard deviations based on the area of the human body part in the image, reduce the ambiguity of artificial labeling, and simultaneously realize human body posture estimation by utilizing the one-dimensional vector.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a method for monitoring a fitness activity in real time according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of gaussian distributions with different standard deviations in a method for real-time monitoring of exercise activities according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the method for monitoring exercise actions in real time according to the present invention, the specific implementation, structure, features and effects thereof will be provided in conjunction with the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the real-time monitoring method for body-building actions provided by the invention in detail with reference to the accompanying drawings.
Referring to fig. 1, a method for monitoring a fitness activity in real time according to the present invention is shown, wherein the method comprises the following steps:
step S001: a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image;
firstly, a monitoring camera is deployed in a monitoring area, the monitoring camera can acquire images of the area in real time by using a common camera with a resolution of 1080P, and the acquired images are images of an RGB color space. And then detecting key point information of the human body by utilizing a human body posture estimation technology in real time on the RGB image.
The deep learning method is preferred for the posture estimation technology, the precision is high, and in the human body posture estimation technology based on deep learning, the two-dimensional heat map representation always dominates the human body posture estimation for many years due to the high performance of the two-dimensional heat map representation. However, the heatmap-based approach has some drawbacks: quantization errors exist, limited by the spatial resolution of the heatmap. Larger heatmaps require additional up-sampling operations and high resolution expensive processing. Overlapping heat map signals may be mistaken for a single keypoint.
The invention can realize the multi-human body posture estimation by adopting the neural network with the common image classification network structure.
Firstly, labeling is carried out in an image, and human skeleton key points in the image are labeled, wherein the labeling is the labeling of image coordinates, namely, the image coordinates and the human skeleton key points have a corresponding relation. The labeled human skeletal key points can be referred to the COCO human posture estimation data set.
And forming a data set by the RGB image and the human body key point category labeling image.
Step S002: segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category;
obtaining a coordinate vector:
the heatmap-based approach also has the following problems with annotations: typically the standard deviation is fixed, meaning that different key points are supervised by the same constructed heatmap. Semantic confusion may result from different coverage areas of the same keypoint, with inherent ambiguity in the keypoint coordinates. There are artificial scale differences and annotation ambiguities.
The present invention separates the x-coordinate and y-coordinate representations of the keypoints into two independent one-dimensional vectors. The keypoint localization task is considered as two subtasks of horizontal and vertical regression.
Firstly, an example of a human body in an image is obtained by using an example segmentation algorithm, wherein the example segmentation algorithm preferably adopts an example segmentation algorithm based on deep learning, and the invention adopts a segment MultiPersonParts method in a BodyPix 2.0 model, so that an example segmentation result of each person in the image and a human body part segmentation result of each person can be obtained, and the total number of the example segmentation results and the human body part segmentation result comprises 24 body part segmentation results.
For each body part, one key point or a plurality of key point positions should be used, for example, the left-hand part corresponds to the left-hand key point. And then acquiring the human body part segmentation area corresponding to each key point. Then, the human body part segmentation area corresponding to each key point in all the labeled data sets is counted. For the area of the part segmentation, the larger the area without considering the influence of the human body pose generally means that the closer the optical center of the camera is to the acquired image, the more prominent the area is in the image.
Further, the average area of the human body part segmentation corresponding to each key point in all the marked data sets is obtained
Figure DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE038
Where T is the number of the keypoint category in all labeled data sets,
Figure DEST_PATH_IMAGE040
and the area of the human body part segmentation corresponding to the jth key point category is shown.
Then, an empirical distribution size B is set, which determines the size of the one-dimensional gaussian distribution in the coordinate vector, with an empirical value of 5.
Furthermore, the average area of the human body part segmentation corresponding to each key point is based on
Figure 982769DEST_PATH_IMAGE036
Obtaining the distribution size:
Figure DEST_PATH_IMAGE002A
Figure 515251DEST_PATH_IMAGE004
showing the distribution size of the ith key point in the ith category key point,
Figure 744238DEST_PATH_IMAGE006
the average area of the human body part segmentation corresponding to the first category key point is shown,
Figure 542430DEST_PATH_IMAGE008
and the area of the human body part segmentation corresponding to the ith key point in the ith category key point is shown. By this formula, the smaller the area of the corresponding portion of the keypoint is, the smaller the distribution thereof is, because the smaller the area is, the smaller it is generally in the image, and the more easily there is an overlap of the distribution region of the keypoint. The larger the area isThe larger its distribution, because the larger its label is ambiguous. I.e. whether the point represents a bone keypoint or not. Then the following constraints exist for the distribution size: the minimum distribution size is 3 and the maximum distribution size is 9.
Figure 397122DEST_PATH_IMAGE010
Is an rounding-up function.
For gaussian distribution, under the condition of a certain mean value, the standard deviations thereof are different, resulting in different distributions, so that the standard deviation of the gaussian distribution can be used to represent the uncertainty of the annotation, and the following formula is also provided:
Figure DEST_PATH_IMAGE014A
where C represents the baseline standard deviation and the empirical value is 1. The larger the area, the larger the standard deviation, and the closer the values of the distributions. The following constraints then exist for the standard deviation size: the minimum standard deviation was 0.5 and the maximum standard deviation was 2.5.
Step S003: acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; obtaining the label value of each person in the image according to the positions of the key points in the human body key point category label image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image;
further, an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector are generated, and the method comprises the following steps:
for neural network reasoning, the input is an image of fixed size, where an x-coordinate one-dimensional vector of width w and a y-coordinate one-dimensional vector of height h are first generated.
And for each one-dimensional vector, substituting the normalized coordinates corresponding to the key points into a one-dimensional Gaussian distribution function, and substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point. The function of the one-dimensional gaussian distribution is:
Figure DEST_PATH_IMAGE042
Figure DEST_PATH_IMAGE044
is taken as the average value of the values,
Figure DEST_PATH_IMAGE046
is a standard deviation, e is a natural base number,
Figure DEST_PATH_IMAGE048
is a mathematical circumference ratio.
The normalized coordinate of each key point is 0, and the normalized coordinate of the neighborhood is the coordinate with the key point as the origin. And finally substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size. Each key point has a corresponding distribution size and a one-dimensional Gaussian distribution standard deviation. For example, if a key point has a distribution size of 3 and a standard deviation of 0.5, taking the x coordinate as an example, the normalized coordinate of the key point and the normalized coordinates of the neighborhood (the distribution size is 3, i.e., the normalized coordinates of the left and right neighborhoods of the key point are-1 and 1, respectively) are substituted into a one-dimensional gaussian distribution function with a standard deviation of 0.5 to obtain corresponding values, and then normalization is performed. The normalization is that the probability value of the coordinate position of the key point is 1.
And then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to finally obtain the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of each image.
Obtaining a label value distribution vector:
because different personnel instances in the image are to be distinguished, label values are introduced, namely the label values of all key points of the same personnel are similar, and the label values of different personnel are different, and the calculation method of the label values of the personnel is as follows:
Figure DEST_PATH_IMAGE020A
n denotes the nth person in the image, K denotes the number of keypoint categories,
Figure 784110DEST_PATH_IMAGE022
x, Y coordinates representing the ith keypoint of the nth person, respectively. w and h are the width and height of the image respectively. The tag value contains two parts, the front being the x tag value and the back being the y tag value.
Then, the label value distribution vector of all key points of the person is obtained, and the method comprises the following steps:
firstly, one-dimensional Gaussian distribution vectors of all key points of the person are obtained, then corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points are summed (the one-dimensional Gaussian distribution probability of each key point can be regarded as a sequence, and then the corresponding positions of the one-dimensional Gaussian distribution probabilities of all the key points are added), then averaging is carried out, the average one-dimensional Gaussian distribution probability is obtained, and the average one-dimensional Gaussian distribution probability is multiplied by a label value, so that the label distribution vector is obtained.
Since the coordinate vectors are expressed as x and y one-dimensional vectors, there are x-label value distribution vectors and y-label value distribution vectors. All key point tag values for each person are the same value.
So far, for each image, an x-coordinate one-dimensional distribution vector, a y-coordinate one-dimensional distribution vector, an x-label value distribution vector and a y-label value distribution vector of each category key point can be obtained. The vectors are used as label data of a network, and human body posture information of each person in the image can be obtained based on the four vectors.
Step S004: establishing a first neural network, and training the first neural network according to the RGB image and the label data;
further, a first neural network is trained, and the structure of the first neural network is two parts, namely an image encoder and a fully-connected network.
The input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, flattening (Flatten) operation is carried out on the feature map to obtain a feature vector, the feature vector is input into a full-connection network to be fitted, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key points are output. The network can adopt common network models such as ResNet and HRNet.
The loss function is as follows:
Figure DEST_PATH_IMAGE024A
Figure 370949DEST_PATH_IMAGE026
respectively representing an x coordinate one-dimensional distribution vector label and an x coordinate one-dimensional distribution vector label of network prediction;
Figure 656437DEST_PATH_IMAGE028
respectively representing a one-dimensional distribution vector label of a y coordinate and a one-dimensional distribution vector label of the y coordinate of network prediction;
Figure 49241DEST_PATH_IMAGE030
respectively representing an x label value distribution vector and an x label value distribution vector of network prediction;
Figure 244730DEST_PATH_IMAGE032
respectively representing the y label value distribution vector label and the y label value distribution vector of the network prediction.
Adam, SGD and the like can be adopted as the optimization method of the network, and an implementer can freely adopt the optimization method.
Step S005: obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data;
so far, the training of the first neural network is completed, and then the extraction and matching of the human body posture are performed based on the output of the first neural network, and the post-processing method comprises the following steps:
and acquiring the x and y coordinates of the suspected key points in the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector by using a threshold method. The empirical threshold value in the thresholding method is 0.85.
Because the coordinate vectors are used for marking the multi-human body gestures to have ambiguity, the x coordinate and the y coordinate of the key point are expressed and separated into two independent one-dimensional vectors, and the obtained x coordinate may correspond to a plurality of y coordinates, the association degree of the coordinates needs to be used, and the association degree of the coordinates is calculated as the cosine similarity of a one-dimensional Gaussian distribution sequence with the coordinates as the center and the distribution length of 5. The cosine similarity between the one-dimensional Gaussian distribution sequence of the x coordinate of each suspected key point and the one-dimensional Gaussian distribution sequence of the y coordinate of each suspected key point is obtained, and the larger the value is, the more the two values are matched.
And then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain the matched x and y coordinates, namely the coordinates of the key points of the personnel in the image.
The coordinate vectors of all the key points are processed to obtain key point information of all the people in the image, then label values and information (namely the sum of the x label values and the y label values) in corresponding positions of the key points are obtained based on the x label value distribution vector and the y label value distribution vector, grouping is carried out based on the label value sum, then the label values of the key points and a group of key points which are closest to the label values of the key points are obtained, and the key points in the group are human body posture information of one person in the image. One method of determining the keypoint label value and the closest set of keypoints is as follows: and acquiring a head key point, taking the head key point as an example, acquiring the label value sum of the head key point, and acquiring the key point with the label value and the minimum variance of each other type of key point. The key points are grouped into a group.
Therefore, the two-dimensional human body posture information of each person in the image can be obtained through the method.
Step S006: and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.
The neural network is utilized to acquire the posture information of each person in the image in real time, then the time convolution network is adopted to carry out body-building action state recognition on the human posture information of the time sequence, and the training details are as follows:
the time convolution network comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information. It inputs the time sequence attitude information of each person and outputs as a feature vector. The full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted
Figure 205733DEST_PATH_IMAGE034
. The label data of the network is artificially marked and is input into the network to be subjected to one-hot coding. And optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.
The body-building action state identification comprises the following steps: the current body-building action state and the current non-body-building action state. And then adjusting the data acquisition based on the MEMS inertial sensor based on different classification results, wherein the data acquisition function based on the MEMS inertial sensor is closed in the non-fitness action state, and the data acquisition function based on the MEMS inertial sensor is opened in the fitness action state.
Therefore, the body-building action state of the person can be identified through the neural network.
Based on the same concept as that of the above method embodiment, another embodiment of the present invention further provides a system for identifying artificial intelligence of fitness activities, where the system includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the steps of the method for monitoring fitness activities in real time provided in any of the above embodiments when executing the computer program, where a method for monitoring fitness activities in real time is described in detail in the above embodiments and is not described again.
It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A real-time monitoring method for fitness actions is characterized by comprising the following steps:
a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image;
segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category;
acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; obtaining the label value of each person in the image according to the positions of the key points in the human body key point category label image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image;
the method for acquiring the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector comprises the following steps:
firstly, generating an x coordinate one-dimensional vector with the width of w and a y coordinate one-dimensional vector with the height of h;
for each one-dimensional vector, substituting the corresponding normalized coordinates of the key points into a one-dimensional Gaussian distribution function, and simultaneously substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point;
the normalized coordinate of each key point is 0, and a neighborhood of each key point is obtained according to the distribution size of the key point, wherein the normalized coordinate of the neighborhood is a coordinate with the key point as an origin; finally, substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size; each key point has a corresponding distribution size and a one-dimensional Gaussian distribution standard deviation;
then, substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to obtain an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector of each image;
the calculation method of the tag value of each person is as follows:
Figure DEST_PATH_IMAGE002
n denotes the nth person in the image, K denotes the number of keypoint categories,
Figure DEST_PATH_IMAGE003
x, Y coordinates representing the ith keypoint of the nth person, w, h being the width and height of the image, respectively, the label value comprising two parts, the front being the x label value and the back being the y label value
Establishing a first neural network, and training the first neural network according to the RGB image and the label data;
obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data;
and establishing a second neural network according to the human body posture information of each person for identifying the body-building action state of the person.
2. A method for real-time monitoring of fitness activities according to claim 1, wherein the gaussian distribution of each key point is calculated by:
Figure DEST_PATH_IMAGE005
Figure DEST_PATH_IMAGE006
showing the distribution size of the ith key point in the ith category key point,
Figure DEST_PATH_IMAGE007
the average area of the human body part segmentation corresponding to the key point of the first category is shown,
Figure DEST_PATH_IMAGE008
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure DEST_PATH_IMAGE009
in order to be a function of rounding up,
Figure DEST_PATH_IMAGE010
indicating the baseline gaussian distribution size.
3. A method for real-time monitoring of fitness activities according to claim 1, wherein the gaussian distribution standard deviation of each key point is calculated as:
Figure DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE013
representing the standard deviation of the gaussian distribution of the ith keypoint in the ith category of keypoints,
Figure 683224DEST_PATH_IMAGE007
the average area of the human body part segmentation corresponding to the key point of the first category is shown,
Figure 741310DEST_PATH_IMAGE008
showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,
Figure 22512DEST_PATH_IMAGE009
in order to get the function of the integer upwards,
Figure DEST_PATH_IMAGE014
the standard deviation of the baseline gaussian distribution is indicated.
4. The method for monitoring fitness activities in real time as claimed in claim 1, wherein the x-label value distribution vector and the y-label value distribution vector are obtained by:
firstly, acquiring one-dimensional Gaussian distribution probability values of all key points of a person, summing corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points, averaging to obtain average one-dimensional Gaussian distribution probability, and multiplying the average one-dimensional Gaussian distribution probability by a label value to obtain a label distribution vector;
because the coordinate vector is expressed as x and y one-dimensional vectors, the x label value distribution vector and the y label value distribution vector are also available; all category keypoint label values for each person are the same value.
5. A method as claimed in claim 1, wherein the first neural network has a structure and loss function of:
the first neural network is structurally composed of an image encoder and a full-connection network;
the input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, the feature map is subjected to flattening operation to obtain a feature vector, the feature vector is input into a full-connection network for fitting, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key point are output;
the loss function is as follows:
Figure DEST_PATH_IMAGE016
Figure DEST_PATH_IMAGE017
respectively representing an x coordinate one-dimensional distribution vector label and an x coordinate one-dimensional distribution vector label of network prediction;
Figure DEST_PATH_IMAGE018
respectively representing a one-dimensional distribution vector label of a y coordinate and a one-dimensional distribution vector label of the y coordinate of network prediction;
Figure DEST_PATH_IMAGE019
respectively representing an x label value distribution vector and an x label value distribution vector of network prediction;
Figure DEST_PATH_IMAGE020
respectively representing the y label value distribution vector label and the y label value distribution vector of the network prediction.
6. A method for real-time monitoring of fitness activity according to claim 1, wherein the method for obtaining the body posture information of each person in the image according to the predictive tag data comprises:
acquiring suspected key point x and y coordinates in an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector predicted by a first neural network by using a threshold method;
obtaining the relevance between the x coordinate of the suspected key point and the y coordinate of the suspected key point, wherein the relevance calculation method of the coordinates comprises the following steps:
calculating the cosine similarity of the one-dimensional Gaussian distribution sequence with the corresponding distribution length by taking the x and y coordinates as a center;
and then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain matched x and y coordinates, and the matched coordinates are the coordinates of the key points of the personnel in the image.
7. The method as claimed in claim 1, wherein the structure and training method of the second neural network comprises:
the second neural network is a time convolution network and comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information, inputting the time sequence attitude information of each person and outputting the time sequence attitude information as a feature vector; the full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function adopts
Figure DEST_PATH_IMAGE021
(ii) a The label data of the network is marked artificially, and the identification and marking of the body-building action state comprises the following steps: the current state is in a fitness action state and the current state is in a non-fitness action state; and optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.
8. A fitness activity artificial intelligence recognition system comprising a processor and a memory, wherein the processor is configured to execute a fitness activity real-time monitoring method as claimed in any one of claims 1 to 7 stored in the memory.
CN202210757131.9A 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system Expired - Fee Related CN114821819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210757131.9A CN114821819B (en) 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210757131.9A CN114821819B (en) 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system

Publications (2)

Publication Number Publication Date
CN114821819A CN114821819A (en) 2022-07-29
CN114821819B true CN114821819B (en) 2022-09-23

Family

ID=82523293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210757131.9A Expired - Fee Related CN114821819B (en) 2022-06-30 2022-06-30 Real-time monitoring method for body-building action and artificial intelligence recognition system

Country Status (1)

Country Link
CN (1) CN114821819B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8837839B1 (en) * 2010-11-03 2014-09-16 Hrl Laboratories, Llc Method for recognition and pose estimation of multiple occurrences of multiple objects in visual images
CN108830150B (en) * 2018-05-07 2019-05-28 山东师范大学 One kind being based on 3 D human body Attitude estimation method and device
CN113673354B (en) * 2021-07-23 2024-02-20 湖南大学 Human body key point detection method based on context information and joint embedding
CN114187665B (en) * 2021-12-20 2023-10-20 长讯通信服务有限公司 Multi-person gait recognition method based on human skeleton heat map

Also Published As

Publication number Publication date
CN114821819A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
Khan et al. Computer-aided gastrointestinal diseases analysis from wireless capsule endoscopy: a framework of best features selection
Miao et al. Recognizing facial expressions using a shallow convolutional neural network
CN108052896B (en) Human body behavior identification method based on convolutional neural network and support vector machine
Yadav et al. Lung-GANs: unsupervised representation learning for lung disease classification using chest CT and X-ray images
Várkonyi-Kóczy et al. Human–computer interaction for smart environment applications using fuzzy hand posture and gesture models
Fayaz et al. Underwater object detection: architectures and algorithms–a comprehensive review
JP2017016593A (en) Image processing apparatus, image processing method, and program
Mao et al. Deep generative classifiers for thoracic disease diagnosis with chest x-ray images
Ramesh et al. Cell segmentation using a similarity interface with a multi-task convolutional neural network
Ali et al. Object recognition for dental instruments using SSD-MobileNet
Yu et al. Fetal facial standard plane recognition via very deep convolutional networks
CN111968124A (en) Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation
Zarbakhsh et al. Low-rank sparse coding and region of interest pooling for dynamic 3D facial expression recognition
Swain et al. Yoga pose monitoring system using deep learning
Abedi et al. Modification of deep learning technique for face expressions and body postures recognitions
CN114821819B (en) Real-time monitoring method for body-building action and artificial intelligence recognition system
Sujatha et al. Enhancing Object Detection with Mask R-CNN: A Deep Learning Perspective
CN111738062A (en) Automatic re-identification method and system based on embedded platform
Cao et al. Capsule endoscopy image classification with deep convolutional neural networks
Zhigang et al. Human behavior recognition method based on double-branch deep convolution neural network
CN113158870A (en) Countermeasure type training method, system and medium for 2D multi-person attitude estimation network
Goundar Improved deep learning model based on integrated convolutional neural networks and transfer learning for shoeprint image classification
Naik Bukht et al. A Novel Human Interaction Framework Using Quadratic Discriminant Analysis with HMM.
Farouk Principal component pyramids using image blurring for nonlinearity reduction in hand shape recognition
Jiang et al. A fully-convolutional framework for semantic segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220923