CN114821819A

CN114821819A - Real-time monitoring method for body-building action and artificial intelligence recognition system

Info

Publication number: CN114821819A
Application number: CN202210757131.9A
Authority: CN
Inventors: 张桢
Original assignee: Nantong Tongxing Fitness Equipment Co ltd
Current assignee: Nantong Tongxing Fitness Equipment Co ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-07-29
Anticipated expiration: 2042-06-30
Also published as: CN114821819B

Abstract

The invention relates to the technical field of image recognition, in particular to a real-time body-building action monitoring method and an artificial intelligence recognition system. The method comprises the following steps: acquiring an image and a human body key point category marking image; segmenting the human body part of the image; acquiring the average area of the human body part segmentation corresponding to each human body key point category, and acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point according to the average area; converting the category labels of the key points of the human body into coordinate one-dimensional distribution vectors, and acquiring label value distribution vectors; training a neural network according to the image, the one-dimensional distribution vector of the coordinates and the distribution vector of the label values; and identifying the body-building action state of the personnel according to the human body posture information of each personnel. The method obtains the human body posture labels with different distribution sizes and standard deviations based on the area of the human body part in the image, reduces the ambiguity of artificial labeling, realizes human body posture image recognition by utilizing the one-dimensional vector, saves the computing resource and has high detection precision.

Description

Real-time monitoring method for body-building action and artificial intelligence recognition system

Technical Field

The invention relates to the field of image recognition, in particular to a real-time body-building action monitoring method and an artificial intelligence recognition system.

Background

In recent years, with the increasing working intensity and the increasing living pressure of people, the physical health of people is facing a plurality of challenges. In this context, healthy life becomes a topic of interest, and more people choose to strengthen physical fitness by exercising. On the other hand, the rapid development of electronic technology has also led to an increasing number of people using MEMS-based inertial sensors to monitor people's movement. The mode can better assist physical exercise and has important significance for promoting physical health.

In the aspect of application scenes, human body activity recognition is mainly used for recognizing the states of standing, lying, sitting, riding and the like of a human body, so that the method has great limitation.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a real-time monitoring method for body-building actions and an artificial intelligence recognition system, wherein the adopted technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for monitoring fitness activities in real time, the method including the following steps: a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image; segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category; acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; obtaining the label value of each person in the image according to the positions of the key points in the human body key point category label image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image; establishing a first neural network, and training the first neural network according to the RGB image and the label data; obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data; and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.

Further, the calculation method of the gaussian distribution size of each key point is as follows:

showing the distribution size of the ith key point in the ith category key point,

the average area of the human body part segmentation corresponding to the key point of the first category is shown,

showing the segmented area of the human body part corresponding to the ith key point in the ith category key point,

in order to be a function of rounding up,

indicating the baseline gaussian distribution size.

Further, the method for calculating the standard deviation of the gaussian distribution of each key point comprises:

representing the standard deviation of the gaussian distribution of the ith keypoint in the ith category of keypoints,

in order to be a function of rounding up,

the standard deviation of the baseline gaussian distribution is indicated.

Further, the method for obtaining the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector comprises the following steps: firstly, generating an x coordinate one-dimensional vector with the width of w and a y coordinate one-dimensional vector with the height of h; for each one-dimensional vector, substituting the corresponding normalized coordinates of the key points into a one-dimensional Gaussian distribution function, and simultaneously substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point; the normalized coordinate of each key point is 0, the neighborhood of each key point is obtained according to the distribution size of the key point, and the normalized coordinate of the neighborhood is a coordinate with the key point as the origin; finally, substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size; each key point has a corresponding distribution size and a one-dimensional Gaussian distribution standard deviation; and then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to obtain an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector of each image.

Further, the method for calculating the tag value of each person includes:

n denotes the nth person in the image, K denotes the number of keypoint categories,

x, Y coordinates representing the ith keypoint of the nth person, w, h are the width and height of the image, respectively, and the label value contains two parts, the front is the x label value and the back is the y label value.

Further, the method for obtaining the x tag value distribution vector and the y tag value distribution vector comprises the following steps: firstly, acquiring one-dimensional Gaussian distribution probability values of all key points of a person, summing corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points, averaging to obtain average one-dimensional Gaussian distribution probability, and multiplying the average one-dimensional Gaussian distribution probability by a label value to obtain a label distribution vector; because the coordinate vector is expressed as x and y one-dimensional vectors, the x label value distribution vector and the y label value distribution vector are also available; all category keypoint label values for each person are the same value.

Further, the structure and loss function of the first neural network is: the first neural network is structurally composed of an image encoder and a full-connection network; the input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, the feature map is subjected to flattening operation to obtain a feature vector, the feature vector is input into a full-connection network for fitting, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key point are output; the loss function is as follows:

respectively representing an x coordinate one-dimensional distribution vector label and an x coordinate one-dimensional distribution vector label of network prediction;

respectively representing a one-dimensional distribution vector label of a y coordinate and a one-dimensional distribution vector label of the y coordinate of network prediction;

respectively representing an x label value distribution vector and an x label value distribution vector of network prediction;

respectively representing the y label value distribution vector label and the y label value distribution vector of the network prediction.

Further, the method for acquiring the human body posture information of each person in the image according to the predicted tag data comprises the following steps: acquiring suspected key point x and y coordinates in an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector predicted by a first neural network by using a threshold method; the method comprises the following steps of obtaining the association degree of the x coordinate of the suspected key point and the y coordinate of the suspected key point, wherein the calculation method of the association degree of the coordinates comprises the following steps: calculating the cosine similarity of the one-dimensional Gaussian distribution sequence with the corresponding distribution length by taking the x and y coordinates as a center; and then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain matched x and y coordinates, and the matched coordinates are the coordinates of the key points of the personnel in the image.

Further, the second nerveThe structure and the training method of the network are as follows: the second neural network is a time convolution network and comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information, inputting the time sequence attitude information of each person and outputting the time sequence attitude information as a feature vector; the full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted

(ii) a The label data of the network is marked artificially, and the identification and marking of the body-building action state comprises the following steps: the current state is in a fitness action state and the current state is in a non-fitness action state; and optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.

In a second aspect, another embodiment of the present invention provides a system for identifying exercise activity artificial intelligence, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the steps of the method according to any one of the above methods.

The invention has the following beneficial effects:

the human body posture labels with different distribution sizes and standard deviations are obtained based on the area of the human body part in the image, the ambiguity of the artificial labels is reduced, meanwhile, the human body posture estimation is realized by utilizing the one-dimensional vector, compared with the process of subtracting a decoder in the prior art, the calculation resource is saved, and meanwhile, the problem of false detection caused by the overlapped 2D heat map is also reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a method for monitoring a fitness activity in real time according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of gaussian distributions with different standard deviations in a method for real-time monitoring of exercise activities according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the method for monitoring exercise actions in real time according to the present invention, the specific implementation, structure, features and effects thereof will be provided in conjunction with the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following describes a specific scheme of the real-time monitoring method for body-building actions provided by the invention in detail with reference to the accompanying drawings.

Referring to fig. 1, a method for monitoring a fitness activity in real time according to the present invention is shown, wherein the method comprises the following steps:

step S001: a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image;

firstly, a monitoring camera is deployed in a monitoring area, the monitoring camera can acquire images of the area in real time by using a common camera with a resolution of 1080P, and the acquired images are images of an RGB color space. And then detecting key point information of the human body by utilizing a human body posture estimation technology in real time on the RGB image.

The deep learning method is preferred for the posture estimation technology, the precision is high, and in the human body posture estimation technology based on deep learning, the two-dimensional heat map representation always dominates the human body posture estimation for many years due to the high performance of the two-dimensional heat map representation. However, the heatmap-based approach has some drawbacks: quantization errors exist, limited by the spatial resolution of the heatmap. Larger heatmaps require additional up-sampling operations and high resolution expensive processing. Overlapping heat map signals may be mistaken for a single keypoint.

The invention can realize the multi-human body posture estimation by adopting the neural network with the common image classification network structure.

Firstly, labeling an image, labeling a human skeleton key point in the image, wherein the labeling is to label an image coordinate, namely, the image coordinate is allowed to have a corresponding relation with the human skeleton key point. The labeled human skeletal key points can be referred to the COCO human posture estimation data set.

And forming a data set by the RGB image and the human body key point category labeling image.

Step S002: segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category;

obtaining a coordinate vector:

the heat map based approach also has the following problems on the annotation: typically the standard deviation is fixed, meaning that different keypoints are supervised by the same constructed heatmap. Semantic confusion may result from different coverage areas of the same keypoint, with inherent ambiguity in the keypoint coordinates. Artificial scale differences and annotation ambiguities exist.

The present invention separates the x-coordinate and y-coordinate representations of the keypoints into two independent one-dimensional vectors. The keypoint localization task is considered as two subtasks of horizontal and vertical regression.

Firstly, an example of a human body in an image is obtained by using an example segmentation algorithm, wherein the example segmentation algorithm preferably adopts an example segmentation algorithm based on deep learning, and the method adopts a segmentMultiPersonParts method in a BodyPix 2.0 model, so that an example segmentation result of each person in the image and a human body part segmentation result of each person can be obtained, and the total comprises 24 body part segmentation results.

For each body part, one key point or a plurality of key point positions should be used, for example, the left-hand part corresponds to the left-hand key point. And then acquiring the human body part segmentation area corresponding to each key point. Then, the human body part segmentation area corresponding to each key point in all the labeled data sets is counted. For the area of the part segmentation, the larger the area without considering the influence of the human body pose generally means that the closer the optical center of the camera is to the acquired image, the more prominent the area is in the image.

Further, the average area of the human body part segmentation corresponding to each key point in all the labeled data sets is obtained

：

Where T is the number of the keypoint category in all labeled data sets,

and the area of the human body part segmentation corresponding to the jth key point category is shown.

Then, an empirical distribution size B is set, which determines the size of the one-dimensional gaussian distribution in the coordinate vector, with an empirical value of 5.

Furthermore, the average area of the human body part segmentation corresponding to each key point is based on

Obtaining the distribution size:

the average area of the human body part segmentation corresponding to the first category key point is shown,

and the area of the human body part segmentation corresponding to the ith key point in the ith category key point is shown. By this formula, the smaller the area of the corresponding portion of the keypoint is, the smaller the distribution thereof is, because the smaller the area is, the smaller it is generally in the image, and the more easily there is an overlap of the distribution region of the keypoint. The larger the area, the larger its distribution, because the larger its label is more ambiguous. I.e. whether the point represents a bone keypoint or not. The following constraints then exist for the distribution size: the minimum distribution size is 3 and the maximum distribution size is 9.

Is an rounding-up function.

For a gaussian distribution, when the mean value is constant, the standard deviation of the gaussian distribution is different, resulting in different distributions, so the standard deviation of the gaussian distribution can be used to represent the uncertainty of the label, and the following formula is also provided:

where C represents the standard deviation of the reference and the empirical value is 1. The larger the area, the larger the standard deviation, and the closer the values of the distributions. The following constraints then exist for the standard deviation size: the minimum standard deviation was 0.5 and the maximum standard deviation was 2.5.

Step S003: acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; labeling the positions of key points in the image according to the categories of the key points of the human body to obtain the label value of each person in the image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image;

further, an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector are generated, and the method comprises the following steps:

for neural network reasoning, the input is an image of fixed size, where an x-coordinate one-dimensional vector of width w and a y-coordinate one-dimensional vector of height h are first generated.

And for each one-dimensional vector, substituting the normalized coordinates corresponding to the key points into a one-dimensional Gaussian distribution function, and substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point. The function of the one-dimensional gaussian distribution is:

is taken as the mean value of the average value,

is the standard deviation, e is the natural base number,

is a mathematical circumference ratio.

The normalized coordinate of each key point is 0, and the normalized coordinate of the neighborhood is the coordinate with the key point as the origin. And finally substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size. And each key point has the corresponding distribution size and one-dimensional Gaussian distribution standard deviation. For example, if a key point has a distribution size of 3 and a standard deviation of 0.5, taking the x coordinate as an example, the normalized coordinate of the key point and the normalized coordinates of the neighborhood (the distribution size is 3, i.e., the normalized coordinates of the left and right neighborhoods of the key point are-1 and 1, respectively) are substituted into a one-dimensional gaussian distribution function with a standard deviation of 0.5 to obtain corresponding values, and then normalization is performed. The normalization is that the probability value of the coordinate position of the key point is 1.

And then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to finally obtain the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of each image.

Obtaining a label value distribution vector:

because different personnel instances in the image are to be distinguished, label values are introduced, namely the label values of all key points of the same personnel are similar, and the label values of different personnel are different, and the calculation method of the label values of the personnel is as follows:

x, Y coordinates representing the ith keypoint of the nth person, respectively. w and h are the width and height of the image respectively. The tag value contains two parts, the front being the x tag value and the back being the y tag value.

Then, the label value distribution vector of all key points of the person is obtained, and the method comprises the following steps:

firstly, one-dimensional Gaussian distribution vectors of all key points of the person are obtained, then corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points are summed (the one-dimensional Gaussian distribution probability of each key point can be regarded as a sequence, and then the corresponding positions of the one-dimensional Gaussian distribution probabilities of all the key points are added), then averaging is carried out, the average one-dimensional Gaussian distribution probability is obtained, and the average one-dimensional Gaussian distribution probability is multiplied by a label value, so that the label distribution vector is obtained.

Since the coordinate vectors are expressed as x and y one-dimensional vectors, there are x-label value distribution vectors and y-label value distribution vectors. All key point tag values for each person are the same value.

So far, for each image, an x-coordinate one-dimensional distribution vector, a y-coordinate one-dimensional distribution vector, an x-label value distribution vector and a y-label value distribution vector of each category key point can be obtained. The vectors are used as label data of a network, and human body posture information of each person in the image can be obtained based on the four vectors.

Step S004: establishing a first neural network, and training the first neural network according to the RGB image and the label data;

further, a first neural network is trained, and the structure of the first neural network is two parts, namely an image encoder and a fully-connected network.

The input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, flattening (Flatten) operation is carried out on the feature map to obtain a feature vector, the feature vector is input into a full-connection network to be fitted, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key points are output. The network can adopt common network models such as ResNet and HRNet.

The loss function is as follows:

Adam, SGD and the like can be adopted as the optimization method of the network, and an implementer can freely adopt the optimization method.

Step S005: obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data;

so far, the training of the first neural network is completed, and then the extraction and matching of the human body posture are performed based on the output of the first neural network, and the post-processing method comprises the following steps:

and acquiring the x and y coordinates of the suspected key points in the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector by using a threshold method. The empirical threshold value in the thresholding method is 0.85.

Because the coordinate vectors are used for marking the multi-human body gestures to have ambiguity, the x coordinate and the y coordinate of the key point are expressed and separated into two independent one-dimensional vectors, and the obtained x coordinate may correspond to a plurality of y coordinates, the association degree of the coordinates needs to be used, and the association degree of the coordinates is calculated as the cosine similarity of a one-dimensional Gaussian distribution sequence with the coordinates as the center and the distribution length of 5. The cosine similarity between the one-dimensional Gaussian distribution sequence of the x coordinate of each suspected key point and the one-dimensional Gaussian distribution sequence of the y coordinate of each suspected key point is obtained, and the larger the value is, the more the two values are matched.

And then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain the matched x and y coordinates, namely the coordinates of the key points of the personnel in the image.

The coordinate vectors of all the key points are processed to obtain key point information of all the people in the image, then label values and information (namely the sum of the x label values and the y label values) in corresponding positions of the key points are obtained based on the x label value distribution vector and the y label value distribution vector, grouping is carried out based on the label value sum, then the label values of the key points and a group of key points which are closest to the label values of the key points are obtained, and the key points in the group are human body posture information of one person in the image. One method of determining the keypoint label value and the closest set of keypoints is as follows: and acquiring a head key point, taking the head key point as an example, acquiring the label value sum of the head key point, and acquiring the key point with the label value and the minimum variance of each other type of key point. The key points are grouped into a group.

Therefore, the two-dimensional human body posture information of each person in the image can be obtained through the method.

Step S006: and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.

The neural network is utilized to acquire the posture information of each person in the image in real time, then the time convolution network is adopted to carry out body-building action state recognition on the human posture information of the time sequence, and the training details are as follows:

the time convolution network comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information. It inputs the time sequence attitude information of each person and outputs as a feature vector. The full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted

. The label data of the network is artificially marked and is input into the network to be subjected to unique hot coding. And optimizing the neural network parameters by adopting a cross entropy loss function based on an Adam algorithm.

The body-building action state identification comprises the following steps: the current state is in the body-building action state and the current state is in the non-body-building action state. And then adjusting the data acquisition based on the MEMS inertial sensor based on different classification results, wherein the data acquisition function based on the MEMS inertial sensor is closed in the non-fitness action state, and the data acquisition function based on the MEMS inertial sensor is opened in the fitness action state.

Therefore, the body-building action state of the person can be identified through the neural network.

Based on the same concept as the method embodiment, another embodiment of the present invention further provides a system for identifying artificial intelligence of a fitness activity, where the system includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the steps of the method for monitoring the fitness activity in real time provided in any of the embodiments when executing the computer program, where the method for monitoring the fitness activity in real time is described in detail in the embodiments and is not repeated.

It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A real-time monitoring method for fitness actions is characterized by comprising the following steps:

a camera is deployed to collect RGB images; labeling the RGB image to obtain a human body key point category labeled image; forming a data set by the RGB image and the human body key point category labeling image;

segmenting the human body part of the RGB image by using a BodyPix 2.0 to obtain a human body part segmentation image; establishing a corresponding relation between the human body key points and human body parts, and acquiring the average area of the human body part segmentation corresponding to each human body key point category in the data set based on the corresponding relation; acquiring the Gaussian distribution size and the Gaussian distribution standard deviation of each key point in each key point category based on the average area of the human body part segmentation corresponding to each human body key point category;

acquiring an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector of the image based on the Gaussian distribution size and the Gaussian distribution standard deviation of each key point; obtaining the label value of each person in the image according to the positions of the key points in the human body key point category label image; acquiring an x-label value distribution vector and a y-label value distribution vector according to the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector of the image and the label value of each person in the image; tag data are formed by the x coordinate one-dimensional distribution vector, the y coordinate one-dimensional distribution vector sum, the x tag value distribution vector and the y tag value distribution vector of the image;

establishing a first neural network, and training the first neural network according to the RGB image and the label data;

obtaining predictive tag data using the first neural network; acquiring human body posture information of each person in the image according to the predicted tag data;

and establishing a second neural network according to the human body posture information of each person for identifying the fitness action state of the person.

2. A method for real-time monitoring of fitness activities according to claim 1, wherein the gaussian distribution of each key point is calculated by:

in order to be a function of rounding up,

indicating the baseline gaussian distribution size.

3. A method for real-time monitoring of fitness activities according to claim 1, wherein the gaussian distribution standard deviation of each key point is calculated as:

in order to be a function of rounding up,

the standard deviation of the baseline gaussian distribution is indicated.

4. A real-time monitoring method for body-building actions according to claim 1, wherein the x-coordinate one-dimensional distribution vector and the y-coordinate one-dimensional distribution vector are obtained by:

firstly, generating an x coordinate one-dimensional vector with the width of w and a y coordinate one-dimensional vector with the height of h;

for each one-dimensional vector, substituting the normalized coordinates corresponding to the key points into a one-dimensional Gaussian distribution function, and simultaneously substituting the neighborhood normalized coordinates obtained by the coordinates according to the distribution size of the key points to obtain a one-dimensional Gaussian distribution value of each key point;

the normalized coordinate of each key point is 0, and a neighborhood of each key point is obtained according to the distribution size of the key point, wherein the normalized coordinate of the neighborhood is a coordinate with the key point as an origin; finally, substituting the key point into a one-dimensional Gaussian distribution function, then obtaining a Gaussian distribution probability value, and then carrying out normalization to obtain the one-dimensional Gaussian distribution probability value of the key point and the corresponding distribution size; each key point has a corresponding distribution size and a one-dimensional Gaussian distribution standard deviation;

and then substituting and calculating all key points one by one, and adding the one-dimensional Gaussian distribution probability values calculated by all key points of each image to obtain an x-coordinate one-dimensional distribution vector and a y-coordinate one-dimensional distribution vector of each image.

5. A method for real-time monitoring of fitness activity according to claim 1, wherein the tag value of each person is calculated by:

6. The method for monitoring fitness activities in real time as claimed in claim 1, wherein the x-label value distribution vector and the y-label value distribution vector are obtained by:

firstly, acquiring one-dimensional Gaussian distribution probability values of all key points of a person, summing corresponding positions of the one-dimensional Gaussian distribution probability values of all the key points, averaging to obtain average one-dimensional Gaussian distribution probability, and multiplying the average one-dimensional Gaussian distribution probability by a label value to obtain a label distribution vector;

because the coordinate vector is expressed as x and y one-dimensional vectors, the x label value distribution vector and the y label value distribution vector are also available; all category keypoint label values for each person are the same value.

7. A method as claimed in claim 1, wherein the first neural network has a structure and loss function of:

the first neural network is structurally composed of an image encoder and a full-connection network;

the input of the image encoder is an RGB image which is used for feature extraction, then a feature map is obtained, the feature map is subjected to flattening operation to obtain a feature vector, the feature vector is input into a full-connection network for fitting, and an x coordinate one-dimensional distribution vector, a y coordinate one-dimensional distribution vector, an x label value distribution vector and a y label value distribution vector of each type of key point are output;

the loss function is as follows:

8. A method for real-time monitoring of fitness activity according to claim 1, wherein the method for obtaining the body posture information of each person in the image according to the predictive tag data comprises:

acquiring suspected key point x and y coordinates in an x coordinate one-dimensional distribution vector and a y coordinate one-dimensional distribution vector predicted by a first neural network by using a threshold method;

the method comprises the following steps of obtaining the association degree of the x coordinate of the suspected key point and the y coordinate of the suspected key point, wherein the calculation method of the association degree of the coordinates comprises the following steps:

calculating the cosine similarity of the one-dimensional Gaussian distribution sequence with the corresponding distribution length by taking the x and y coordinates as a center;

and then matching the x and y coordinates of the suspected key points, wherein the matching method adopts KM matching to obtain the optimal maximum matching of cosine similarity so as to obtain matched x and y coordinates, and the matched coordinates are the coordinates of the key points of the personnel in the image.

9. The method for monitoring the fitness activity in real time as claimed in claim 1, wherein the structure and the training method of the second neural network are as follows:

the second neural network is a time convolution network and comprises a time sequence encoder and a full-connection network, wherein the time sequence encoder is used for extracting time sequence attitude information, inputting the time sequence attitude information of each person and outputting the time sequence attitude information as a feature vector; the full-connection network plays the roles of fitting and feature mapping, the input is a feature vector, the body-building action state identification of the person is finally output, and a classification function is adopted

10. A fitness activity artificial intelligence recognition system comprising a processor and a memory, wherein the processor is configured to execute a fitness activity real-time monitoring method according to any one of claims 1-9 stored in the memory.