CN112464915A

CN112464915A - Push-up counting method based on human body bone point detection

Info

Publication number: CN112464915A
Application number: CN202011606081.1A
Authority: CN
Inventors: 吴柯维; 叶佳林
Original assignee: Nanjing Jitu Network Technology Co ltd
Current assignee: Nanjing Jitu Network Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-03-09
Anticipated expiration: 2040-12-30
Also published as: CN112464915B

Abstract

The invention relates to a push-up counting method based on human body bone point detection, which comprises the steps of obtaining a human body push-up image, marking human body bone point position information, training and obtaining a human body bone point detection network model; shooting the image of the human body of the tested person by a camera, and inputting the obtained image of the human body into the trained human body skeleton point detection network model according to the shooting sequence; acquiring human skeleton point information in the shot picture; judging whether the signal of the detected human skeleton is in a straightened state signal or a straightened state signal according to the position information of the preset skeleton point; if the human body of the tested person has a continuous straightening and bending signal, the person is judged to do push-up counting one. The push-up data acquisition can be realized only by one camera, and the implementation mode is simple and feasible and is easy to popularize; and push-up counting is realized by an image method, so that the manpower condition is reduced.

Description

Push-up counting method based on human body bone point detection

Technical Field

The invention relates to the technical field of image recognition, in particular to a push-up counting method based on human body bone point detection.

Background

The push-up is a common body-building exercise, mainly exercises the muscle group of the triceps brachii, the deltoid muscle, the serratus forepawns, the brachial coracoid muscle and other parts of the body, mainly plays a role in improving the muscle strength of the upper limbs, the chest, the waist, the back and the abdomen, and is a simple, easy and effective strength training means. To achieve a perfect starting position for push-ups, the body must be kept straight from the shoulder to the ankle, the arms should be pushed up on the chest with the hands slightly wider than the shoulder, thus ensuring more effective biceps training for each activity.

The prior art application number is CN201610303349, the name of which is: a counting method for push-up test discloses a method for respectively arranging a first identification area and a second identification area with specific lines in a test area; in the test, continuously shooting at a certain frame rate by using a camera arranged obliquely above a test area to obtain a test image comprising a tester, a first texture and a second texture; in each frame of test image from the initial frame to the later frame, judging whether the minimum value of the pixel value of the first texture is smaller than the low threshold value in the process of changing from reduction to increase, and then judging whether the maximum value of the pixel value of the first texture is larger than the high threshold value in the process of changing from increase to decrease, if so, judging whether the pixel values of the second texture are larger than the kneeling posture threshold value in the judging period, and if so, counting once. The error rate of the pixel value comparison method adopted by the method is higher, and the method has higher requirements on the acquisition angle of the camera.

The invention name of prior art application number CN202010081035.8 is: a push-up test method and a device thereof disclose a rule judgment and performance measurement for push-up test by using a camera and a machine vision technology, which breaks through the limitation that the traditional push-up test technology can not carry out intelligent test, and the intellectualization of the push-up test is realized by introducing the camera and the machine vision technology into the push-up test method and the device, thereby improving the efficiency of the push-up test and the accuracy of the measurement of the push-up performance. But the count is also more demanding on the acquisition of pictures.

The traditional push-up counting device mainly has a push-up counting mode and two push-up counting modes based on wearable equipment. Based on push type counting mode, judge counting mode through pressing, need be complicated at the site operation, can not open-air erection equipment, receive sleet weather influence great, probably lead to pressing equipment or sensor to damage. According to the control method and device based on the wearable equipment, a user needs to wear the device, so that the user experience is poor.

Disclosure of Invention

1. The technical problem to be solved is as follows:

aiming at the technical problem, the invention provides a push-up counting method based on human body bone point detection.

2. The technical scheme is as follows:

a push-up identification method based on human body identification is characterized in that: the method comprises the following steps:

the method comprises the following steps: acquiring a picture containing a human body push-up image, extracting and marking human body bone point location information from human body push-up image data so as to obtain a human body bone point location data set; inputting the marked push-up picture and the marked truth value data into a human body bone point detection network to train and obtain a human body bone point detection network model, wherein the human body bone point detection network is a human body bone point detection network based on multi-scale fusion.

Step two: shooting the image of the human body of the tested person by a camera, and inputting the obtained image of the human body into the trained human body skeleton point detection network model according to the shooting sequence; acquiring human skeleton point information in the shot picture;

step three: judging a signal of the detected human skeleton according to the position information of the preset skeleton point; the obtained human skeleton point information comprises positions of a right shoulder point, a right elbow point, a right wrist point, a left shoulder point, a left elbow point, a left wrist point, a right hip point, a right knee point, a right ankle point, a left hip point, a left knee point and a left ankle point; the preset skeleton point position information is that the break angle of a broken line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a measured person is more than 150 degrees, the break angle of a broken line formed by connecting a left shoulder point, a left elbow point and a left wrist point is more than 150 degrees, the break angle of a broken line formed by connecting a right hip point, a right knee point and a right ankle point is more than 170 degrees, and the break angle of a broken line formed by connecting a left hip point, a left knee point and a left ankle point is more than 170 degrees, so that a signal that the human body is in a straight state is judged; presetting that the break angle of a broken line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a tested person is less than 30 degrees, the break angle of a broken line formed by connecting a left shoulder point, a left elbow point and a left wrist point is less than 30 degrees, the break angle of a broken line formed by connecting a right hip point, a right knee point and a right ankle point is more than 170 degrees, and the break angle of a broken line formed by connecting a left hip point, a left knee point and a left ankle point is more than 170 degrees, and then judging that the human body is in a straightening state signal.

Step four: if the human body of the tested person has a continuous straightening and bending signal, the person is judged to do push-up counting one.

Further, the step one of acquiring the human body push-up image data is specifically to acquire a video and a picture for making push-up in a network; cutting each frame of the obtained video into an image; and mixing the cut image and the image obtained by the original crawler, marking out the person in the image, cutting out the marked person, uniformly performing normalization processing, and then performing skeleton point location marking.

Further, the human body skeleton point detection network comprises a backbone network, a feature fusion network and an output network.

Further, a backbone network of the human skeleton point detection network uses resnet101 as the backbone network, wherein the resnet101 is composed of 5 convolution modules; the first convolution module of resnet101 consists of three repeated convolution groups, each convolution group containing 64 1 × 1 convolution layers, 64 3 × 3 convolution layers and 256 1 × 1 convolution layers; the second convolution module of resnet101 consists of four repeated convolution groups, each convolution group containing 128 1 × 1 convolution layers, 128 3 × 3 convolution layers and 512 1 × 1 convolution layers; the third convolution module of resnet101 consists of twenty-three repeating convolution groups, each convolution group containing 256 1 x 1 convolutional layers, 256 3 x 3 convolutional layers, and 1024 1 x 1 convolutional layers; the fourth convolution module of resnet101 consists of three repeated convolution groups, each convolution group containing 512 1 × 1 convolution layers, 512 3 × 3 convolution layers and 2048 1 × 1 convolution layers.

Furthermore, the feature fusion network of the human body bone point detection network is a high-low layer fusion structure realized by fusing three times of down-sampling, four times of down-sampling and five times of down-sampling scale feature maps; for triple down-sampling, 3 × 3 convolution operation with the step length of 2 is firstly carried out on a triple down-sampling layer to realize down-sampling, then 1 × 1 convolution layer is used to realize channel multiplication, the number and the dimensionality of the channels are the same as those of a quadruple down-sampling layer, for a quintuple down-sampling layer, the number of the channels is reduced to half of the original number through 1 × 1 convolution, and then the size multiplication of a characteristic layer is realized through up-sampling; connecting the feature layers obtained by the three scales, and finally performing 3-by-3 convolution on the connected feature layers to generate a feature layer A; and respectively performing 1 × 1 convolution on the feature layer A, respectively performing 3 × 3 convolution and 1 × 1 convolution to generate a feature layer B, respectively performing 1 × 1 convolution and 2 fully-connected layers on the feature layer A to generate a feature layer C, and multiplying A and C to obtain a heat point diagram D.

Further, the output layer of the human body skeleton point detection network outputs key point coordinates and coordinates classification through a hotspot graph D to output key points of the key points, and the prediction loss function uses MSE loss classification and uses mutual entropy loss.

3. Has the advantages that:

(1) the push-up data acquisition of the testee can be realized only by one camera, and the implementation mode is simple and feasible and is easy to popularize.

(2) The push-up counting is realized by an image method, so that the manpower condition is reduced.

(3) According to the invention, the multi-scale feature layers are fused and then feature selection is carried out, so that the network output layer information not only has deep semantic information, but also has low-level position information.

(4) The invention sends the attention mechanism to the multi-scale fusion layer, so that the position of the human skeleton point which can be learned by the network is more accurate.

(5) In the invention, the human skeleton point detection network model is trained by using the multilayer convolutional neural network model in the deep learning model, so that the detection is more accurate. .

Drawings

FIG. 1 is a schematic diagram of the present invention for collecting the picture of the tested person;

FIG. 2 is a diagram of human bone points in accordance with the present invention;

FIG. 3 is a flow chart of a detection network model for training human skeletal points in the present invention;

FIG. 4 is a schematic view of a human skeletal point detection network attention mechanism according to the present invention; (ii) a

FIG. 5 is a schematic structural diagram of a feature fusion network of the human skeletal point detection network according to the present invention;

FIG. 6 is a flow chart of the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

As shown in the attached drawings 1 to 5, a push-up recognition method based on human body recognition is characterized in that: the method comprises the following steps:

The specific embodiment is as follows:

as shown in fig. 1, the image acquisition of the subject needs an imaging unit capable of realizing the position of the subject, and fig. 1 is an optional implementation manner in a schematic diagram thereof, in which a camera is installed obliquely downward, and an image set of the subject can be acquired by the camera.

As shown in fig. 2, in the present invention, a human skeleton point detection network model needs to be established first, and when a picture including a human body is input, human skeleton points in the picture can be output quickly and accurately. The skeleton point bitmap of the human body in the invention records 18 point positions, wherein, 0 represents nose, 1 represents neck, 2 represents right shoulder, 3 represents right elbow, 4 represents right wrist, 5 represents left shoulder, 6 represents left elbow, 7 represents left wrist, 8 represents right hip, 9 represents right knee, 10 represents right ankle, 11 represents left hip, 12 represents left knee, 13 left ankle, 14 represents right eye, 15 represents left eye, 16 represents right ear, and 17 represents left ear.

As shown in fig. 3, the human bone point detection network model of the invention obtains a picture containing a human push-up image, extracts and marks human point location information from the human push-up image data to obtain a human bone point data set; and inputting the marked push-up picture and the marked truth value data into a human body bone point detection network to train and obtain a human body bone point detection network model. When a person prepares for push-up, when the point 2, the point 3 and the point 4 are sequentially connected to form a break angle, the break angle is larger than 150 degrees, the point 5, the point 6 and the point 7 are sequentially connected to form a break angle, the break angle is larger than 150 degrees, the point 8, the point 9 and the point 10 are sequentially connected to form a break angle, the break angle is larger than 170 degrees, the point 11, the point 12 and the point 13 are sequentially connected to form a break angle, the break angle is larger than 170 degrees, a signal in a straightening state is sent out, when the point 2, the point 3 and the point 4 are sequentially connected to form a break angle, the break angle is smaller than 30 degrees, the point 5, the point 6 and the point 7 are smaller than 30 degrees, the point 8, the point 9 and the point 10 are sequentially connected to form a break angle, the break angle is larger than 170 degrees, the point 11, the point 12 and the point 13 are sequentially connected to form a break angle, the break angle is larger. When a continuous straightening and bending signal is identified, the count of the push-ups is increased by 1.

Fig. 4 is a schematic structural diagram of a human skeleton point detection network attention mechanism, namely, a structural diagram of resnet 101; as shown, where resnet101 consists of 5 convolution modules; the first convolution module of resnet101 consists of three repeated convolution groups, each convolution group containing 64 1 × 1 convolution layers, 64 3 × 3 convolution layers and 256 1 × 1 convolution layers; the second convolution module of resnet101 consists of four repeated convolution groups, each convolution group containing 128 1 × 1 convolution layers, 128 3 × 3 convolution layers and 512 1 × 1 convolution layers; the third convolution module of resnet101 consists of twenty-three repeating convolution groups, each convolution group containing 256 1 x 1 convolutional layers, 256 3 x 3 convolutional layers, and 1024 1 x 1 convolutional layers; the fourth convolution module of resnet101 consists of three repeated convolution groups, each convolution group containing 512 1 × 1 convolution layers, 512 3 × 3 convolution layers and 2048 1 × 1 convolution layers.

FIG. 5 is a schematic structural diagram of a feature fusion network of the human skeletal point detection network according to the present invention; the feature fusion network of the human body skeleton point detection network is a high-low layer fusion structure realized by fusing three times of downsampling, four times of downsampling and five times of downsampling scale feature maps; for triple down-sampling, 3 × 3 convolution operation with the step length of 2 is firstly carried out on a triple down-sampling layer to realize down-sampling, then 1 × 1 convolution layer is used to realize channel multiplication, the number and the dimensionality of the channels are the same as those of a quadruple down-sampling layer, for a quintuple down-sampling layer, the number of the channels is reduced to half of the original number through 1 × 1 convolution, and then the size multiplication of a characteristic layer is realized through up-sampling; connecting the feature layers obtained by the three scales, and finally performing 3-by-3 convolution on the connected feature layers to generate a feature layer A; and respectively performing 1 × 1 convolution on the feature layer A, respectively performing 3 × 3 convolution and 1 × 1 convolution to generate a feature layer B, respectively performing 1 × 1 convolution and 2 fully-connected layers on the feature layer A to generate a feature layer C, and multiplying A and C to obtain a heat point diagram D.

Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A push-up identification method based on human body identification is characterized in that: the method comprises the following steps:

the method comprises the following steps: acquiring a picture containing a human body push-up image, extracting and marking human body bone point location information from human body push-up image data so as to obtain a human body bone point location data set; inputting the marked push-up picture and the marked truth value data into a human body bone point detection network to train and obtain a human body bone point detection network model, wherein the human body bone point detection network is a human body bone point detection network based on multi-scale fusion;

step three: judging a signal of the detected human skeleton according to the position information of the preset skeleton point; the obtained human skeleton point information comprises positions of a right shoulder point, a right elbow point, a right wrist point, a left shoulder point, a left elbow point, a left wrist point, a right hip point, a right knee point, a right ankle point, a left hip point, a left knee point and a left ankle point; the preset skeleton point position information is that the break angle of a broken line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a measured person is more than 150 degrees, the break angle of a broken line formed by connecting a left shoulder point, a left elbow point and a left wrist point is more than 150 degrees, the break angle of a broken line formed by connecting a right hip point, a right knee point and a right ankle point is more than 170 degrees, and the break angle of a broken line formed by connecting a left hip point, a left knee point and a left ankle point is more than 170 degrees, so that a signal that the human body is in a straight state is judged; presetting a folding angle of a folding line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a tested person to be less than 30 degrees, a folding angle of a folding line formed by connecting a left shoulder point, a left elbow point and a left wrist point to be less than 30 degrees, a folding angle of a folding line formed by connecting a right hip point, a right knee point and a right ankle point to be more than 170 degrees, and a folding angle of a folding line formed by connecting a left hip point, a left knee point and a left ankle point to be more than 170 degrees, and judging that the human body is in a straightening state signal;

2. The push-up recognition method based on human body recognition according to claim 1, characterized in that: acquiring human body push-up image data in the first step, specifically acquiring videos and pictures for making push-up in a network; cutting each frame of the obtained video into an image; and mixing the cut image and the image obtained by the original crawler, marking out the person in the image, cutting out the marked person, uniformly performing normalization processing, and then performing skeleton point location marking.

3. The push-up recognition method based on human body recognition according to claim 1, characterized in that: the human body skeleton point detection network comprises a backbone network, a feature fusion network and an output network.

4. The push-up recognition method based on human body recognition according to claim 3, characterized in that: the backbone network of the human skeleton point detection network uses a resnet101 as the backbone network, wherein the resnet101 is composed of 5 convolution modules; the first convolution module of resnet101 consists of three repeated convolution groups, each convolution group containing 64 1 × 1 convolution layers, 64 3 × 3 convolution layers and 256 1 × 1 convolution layers; the second convolution module of resnet101 consists of four repeated convolution groups, each convolution group containing 128 1 × 1 convolution layers, 128 3 × 3 convolution layers and 512 1 × 1 convolution layers; the third convolution module of resnet101 consists of twenty-three repeating convolution groups, each convolution group containing 256 1 x 1 convolutional layers, 256 3 x 3 convolutional layers, and 1024 1 x 1 convolutional layers; the fourth convolution module of resnet101 consists of three repeated convolution groups, each convolution group containing 512 1 × 1 convolution layers, 512 3 × 3 convolution layers and 2048 1 × 1 convolution layers.

5. The push-up recognition method based on human body recognition according to claim 3, characterized in that: the feature fusion network of the human body skeleton point detection network is a high-low layer fusion structure realized by fusing three times of downsampling, four times of downsampling and five times of downsampling scale feature maps; for triple down-sampling, 3 × 3 convolution operation with the step length of 2 is firstly carried out on a triple down-sampling layer to realize down-sampling, then 1 × 1 convolution layer is used to realize channel multiplication, the number and the dimensionality of the channels are the same as those of a quadruple down-sampling layer, for a quintuple down-sampling layer, the number of the channels is reduced to half of the original number through 1 × 1 convolution, and then the size multiplication of a characteristic layer is realized through up-sampling; connecting the feature layers obtained by the three scales, and finally performing 3-by-3 convolution on the connected feature layers to generate a feature layer A; and respectively performing 1 × 1 convolution on the feature layer A, respectively performing 3 × 3 convolution and 1 × 1 convolution to generate a feature layer B, respectively performing 1 × 1 convolution and 2 fully-connected layers on the feature layer A to generate a feature layer C, and multiplying A and C to obtain a heat point diagram D.

6. The push-up recognition method based on human body recognition according to claim 5, characterized in that: the output layer of the human body skeleton point detection network outputs the coordinates of the key points through the hot spot graph D and realizes the output of the key points through coordinate classification, and the prediction loss function uses the MSE loss classification and uses the mutual entropy loss.