CN112464915B

CN112464915B - Push-up counting method based on human skeleton point detection

Info

Publication number: CN112464915B
Application number: CN202011606081.1A
Authority: CN
Inventors: 吴柯维; 叶佳林
Original assignee: Nanjing Jitu Network Technology Co ltd
Current assignee: Nanjing Jitu Network Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-03-26
Anticipated expiration: 2040-12-30
Also published as: CN112464915A

Abstract

The invention relates to a push-up counting method based on human skeleton point detection, which comprises the steps of obtaining a human push-up image, marking human skeleton point information, and training to obtain a human skeleton point detection network model; shooting images of the human body of a tested person through a camera, and inputting the acquired images of the human body into a trained human body skeleton point detection network model according to shooting sequence; acquiring human skeleton point information in the shot picture; judging whether the detected signals of the human bones are in a straightening state signal or a straightening state signal according to the position information of the preset bone points; if a continuous straightening and bending signal appears on the human body of the tested person, the person is judged to perform push-up counting. The invention can realize the acquisition of push-up data by only one camera, and the realization mode is simple and feasible and is easy to popularize; and push-up counting is realized by an image method, so that the manpower condition is reduced.

Description

Push-up counting method based on human skeleton point detection

Technical Field

The invention relates to the technical field of image recognition, in particular to a push-up counting method based on human skeleton point detection.

Background

The push-up is a common body building exercise, mainly exercises the triceps brachii muscle group, simultaneously exercises the deltoid anterior bundle, the anterior saw muscle, the coracoid brachii muscle and other parts of the body, has the main functions of improving the muscle strength of the upper limbs, the chest, the waist, the back and the abdomen, and is a simple and feasible but very effective strength training means. To achieve a perfect starting position for push-ups, the body must be kept in line from the shoulders to the ankle, the arms should be pushed up to rest on the chest, and the hands are slightly wider than the shoulders, thus ensuring that each motion exercises the triceps brachii more effectively.

The prior art has the application number of CN201610303349 and the invention name of: a counting method for push-up test is disclosed, which is characterized in that a first identification area and a second identification area with specific lines are respectively arranged in a test area; in the test, a camera arranged obliquely above a test area is utilized to continuously shoot at a certain frame rate to obtain a test image comprising a tester, a first grain and a second grain; in each frame of test image from the initial frame, judging whether the minimum value of the pixel value of the first texture is smaller than the low threshold value in the process of changing from decrease to increase, then the maximum value of the pixel value of the first texture is larger than the high threshold value in the process of changing from increase to decrease, if so, judging whether the pixel value of the second texture is larger than the kneeling threshold value in the judging period, and if so, counting once. The pixel value comparison method adopted by the method has higher error rate and higher requirement on the acquisition angle of the camera.

The prior art application number is CN202010081035.8 with the invention name: the push-up test method and device are used for performing regular judgment and score measurement of the push-up test by using a camera and a machine vision technology, so that the limitation that the conventional push-up test technology cannot perform intelligent test is broken through. But the count is also a relatively high requirement for the acquisition of pictures.

The traditional push-up counting device mainly comprises a push-up counting mode and a push-up counting mode based on wearable equipment. Based on the push type counting mode, the push type counting mode is judged through pushing, the on-site construction is complex, equipment cannot be installed in the open air, the influence of rain and snow weather is large, and the push equipment or the sensor is possibly damaged. Control method and device based on wearable equipment, user needs to wear the device, so that user experience is poor.

Disclosure of Invention

1. The technical problems to be solved are as follows:

aiming at the technical problems, the invention provides a push-up counting method based on human skeleton point detection, which is used for training a human skeleton point detection network model based on a human skeleton point detection network attention mechanism, and judging the position information of a human body by acquiring the skeleton point of the human body of a tested person and outputting push-up pictures of the tested person so as to realize the push-up counting.

2. The technical scheme is as follows:

a push-up recognition method based on human body recognition is characterized in that: the method comprises the following steps:

step one: acquiring a picture containing a human body push-up image, and extracting and marking human body bone point position information from the human body push-up image data to obtain a human body bone point position data set; inputting the marked push-up picture and the true value data marked by the mark into a human skeleton point detection network for training to obtain a human skeleton point detection network model, wherein the human skeleton point detection network is a human skeleton point detection network based on multi-scale fusion.

Step two: shooting images of the human body of a tested person through a camera, and inputting the acquired images of the human body into a trained human body skeleton point detection network model according to shooting sequence; acquiring human skeleton point information in the shot picture;

step three: judging the signal of the detected human skeleton according to the position information of the preset skeleton points; the obtained human skeleton point information comprises positions of a right shoulder point, a right elbow point, a right wrist point, a left shoulder point, a left elbow point, a left wrist point, a right hip point, a right knee point, a right ankle point, a left hip point, a left knee point and a left ankle point; the position information of the preset skeleton point is that the folding angle of a folding line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a tested person is larger than 150 degrees, the folding angle of a folding line formed by connecting a left shoulder point, a left elbow point and a left wrist point is larger than 150 degrees, the folding angle of a folding line formed by connecting a right hip point, a right knee point and a right ankle point is larger than 170 degrees, and the folding angle of a folding line formed by connecting a left hip point, a left knee point and a left ankle point is larger than 170 degrees, so that a human body is judged to be in a straightened state signal; the method comprises the steps that the folding angle of a folding line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a tested person is preset to be smaller than 30 degrees, the folding angle of a folding line formed by connecting a left shoulder point, a left elbow point and a left wrist point is preset to be smaller than 30 degrees, the folding angle of a folding line formed by connecting a right hip point, a right knee point and a right ankle point is preset to be larger than 170 degrees, and the folding angle of a folding line formed by connecting a left hip point, a left knee point and a left ankle point is preset to be larger than 170 degrees, so that a human body is judged to be in a straightened state signal.

Step four: if a continuous straightening and bending signal appears on the human body of the tested person, the person is judged to perform push-up counting.

Further, in the first step, the image data of the push-up of the human body is obtained, specifically, a video and a picture for making the push-up are obtained in a network; cutting each frame of the acquired video into images; and mixing the cut image with the image obtained by the original crawler, marking the person in the image, cutting out the marked person, uniformly carrying out normalization processing, and marking the bone point positions.

Further, the human skeleton point detection network comprises a main network, a characteristic fusion network and an output network.

Further, the main network of the human skeleton point detection network uses a resnet101 as the main network, wherein the resnet101 consists of 5 convolution modules; the first convolution module of resnet101 consists of three repeating convolution groups, each containing 64 1*1 convolution layers, 64 3*3 convolution layers, and 256 1*1 convolution layers; the second convolution module of resnet101 consists of four repeating convolution groups, each containing 128 1*1 convolution layers, 128 3*3 convolution layers, and 512 1*1 convolution layers; the third convolution module of resnet101 consists of twenty-three repeated convolution groups, each containing 256 1*1 convolution layers, 256 3*3 convolution layers, and 1024 1*1 convolution layers; the fourth convolution module of resnet101 consists of three repeating convolution groups, each containing 512 1*1 convolution layers, 512 3*3 convolution layers, and 2048 1*1 convolution layers.

Further, the feature fusion network of the human skeleton point detection network realizes a high-low layer fusion structure by fusing three times of downsampling, four times of downsampling and five times of downsampling scale feature graphs; for three times of downsampling, firstly, performing 3*3 convolution operation with the step length of 2 on a three times of downsampling layer to realize downsampling, then realizing channel multiplication on a 1*1 convolution layer, so that the number and the dimension of channels of the three times of downsampling layer are the same as those of the four times of downsampling layer, and for five times of downsampling layer, firstly, reducing the number of channels of the downsampling layer to half of the original number through 1*1 convolution, and then realizing the size multiplication of a feature layer through upsampling; connecting the feature layers obtained by three scales, and generating a feature layer A after 3*3 convolution of the feature layers obtained by connection; and respectively carrying out 1*1 convolution, 3*3 convolution and 1*1 convolution on the feature layer A to generate a feature layer B, carrying out 1*1 convolution on the feature layer A and 2 fully connected layers to generate a feature layer C, and multiplying the feature layer A and the feature layer C to obtain a heat point diagram D.

Further, the output layer of the human skeleton point detection network realizes the output of key points by outputting key point coordinates and classifying coordinates through the hotspot graph D, and the prediction loss function uses MSE loss classification and uses mutual entropy loss.

3. The beneficial effects are that:

(1) The invention can collect the data of the push-up of the tested person by only one camera, and has simple and feasible implementation mode and easy popularization.

(2) According to the invention, push-up counting is realized by an image method, so that the manpower condition is reduced.

(3) The invention carries out feature selection after fusing the multi-scale feature layers, so that the information of the network output layer has deep semantic information and low-layer position information.

(4) According to the invention, the multiscale fusion layer is sent into the attention mechanism, so that the position of the human skeleton point which can be learned by the network is more accurate.

(5) In the invention, a multi-layer convolutional neural network model in the deep learning model is firstly used for training a human skeleton point detection network model, and the detection is more accurate. .

Drawings

FIG. 1 is a schematic diagram of a subject taking pictures according to the present invention;

FIG. 2 is a human skeleton point bitmap according to the present invention;

FIG. 3 is a flow chart of training a human skeletal point detection network model in accordance with the present invention;

FIG. 4 is a schematic diagram of the attention mechanism of the human skeletal point detection network according to the present invention; the method comprises the steps of carrying out a first treatment on the surface of the

FIG. 5 is a schematic diagram of a feature fusion network of a human skeletal point detection network according to the present invention;

fig. 6 is a flow chart of the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

The push-up recognition method based on human body recognition as shown in the accompanying drawings from 1 to 5 is characterized in that: the method comprises the following steps:

Specific examples:

as shown in fig. 1, the image acquisition of the tested person in the present invention needs an imaging unit capable of realizing the position of the tested person, and fig. 1 is an alternative implementation manner in a schematic diagram, in which a camera is installed obliquely downward, and an image set of the tested person can be acquired through the camera.

As shown in figure 2, the invention needs to establish a human skeleton point detection network model first, and can rapidly and accurately output human skeleton points in a picture when the picture including the human body is input. In the human skeleton point diagram, 18 points are recorded in total, wherein 0 represents a nose, 1 represents a neck, 2 represents a right shoulder, 3 represents a right elbow, 4 represents a right wrist, 5 represents a left shoulder, 6 represents a left elbow, 7 represents a left wrist, 8 represents a right hip, 9 represents a right knee, 10 represents a right ankle, 11 represents a left hip, 12 represents a left knee, 13 left ankle, 14 represents a right eye, 15 represents a left eye, 16 represents a right ear, and 17 represents a left ear.

As shown in fig. 3, the human skeleton point detection network model in the invention acquires a picture containing a human push-up image, extracts and marks human point position information from the human push-up image data to acquire a human skeleton point data set; inputting the marked push-up picture and the marked truth value data into a human skeleton point detection network for training to obtain a human skeleton point detection network model. When a person prepares push-ups, when points 2, 3 and 4 are sequentially connected to form a folding angle, the folding angle of the points is larger than 150 degrees, points 5 and 6 are sequentially connected to form a folding angle, the folding angle of the points is larger than 150 degrees, points 8 and 9 are sequentially connected to form a folding angle, the folding angle of the points is larger than 170 degrees, points 11 and 12 are sequentially connected to form a folding angle, the points 13 are sequentially connected to form a folding angle, the folding angle is larger than 170 degrees, a signal in a stretching state is sent out, when points 2, 3 and 4 are sequentially connected to form a folding angle, the folding angle of the points is smaller than 30 degrees, points 5, 6 and 7 are angled to be smaller than 30 degrees, and points 8, points 9 and 10 are sequentially connected to form a folding angle, the folding angle of the points 11 and 12 is sequentially connected to form a folding angle, the folding angle of the points is larger than 170 degrees, and a bending signal is sent out. When a continuous straightening and bending signal is identified, the push-up count is incremented by 1.

Fig. 4 is a schematic structural diagram of a human skeleton point detection network attention mechanism, namely a structure schematic diagram of a resnet101 in the invention; as shown, wherein the resnet101 consists of 5 convolution modules; the first convolution module of resnet101 consists of three repeating convolution groups, each containing 64 1*1 convolution layers, 64 3*3 convolution layers, and 256 1*1 convolution layers; the second convolution module of resnet101 consists of four repeating convolution groups, each containing 128 1*1 convolution layers, 128 3*3 convolution layers, and 512 1*1 convolution layers; the third convolution module of resnet101 consists of twenty-three repeated convolution groups, each containing 256 1*1 convolution layers, 256 3*3 convolution layers, and 1024 1*1 convolution layers; the fourth convolution module of resnet101 consists of three repeating convolution groups, each containing 512 1*1 convolution layers, 512 3*3 convolution layers, and 2048 1*1 convolution layers.

FIG. 5 is a schematic diagram of a feature fusion network of a human skeletal point detection network according to the present invention; the feature fusion network of the human skeleton point detection network is a high-low layer fusion structure realized by fusing three times of downsampling, four times of downsampling and five times of downsampling scale feature graphs; for three times of downsampling, firstly, performing 3*3 convolution operation with the step length of 2 on a three times of downsampling layer to realize downsampling, then realizing channel multiplication on a 1*1 convolution layer, so that the number and the dimension of channels of the three times of downsampling layer are the same as those of the four times of downsampling layer, and for five times of downsampling layer, firstly, reducing the number of channels of the downsampling layer to half of the original number through 1*1 convolution, and then realizing the size multiplication of a feature layer through upsampling; connecting the feature layers obtained by three scales, and generating a feature layer A after 3*3 convolution of the feature layers obtained by connection; and respectively carrying out 1*1 convolution, 3*3 convolution and 1*1 convolution on the feature layer A to generate a feature layer B, carrying out 1*1 convolution on the feature layer A and 2 fully connected layers to generate a feature layer C, and multiplying the feature layer A and the feature layer C to obtain a heat point diagram D.

While the invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention, and it is intended that the scope of the invention shall be limited only by the claims appended hereto.

Claims

1. A push-up recognition method based on human body recognition is characterized in that: the method comprises the following steps:

step one: acquiring a picture containing a human body push-up image, and extracting and marking human body bone point position information from the human body push-up image data to obtain a human body bone point position data set; inputting the marked push-up picture and the true value data marked by the picture into a human skeleton point detection network for training to obtain a human skeleton point detection network model, wherein the human skeleton point detection network is a human skeleton point detection network based on multi-scale fusion;

step three: judging the signal of the detected human skeleton according to the position information of the preset skeleton points; the obtained human skeleton point information comprises positions of a right shoulder point, a right elbow point, a right wrist point, a left shoulder point, a left elbow point, a left wrist point, a right hip point, a right knee point, a right ankle point, a left hip point, a left knee point and a left ankle point; the position information of the preset skeleton point is that the folding angle of a folding line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a tested person is larger than 150 degrees, the folding angle of a folding line formed by connecting a left shoulder point, a left elbow point and a left wrist point is larger than 150 degrees, the folding angle of a folding line formed by connecting a right hip point, a right knee point and a right ankle point is larger than 170 degrees, and the folding angle of a folding line formed by connecting a left hip point, a left knee point and a left ankle point is larger than 170 degrees, so that a human body is judged to be in a straightened state signal; the method comprises the steps that the folding angle of a folding line formed by connecting a right shoulder point, a right elbow point and a right wrist point of a tested person is preset to be smaller than 30 degrees, the folding angle of a folding line formed by connecting a left shoulder point, a left elbow point and a left wrist point is preset to be smaller than 30 degrees, the folding angle of a folding line formed by connecting a right hip point, a right knee point and a right ankle point is preset to be larger than 170 degrees, and the folding angle of a folding line formed by connecting a left hip point, a left knee point and a left ankle point is preset to be larger than 170 degrees, so that a human body is judged to be in a bending state signal;

step four: if a continuous straightening and bending signal appears on the human body of the tested person, judging that the person performs push-up counting;

step one, acquiring push-up image data of a human body, namely acquiring a video and a picture for push-up in a network; cutting each frame of the acquired video into images; mixing the cut image with the image obtained by the original crawler, marking the person in the image, cutting out the marked person, uniformly carrying out normalization treatment, and marking the bone point positions;

the human skeleton point detection network comprises a main network, a characteristic fusion network and an output network;

the feature fusion network of the human skeleton point detection network realizes a high-low layer fusion structure by fusing three times of downsampling, four times of downsampling and five times of downsampling scale feature graphs; for three times of downsampling, firstly, performing 3*3 convolution operation with the step length of 2 on a three times of downsampling layer to realize downsampling, then realizing channel multiplication on a 1*1 convolution layer, so that the number and the dimension of channels of the three times of downsampling layer are the same as those of the four times of downsampling layer, for five times of downsampling layer, firstly, reducing the number of channels of the five times of downsampling layer to half of the original number through 1*1 convolution, and then realizing the size multiplication of a characteristic layer through upsampling; connecting the feature layers obtained by three scales, and generating a feature layer A after 3*3 convolution of the feature layers obtained by connection; the feature layer A is subjected to 1*1 convolution, 3*3 convolution and 1*1 convolution to generate a feature layer B, the feature layer A is subjected to 1*1 convolution and 2 fully connected layers to generate a feature layer C, and the feature layer A and the feature layer C are multiplied to obtain a heat point diagram D.

2. The push-up recognition method based on human body recognition according to claim 1, wherein: the main network of the human skeleton point detection network uses a resnet101 as the main network, wherein the resnet101 consists of 5 convolution modules; the first convolution module of resnet101 consists of three repeating convolution groups, each containing 64 1*1 convolution layers, 64 3*3 convolution layers, and 256 1*1 convolution layers; the second convolution module of resnet101 consists of four repeating convolution groups, each containing 128 1*1 convolution layers, 128 3*3 convolution layers, and 512 1*1 convolution layers; the third convolution module of resnet101 consists of twenty-three repeated convolution groups, each containing 256 1*1 convolution layers, 256 3*3 convolution layers, and 1024 1*1 convolution layers; the fourth convolution module of resnet101 consists of three repeating convolution groups, each containing 512 1*1 convolution layers, 512 3*3 convolution layers, and 2048 1*1 convolution layers.

3. The push-up recognition method based on human body recognition according to claim 1, wherein: the output layer of the human skeleton point detection network realizes output key point coordinates and coordinate classification through a hotspot graph D, the prediction loss function uses MSE loss, and the classification uses mutual entropy loss.