CN115482425A

CN115482425A - Key point identification method, model training method, device and storage medium

Info

Publication number: CN115482425A
Application number: CN202110662793.3A
Authority: CN
Inventors: 刘朋; 黄铁脉; 柏银; 刘睿
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2022-12-16

Abstract

The application discloses a key point identification method, a model training method, a device, electronic equipment and a storage medium, and relates to the field of artificial intelligence. The key point identification method comprises the following steps: based on the first background image, carrying out human body target detection on the first human body image, and based on a target frame positioned in the human body target detection process, cutting the first human body image and the first background image to obtain a second human body image and a second background image; inputting the second human body image into the set human body part recognition model to obtain a corresponding human body part image, and cutting out an area corresponding to the human body part image from the second background image to obtain a corresponding third background image; and inputting the human body position image and the third background image into the key point identification model, and outputting a key point identification result. Through the technical scheme, the identification of the key points at the specific positions is more accurate, and the accuracy of key point identification is improved.

Description

Key point identification method, model training method, device and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method for identifying a keypoint, a method and an apparatus for training a model, an electronic device, and a storage medium.

Background

The key point identification is a technology for finding out the positions of key points of a human body in an image by utilizing an image processing method and a machine learning method. The method has the advantages that the key points of the specific parts need to be identified in a specific scene, for example, in a scene of standing long jump, the heel needs to be identified, the key points of the specific parts of the human body image are identified through a general algorithm of key point identification, and the problem of low identification accuracy exists.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method for identifying a keypoint, a method for training a model, an apparatus, an electronic device, and a storage medium, so as to at least solve the problem of low accuracy in identification of related technologies.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a method for training a key point recognition model, which comprises the following steps:

inputting the first image sample and the corresponding second image sample into a set key point identification model to obtain a first prediction result and a second prediction result; the first image sample represents an image of a first scene including a human body part; the second image sample represents an image of the first scene which does not contain a human body part; the first prediction result represents the probability that the pixel belongs to the human body part; the second prediction result represents the probability that the pixel is a key point;

calculating a total loss value of the key point identification model based on the first difference value and the second difference value; the first difference value is obtained by calculation based on the first prediction result and the corresponding calibration result; the second difference is calculated based on the second prediction result and the corresponding calibration result;

and updating the weight parameters of the key point identification model according to the total loss value.

In the foregoing scheme, the inputting the first image sample and the corresponding second image sample into the set keypoint identification model to obtain a first prediction result and a second prediction result includes:

inputting the first image sample to a first neural network of the key point identification model to obtain a first feature vector corresponding to the first image sample;

inputting the second image sample into the first neural network to obtain a second feature vector corresponding to the second image sample;

fusing the first feature vector and the second feature vector, inputting a fusion result into a second neural network, and correspondingly obtaining a third feature vector representing the first prediction result;

and inputting the first feature vector into the third neural network, and fusing a fourth feature vector and the third feature vector which are correspondingly obtained to obtain a second prediction result.

In the foregoing solution, the calculating a total loss value of the keypoint identification model based on the first difference and the second difference includes:

inputting a third image sample corresponding to the first image sample into a set human body part recognition model to obtain a third prediction result of the third image sample; the third image sample represents an image shot with a human body outline, and the third image sample is shot with the image content of the first image sample;

calculating a total loss value of the keypoint identification model based on the first difference value, the second difference value and the third difference value; and the third difference is calculated based on the third prediction result and the corresponding calibration result.

The embodiment of the application further provides a key point identification method, which comprises the following steps:

based on a first background image, carrying out human body target detection on a first human body image, and based on a target frame positioned in the human body target detection process, cutting the first human body image and the first background image to obtain a second human body image and a second background image; the first human body image represents an image shot with a second scene containing a human body outline; the first background image represents an image of a second scene without a human body outline;

inputting the second human body image into a set human body part recognition model to obtain a corresponding human body part image, and cutting out an area corresponding to the human body part image from the second background image to obtain a corresponding third background image;

inputting the human body part image and the third background image into a key point identification model, and outputting a key point identification result; wherein the content of the first and second substances,

the key point identification model is obtained by adopting any one of the key point identification model training methods.

In the foregoing solution, the detecting a human target for a first human body image based on a first background image includes:

and detecting the human body target of the first human body image by adopting an optical flow algorithm based on the first background image.

In the scheme, the set human body part identification model is used for identifying the heel; before performing human target detection on the first human body image based on the first background image, the method further comprises:

acquiring a first video shot with the second scene; the second scene comprises a landing area of long jump sports;

determining the first background image and the first human body image from the first video.

The embodiment of the present application further provides a key point recognition model training device, including:

the prediction unit is used for inputting the first image sample and the corresponding second image sample into a set key point identification model to obtain a first prediction result and a second prediction result; the first image sample represents an image of a first scene including a human body part; the second image sample represents an image of the first scene which does not contain a human body part; the first prediction result represents the probability that the pixel belongs to the human body part; the second prediction result represents the probability that the pixel is a key point;

a calculating unit, configured to calculate a total loss value of the keypoint identification model based on the first difference value and the second difference value; the first difference value is obtained by calculation based on the first prediction result and the corresponding calibration result; the second difference value is calculated based on the second prediction result and the corresponding calibration result;

and the updating unit is used for updating the weight parameters of the key point identification model according to the total loss value.

The embodiment of the present application further provides a key point identification device, including:

the first cropping unit is used for detecting a human body target of a first human body image based on a first background image and cropping the first human body image and the first background image based on a target frame positioned in the human body target detection process to obtain a second human body image and a second background image; the first human body image represents an image shot with a second scene containing a human body outline; the first background image represents an image of a second scene without a human body outline;

the second cutting unit is used for inputting the second human body image into a set human body part recognition model to obtain a corresponding human body part image, and cutting out an area corresponding to the human body part image from the second background image to obtain a corresponding third background image;

the recognition unit is used for inputting the human body part image and the third background image into a key point recognition model and outputting a key point recognition result; wherein, the first and the second end of the pipe are connected with each other,

the key point identification model is obtained by adopting the key point identification model training method according to any one of the above items.

An embodiment of the present application further provides a first electronic device, including: a first processor and a first communication interface; wherein, the first and the second end of the pipe are connected with each other,

the first processor is used for inputting the first image sample and the corresponding second image sample into a set key point identification model to obtain a first prediction result and a second prediction result; the first image sample represents an image of a first scene including a human body part; the second image sample represents an image of the first scene without the human body part; the first prediction result represents the probability that the pixel belongs to the human body part; the second prediction result represents the probability that the pixel is a key point;

An embodiment of the present application further provides a second electronic device, including: a second processor and a second communication interface; wherein, the first and the second end of the pipe are connected with each other,

the second processor is used for detecting a human body target of a first human body image based on a first background image, and cutting the first human body image and the first background image based on a target frame positioned in the human body target detection process to obtain a second human body image and a second background image; the first human body image represents an image shot with a second scene containing a human body outline; the first background image represents an image of a second scene without a human body contour;

An embodiment of the present application further provides an electronic device, including: a processor and a memory for storing a computer program capable of running on the processor,

when the computer program is run, the processor is configured to execute the steps of any one of the above-mentioned methods for training the keypoint identification model, or execute the steps of any one of the above-mentioned methods for recognizing the keypoint identification.

An embodiment of the present application further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for training the keypoint recognition model described in any of the above, or implements the steps of the method for recognizing the keypoint recognition described in any of the above.

In the key point identification method, the model training method and device, the electronic device and the storage medium, a first human body image is subjected to human body target detection based on a first background image, the first human body image is characterized by shooting an image of a second scene containing a human body outline, and the first background image is characterized by shooting an image of the second scene not containing the human body outline; cutting a first human body image and a first background image based on a target frame positioned in a human body target detection process to obtain a second human body image and a second background image, inputting the second human body image to a set human body part recognition model to obtain a corresponding human body part image, and cutting out an area corresponding to the human body part image from the second background image to obtain a corresponding third background image; and inputting the human body part image and the third background image into a key point recognition model obtained by the key point recognition model training method, and outputting a key point recognition result corresponding to the human body part image. Therefore, in the process of identifying the key points, the identification objects are gradually reduced on the basis of the original data, so that the identification complexity is gradually reduced, the identification of the key points at specific positions is more accurate, and the accuracy of identifying the key points is improved.

Drawings

Fig. 1 is a schematic flowchart of a method for training a keypoint recognition model according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of a general human body contour recognition algorithm in the related art;

FIG. 3 is a schematic diagram of a method for training a keypoint recognition model according to an embodiment of the present application;

FIG. 4 is a diagram illustrating three loss functions provided in an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a rear heel keypoint identification provided by an application example of the present application;

fig. 6 is a schematic flowchart of a method for identifying a keypoint according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a standing long jump testing method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a keypoint recognition model training device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a keypoint recognition model training device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a keypoint identification apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.

Detailed Description

The key point identification is a technology for finding out the positions of key points of a human body in an image by utilizing an image processing method and a machine learning method. The basic framework of the algorithm is to describe the problem as a regression problem on key points or a classification problem of pixels. The determination of the algorithm model parameters is based on supervised learning, and the optimal or better values of the parameters are found by learning a large number of images marked with key points. The method has the problems that the identification accuracy is not high because the key points of the specific part of the human body image need to be identified in a specific scene, for example, in a scene of standing long jump, the key points of the specific part of the heel need to be identified, and the key points of the specific part of the human body image are identified through a general algorithm for key point identification.

Based on this, in the keypoint identification method, the model training method, the device, the electronic device and the storage medium, in the keypoint identification method, the human body target detection is performed on the first human body image based on the first background image, the first human body image is characterized by shooting an image of a second scene containing a human body contour, and the first background image is characterized by shooting an image of the second scene not containing the human body contour; cutting a first human body image and a first background image based on a target frame positioned in a human body target detection process to obtain a second human body image and a second background image, inputting the second human body image to a set human body part recognition model to obtain a corresponding human body part image, and cutting out an area corresponding to the human body part image from the second background image to obtain a corresponding third background image; and inputting the human body part image and the third background image into a key point recognition model obtained by the key point recognition model training method, and outputting a key point recognition result corresponding to the human body part image. Therefore, in the process of identifying the key points, the identification objects are gradually reduced on the basis of the original data, so that the identification complexity is gradually reduced, the identification of the key points at the specific parts is more accurate, and the accuracy of identifying the key points is improved.

The present application will be described in further detail with reference to the following drawings and examples.

The embodiment of the application provides a method for training a key point recognition model, as shown in fig. 1, the method includes:

step 101: and inputting the first image sample and the corresponding second image sample into a set key point identification model to obtain a first prediction result and a second prediction result. The first image sample represents an image of a first scene containing a human body part; the second image sample represents an image of a first scene without a human body part; the first prediction result represents the probability that the pixel belongs to the human body part; the second prediction characterizes a probability that the pixel is a keypoint.

And inputting a first image sample of the first scene containing the human body part and a second image sample of the first scene not containing the human body part into a set key point identification model to obtain a first prediction result and a second prediction result. Here, the first prediction result is a set part mask value of the image, representing the probability that each pixel belongs to a specific human body part; the second prediction result is a set part key point identification result and represents the probability that each pixel is a key point.

Step 102: a total loss value of the keypoint identification model is calculated based on the first difference and the second difference. The first difference value is obtained by calculation based on the first prediction result and the corresponding calibration result; the second difference is calculated based on the second predicted result and the corresponding calibration result.

And calculating to obtain a first difference value based on the first prediction result and the corresponding calibration result, calculating to obtain a second difference value based on the second prediction result and the corresponding calibration result, and calculating the total loss value of the key point identification model according to the first difference value and the second difference value. Here, the first image sample and the corresponding second image sample may be taken from a sample library, and each first image sample and the corresponding second image sample in the sample library correspond to a corresponding calibration result, and the calibration result is used to represent a set region mask value and/or a set region key point result of the image.

Step 103: and updating the weight parameters of the key point identification model according to the total loss value.

And updating the weight parameters of the key point identification model according to the total loss value of the key point identification model so as to improve the accuracy of the prediction result output by the key point identification model. The method comprises the steps of carrying out back propagation on a total loss value of a key point identification model in the key point identification model, calculating the gradient of a loss function (loss function) according to the total loss value in the process of carrying out back propagation on the total loss value to each layer of the key point identification model, and updating a weight parameter which is back propagated to the current layer along the descending direction of the gradient.

And taking the updated weight parameters as weight parameters used by the trained key point identification model.

Here, an update stop condition may be set, and when the update stop condition is satisfied, the weight parameter obtained by the last update may be determined as the weight parameter used by the trained keypoint recognition model. And updating the stopping conditions such as set training rounds (epochs), wherein one training round is a process of training the key point recognition model once according to the first image sample and the corresponding second image sample. Of course, the update stop condition is not limited to this, and may be, for example, a set Average accuracy (mAP) or the like.

It should be noted that the loss function is used to measure the degree of inconsistency between the predicted value and the actual value (calibration value) of the model. In practical applications, model training is achieved by minimizing a loss function.

The backward propagation is relative to the forward propagation, which refers to the feedforward processing of the model, and the direction of the backward propagation is opposite to the direction of the forward propagation. And the back propagation refers to updating the weight parameters of each layer of the model according to the output result of the model. For example, if the model includes a convolutional layer, a feature fusion layer, and a fully-connected layer, forward propagation refers to processing in the order of convolutional layer-feature fusion layer-fully-connected layer, and backward propagation refers to updating the weight parameters of the layers in turn in the order of fully-connected layer-feature fusion layer-convolutional layer.

As shown in fig. 2, the general human body contour recognition algorithm has a flow chart, and only an image mask value can be obtained from a human body image to determine a human body contour. The input of the algorithm is a human-containing image, the characteristics are extracted through a neural network, pixels are classified or regressed through the neural network, and finally, a human body mask value of the image is output to indicate whether the pixels belong to a human body or not. In the scheme provided by this embodiment, a first image sample of a first scene including a human body part and a second image sample of the first scene not including the human body part are input to a set keypoint identification model, a first prediction result representing the probability that each pixel belongs to a specific human body part and a second prediction result representing the probability that each pixel is a keypoint are obtained, the first prediction result and the second prediction result are respectively calculated with corresponding calibration results, a total loss value is obtained, and a weight parameter of the keypoint identification model is updated according to the total loss value. A contour recognition algorithm is introduced in the training process, the training of the key point recognition model is assisted through the first prediction result, the training efficiency of model training is improved, and the accuracy of the trained key point recognition model is also improved. In addition, the problem of poor generalization capability of the key point recognition model caused by lack of diversity of images can be solved, the generalization capability of the trained key point recognition model is improved, and the accuracy of the trained key point recognition model is further improved. In addition, the weight parameters of the key point identification model are updated according to the total loss value, so that the performance of the convolution layer and the characteristic fusion layer of the key point identification model can be improved, and the accuracy of the trained key point identification model is further improved.

In an embodiment, the inputting the first image sample and the corresponding second image sample into the set keypoint identification model to obtain a first prediction result and a second prediction result includes:

fusing the first characteristic vector and the second characteristic vector, inputting a fusion result into a second neural network, and correspondingly obtaining a third characteristic vector representing the first prediction result;

and inputting the first feature vector into the third neural network, and fusing a correspondingly obtained fourth feature vector and the third feature vector to obtain the second prediction result.

Inputting a first image sample containing a human body part into a first neural network of a key point identification model for feature extraction to obtain a first feature vector corresponding to the first image sample, inputting a second image sample not containing the human body part into the first neural network for feature extraction to obtain a second feature vector corresponding to the second image sample, fusing the first feature vector and the second feature vector to obtain a fusion result, and inputting the fusion result into a second neural network for feature extraction to obtain a first prediction result represented by a third feature vector; and inputting the first feature vector into a third neural network for classification, correspondingly obtaining a fourth feature vector, and fusing the fourth feature vector and the third feature vector to obtain a second prediction result. Here, the fusion of the feature vectors is realized by a set neural network. Therefore, by changing the neural network structure of the key point identification model, the feature vectors are fused when being transmitted among each layer of neural networks in the training and using processes of the key point identification model, so that the required calculated amount is reduced, and the key point identification can be efficiently and accurately carried out. In addition, a contour recognition algorithm is introduced in the training process, the training of the key point recognition model is assisted through the first prediction result, the training efficiency of model training is improved, and the accuracy of the trained key point recognition model is also improved. In addition, feature extraction is carried out on the first image sample and the second image sample through the first neural network, then the obtained feature vectors are fused to achieve background removal of the human body part, and the processed feature vectors can reflect the content of the image better than the original image, so that the effect of background removal is better than that of processing based on the first image sample and the second image sample directly.

In practical applications, a schematic diagram of the method for training the keypoint recognition model is shown in fig. 3. The upper half part of the key point recognition model training method is a key point algorithm, and the lower half part of the key point recognition model training method is a human body contour recognition algorithm.

301: a first neural network.

Inputting a first image sample containing a human body part into the key point recognition model, and performing feature extraction through a first neural network to obtain a first feature vector corresponding to the first image sample. Here, the first neural network is used as a preliminary sample of the image feature, and the first neural network is used to sample the image feature for the first image sample. According to the requirements of real-time performance and precision of the algorithm, the first neural network can select a proper feature extraction neural network. For example, feature extraction network frameworks such as ResNet, mobileNet, inclusion, VGG, and the like. The first image sample may be an image of a tester within a standing jump testing area.

302: a first neural network.

And inputting a second image sample which does not contain the human body part into the key point identification model, and performing feature extraction through the first neural network to obtain a second feature vector corresponding to the second image sample. Here, the first neural network is used as a preliminary sample of the image feature, and the second image sample is subjected to sampling of the image feature using the first neural network. The second image sample is a background image corresponding to the first image, and the image when no tester exists in the standing long jump test area can be acquired. The background image is selected by considering the content change of the changed image caused by natural factors, such as the change of sky clouds in the background image.

The first image sample and the second image sample can be obtained by acquiring a human-containing image and a background image and manually marking the human body outline; the human-containing image and the background image can also be synthesized in a fusion mode through the standard human-containing image marked with the human body outline. Samples marked with heel markers are also used in supervised learning. The algorithm training process can perform operations such as stretching, cutting, rotating, deforming and noise adding on the image sample, eliminate environmental interference factors and improve the robustness of the algorithm.

303: and (4) fusing the networks.

And exchanging and fusing the characteristics of the first characteristic vector and the second characteristic vector to obtain a fusion result.

Let the first eigenvector matrix F1 be an nxm × C three-dimensional eigenvector matrix, the second eigenvector matrix F2 be an nxm × C three-dimensional eigenvector matrix, and the eigenvector matrix after the fusion operation be F0. The optional operations of feature exchange and fusion are as follows:

1、F ₀ ＝F ₂ ⊕F ₁ expressed as F ₂ And F ₁ Are connected to characteristic values of F ₀ Is a matrix of N × M × 2C.

2、F ₀ ＝(F ₂ －F ₁ ) Is represented by the following general formula ₂ And F ₁ Is subtracted from the characteristic value of (F) ₀ Is a matrix of N × M × C.

3、F ₀ ＝(F ₂ －F ₁ )⊕F ₁ Is represented by the following general formula ₂ And F ₁ Is subtracted from the characteristic value of (1) and then is added to F ₁ Are connected to characteristic values of F ₀ Is a matrix of nxm × 2C.

304: a second neural network.

And classifying the pixels of the input fusion result through a second neural network to obtain a first prediction result represented by a third feature vector. The first prediction result here is a set region mask value of the image, represents a probability that each pixel belongs to a specific human body region, and represents a probability that each pixel belongs to a human foot when setting a long jump scene. This output is used when training the algorithm.

305: a third neural network.

And inputting the first feature vector into a third neural network for feature extraction to obtain a fourth feature vector. Here, the third neural network may refer to the first neural network.

306: and fusing the network.

And fusing the fourth feature vector representing the key point features and the third feature vector representing the contour features to obtain a second prediction result. Here, the fusion alternative operation is as follows:

let a third eigenvector matrix F ₃ Is a three-dimensional feature matrix of NxMxC, a fourth feature vector matrix F ₄ Is an NxMxC three-dimensional feature matrix, and a feature matrix dimension F after fusion operation ₅ The fusion operation is F ₅ ＝F ₄ ⊕F ₃ Expressed as F ₄ And F ₃ Are connected to characteristic values of F ₅ Is a matrix of nxm × 2C.

The second prediction result is a set part key point recognition result, represents the probability that each pixel is a key point, is a heel key point when a long jump scene is set up, and represents the probability that each pixel is the heel key point. Regression networks and classification networks may be selected: the regression network outputs the coordinate position of the heel key point, and the coordinate position is output in a reference point mode; the classification network can output the probability whether the pixel is the heel point, and the probability value of each pixel can be regenerated by adopting Gaussian distribution by using the classification network, so that model convergence is facilitated.

The human body part can be cut, the human body part is a foot in a scene of standing long jump, a rough image frame of the foot is cut, and a characteristic value corresponding to a heel is directly output.

In one embodiment, said calculating a total loss value of said keypoint identification model based on the first difference and the second difference comprises:

calculating a total loss value of the key point identification model based on the first difference value, the second difference value and the third difference value; and the third difference is calculated based on the third prediction result and the corresponding calibration result.

Inputting a third image sample which is shot with a human body contour and contains the human body part corresponding to the first image sample into a set human body part recognition model to obtain a third prediction result of the third image sample, calculating to obtain a third difference value based on the third prediction result and a corresponding calibration result, and calculating a total loss value based on the first difference value, the second difference value and the third difference value so as to update the weight parameter of the recognition model. Here, the relationship between the first image sample and the third image sample may be the first image sample obtained by cutting out the third image sample containing the human body outline, so as to obtain the image containing the specific human body part. Here, the first difference, the second difference, and the third difference may be subjected to weighted summation to obtain a total loss value of the keypoint identification model; the weights of the first difference, the second difference, and the third difference may be set according to an actual application scenario. The fusion can also be performed in a non-linear manner. Therefore, the three recognition loss functions are subjected to information fusion in a linear or nonlinear mode, and the effects that algorithm training is more stable and the recognition of the key points of specific parts is more accurate are achieved.

For example, in a standing jump scenario, as shown in FIG. 4, the loss function used in training incorporates the heel keypoint loss function L ₁ Human body contour loss function L ₂ And a foot recognition penalty function L ₃ The loss function is defined as: l = (1-. Alpha. -beta.) L ₃ +αL ₁ +βL ₂ The first weight value alpha corresponding to the heel key point loss function is larger than the second weight value beta corresponding to the human body contour loss function, the first weight value alpha corresponding to the heel key point loss function is larger than the third weight value (1-alpha-beta) corresponding to the foot identification loss function, and the sum of the first weight value, the second weight value and the third weight value is equal to 1. Thus, the loss function is adjusted through the corresponding weight value, and the L is automatically adjusted and controlled ₂ And L ₃ To L ₁ The influence of (c).

After the key point identification model is trained, the key point identification model can be put into use. For example, in a scene of keypoint recognition for standing long jump, as shown in fig. 5, a human body block diagram obtained by performing human body recognition output clipping on an image is clipped on the basis of a human body block diagram, a left foot frame diagram and a right foot frame diagram obtained by performing foot recognition output clipping on the human body block diagram are clipped again on the basis of the clipped background image, and finally, the left foot block diagram (or the right foot frame diagram) and the corresponding background image are input into a rear heel keypoint recognition model obtained by training based on the training method to perform rear heel keypoint recognition. In the keypoint identification method, the complexity of the model is reduced by continuous image cropping.

As another embodiment of the present application, the keypoint identification model obtained through training in the above embodiment may be used for keypoint identification. Referring to fig. 6, the key point identifying method includes:

step 601: based on the first background image, carrying out human body target detection on the first human body image, and based on a target frame positioned in the human body target detection process, cutting the first human body image and the first background image to obtain a second human body image and a second background image; the first human body image represents an image shot with a second scene containing a human body outline; the first background image represents an image of a second scene captured without the human body contour.

The method comprises the steps of detecting a human body target of a first human body image based on a first background image, cutting the first human body image based on a target frame positioned in the human body target detection process to obtain a second human body image, and cutting the first background image based on the target frame to obtain a second background image. Here, the first background image and the first human body image are taken of the same scene, and the first human body image further includes an image of a human body contour.

And the first human body image is subjected to human body target detection, a human body target in the image is selected based on a target rectangular frame positioned in the human body target detection process, a coordinate frame of the human body position is obtained, a detection frame containing the human body is cut out according to the coordinate frame, and a second human body image is obtained.

In this way, the human body image is detected and trimmed, and the trimmed human body image is input into the set human body part recognition model, so that the calculation amount of the human body part recognition model can be reduced, and the complexity of the model is reduced.

Step 602: and inputting the second human body image into the set human body part recognition model to obtain a corresponding human body part image, and cutting out an area corresponding to the human body part image from the second background image to obtain a corresponding third background image.

And inputting the obtained second human body image into the set human body part recognition model to obtain a human body part image corresponding to the second human body image, and cutting out an area corresponding to the human body part image from the second background image to obtain a corresponding third background image. Here, a human body part recognition model is set according to a human body part to be acquired.

Therefore, the trimmed human body image is input into the human body part recognition model, the obtained image can reduce the hyper-parameters required to be used in the key point recognition model during key point recognition while the human body part of the key point recognition target is reserved, and therefore the model complexity is reduced.

Step 603: inputting the human body position image and the third background image into the key point identification model, and outputting a key point identification result; wherein, the first and the second end of the pipe are connected with each other,

Inputting the human body position image and the third background image into the trained key point identification model,

inputting a human body part image containing a human body part into a first neural network of a key point identification model for feature extraction to obtain a fifth feature vector corresponding to the human body part image, inputting a third background image not containing the human body part into the first neural network for feature extraction to obtain a sixth feature vector corresponding to the third background image, fusing the fifth feature vector and the sixth feature vector to obtain a fusion result, and inputting the fusion result into a second neural network for feature extraction to obtain a seventh feature vector; and inputting the fifth feature vector into a third neural network for classification, correspondingly obtaining an eighth feature vector, and fusing the eighth feature vector and the seventh feature vector to obtain a third prediction result. Here, the fusion of the feature vectors is realized by a set neural network.

Therefore, in the process of identifying the key points, the identification objects are gradually reduced on the basis of the original data, so that the identification complexity is gradually reduced, the identification of the key points at the specific parts is more accurate, and the accuracy of identifying the key points is improved. In addition, by changing the neural network structure of the key point recognition model, the feature vectors are fused when being transmitted among the neural networks of each layer in the training and using processes of the key point recognition model, so that the required calculated amount is reduced, and the key point recognition can be efficiently and accurately carried out.

In an embodiment, the human target detection on the first human body image based on the first background image includes:

And based on the first background image, detecting a human body target of the first human body image by adopting an optical flow algorithm, and obtaining a target rectangular frame containing the human body target in the human body target detection process. Therefore, data acquisition and training are not needed, the calculation speed for acquiring the human body target is high, more accurate human body images are provided for human body part recognition, meanwhile, the input images of the human body part recognition model are optimized, and the complexity of a subsequent algorithm is reduced.

In one embodiment, the set human body part recognition model is used for recognizing heel; before performing human target detection on the first human body image based on the first background image, the method further comprises:

The method comprises the steps of collecting a first video of a scene of a landing area of long jump movement, determining a first human body image from the first video, obtaining a first background image without the human body image from a picture video shot with the same scene, and identifying by using a human body part identification model for identifying heels when identifying human body parts. In this case, at least one first background image and first human body image in the long jump sport process are obtained from the first video, which may be several frames of images when the athlete lands on the ground, or may be images of the athlete at the highest point in the moving process, so that through image analysis in the first video and through heel key point identification, the sport track (including distance and height) of the athlete in the long jump sport process can be obtained, thereby helping the athlete and a coach to accurately analyze the sport performance.

As shown in fig. 7, the following gives an application example of the embodiment of the present application:

the key point identification method based on the embodiment of the application is applied to a standing long jump testing method so as to improve the accuracy of a testing result. The related process is as follows:

1. a preparation stage:

step 701: and installing a camera.

And selecting a designated standing long jump site, and installing a fixed camera.

Step 702: and setting key marking points.

The test area identification device is a support movable stretching identification device by combining with the standing long jump test area identification device.

Step 703: and setting a test area.

And setting a test area according to the image range of the camera, wherein the test area can comprise a landing area.

2. And (3) a testing stage:

step 704: the self-test mode is enabled.

A worker starts a self-service test mode in a human-computer interaction module such as a terminal.

Step 705: entry into a test area is identified.

The system can detect that the athlete enters the test area by means of a pressure sensor and the like.

Step 706: and (5) identity recognition.

The athlete carries out face recognition in the test area through the camera to confirm the identity.

Step 707: and informing to test.

And prompting the athlete to start the examination after the face recognition is successful.

Step 708: and collecting a standing long jump test video.

The system collects the video of the whole process of the standing long jump test of the athlete through the camera until the athlete leaves the standing long jump test area.

Step 709: an exit from the test area is identified.

The system identifies that the athlete has left the standing long jump test area through video information.

Step 710: and (5) analyzing the standing long jump result.

The system carries out image analysis on the standing long jump test video. Based on the key point identification method of the embodiment of the application, the calculation of the distance and the motion trail of the standing long jump is realized, and the result is output with the precision of centimeter.

In the standing long jump scene, the environment background is simple, only one tester is in the test area, the human body target detection is carried out on the human body image by adopting the optical flow algorithm, data acquisition and training are not needed, and the calculation speed for acquiring the human body target is high.

In addition, the key point identification method is an end-to-end algorithm, the human-containing image and the background image of the standing jump are input, information such as coordinates of key points of heels can be directly output, and therefore the result of the standing jump is obtained after processing. The heel key points belong to points of a human body contour and a certain key point in a human skeleton contour, and in a standing long jump scene, the human body contour information and the key information of the human skeleton are fused, so that the heel key points can be identified more accurately.

Step 711: and storing and notifying results.

The system saves the results of the analysis and makes an announcement on the setting screen.

Step 712: and judging whether the examination is not completed by someone.

When the test is still unfinished, step 713 is entered.

When all personnel are finished training or testing, step 714 is entered

Step 713: the next athlete is notified to enter the test zone.

And after step 713, return to step 705 to continue the examination of the next person.

Step 714: and outputting a standing long jump exercise result report.

According to the standing long jump testing method, the relevant information of the heel is accurately identified through the identification method of the key point of the heel, and therefore an accurate standing long jump result is obtained.

In order to implement the method according to the embodiment of the present application, an embodiment of the present application further provides a keypoint recognition model training device, which is disposed on a first electronic device, and as shown in fig. 8, the device includes:

a prediction unit 801, configured to input a first image sample and a corresponding second image sample to a set keypoint identification model, so as to obtain a first prediction result and a second prediction result; the first image sample represents an image of a first scene including a human body part; the second image sample represents an image of the first scene which does not contain a human body part; the first prediction result represents the probability that the pixel belongs to the human body part; the second prediction result represents the probability that the pixel is a key point;

a calculating unit 802, configured to calculate a total loss value of the keypoint identification model based on the first difference and the second difference; the first difference value is obtained by calculation based on the first prediction result and the corresponding calibration result; the second difference is calculated based on the second prediction result and the corresponding calibration result;

an updating unit 803, configured to update the weight parameter of the keypoint identification model according to the total loss value.

In an embodiment, the prediction unit 801 is configured to:

In an embodiment, the computing unit 802 is configured to:

In practical applications, the prediction Unit 801, the calculation Unit 802, and the update Unit 803 may be implemented by a Processor in a key point identification based model training apparatus, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA).

It should be noted that: in the above embodiment, when performing the key point recognition model training, the key point recognition model training apparatus is only illustrated by dividing the program modules, and in practical applications, the processing may be distributed by different program modules as needed, that is, the internal structure of the apparatus is divided into different program modules to complete all or part of the above-described processing. In addition, the key point recognition model training device provided in the above embodiments and the training method embodiment of the key point recognition model belong to the same concept, and the specific implementation process thereof is described in detail in the method embodiment and is not described herein again.

In order to implement the method according to the embodiment of the present application, an embodiment of the present application further provides a key point identifying apparatus, which is disposed on a second electronic device, and as shown in fig. 9, the apparatus includes:

a first cropping unit 901, configured to perform human body target detection on a first human body image based on a first background image, and crop the first human body image and the first background image based on a target frame located in a human body target detection process to obtain a second human body image and a second background image; the first human body image represents an image shot with a second scene containing a human body outline; the first background image represents an image of a second scene without a human body contour;

a second clipping unit 902, configured to input the second human body image into a set human body part recognition model to obtain a corresponding human body part image, and clip an area corresponding to the human body part image in the second background image to obtain a corresponding third background image;

an identifying unit 903, configured to input the human body part image and the third background image to a key point identification model, and output a key point identification result; wherein, the first and the second end of the pipe are connected with each other,

In an embodiment, the first cropping unit 901 is configured to:

In one embodiment, the apparatus further comprises:

the acquisition unit is used for acquiring a first video shot with the second scene; the second scene comprises a landing area of long jump sports;

a determining unit configured to determine the first background image and the first human body image from the first video.

In practical application, the first cutting unit 901, the second cutting unit 902, the identification unit 903, the acquisition unit, and the determination unit may be implemented by a processor in a key point identification-based device, such as a CPU, a DSP, an MCU, or an FPGA.

It should be noted that: in the above embodiment, when performing the key point identification, the key point identification apparatus is only illustrated by dividing each program module, and in practical applications, the processing distribution may be completed by different program modules according to needs, that is, the internal structure of the apparatus is divided into different program modules to complete all or part of the processing described above. In addition, the embodiments of the key point identification apparatus and the key point identification method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the embodiments of the methods, and are not described herein again.

Based on the hardware implementation of the program module, and in order to implement the method for training the keypoint recognition model in the embodiment of the present application, an embodiment of the present application further provides a first electronic device, as shown in fig. 10, where the first electronic device 1000 includes:

a first communication interface 1001 capable of performing information interaction with other network nodes;

the first processor 1002 is connected to the first communication interface 1001 to implement information interaction with other network nodes, and is configured to execute the method provided by one or more technical solutions of the first electronic device side when running a computer program. And the computer program is stored on the first memory 1003.

In particular, the first processor 1002 is configured to

Inputting the first image sample and the corresponding second image sample into a set key point identification model to obtain a first prediction result and a second prediction result; the first image sample represents an image of a first scene including a human body part; the second image sample represents an image of the first scene without the human body part; the first prediction result represents the probability that the pixel belongs to the human body part; the second prediction result represents the probability that the pixel is a key point;

calculating a total loss value of the keypoint identification model based on the first difference value and the second difference value; the first difference value is obtained by calculation based on the first prediction result and the corresponding calibration result; the second difference value is calculated based on the second prediction result and the corresponding calibration result;

Wherein, in an embodiment, the first processor 1002 is configured to

In one embodiment, the first processor 1002 is configured to

It should be noted that: the specific processing procedures of the first processor 1002 and the first communication interface 1001 can be understood with reference to the methods described above.

Of course, in practice, the various components in the first electronic device 1000 are coupled together by the bus system 1004. It is understood that the bus system 1004 is used to enable communications among the components. The bus system 1004 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for the sake of clarity the various busses are labeled in fig. 10 as the bus system 1004.

The first memory 1003 in the embodiment of the present application is used to store various types of data to support the operation of the first electronic device 1000. Examples of such data include: any computer program for operating on the first electronic device 1000.

The method disclosed in the embodiment of the present application may be applied to the first processor 1002, or implemented by the first processor 1002. The first processor 1002 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the first processor 1002. The first processor 1002 may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The first processor 1002 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the first memory 1003, and the first processor 1002 reads information in the first memory 1003, and completes the steps of the foregoing method in combination with hardware thereof.

Optionally, when the first processor 1002 executes the program, the corresponding process implemented by the electronic device in the methods according to the embodiment of the present application is implemented, and for brevity, is not described again here.

In an exemplary embodiment, the first electronic Device 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.

Based on the hardware implementation of the program module, and in order to implement the method for identifying key points in the embodiment of the present application, an embodiment of the present application further provides a second electronic device, as shown in fig. 11, where the second electronic device 1100 includes:

a second communication interface 1101 capable of performing information interaction with other network nodes;

the second processor 1102 is connected to the second communication interface 1101 to perform information interaction with other network nodes, and is configured to execute the key point identification method provided by one or more of the above technical solutions when running a computer program. And the computer program is stored on the second memory 1103.

In particular, the second processor 1102 is configured to

In an embodiment, the second processor 1102 is configured to perform human target detection on the first human body image by using an optical flow algorithm based on the first background image.

In one embodiment, the second processor 1102 is configured to

It should be noted that: the specific processing of the second processor 1102 and the second communication interface 1101 can be understood with reference to the methods described above.

Of course, in practice, the various components in the second electronic device 1100 are coupled together by the bus system 1104. It is understood that the bus system 1104 is used to enable communications among the connections of these components. The bus system 1104 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are designated as the bus system 1104 in figure 11.

The second memory 1103 in the embodiment of the present application is used to store various types of data to support the operation of the second electronic device 1100. Examples of such data include: any computer program for operating on the second electronic device 1100.

The method disclosed in the embodiments of the present application can be applied to the second processor 1102, or implemented by the second processor 1102. The second processor 1102 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method can be implemented by integrated logic circuits of hardware or instructions in the form of software in the second processor 1102. The second processor 1102 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The second processor 1102 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the second memory 1103, and the second processor 1102 reads the information in the second memory 1103 and completes the steps of the foregoing method in combination with the hardware thereof.

Optionally, when the second processor 1102 executes the program, the corresponding process implemented by the electronic device in each method of the embodiment of the present application is implemented, and for brevity, no further description is given here.

In an exemplary embodiment, the second electronic device 1100 may be implemented by one or more ASICs, DSPs, PLDs, CPLDs, FPGAs, general-purpose processors, controllers, MCUs, microprocessors, or other electronic components for performing the aforementioned methods.

It is understood that the memories (the first memory 1003 and the second memory 1103) in the embodiments of the present application may be volatile memories or nonvolatile memories, and may also include both volatile and nonvolatile memories. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a magnetic random access Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), synchronous Static Random Access Memory (SSRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), synchronous Dynamic Random Access Memory (SLDRAM), direct Memory (DRmb Access), and Random Access Memory (DRAM). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.

In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, which is a computer readable storage medium, for example, the storage medium includes a first memory 1003 storing a computer program, and the computer program is executable by a first processor 1002 of the first electronic device 1000 to perform the steps of the first electronic device side method. For example, the second memory 1103 stores a computer program, which can be executed by the second processor 1102 of the second electronic device 1100, to perform the steps of the second electronic device side method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.

It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A method for training a key point recognition model is characterized by comprising the following steps:

2. The method for training a keypoint recognition model according to claim 1, wherein the step of inputting the first image sample and the corresponding second image sample into the set keypoint recognition model to obtain the first prediction result and the second prediction result comprises:

inputting the first image sample into a first neural network of the key point identification model to obtain a first feature vector corresponding to the first image sample;

3. The method for training the keypoint recognition model according to claim 1, wherein said calculating the total loss value of the keypoint recognition model based on the first difference and the second difference comprises:

4. A method for identifying key points, comprising:

based on a first background image, carrying out human body target detection on a first human body image, and cutting the first human body image and the first background image based on a target frame positioned in the human body target detection process to obtain a second human body image and a second background image; the first human body image represents an image shot with a second scene containing a human body outline; the first background image represents an image of a second scene without a human body outline;

the key point recognition model is obtained by training according to the key point recognition model training method of any one of claims 1 to 3.

5. The method for identifying key points according to claim 4, wherein the detecting a human target on a first human body image based on a first background image comprises:

6. The key point identification method according to claim 4, wherein the set human body part identification model is used for identifying a heel; before the human target detection is performed on the first human body image based on the first background image, the method further comprises:

acquiring a first video shot with the second scene; the second scene comprises a landing area for long jump sports;

7. A kind of key point discerns the model training device, characterized by, comprising:

a calculating unit, configured to calculate a total loss value of the keypoint identification model based on the first difference value and the second difference value; the first difference value is obtained by calculation based on the first prediction result and the corresponding calibration result; the second difference is calculated based on the second prediction result and the corresponding calibration result;

8. A keypoint recognition apparatus, comprising:

the recognition unit is used for inputting the human body part image and the third background image into a key point recognition model and outputting a key point recognition result; wherein the content of the first and second substances,

the key point identification model is obtained by training by the key point identification model training method according to any one of claims 1 to 3.

9. A first electronic device, comprising: a first processor and a first communication interface; wherein, the first and the second end of the pipe are connected with each other,

10. A second electronic device, comprising: a second processor and a second communication interface; wherein, the first and the second end of the pipe are connected with each other,

11. An electronic device, comprising: a processor and a memory for storing a computer program capable of running on the processor,

wherein the processor is configured to execute the steps of the method for training a keypoint recognition model according to any one of claims 1 to 3 or the steps of the method for recognizing keypoints according to any one of claims 4 to 6 when the computer program is run.

12. A storage medium having stored thereon a computer program for implementing the steps of the method for training a keypoint recognition model according to any one of claims 1 to 3, or for implementing the steps of the method for keypoint recognition according to any one of claims 4 to 6, when said computer program is executed by a processor.