CN110619316A

CN110619316A - Human body key point detection method and device and electronic equipment

Info

Publication number: CN110619316A
Application number: CN201910918146.7A
Authority: CN
Inventors: 马骁
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2019-12-27

Abstract

The application discloses a method and a device for detecting key points of a human body and electronic equipment, wherein the method comprises the following steps: obtaining a human body image to be detected; extracting the characteristic information of the human body image by utilizing a characteristic extraction sub-model in a key point detection model obtained by training, wherein the key point detection model comprises the following steps: the feature extraction submodel is a pre-trained convolutional neural network model, and the detection submodule consists of two different convolutional layers, wherein the layer number of each convolutional layer comprises at least one layer; processing the characteristic information of the human body image by the multistage detection submodule in sequence to obtain thermodynamic diagrams of all key points in the human body image output by the key point detection model; and determining the position of each key point in the human body image according to the thermodynamic diagram of each key point in the human body image. The scheme of this application can improve the degree of accuracy that human key point detected.

Description

Human body key point detection method and device and electronic equipment

Technical Field

The application relates to the technical field of human body posture estimation, in particular to a human face key point detection method and device and electronic equipment.

Background

The key points of the human body are important for describing the posture of the human body and predicting the behavior of the human body, so that the detection of the key points of the human body is the basis of various applications in the field of computer vision, such as intelligent video monitoring, virtual reality and the like. The key point detection of the human body mainly detects some key points of the human body, such as joints, irrelevant points and the like, so that the human skeleton information is described through the key points.

The requirement of the accuracy for the detection of the key points of the human body is high, and therefore, how to improve the accuracy of the detection of the key points of the human body is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a method and a device for detecting key points of a human body and electronic equipment so as to improve the accuracy of detecting the key points of the human body.

In order to achieve the purpose, the application provides the following technical scheme:

a human body key point detection method comprises the following steps:

obtaining a human body image to be detected;

extracting the characteristic information of the human body image by utilizing a characteristic extraction sub-model in a key point detection model obtained by training, wherein the key point detection model comprises the following steps: the feature extraction submodel is a pre-trained convolutional neural network model, and the detection submodules are composed of two different convolutional layers, wherein the number of layers of each convolutional layer comprises at least one layer;

processing the characteristic information of the human body image by the multistage detection submodule in sequence to obtain thermodynamic diagrams of all key points in the human body image output by the key point detection model;

and determining the position of each key point in the human body image according to the thermodynamic diagram of each key point in the human body image.

Preferably, the processing of the characteristic information of the human body image by the multilevel detection sub-modules in sequence includes:

and sequentially inputting the characteristic information of the human body image into each level of detection sub-module, and inputting the output result of the previous level of detection sub-module in the detection sub-modules into the detection sub-modules.

Preferably, the multi-stage detection sub-module includes: the detection device comprises a first detection submodule and a plurality of stages of second detection submodules which are sequentially connected, wherein the input end of the second detection submodule which is most front in sequence in the second detection submodules which are sequentially connected is connected with the output end of the first detection submodule;

the first detection submodule is composed of at least one first type convolution layer and at least one second type convolution layer;

the second detection submodule is composed of at least one layer of a third type convolutional layer and at least one layer of the second type convolutional layer, and the number of convolution kernels of the second type convolutional layer is lower than that of the first type convolutional layer and that of the third type convolutional layer.

Preferably, the first detection submodule is composed of 3 layers of 3 × 3 convolutional layers and 2 layers of 1 × 1 convolutional layers;

the second detection submodule is composed of 5 layers of convolution layers of a third type and 2 layers of convolution layers of 1 × 1, and the number of convolution kernels of the convolution layers of the third type is larger than 3 × 3.

Preferably, the feature extraction submodel is a pre-trained visual geometry group VGG16 network model, VGG19 or residual error network ResNet 50.

Preferably, the obtaining of the human body image to be detected includes:

acquiring an image to be detected;

detecting the number of human bodies contained in the human body image through a human body detection model;

if the number of the human bodies in the image is one, determining the image as a human body image to be detected;

and if the number of the human bodies in the image is more than one, determining the human body to be detected from the image, and extracting a human body image containing the human body from the image.

Preferably, after obtaining the human body image, the method further comprises:

recognizing the human face in the human body image through a human face recognition model;

and if a plurality of faces are identified, removing the faces which do not meet the set conditions in the face image.

In another aspect, the present application further provides a human body key point detection device, including:

the image acquisition unit is used for acquiring a human body image to be detected;

the feature extraction unit is used for extracting feature information of the human body image by using a feature extraction sub-model in a key point detection model obtained by training, and the key point detection model comprises: the feature extraction submodel is a pre-trained convolutional neural network model, and the detection submodules are composed of two different convolutional layers, wherein the number of layers of each convolutional layer comprises at least one layer;

the thermodynamic diagram obtaining unit is used for sequentially processing the characteristic information of the human body image through the multi-stage detection submodule to obtain thermodynamic diagrams of all key points in the human body image output by the key point detection model;

and the key point positioning unit is used for determining the position of each key point in the human body image according to the thermodynamic diagram of each key point in the human body image.

In another aspect, the present application further provides an electronic device, including:

the data interface is used for obtaining a human body image to be detected;

the processor is used for extracting the characteristic information of the human body image by utilizing a characteristic extraction sub-model in a key point detection model obtained by training, and the key point detection model comprises: the feature extraction submodel is a pre-trained convolutional neural network model, and the detection submodules are composed of two different convolutional layers, wherein the number of layers of each convolutional layer comprises at least one layer; processing the characteristic information of the human body image by the multistage detection submodule in sequence to obtain thermodynamic diagrams of all key points in the human body image output by the key point detection model; and determining the position of each key point in the human body image according to the thermodynamic diagram of each key point in the human body image.

Preferably, when the processor sequentially processes the characteristic information of the human body image by the multiple stages of detection sub-modules, specifically, the processor sequentially inputs the characteristic information of the human body image to each stage of detection sub-module, and inputs the output result of the previous stage of detection sub-module in the detection sub-modules to the detection sub-module.

According to the scheme, the key point detection is carried out on the human body image by using the key point detection model, wherein the key point detection model consists of a feature extraction sub-model and a detection sub-module which is connected in a multi-level sequence, the feature extraction sub-model is a pre-trained convolutional neural network model, and the feature information of the human body image can be accurately extracted by using the feature extraction sub-model; and the extracted feature information is sequentially processed by the multi-stage detection sub-modules, and each stage of detection sub-module consists of two different convolutional layers, so that the feature information is continuously and repeatedly processed by the multi-stage detection sub-modules, the thermodynamic diagrams of all key points in the human body image can be accurately obtained, and the accurate analysis of the positions of all key points in any image based on the thermodynamic diagrams of all key points is facilitated.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an implementation of a method for detecting a human key point according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a composition architecture of a keypoint detection model according to an embodiment of the present application;

fig. 3 is a schematic flow chart of obtaining a human body image of a human body key point to be detected in the embodiment of the present application;

fig. 4 is a schematic flow chart of another implementation of the human body key point detection method provided in the embodiment of the present application;

fig. 5 is a schematic structural diagram illustrating an embodiment of a human body key point detection apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a composition architecture of an electronic device according to an embodiment of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present disclosure.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a method for detecting key points of a human body according to the embodiment of the present application, where the method of the embodiment may include:

and S101, obtaining a human body image to be detected.

The human body image is an image that needs to be subjected to human body key point detection, and therefore the human body image needs to include an image of a human body.

The manner of obtaining the human body image including the human body may be various, and the present application does not limit this.

And S102, extracting the characteristic information of the human body image by using the characteristic extraction sub-model in the trained key point detection model.

Wherein, the key point detection model includes: the feature extraction submodel and the multi-level detection submodules which are connected in sequence.

The feature extraction model is a pre-trained convolutional neural network model. The pre-trained convolutional neural network can be obtained by training pre-trained convolutional neural network models provided by some platforms. It can be understood that the trained convolutional neural network model is used as a feature extraction model, which is beneficial to ensuring the accuracy of feature information extraction in a human body image.

Meanwhile, the number of convolution layers in the convolution neural network model can be set according to needs, so that the convolution neural network model with a small number of layers can be set as the feature extraction model, and the extraction speed of the feature information in the human body image is improved on the premise of ensuring the detection accuracy of the feature information.

Optionally, the feature extraction model may be one of a convolutional neural network with a relatively small number of layers, such as a pre-trained Visual Geometry Group (VGG) 16 network model (i.e., VGG16 network model), VGG19, or a residual network (ResNet)50 (i.e., colloquially called ResNet 50). In particular, the feature extraction model may also be a top designated layer in a network such as VGG16, VGG19, or ResNet50, such as the top 10 layer of VGG 19.

The detection submodule is composed of two different convolution layers, and the number of layers of each convolution layer comprises at least one layer. The number of convolution kernels in each convolution layer is different, and specifically the number of convolution kernels in each convolution layer can be set as required.

Alternatively, the dimensions of the thermodynamic diagram output by each detection sub-module may be the same. In order to ensure that the dimensions of the thermodynamic diagrams output by the detection submodules are the same, the detection submodules may be set to be of the same type, or the last convolution layers of the detection submodules may be set to be of the same type.

Optionally, in order to ensure the detection accuracy, each detection submodule includes at least one convolutional layer with a higher convolutional kernel, and the convolutional kernel of the convolutional layer is at least not less than 3 × 3. Correspondingly, in order to reduce the dimension of the thermodynamic diagram of the key points determined by the detection submodules, each detection submodule at least needs to include a convolution layer with a convolution kernel of 1 × 1, and the last convolution layer of the detection submodule is the convolution layer with the convolution kernel of 1 × 1.

And S103, processing the characteristic information of the human body image by the multistage detection submodule in sequence to obtain the thermodynamic diagrams of all key points in the human body image output by the key point detection model.

The key points of the human body refer to the position points of important components of the human body, and for example, the key points of the human body may include: left eye, right eye, left ear, right ear, nose, chest, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip joint, right hip joint, left knee, right knee, left ankle, right ankle, and the like.

Wherein the thermodynamic diagram of each key point represents the probability that a plurality of positions in the human body image belong to the key point. That is, each position possibly belonging to the key point in the human body image and the probability that each position belongs to the key point can be presented in the thermodynamic diagram corresponding to the key point. Compared with the method for directly obtaining the specific position of each key point, the thermodynamic diagram of the key points is utilized to facilitate more accurate representation of the information of the key points.

In this application, this multistage detection submodule piece links to each other in proper order and forms a stacked structure, like this, the characteristic information that extracts from human image can pass through the many times of detecting the submodule piece at different levels in proper order, and every level detects the submodule piece and can revise the processing result that last level detected the submodule piece to constantly improve the precision of the thermodynamic diagram of the key point of determining.

Optionally, in order to further ensure the accuracy of the thermodynamic diagram of the key point, in the embodiment of the present application, not only the feature information of the detection sub-module for the human body image is input to the next-stage detection sub-module, but also the feature information extracted by the feature extraction sub-model is respectively input to each detection sub-module. Specifically, the feature information of the human body image may be sequentially input to each level of the detection sub-modules, and the output result of the previous level of the detection sub-modules in the detection sub-modules may be input to the detection sub-modules. It can be seen that, except for the first detection submodule in the multi-stage detection submodules, other detection submodules receive the thermodynamic diagrams of the key points determined by the previous detection submodule based on the feature information, and also receive the original feature information extracted by the feature extraction submodule.

Correspondingly, except for the first detection submodule, each of the other detection submodules outputs the thermodynamic diagram of each determined key point based on the feature information provided by the feature extraction submodule and the thermodynamic diagrams of each key point output by the previous detection submodule, so that the accuracy of the thermodynamic diagrams of each key point output by each detection submodule is continuously improved.

And S104, determining the position of each key point in the human body image according to the thermodynamic diagram of each key point in the human body image.

In the case of determining the thermodynamic diagram of a certain key point in the human body image, there may be a plurality of specific ways of determining the position of the key point based on the thermodynamic diagram of the key point. For example, the position of the key point in the thermodynamic diagram of the key point where the probability value is the maximum may be used as the position of the key point in the human body image.

The method comprises the steps of utilizing a key point detection model to detect key points of a human body image, wherein the key point detection model consists of a feature extraction sub-model and a multi-level detection sub-module which are sequentially connected, the feature extraction sub-model is a pre-trained convolutional neural network model, and the feature information of the human body image can be accurately extracted by utilizing the feature extraction sub-model; and the extracted feature information is sequentially processed by the multi-stage detection sub-modules, and each stage of detection sub-module consists of two different convolutional layers, so that the feature information is continuously and repeatedly processed by the multi-stage detection sub-modules, the thermodynamic diagrams of all key points in the human body image can be accurately obtained, and the accurate analysis of the positions of all key points in any image based on the thermodynamic diagrams of all key points is facilitated.

It can be understood that, in the embodiment of the present application, the number of layers of the convolutional neural network can be set as required, for example, the feature extraction submodule is the first 10 layers of the VGG19 or VGG19, so as to improve the speed of extracting the feature information and further improve the efficiency of detecting the key points under the condition of ensuring the accuracy of detecting the key points, so as to ensure the real-time performance of detecting the key points.

Similarly, each level of detection sub-module is composed of convolution layers without involving a pooling layer and the like, so that parameters can be reduced, and the key point identification efficiency can be improved. And may also identify specific needs for efficiency, set the number of convolutional layers and select convolutional layers with the appropriate number of convolutional kernels.

In order to facilitate understanding of the specific components of the keypoint detection model of the present application, a case of the keypoint detection model of the present application is described below for explanation.

Fig. 2 is a schematic diagram illustrating a component architecture of a keypoint detection model according to the present application.

As can be seen from fig. 2, the keypoint detection model sequentially includes, from left to right, a feature extraction sub-model 201 and a multi-stage detection sub-module disposed behind the feature extraction sub-model.

The feature extraction submodel is a convolutional neural network, such as the aforementioned VGG16 or VGG 19.

In fig. 2, the multi-stage detection sub-module includes: a first detection submodule 202 and a plurality of stages of sequentially connected second detection submodules 203.

And the input end of the second detection submodule which is most front in sequence in the second detection submodules which are sequentially connected in the multi-stage mode is connected with the output end of the first detection submodule.

The first detection sub-module 202 is composed of at least one first type convolutional layer and at least one second type convolutional layer. The feature information extracted by the feature extraction sub-module sequentially passes through the at least one first type of convolution layer and then sequentially passes through the at least one second type of convolution layer on the basis of the first detection module.

As shown in FIG. 2, the different convolutional layers of the first detection submodule 202 are represented in FIG. 2 by different colors. In the first detection module 202, the first white convolutional layers represent the first type of convolutional layers in the first detection sub-module, and the second black convolutional layers represent the second type of convolutional layers.

The number of convolution kernels of the second type of convolution layer is lower than that of the first type of convolution layer, so that the characteristic information is subjected to dimension reduction processing after being processed by at least one first type of convolution layer and then being processed by at least one second type of convolution layer. Optionally, the second type of convolution layer is a 1 × 1 convolution layer.

Fig. 2 shows an optional structure of the first detection sub-module, which specifically includes 3 first type convolutional layers and 2 second type convolutional layers.

Optionally, the first type of convolution layer is a 3 × 3 convolution layer, that is, the first detection submodule is formed by sequentially arranging 3 layers of convolution layers 3 × 3 and 2 layers of convolution layers 1 × 1.

The second detection submodule 203 is composed of at least one layer of the third type convolutional layer and at least one layer of the second type convolutional layer, and the number of convolution kernels of the second type convolutional layer is lower than that of convolution kernels of the third type convolutional layer. Optionally, the second type of convolution layer is a 1 × 1 convolution layer.

Wherein the third type of convolutional layer is different from the second type of convolutional layer. Similar to the first detection submodule, the at least one third convolution layer and the at least one second convolution layer are sequentially arranged to form the second detection submodule.

As shown in fig. 2, the first few white convolutional layers in the second detection sub-module 203 are the third type convolutional layers, and the last few black convolutional layers represent the first type convolutional layers.

Correspondingly, the input information of the second detection sub-module (e.g., the feature information extracted by the feature extraction sub-module, the thermodynamic diagram output by the first detection sub-module or the thermodynamic diagram output by the second detection sub-module at the previous stage) is sequentially processed by the at least one layer of the third type convolutional layer, and then the data processed by the at least one layer of the third type convolutional layer is sequentially processed by the at least one layer of the second type convolutional layer for dimension reduction.

Fig. 2 shows an optional structure of the second detection submodule, and in particular, the second detection submodule includes 5 third type convolutional layers and 2 second type convolutional layers.

Optionally, the convolution layer of the third type is a convolution layer with a convolution kernel larger than 3 × 3, for example, the convolution layer of the third type is a convolution layer of 5 × 5. Correspondingly, the second detection submodule is formed by arranging 5 convolution layers with convolution kernels larger than 3 × 3 and 2 convolution layers with convolution kernels 1 × 1 in sequence.

As can be seen from fig. 2, the human body image 204 is first input into the feature extraction submodel of the keypoint detection model. Because the feature extraction submodule is a convolutional neural network with fewer convolutional layers such as VGG16 or VGG19, the extraction efficiency of the feature information can be improved on the premise of ensuring the accuracy of the extracted feature information, the overall speed of key point detection is favorably improved, and reliable support is provided for ensuring the real-time performance of the key point detection.

The feature information output by the feature extraction submodel is respectively input to the first detection submodule and each second detection submodule, as shown in the arrow direction in fig. 2.

Because the first detection submodule is not provided with other detection submodules in front, the first detection submodule only processes the characteristic information and outputs the thermodynamic diagrams of all the key points determined by the first detection submodule. Meanwhile, the thermodynamic diagrams of the key points output by the first detection submodule are input into a first second detection submodule from left to right.

As can be seen from fig. 2, the input of the first and second detection sub-modules includes the feature information output by the feature extraction sub-module, in addition to the thermodynamic diagrams of the respective keypoints output by the first detection sub-module. Correspondingly, the first and second detection sub-modules can identify key points according to the feature information and thermodynamic diagrams of all key points output by the first detection sub-module, and output thermodynamic diagrams of all key points in the human body image obtained through identification.

For the other second detection sub-modules after the first second detection sub-module, each second detection sub-module corresponds to a previous-stage second detection sub-module adjacent to and before the second detection sub-module, and therefore the input of each second detection sub-module includes a thermodynamic diagram of each keypoint output by the previous-stage second detection sub-module of the second detection sub-module, and feature information output by the feature extraction sub-module.

Because each level of detection sub-module is a simple convolution layer, other forms of composition structures such as a pooling layer and the like are not involved, parameters in the processing process are reduced, and the key point detection speed is favorably improved. Moreover, because the pooling layer is not involved, the feature map size in the convolution process is not changed, and the detection precision is favorably ensured.

Meanwhile, thermodynamic diagrams output by each second detection submodule are continuously processed by each subsequent second detection submodule, and each detection submodule updates and adjusts the thermodynamic diagrams of each input key point by synthesizing the characteristic information, so that the accuracy of thermodynamic diagram adjustment is more effectively ensured, and the thermodynamic diagrams of the key points with higher accuracy are obtained. Accordingly, the thermodynamic diagram 205 of each keypoint output by the last second detection submodule on the right side is the thermodynamic diagram of each keypoint output finally by the keypoint detection model.

Of course, fig. 2 is only one composition structure of the keypoint detection model of the present application, but other composition architectures of the keypoint detection model formed by a convolutional neural network forming a feature extraction sub-model and a plurality of detection sub-modules including at least one convolutional layer are also applicable to the present embodiment, and this is not limited thereto.

It can be understood that the key point detection model of the application is composed of a feature extraction submodel and a multi-stage detection submodule, and in order to improve the key point detection accuracy of the key point detection model, the application trains each submodel or module in the key point detection model in a unified manner. For example, training may be performed using multiple human image samples labeled with key points or thermodynamic diagrams of key points.

Optionally, in order to ensure the training precision, the best key point detection model is optimized, and when the key point detection model is trained, a relay supervision mode is adopted for training. The relay supervision means that the loss is calculated by the output of each stage of detection sub-modules in the key point detection model, and the losses of the multi-stage detection sub-modules are added to finally optimize the key point detection model.

For example, the loss function required to train the keypoint detection model can be expressed as the following equation one:

wherein L represents the overall loss error of the training keypoint detection model. S is the total number of the multi-stage detection submodules, S represents one detection submodule in the multi-stage detection submodules, and the value of S is a natural number from 1 to S;

n is the total number of key points in the human body, and if 18 key points requiring human body detection can be set, the value of N is 18._nRepresents one key point of a plurality of key points of the human body, and the value of N is a natural number from 1 to N.

Z represents a sample set of human body images as training samples, and Z represents one human body image sample in the sample set.

Accordingly, the method can be used for solving the problems that,indicating that the s-th detection sub-module identifies the thermodynamic diagram (or the position of the key point) corresponding to the nth key point for the z-th human body image sample, andan actual thermodynamic diagram (or actual locations of the keypoints) representing the nth keypoint corresponding to the z-th personal image sample.

As can be seen from the above, if the loss function is used to calculate that the loss error does not satisfy the condition (such as convergence or is smaller than a set threshold), the internal parameters of the feature extraction module and each level of detection sub-modules in the keypoint detection model still need to be adjusted, and the human body image sample is reused to perform continuous training until the loss error calculated based on the loss function satisfies the condition, so as to obtain the trained keypoint detection model.

It can be understood that, in the embodiment of the present application, in order to improve the accuracy of human body key point detection, some interference exclusion needs to be reduced, for example, in the case that a human body image includes multiple human bodies, if the image is captured in real time, an unrelated person needs to be instructed to leave the image capturing area, or a target human body area needing to be detected is determined from the multiple human body areas, and the like. It can be seen that after the human body image to be detected, some interference-removing operations need to be performed on the obtained image.

For example, referring to fig. 3, which shows a schematic flow chart of an implementation of obtaining a human body image to be detected in an embodiment of the present application, the embodiment may include:

s301, acquiring an image to be detected.

It can be understood that the image obtained for detecting the key point may be a single frame image or a video, and in the case of inputting the video, each frame image in the video may be sequentially used as the image to be detected.

S302, detecting the number of human bodies contained in the human body image through a human body detection model.

The human body detection model is a pre-trained model for detecting the human body contour in the image.

The human body detection model can be set according to needs, for example, the human body detection model can be a previously trained YoLO model.

And S303, if the number of the human bodies in the image is one, determining the image as a human body image to be detected.

It can be understood that if the image only includes an image of a human body, it indicates that the image does not have an interfering human body causing the detection of the key points of the human body, and in this case, the image can be regarded as the image of the human body to be detected.

S304, if the number of the human bodies existing in the image is more than one, determining the human body to be detected from the image, and extracting the human body image containing the human body from the image.

If the human face detection model detects that more than one human body exists in the image, then the human body image part of the image belonging to the interference data needs to be excluded, only the image part corresponding to the human body to be detected is reserved, and the image part of the human body is used as any image to be detected.

The human body to be detected may be determined from the image in a manner that a user designates the human body to be detected. Or judging the distance between each human body and the center of the image, and marking the human body with the closest distance as the human body to be detected.

By the method and the device, other human body images of other people influencing key point detection can be prevented from being included in the image due to the fact that other people enter the detection environment by mistake.

It can be understood that, in the above embodiments of the present application, after obtaining the human body image, in order to avoid the human face portions of other users appearing in the human body image, for example, the aforementioned human body detection model may detect a human body contour existing in the image, but cannot detect a single human face region, and therefore, in order to further reduce interference, the present application may also identify the human face included in the human body image through the human face recognition model.

And if the human body image is identified to comprise a plurality of faces, removing the faces which do not meet the set conditions in the human face image. Certainly, when a plurality of faces are identified in the face image, the reminding information can be output so as to replace the human body image; or in the scene of actual key point detection, irrelevant people are reminded to exit so as to obtain the human body image of a single person.

The face recognition model may be a pre-trained model for recognizing a face, and the model may have many possibilities. For example, the face recognition model may be a pre-trained multitask convolutional neural network MTCNN, or an ssd (single Shot multi box detector) model, etc.

It is understood that, in the above embodiments of the present application, after obtaining the human body image, it may also be detected whether the image quality of the human body image meets requirements, such as whether the human body image is blurred or not. Under the condition that the image quality of the human body image meets the requirement, the key point detection can be carried out on the human body image.

The image quality of the detected human body image can be judged by adopting an image quality evaluation algorithm or a Laplacian operator and other methods.

In order to facilitate understanding of the scheme of the present application, a case is taken as an example below to describe the human body key point detection method of the present application. As shown in fig. 4, which shows a schematic flow chart of another embodiment of the method for detecting human body key points according to the present application, the method of the present embodiment may include:

s401, acquiring an image to be detected.

S402, detecting the number of human bodies contained in the human body image through a human body detection model.

And S403, if the number of the human bodies in the image is one, determining the image as a human body image to be detected.

S404, if the number of the human bodies existing in the image is more than one, determining the human body to be detected from the image, and extracting the human body image containing the human body from the image.

Of course, in the real-time human body posture detection scene, if the human body image is collected in real time, if the image contains a plurality of human bodies, the reminding information can also be output so as to collect the image containing a single human body again.

S405, identifying whether the human body image contains a plurality of human faces by using a human face identification algorithm, and if so, executing the step S406; if not, step S407 is executed.

S406, if the human body image is identified to include a plurality of human faces, removing the human faces that do not meet the set condition from the human face image, and performing step S407 on the human body image from which the redundant human faces are removed.

S407, detecting whether the human body image is fuzzy or not by using an image quality evaluation algorithm, and if so, outputting a fuzzy prompt; if not, step S408 is performed.

And S408, inputting the human body image into the feature extraction sub-model of the trained key point detection model to extract feature information of the human body image.

And S409, respectively inputting the feature information extracted by the feature extraction submodel to a first detection submodule and a multi-stage second detection submodule of the key point detection model.

And S410, processing the characteristic information through the first detection submodule to obtain thermodynamic diagrams of all key points corresponding to the human body image.

S411, inputting the thermodynamic diagrams of the key points output by the first detection submodule into the most front second detection submodule in the multi-stage second detection submodules.

And S412, for each second detection submodule, combining the input characteristic information and thermodynamic diagrams of each key point output by the previous second detection submodule or the first detection submodule, and re-determining the thermodynamic diagrams of each key point.

The steps S410 to S412 can refer to the related description of the embodiment in fig. 2, and are not described herein again.

And S413, obtaining thermodynamic diagrams of all key points output by the last-stage second detection submodule in the key point detection model, and determining the positions of all key points in the human body image according to the thermodynamic diagrams of all key points output by the last-stage second detection submodule.

In the embodiment of the application, besides the key point detection model, a human body detection model, a face recognition model, an image quality recognition algorithm model and the like are involved, wherein the human body detection model, the face recognition model and the image quality recognition algorithm model can obtain relevant pre-trained models in some platforms.

Because the whole process belongs to the process part of key point detection, in order to ensure the speed of key point detection, the feature extraction models related in the human body detection model and the human face detection model can be the same as the feature extraction submodels used in the key point detection model, thus, when in online inference (inference), the three network models use the same feedforward network to calculate the feature map, so that the same feature map can be used by the three network models, thereby avoiding the repeated calculation of the feature map and rapidly obtaining the final result.

The application also provides a human body key point detection device corresponding to the human body key point detection method.

As shown in fig. 5, which shows a schematic structural diagram of an embodiment of a human body key point detection device according to the present application, the device of the present embodiment may include:

an image obtaining unit 501, configured to obtain a human body image to be detected;

a feature extraction unit 502, configured to extract feature information of the human body image by using a feature extraction sub-model in a key point detection model obtained through training, where the key point detection model includes: the feature extraction submodel is a pre-trained convolutional neural network model, and the detection submodules are composed of two different convolutional layers, wherein the number of layers of each convolutional layer comprises at least one layer;

a thermodynamic diagram obtaining unit 503, configured to sequentially process the feature information of the human body image through the multi-stage detection sub-modules, so as to obtain thermodynamic diagrams of each key point in the human body image output by the key point detection model;

a keypoint locating unit 504, configured to determine a position of each keypoint in the human body image according to the thermodynamic diagram of each keypoint in the human body image.

In a possible implementation manner, the thermodynamic diagram obtaining unit is specifically configured to sequentially input the feature information of the human body image to each level of the detection sub-modules, and input an output result of a previous level of the detection sub-modules to the detection sub-modules, so as to obtain a thermodynamic diagram of each key point in the human body image output by the key point detection model.

In a possible implementation manner, the multistage detection submodule in the thermodynamic diagram obtaining unit in the embodiment of the present application includes: the detection device comprises a first detection submodule and a plurality of stages of second detection submodules which are sequentially connected, wherein the input end of the second detection submodule which is most front in sequence in the second detection submodules which are sequentially connected is connected with the output end of the first detection submodule;

Optionally, the first detection submodule is composed of 3 layers of 3 × 3 convolutional layers and 2 layers of 1 × 1 convolutional layers;

Optionally, the feature extraction submodel is a pre-trained visual geometry group VGG16 network model, VGG19 or residual error network ResNet 50.

In yet another possible implementation manner, the image obtaining unit may include:

the initial image obtaining unit is used for obtaining an image to be detected;

a human body detection unit for detecting the number of human bodies contained in the human body image through a human body detection model;

a first image obtaining unit, configured to determine the image as a human body image to be detected if the number of human bodies in the image is one;

a second image obtaining unit, configured to determine a human body to be detected from the image if the number of human bodies existing in the image is more than one, and extract a human body image including the human body from the image.

Optionally, the apparatus may further include:

the human face detection unit is used for identifying the human face in the human body image through a human face identification model after the first image obtaining unit or the second image obtaining unit obtains the human body image;

and the interference removal unit is used for removing the faces which do not meet the set conditions in the face image if a plurality of faces are identified.

In another aspect, the present application further provides an electronic device.

As shown in fig. 6, it shows a schematic structural diagram of an electronic device according to the present application, and the electronic device at least includes: a data interface 601 and a processor 602.

The data interface 601 is used for obtaining a human body image to be detected.

A processor 602, configured to extract feature information of the human body image by using a feature extraction sub-model in a trained key point detection model, where the key point detection model includes: the feature extraction submodel is a pre-trained convolutional neural network model, and the detection submodules are composed of two different convolutional layers, wherein the number of layers of each convolutional layer comprises at least one layer; processing the characteristic information of the human body image by the multistage detection submodule in sequence to obtain thermodynamic diagrams of all key points in the human body image output by the key point detection model; and determining the position of each key point in the human body image according to the thermodynamic diagram of each key point in the human body image.

The electronic device may further include a memory for storing a program required for the processor to perform the above operations.

Optionally, when the processor sequentially processes the feature information of the human body image through the multiple stages of detection sub-modules, specifically, the feature information of the human body image is sequentially input to each stage of detection sub-modules, and an output result of a previous stage of detection sub-module in the detection sub-modules is input to the detection sub-modules.

Of course, the processor may also perform the operations of the relevant steps in the above embodiments, which are not described herein again.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A human body key point detection method comprises the following steps:

obtaining a human body image to be detected;

2. The method according to claim 1, wherein the processing the feature information of the human body image by the multi-stage detection sub-modules in sequence comprises:

3. The method of claim 1 or 2, the multi-stage detection sub-module comprising: the detection device comprises a first detection submodule and a plurality of stages of second detection submodules which are sequentially connected, wherein the input end of the second detection submodule which is most front in sequence in the second detection submodules which are sequentially connected is connected with the output end of the first detection submodule;

4. The method of claim 3, said first detection submodule being comprised of 3 layers of 3 x 3 convolutional layers and 2 layers of 1 x 1 convolutional layers;

5. The method of claim 1, the feature extraction submodel being a pre-trained visual geometry group VGG16 network model, VGG19, or a residual network ResNet 50.

6. The method of claim 1, wherein the obtaining of the human body image to be detected comprises:

acquiring an image to be detected;

7. The method of claim 6, after obtaining the image of the human body, further comprising:

8. A human keypoint detection device comprising:

9. An electronic device, comprising:

the data interface is used for obtaining a human body image to be detected;

10. The electronic device according to claim 9, wherein when the processor sequentially processes the feature information of the human body image by the multiple detection sub-modules, specifically, the processor sequentially inputs the feature information of the human body image to each detection sub-module, and inputs an output result of a previous detection sub-module in the detection sub-modules to the detection sub-module.