CN111860276B - Human body key point detection method, device, network equipment and storage medium - Google Patents
Human body key point detection method, device, network equipment and storage medium Download PDFInfo
- Publication number
- CN111860276B CN111860276B CN202010674493.2A CN202010674493A CN111860276B CN 111860276 B CN111860276 B CN 111860276B CN 202010674493 A CN202010674493 A CN 202010674493A CN 111860276 B CN111860276 B CN 111860276B
- Authority
- CN
- China
- Prior art keywords
- human body
- feature map
- deconvolution
- key point
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 123
- 238000010586 diagram Methods 0.000 claims abstract description 83
- 238000013527 convolutional neural network Methods 0.000 claims description 27
- 238000000034 method Methods 0.000 claims description 26
- 238000000605 extraction Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 210000003414 extremity Anatomy 0.000 description 31
- 238000004364 calculation method Methods 0.000 description 15
- 238000013528 artificial neural network Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 210000002683 foot Anatomy 0.000 description 2
- 210000004247 hand Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 210000003423 ankle Anatomy 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000001513 elbow Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000037081 physical activity Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000002832 shoulder Anatomy 0.000 description 1
- 210000005010 torso Anatomy 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Social Psychology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Psychiatry (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention relates to the technical field of computer vision, and discloses a human body key point detection method, which is applied to a pre-trained key point detection model and comprises the following steps: extracting a first feature map of a target human body image; performing deconvolution of a plurality of layers according to the first feature map to obtain a second feature map, wherein when performing deconvolution of the next layer, the first feature map and the third feature map which have the same resolution as that of the third feature map obtained by deconvolution of the previous layer are combined and then subjected to deconvolution; outputting thermodynamic diagrams of all human body key points of the target human body image according to the second characteristic diagram, and determining all human body key points of the target human body image according to the thermodynamic diagrams. The embodiment of the invention also provides a human body key point detection device, network equipment and a storage medium. The human body key point detection method, the human body key point detection device, the network equipment and the storage medium can improve the detection precision of the human body key point and the accuracy of a human body posture estimation result.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a human body key point detection method, a human body key point detection device, network equipment and a storage medium.
Background
Human body posture estimation is an important research direction in the field of computer vision, and is widely applied to the aspects of human body activity analysis, human-computer interaction, video monitoring and the like. The human body posture estimation refers to the estimation of the human body posture by positioning key points (such as shoulders, elbows, wrists, knees, ankles and the like) of the human body in an image or a video through a computer algorithm.
However, the inventors found that the prior art has at least the following problems: the existing detection method for the human body key points is low in precision, output errors of the human body key points are easily caused, and the result of human body posture estimation is influenced.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device, network equipment and a storage medium for detecting key points of a human body, so that the detection precision of the key points of the human body and the accuracy of an estimation result of a posture of the human body are improved.
In order to solve the above technical problems, an embodiment of the present invention provides a method for detecting a human body key point, which is applied to a pre-trained key point detection model, and includes: extracting a first feature map of the target human body image by using a first sub-module of the key point detection model, wherein the first feature map comprises a plurality of groups of feature maps with different resolutions; carrying out deconvolution on a plurality of layers by using a second submodule of the key point detection model according to the first feature map to obtain a second feature map, wherein when carrying out deconvolution on the next layer, combining the first feature map and a third feature map which have the same resolution as a third feature map obtained by deconvolution on the previous layer by using the second submodule, and then carrying out deconvolution; outputting thermodynamic diagrams of all human body key points of the target human body image according to the second characteristic diagram by using a third sub-module of the key point detection model, and determining all human body key points of the target human body image according to the thermodynamic diagrams.
The embodiment of the invention also provides a human body key point detection device, which comprises: the extraction module is used for extracting a first feature map of the target human body image, wherein the first feature map comprises a plurality of groups of feature maps with different resolutions; the deconvolution module is used for carrying out deconvolution on a plurality of layers according to the first feature graph to obtain a second feature graph, wherein when the deconvolution of the next layer is carried out, the first feature graph and the third feature graph which are the same as the third feature graph obtained by deconvolution of the previous layer with resolution are combined and then are subjected to deconvolution; and the detection module is used for outputting thermodynamic diagrams of all human key points of the target human body image according to the second characteristic diagram and determining all human key points of the target human body image according to the thermodynamic diagrams.
An embodiment of the present invention further provides a network device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the human body key point detection method.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the human body key point detection method.
Compared with the prior art, the method and the device have the advantages that multiple groups of first feature maps with different resolutions of the target human body image are extracted through the key point detection model; performing deconvolution of a plurality of layers according to the first feature map to obtain a second feature map, wherein when performing deconvolution of the next layer, the first feature map and the third feature map which have the same resolution as that of the third feature map obtained by deconvolution of the previous layer are combined and then subjected to deconvolution; outputting thermodynamic diagrams of all human body key points of the target human body image according to the second characteristic diagram, and determining all human body key points of the target human body image according to the thermodynamic diagrams. Because the information of the third feature map and the information of the combined first feature map can be mutually compensated, the two feature maps are combined and then deconvoluted, so that the key point detection model can learn more information of the human key points in the target human body image, and the detection precision of the human key points and the accuracy of the human posture estimation result are effectively improved.
In addition, the deconvolution of a plurality of layers is carried out by utilizing a second submodule of the key point detection model according to the first feature map, and the deconvolution comprises the following steps: and when the deconvolution of each layer is carried out, the deconvolution is divided into a plurality of groups by the second submodule according to the number of channels. When the deconvolution of each layer is carried out, the deconvolution of each layer is divided into a plurality of groups according to the number of channels, so that the operation amount of the deconvolution can be greatly reduced on the original basis.
In addition, the deconvolution of a plurality of layers is carried out by utilizing a second submodule of the key point detection model according to the first feature map, and the deconvolution comprises the following steps: and when the deconvolution of the first layer is carried out, carrying out deconvolution by using the second submodule according to the first feature map with the minimum resolution.
In addition, outputting a thermodynamic diagram of each human body key point of the target human body image by using a third submodule of the key point detection model according to the second characteristic diagram, wherein the thermodynamic diagram comprises: and performing grouping convolution on the second characteristic graph by using a third submodule according to the human body limb space relation, and outputting thermodynamic diagrams of all human body key points of the target human body image according to a grouping convolution result. Because the limbs of the same part in the human body have a relatively fixed spatial relationship in the target human body image and the characteristics of the limbs of the same part in the target human body image are relatively similar, the second characteristic diagram is subjected to the grouping convolution according to the spatial relationship of the limbs of the human body, so that the key point detection model can learn and refer to the spatial position and the image characteristics of the limbs of the same part in the same group of convolution, and the detection accuracy of the key points of the human body is effectively improved.
In addition, the third submodule is used for carrying out grouping convolution on the second characteristic diagram according to the human body limb space relation, and the grouping convolution specifically comprises the following steps: and performing grouping convolution on the second feature maps according to preset key point groups by using a third submodule, wherein the second feature maps of each group in the key point groups comprise second feature maps of first human body key points which are located at adjacent positions and belong to the same limb position of the human body, and the first human body key points are key points which are predefined according to the spatial relationship of the limb of the human body.
In addition, a first sub-module of the key point detection model is used for extracting a first characteristic diagram of the target human body image, and the method specifically comprises the following steps: and acquiring a first characteristic diagram of the target human body image by using a first submodule according to the lightweight convolutional neural network. The lightweight convolutional neural network can reduce the requirement of the key point detection model on the computing power, so that the first feature map is obtained according to the lightweight convolutional neural network, and the requirement of the key point detection model on the computing power of the equipment for operating the model can be reduced on the whole.
In addition, the method for acquiring the first characteristic diagram of the target human body image by using the first sub-module according to the lightweight convolutional neural network comprises the following steps: and reducing the number of channels of the last convolutional layer of the lightweight convolutional neural network to a preset number by using the first sub-module, and acquiring a first characteristic diagram of the target human body image according to the lightweight convolutional neural network after the number of channels is reduced. By reasonably setting the preset number and reducing the number of the channels of the last layer of convolution layer of the lightweight convolutional neural network to the preset number, the operation amount of the lightweight convolutional neural network can be reduced under the condition that the detection effect of the key point detection model is not large, and therefore the requirement of the key point detection model on the computing capacity of the equipment for operating the model is further reduced.
Drawings
One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.
Fig. 1 is a schematic flow chart of a human body key point detection method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a method for detecting key points in a human body according to a first embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for detecting key points in a human body according to a second embodiment of the present invention;
FIG. 4 is a schematic flow chart of a method for detecting key points in a human body according to a third embodiment of the present invention;
FIG. 5 is a diagram illustrating an exemplary method for detecting key points in a human body according to a third embodiment of the present invention;
FIG. 6 is a schematic block diagram of a human body keypoint detecting device according to a fourth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a network device according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The first embodiment of the invention relates to a human body key point detection method, which comprises the steps of extracting a first characteristic diagram of a target human body image by utilizing a first submodule of a key point detection model, wherein the first characteristic diagram comprises a plurality of groups of characteristic diagrams with different resolutions; carrying out deconvolution on a plurality of layers by using a second submodule of the key point detection model according to the first feature map to obtain a second feature map, wherein when carrying out deconvolution on the next layer, the second submodule is used for combining the first feature map and the third feature map which have the same resolution as a third feature map obtained by deconvolution on the previous layer, and then carrying out convolution; outputting thermodynamic diagrams of all human body key points of the target human body image according to the second characteristic diagram by using a third sub-module of the key point detection model, and determining all key points of the target human body image according to the thermodynamic diagrams. The first feature graph with the same resolution as the third feature graph obtained by deconvolution of the previous layer is linked to the third feature graph, and deconvolution of the next layer is performed after the two feature graphs are combined.
It should be noted that the human body key point detection method provided by the embodiment of the present invention is applied to a pre-trained key point detection model, that is, an execution subject is the key point detection model. It is to be understood that the first, second and third sub-modules of the keypoint detection model are components of the keypoint detection model. Optionally, the keypoint detection model may also include other components (sub-modules), which are not specifically limited herein. Optionally, the keypoint detection model may use data in an RGB format during training, and a learning rate, a training period, a batch size, and the like adopted by the keypoint detection model during training may all be set according to actual needs, which is not specifically limited in the embodiment of the present invention.
The specific flow of the human body key point detection method provided by the embodiment of the invention is shown in fig. 1, and specifically comprises the following steps:
s101: and extracting a first feature map of the target human body image by using a first sub-module of the key point detection model, wherein the first feature map comprises a plurality of groups of feature maps with different resolutions.
The target human body image refers to a human body image of a key point of a human body to be detected, and may be obtained by using a target detection method, for example, obtaining the target human body image from a frame image of a video by using the target detection method, where the specific target detection method may be set according to actual needs, and is not specifically limited herein.
Please refer to fig. 2, which is a schematic diagram illustrating a method for detecting key points of a human body according to an embodiment of the present invention. Specifically, a first sub-module of the key point detection model is used for extracting a first feature map of the target human body image, and corresponds to a feature extraction part in the map; the second submodule is used for carrying out deconvolution according to the first characteristic diagram and corresponding to a deconvolution part in the diagram; the third submodule is used for outputting thermodynamic diagrams of key points of the human body, and the thermodynamic diagrams correspond to the output thermodynamic diagram part in the diagrams.
Optionally, the first sub-module may specifically use a sub-neural network to extract the features of the target human body image, where the sub-neural network is, for example, a convolutional neural network, and extracts the first feature map of the target human body image through convolutional layers of the convolutional neural network, and the sub-neural network specifically used may be set according to actual needs, and is not specifically limited herein. By setting the sizes of the convolution kernels of each layer in the sub-neural network to different sizes, a plurality of sets of first feature maps with different resolutions, for example, 64 × 48, 32 × 24, 16 × 12, etc., can be obtained, and the sizes of the convolution kernels of each layer can be set according to actual needs, which is not specifically limited herein.
S102: and performing deconvolution on a plurality of layers by using a second submodule of the key point detection model according to the first feature map to obtain a second feature map, wherein when performing deconvolution on the next layer, the second submodule is used for combining the first feature map and the third feature map which have the same resolution as that of the third feature map obtained by the deconvolution of the previous layer, and then performing deconvolution.
As shown in FIG. 2, G00/G10 represents the deconvolution of the first layer, G01/G11 represents the deconvolution of the second layer, and G02/G12 represents the deconvolution of the third layer. In practical application, the number of deconvolution layers of the keypoint detection model can be set as required. It can be understood that, when the number of deconvolution layers is larger, the detection accuracy of the human body key points is higher, but the corresponding calculation amount is larger, so that the actual number of deconvolution layers can be reasonably set according to the application scene and/or the calculation capability of the device running the model.
In fig. 2, taking G01 as an example, the left (dark) square is a first feature map directly copied from the extracted first feature map, and the right (light) square is a third feature map obtained by deconvolving the previous layer, wherein the resolution of the left square is the same as that of the right square, for example, the resolution of the right square in G01 is 32 × 24, and the first feature map with the resolution of 32 × 24 is copied from the first feature map as the content of the left square. When deconvolution of G02 (next layer) is performed, the left square and the right square in G01 are combined and then deconvolved, specifically, the first feature map corresponding to the left square and the third feature map corresponding to the right square may be placed side by side, and then the deconvolved convolution kernel is used to deconvolve the feature maps placed side by side.
In a specific example, S102 includes: and when the deconvolution of the first layer is carried out, the key point detection model carries out deconvolution by using the second submodule according to the first feature map with the minimum resolution.
With continued reference to fig. 2, in performing deconvolution of G00 (the first layer), the block longest on the left in fig. 2 (i.e., the first feature map of the minimum resolution) is directly used for deconvolution. Optionally, the first feature map of the minimum resolution is a feature map extracted from a last convolutional layer of the sub-neural network of the first sub-module.
S103: outputting a thermodynamic diagram of each human body key point of the target human body image according to the second characteristic diagram by using a third submodule of the key point detection model, and determining each human body key point of the target human body image according to the thermodynamic diagram.
The number of the human body key points may be set according to actual situations, for example, 14, 16, 63, and the like, which is not limited specifically herein.
Optionally, the key point detection model performs convolution on the second feature map obtained by the second submodule by using a third submodule according to the number of the human key points, and outputs thermodynamic diagrams with the same number as the human key points, wherein each thermodynamic diagram corresponds to one human key point; and then determining all human body key points of the target human body image according to all thermodynamic diagrams by the key point detection model. Optionally, the size of the convolution kernel uses a size of 1 × 1 when the third submodule convolves the second feature map.
Compared with the prior art, the human body key point detection method provided by the embodiment of the invention extracts a plurality of groups of first characteristic graphs with different resolutions of the target human body image through the key point detection model; performing deconvolution on a plurality of layers according to the first feature map to obtain a second feature map, wherein when deconvolution is performed on the next layer, the first feature map and the third feature map which have the same resolution as the third feature map obtained by deconvolution on the previous layer are combined and then deconvolution is performed; outputting thermodynamic diagrams of all human body key points of the target human body image according to the second characteristic diagram, and determining all human body key points of the target human body image according to the thermodynamic diagrams. Because the information of the third feature map and the information of the combined first feature map can be mutually compensated, the two feature maps are combined and then subjected to deconvolution, so that the key point detection model can learn more information of human key points in the target human body image, and the detection precision of the human key points and the accuracy of the human posture estimation result are effectively improved.
A second embodiment of the present invention relates to a method for detecting human key points. The second embodiment is substantially the same as the first embodiment, and mainly differs therefrom in that: and performing deconvolution of a plurality of layers by using a second submodule of the key point detection model according to the first feature map, wherein the deconvolution comprises the following steps: and when the deconvolution of each layer is carried out, the deconvolution is divided into a plurality of groups according to the number of channels by using the second submodule. By dividing the deconvolution into a plurality of groups, the operation amount of the key point detection model can be greatly reduced, so that the key point detection model can run on a mobile terminal with lower computing power, and the application range of the human body key point detection method is widened.
The specific flow of the human body key point detection method provided by the embodiment of the invention is shown in fig. 3, and specifically comprises the following steps:
s201: and extracting a first feature map of the target human body image by using a first sub-module of the key point detection model, wherein the first feature map comprises a plurality of groups of feature maps with different resolutions.
S202: and performing deconvolution on a plurality of layers by using a second submodule of the key point detection model according to the first feature map to obtain a second feature map, wherein when performing deconvolution on the next layer, the second submodule is used for combining the first feature map and the third feature map which are the same as the third feature map and are obtained by deconvolution on the previous layer with resolution, and then performing deconvolution, and when performing deconvolution on each layer, the second submodule is used for dividing the deconvolution into a plurality of groups according to the number of channels.
S203: outputting thermodynamic diagrams of all human body key points of the target human body image according to the second characteristic diagram by using a third sub-module of the key point detection model, and determining all human body key points of the target human body image according to the thermodynamic diagrams.
S201 and S203 are the same as S101 and S103 in the first embodiment, and for details, reference may be made to the description in the first embodiment, and details are not repeated here to avoid repetition.
For S202, it can be understood that, in order to output the thermodynamic diagrams of a plurality of human body key points, a plurality of convolution kernels need to be used to deconvolve the first feature map, and the number of the convolution kernels corresponds to the number of channels, so that a plurality of channels exist in the deconvolution of each layer, and the key point detection model can perform deconvolution into a plurality of groups according to the number of the channels. The number of packets of each layer of deconvolution may be set according to actual needs, for example, the packets are divided into 2 groups, 4 groups, or 8 groups, and the like, which is not limited herein. Alternatively, the number of packets in each layer of deconvolution may be the same or different, for example, the number of packets in G00 and G01 in fig. 2 may be the same or different. In order to facilitate deconvolution of the preceding and subsequent layers, the number of groups in each layer may be set to be the same, for example, divided equally into 2 groups.
In order to evaluate the operation amount of the deconvolution, the operation amount of the i-th layer deconvolution can be calculated using the following calculation formula:
FLOPs i =C i-1 *H i *W i *C i *C;
wherein C represents a constant and is related to the size of a convolution kernel of deconvolution and the operation amount of each deconvolution; c i-1 Representing the number of channels of the deconvolved input, C i Representing the number of channels of the deconvolved output, H i And W i Respectively representing the length and width of the signature of the deconvolution output.
Due to H i And W i Since C is a constant and does not change when grouping, C is a constant when deconvoluting is performed by 2 groups i-1 And C i When the calculation amount of each group is reduced to 1/2 of the original calculation amount, the calculation amount of each group is reduced to 1/4 of the original calculation amount, and the total calculation amount is 1/2 of the original (non-grouped) calculation amount. If the deconvolution is to be 4 groups, C i-1 And C i When the calculation amount of each group is reduced to 1/4 of the original calculation amount, the calculation amount of each group is reduced to 1/16 of the original calculation amount, and the total calculation amount is 1/4 of the original (non-grouped) calculation amount.
Compared with the prior art, the method for detecting the key points of the human body provided by the embodiment of the invention has the advantages that the deconvolution is divided into a plurality of groups according to the number of channels when each layer of deconvolution is carried out, so that the operation amount of the deconvolution can be greatly reduced on the original basis, the detection precision of the key points of the human body can be improved by increasing the operation amount of the model (such as increasing the number of layers of deconvolution) under the same computing power of the key point detection model, or the detection speed of the key point detection model can be improved under the condition of not increasing the operation amount of the model, the computing power requirement on equipment for operating the model is reduced, the key point detection model can be operated on equipment with lower computing power (such as a mobile terminal), and the application range of the key point detection model and the method for detecting the key points of the human body is expanded.
In order to further reduce the requirement of the keypoint detection model on the computing power, in a specific example, in S201, the extracting, by using the first sub-module of the keypoint detection model, the first feature map of the target human body image may specifically be: and acquiring a first characteristic diagram of the target human body image by using a first submodule according to the lightweight convolutional neural network.
The lightweight convolutional neural network may be specifically selected according to actual needs, for example, mobileNet V2 or shuffenet V2, and the like, which is not specifically limited herein. The light-weight convolutional neural network can reduce the requirement of the key point detection model on the computing power, so that the first feature map is obtained according to the light-weight convolutional neural network, and the requirement of the key point detection model on the computing power of the equipment for operating the model can be reduced on the whole.
In a specific example, the obtaining of the first feature map of the target human body image by using the first sub-module according to the lightweight convolutional neural network may be further refined as follows: and reducing the number of channels of the last convolutional layer of the lightweight convolutional neural network to a preset number by using the first sub-module, and acquiring a first characteristic diagram of the target human body image according to the lightweight convolutional neural network after the number of channels is reduced.
The preset number can be determined according to the training effect, and if the channel number of the last convolutional layer of the lightweight convolutional neural network is reduced, and the degree of the reduction of the detected effect compared with the detection result before reduction is within a receivable range, the reduced channel number is used as the preset number for application. It can be understood that the operation amount of the lightweight convolutional neural network can be further reduced by reducing the number of channels to a preset number. For example, if the lightweight convolutional neural network is MobileNet V2, and if the number of channels in the last layer of MobileNet V2 is reduced from 1280 to 160, the detection accuracy of the keypoint detection model is only reduced by 0.5%, and then 160 may be used as the preset number of channels in the last layer.
By reasonably setting the preset number and reducing the number of the channels of the last layer of convolution layer of the lightweight convolutional neural network to the preset number, the operation amount of the lightweight convolutional neural network can be reduced under the condition that the detection effect of the key point detection model is not large, and therefore the requirement of the key point detection model on the computing capacity of the equipment for operating the model is further reduced.
A third embodiment of the present invention relates to a human body key point detection method. The third embodiment is substantially the same as the first embodiment, and mainly differs therefrom in that: outputting thermodynamic diagrams of all human key points of the target human body image by using a third sub-module of the key point detection model according to the second feature diagram, wherein the thermodynamic diagrams comprise: and performing grouping convolution on the second characteristic graph by using a third submodule according to the human body limb space relationship, and outputting thermodynamic diagrams of all human body key points of the target human body image according to the result of the grouping convolution. Because limbs at the same part of the human body have a relatively fixed spatial relationship in the image, the second characteristic diagram is subjected to grouping convolution according to the spatial relationship of the limbs of the human body, so that the detection of key points of the human body can be more accurate.
The specific flow of the human body key point detection method provided by the embodiment of the invention is shown in fig. 4, and specifically comprises the following steps:
s301: and extracting a first characteristic map of the target human body image by using a first sub-module of the key point detection model, wherein the first characteristic map comprises a plurality of groups of characteristic maps with different resolutions.
S302: and performing deconvolution on a plurality of layers by using a second submodule of the key point detection model according to the first feature map to obtain a second feature map, wherein when performing deconvolution on the next layer, the second submodule is used for combining the first feature map and the third feature map which are the same as the third feature map and are obtained by deconvolution on the previous layer of resolution, and then performing deconvolution.
S303: and performing grouping convolution on the second characteristic graph by using a third submodule of the key point detection model according to the human body limb space relation, outputting thermodynamic diagrams of all human body key points of the target human body image according to the grouping convolution results, and determining all human body key points of the target human body image according to the thermodynamic diagrams.
S301 and S302 are the same as S101 and S102 in the first embodiment, and for details, reference may be made to the description in the first embodiment, and details are not repeated here to avoid repetition.
For S303, specifically, because the limbs of the same part in the human body have a relatively fixed spatial relationship in the target human body image, for example, the relative positions of the key points of the human body in the hand are adjacent to each other, and because the limbs of the same part are concentrated in a certain area of the target human body image, the features in the target human body image are generally relatively similar, the second feature map is subjected to the group convolution according to the spatial relationship of the limbs of the human body, so that the key point detection model can learn and refer to the spatial positions and the image features of the limbs of the same part in the same group of convolution, thereby effectively improving the detection accuracy of the key points of the human body.
Optionally, the second feature map is specifically grouped according to the spatial relationship of the limbs of the human body, and the number of groups may be set according to actual needs, which is not specifically limited here. For example, the groups may be based on regions of the head, torso, hands, and feet. Alternatively, a part may be subdivided to obtain a new group, for example, a left hand and a right hand of the hand may be grouped into one group.
It can be understood that the second feature map is subjected to the grouping convolution according to the human body limb space relationship, and in the training process, the grouping convolution of the second feature map can affect the front grouping deconvolution and other parts of the key detection model through a back propagation algorithm, so that the overall detection effect of the key point detection model is obtained according to the human body limb space relationship, and the detection precision of the human body key points is effectively improved.
In a specific example, the performing, by using the third sub-module, a group convolution on the second feature map according to the spatial relationship of the human body limb may specifically be: and carrying out grouping convolution on the second feature maps according to preset key point groups by utilizing a third submodule, wherein the second feature maps of each group in the key point groups comprise second feature maps of first human body key points which are located at adjacent positions and belong to the same limb position of the human body, and the first human body key points are key points which are predefined according to the spatial relationship of the limb of the human body.
Please refer to fig. 5, which is a diagram illustrating a method for detecting key points of a human body according to an embodiment of the present invention. In fig. 5, the human body key points are divided into 63 key points, each of which is a key point predefined according to the spatial relationship of the human body limbs, for example, 0 and 58 human body key points refer to the left and right sides of the neck respectively. In fig. 5, each box represents a group in which key points of the head are grouped into one group, hands are grouped into four groups, the body is grouped into two groups, and feet are grouped into two groups.
It can be understood that the human body key points in fig. 5 are edge points of the target human body image, and when the key points of the skeleton in the target human body image need to be known, the key points can be obtained by calculating the relative positions or the relative distances between the key points of the skeleton and the human body key points defined in the figure.
Compared with the prior art, the human body key point detection method provided by the embodiment of the invention performs the grouping convolution on the second feature map according to the human body limb space relationship, and outputs the thermodynamic diagram of each human body key point of the target human body image according to the result of the grouping convolution. Because limbs of the same part in the human body have a relatively fixed spatial relationship in the target human body image and the characteristics of the limbs of the same part in the target human body image are relatively similar, the second characteristic diagram is subjected to grouping convolution according to the spatial relationship of the limbs of the human body, so that the key point detection model can mutually learn and refer to the spatial position and the image characteristics of the limbs of the same part in the same group of convolution, and the detection accuracy of the key points of the human body is effectively improved.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the steps contain the same logical relationship, which is within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A fourth embodiment of the present invention relates to a human body key point detection device 400, as shown in fig. 6, including: an extraction module 401, a deconvolution module 402 and a detection module 403, the functions of which are described in detail as follows:
the extraction module 401 is configured to extract a first feature map of the target human body image, where the first feature map includes multiple groups of feature maps with different resolutions;
a deconvolution module 402, configured to perform deconvolution of multiple layers according to the first feature map to obtain a second feature map, where, when performing deconvolution of a next layer, a first feature map and a third feature map that are the same as a third feature map obtained by deconvolution of a previous layer with resolution are combined and then subjected to deconvolution;
and the detecting module 403 is configured to output a thermodynamic diagram of each human body key point of the target human body image according to the second feature map, and determine each human body key point of the target human body image according to the thermodynamic diagram.
Further, the deconvolution module 402 is further configured to:
when deconvolution of each layer is performed, the deconvolution is divided into a plurality of groups according to the number of channels.
Further, the deconvolution module 402 is further configured to:
in the deconvolution of the first layer, the deconvolution is performed based on the first feature map of the minimum resolution.
Further, the detection module 403 is further configured to:
and performing grouping convolution on the second characteristic graph according to the human body limb space relationship, and outputting thermodynamic diagrams of all human body key points of the target human body image according to the result of the grouping convolution.
Further, the detection module 403 is further configured to:
and carrying out grouping convolution on the second feature maps according to preset key point groups, wherein the second feature maps of each group in the key point groups comprise second feature maps of first human body key points which are located at adjacent positions and belong to the same limb position of the human body, and the first human body key points are key points which are predefined according to the spatial relationship of the limb of the human body.
Further, the extraction module 401 is further configured to:
and acquiring a first characteristic diagram of the target human body image according to the lightweight convolutional neural network.
Further, the extraction module 401 is further configured to:
and reducing the number of channels of the last layer of convolution layer of the lightweight convolution neural network to a preset number, and acquiring a first characteristic diagram of the target human body image according to the lightweight convolution neural network after the number of channels is reduced.
It should be understood that this embodiment is an example of an apparatus corresponding to the first, second, and third embodiments, and may be implemented in cooperation with the first, second, and third embodiments. The related technical details mentioned in the first embodiment, the second embodiment and the third embodiment are still valid in the present embodiment, and are not described herein again in order to reduce the repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment, the second embodiment, and the third embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A fifth embodiment of the present invention relates to a network device, as shown in fig. 7, including at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the above-mentioned human body key point detection method.
Where the memory 502 and the processor 501 are coupled by a bus, the bus may comprise any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 501 and the memory 502 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 501 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 501.
The processor 501 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. While memory 502 may be used to store data used by processor 501 in performing operations.
A sixth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of practicing the invention, and that various changes in form and detail may be made therein without departing from the spirit and scope of the invention in practice.
Claims (9)
1. A human body key point detection method is characterized by being applied to a pre-trained key point detection model and comprising the following steps:
extracting a first feature map of the target human body image by using a first sub-module of the key point detection model, wherein the first feature map comprises a plurality of groups of feature maps with different resolutions;
performing deconvolution on a plurality of layers by using a second submodule of the key point detection model according to the first feature map to obtain a second feature map, wherein when performing deconvolution on the next layer, the first feature map and the third feature map which have the same resolution as that of a third feature map obtained by deconvolution on the previous layer are combined by using the second submodule to perform deconvolution;
outputting thermodynamic diagrams of all human body key points of the target human body image according to the second feature diagram by using a third sub-module of the key point detection model, and determining all human body key points of the target human body image according to the thermodynamic diagrams;
wherein the outputting, by the third sub-module of the keypoint detection model, the thermodynamic diagrams of the respective human keypoints of the target human image according to the second feature map includes:
and performing grouping convolution on the second characteristic diagram by using the third sub-module according to the human body limb space relationship, and outputting the thermodynamic diagram of each human body key point of the target human body image according to the result of the grouping convolution.
2. The method according to claim 1, wherein the deconvoluting a plurality of layers with the second sub-module of the keypoint detection model according to the first feature map comprises:
and when the deconvolution of each layer is carried out, the deconvolution is divided into a plurality of groups according to the number of channels by utilizing the second submodule.
3. The method according to claim 1, wherein the deconvoluting a plurality of layers with the second sub-module of the keypoint detection model according to the first feature map comprises:
and when the deconvolution of the first layer is carried out, carrying out deconvolution by utilizing the second submodule according to the first feature map with the minimum resolution.
4. The method for detecting human key points according to claim 1, wherein the performing, by using the third sub-module, the group convolution on the second feature map according to the human body limb space relationship specifically includes:
and performing grouping convolution on the second feature maps according to preset key point groups by using the third sub-module, wherein the second feature maps of each group in the key point groups comprise the second feature maps of first human body key points which are located at adjacent positions and belong to the same limb position of the human body, and the first human body key points are key points predefined according to the spatial relationship of the limb of the human body.
5. The method according to claim 1, wherein the extracting the first feature map of the target human body image by using the first sub-module of the keypoint detection model specifically comprises:
and acquiring a first characteristic diagram of the target human body image by using the first sub-module according to the lightweight convolutional neural network.
6. The method according to claim 5, wherein the obtaining a first feature map of the target human body image by the first sub-module according to a lightweight convolutional neural network comprises:
and reducing the number of channels of the last layer of convolution layer of the lightweight convolutional neural network to a preset number by using the first submodule, and acquiring a first characteristic diagram of the target human body image according to the lightweight convolutional neural network after the number of channels is reduced.
7. A human key point detection device, comprising:
the extraction module is used for extracting a first feature map of the target human body image, and the first feature map comprises a plurality of groups of feature maps with different resolutions;
the deconvolution module is used for carrying out deconvolution on a plurality of layers according to the first feature graph to obtain a second feature graph, wherein when the deconvolution of the next layer is carried out, the first feature graph and the third feature graph which are the same as the third feature graph obtained by deconvolution of the previous layer with resolution are combined and then are subjected to deconvolution;
the detection module is used for outputting thermodynamic diagrams of all human key points of the target human body image according to the second feature diagram and determining all human key points of the target human body image according to the thermodynamic diagrams;
wherein, the outputting the thermodynamic diagrams of the key points of the target human body image according to the second characteristic diagram comprises:
and performing grouping convolution on the second characteristic graph according to the human body limb space relationship, and outputting thermodynamic diagrams of all human body key points of the target human body image according to the result of the grouping convolution.
8. A network device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a human keypoint detection method as claimed in any one of claims 1 to 6.
9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the human keypoint detection method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010674493.2A CN111860276B (en) | 2020-07-14 | 2020-07-14 | Human body key point detection method, device, network equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010674493.2A CN111860276B (en) | 2020-07-14 | 2020-07-14 | Human body key point detection method, device, network equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111860276A CN111860276A (en) | 2020-10-30 |
CN111860276B true CN111860276B (en) | 2023-04-11 |
Family
ID=72983982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010674493.2A Active CN111860276B (en) | 2020-07-14 | 2020-07-14 | Human body key point detection method, device, network equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860276B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112784743B (en) * | 2021-01-21 | 2023-08-04 | 北京百度网讯科技有限公司 | Method and device for identifying key points and storage medium |
CN114638878B (en) * | 2022-03-18 | 2022-11-11 | 北京安德医智科技有限公司 | Two-dimensional echocardiogram pipe diameter detection method and device based on deep learning |
CN114757822B (en) * | 2022-06-14 | 2022-11-04 | 之江实验室 | Binocular-based human body three-dimensional key point detection method and system |
CN115578753B (en) * | 2022-09-23 | 2023-05-05 | 中国科学院半导体研究所 | Human body key point detection method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921225A (en) * | 2018-07-10 | 2018-11-30 | 深圳市商汤科技有限公司 | A kind of image processing method and device, computer equipment and storage medium |
CN109543549A (en) * | 2018-10-26 | 2019-03-29 | 北京陌上花科技有限公司 | Image processing method and device, mobile end equipment, server for more people's Attitude estimations |
CN111191622A (en) * | 2020-01-03 | 2020-05-22 | 华南师范大学 | Posture recognition method and system based on thermodynamic diagram and offset vector and storage medium |
CN111277759A (en) * | 2020-02-27 | 2020-06-12 | Oppo广东移动通信有限公司 | Composition prompting method and device, storage medium and electronic equipment |
CN111291729A (en) * | 2020-03-26 | 2020-06-16 | 北京百度网讯科技有限公司 | Human body posture estimation method, device, equipment and storage medium |
CN111310653A (en) * | 2020-02-13 | 2020-06-19 | 上海眼控科技股份有限公司 | Detection method and device for wearing helmet, computer equipment and storage medium |
CN111368673A (en) * | 2020-02-26 | 2020-07-03 | 华南理工大学 | Method for quickly extracting human body key points based on neural network |
-
2020
- 2020-07-14 CN CN202010674493.2A patent/CN111860276B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921225A (en) * | 2018-07-10 | 2018-11-30 | 深圳市商汤科技有限公司 | A kind of image processing method and device, computer equipment and storage medium |
CN109543549A (en) * | 2018-10-26 | 2019-03-29 | 北京陌上花科技有限公司 | Image processing method and device, mobile end equipment, server for more people's Attitude estimations |
CN111191622A (en) * | 2020-01-03 | 2020-05-22 | 华南师范大学 | Posture recognition method and system based on thermodynamic diagram and offset vector and storage medium |
CN111310653A (en) * | 2020-02-13 | 2020-06-19 | 上海眼控科技股份有限公司 | Detection method and device for wearing helmet, computer equipment and storage medium |
CN111368673A (en) * | 2020-02-26 | 2020-07-03 | 华南理工大学 | Method for quickly extracting human body key points based on neural network |
CN111277759A (en) * | 2020-02-27 | 2020-06-12 | Oppo广东移动通信有限公司 | Composition prompting method and device, storage medium and electronic equipment |
CN111291729A (en) * | 2020-03-26 | 2020-06-16 | 北京百度网讯科技有限公司 | Human body posture estimation method, device, equipment and storage medium |
Non-Patent Citations (4)
Title |
---|
A Recurrent Encoder-Decoder Network for Sequential Face Alignment;Xi Peng等;《arXiv》;20160823;第1-12页 * |
基于C-TOF成像的位姿测量与地物目标识别技术研究;卢纯青等;《红外与激光工程》;20200131;第49卷(第1期);参见第3.3节 * |
基于Mobile U-Net的多目标(行人)检测算法;金玥等;《工业控制计算机》;20200325(第03期);第84-86页 * |
基于轻量级网络的实时人体姿态估计方法;胡江颢等;《计算机工程》;20200410;参见第2节 * |
Also Published As
Publication number | Publication date |
---|---|
CN111860276A (en) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860276B (en) | Human body key point detection method, device, network equipment and storage medium | |
CN111192292B (en) | Target tracking method and related equipment based on attention mechanism and twin network | |
CN111859023B (en) | Video classification method, apparatus, device and computer readable storage medium | |
CN111340077B (en) | Attention mechanism-based disparity map acquisition method and device | |
CN111860398B (en) | Remote sensing image target detection method and system and terminal equipment | |
CN114186632B (en) | Method, device, equipment and storage medium for training key point detection model | |
CN108846851B (en) | Moving target tracking method and terminal equipment | |
CN108734127B (en) | Age identification value adjusting method, age identification value adjusting device, age identification value adjusting equipment and storage medium | |
CN112085056A (en) | Target detection model generation method, device, equipment and storage medium | |
CN111898735A (en) | Distillation learning method, distillation learning device, computer equipment and storage medium | |
CN113052868A (en) | Cutout model training and image cutout method and device | |
CN114138231B (en) | Method, circuit and SOC for executing matrix multiplication operation | |
CN115457364A (en) | Target detection knowledge distillation method and device, terminal equipment and storage medium | |
CN114387450B (en) | Picture feature extraction method and device, storage medium and computer equipment | |
CN109359542B (en) | Vehicle damage level determining method based on neural network and terminal equipment | |
CN112862095B (en) | Self-distillation learning method and device based on feature analysis and readable storage medium | |
CN113780523A (en) | Image processing method, image processing device, terminal equipment and storage medium | |
CN110633630B (en) | Behavior identification method and device and terminal equipment | |
CN110502975B (en) | Batch processing system for pedestrian re-identification | |
US20210397953A1 (en) | Deep neural network operation method and apparatus | |
CN111104965A (en) | Vehicle target identification method and device | |
CN109816709B (en) | Monocular camera-based depth estimation method, device and equipment | |
CN110147819B (en) | Video feature extraction method and device, readable storage medium and terminal equipment | |
CN110689513B (en) | Color image fusion method and device and terminal equipment | |
CN113989121A (en) | Normalization processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |