CN109508681B - Method and device for generating human body key point detection model - Google Patents

Method and device for generating human body key point detection model Download PDF

Info

Publication number
CN109508681B
CN109508681B CN201811380813.2A CN201811380813A CN109508681B CN 109508681 B CN109508681 B CN 109508681B CN 201811380813 A CN201811380813 A CN 201811380813A CN 109508681 B CN109508681 B CN 109508681B
Authority
CN
China
Prior art keywords
model
initial
human body
attention
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811380813.2A
Other languages
Chinese (zh)
Other versions
CN109508681A (en
Inventor
鲍慊
刘武
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811380813.2A priority Critical patent/CN109508681B/en
Publication of CN109508681A publication Critical patent/CN109508681A/en
Application granted granted Critical
Publication of CN109508681B publication Critical patent/CN109508681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a method and a device for generating a human body key point detection model. One embodiment of the method comprises: acquiring a sample set comprising a sample human body image and marking information; selecting samples from the sample set, and performing the following training steps: inputting a sample human body image of the selected sample into the initial first model to obtain a characteristic diagram of the pyramid structure; determining a first-layer loss value based on the feature map and the labeling information of the key points in the sample human body image; inputting the characteristic diagram into an initial second model to obtain the position coordinates of the detected key points; determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image; in response to determining that the training of the initial first model and the initial second model is complete, the initial first model and the initial second model are determined to be human keypoint detection models. The embodiment can more accurately detect the blocked or hidden human key points.

Description

Method and device for generating human body key point detection model
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a human body key point detection model.
Background
The human body key point detection is to obtain the positions of human body key points in an image or a video through a computer vision technology and is divided into two problems of single key point detection and multi-person key point detection. After human body detection, the multi-person key point detection usually obtains the key point positions of each person in the picture by using a single key point detection method, so that the improvement of the performance of the single key point detection method is especially important. The deep learning method provides an effective solution for improving the detection accuracy of the key points of the human body.
In the related art, the detection accuracy of the key points such as wrists and ankles which are easily shielded and deformed is low. And the difference of feature maps with different scales is not considered, so that the problems that the target area of the key point is small and difficult to detect cannot be effectively solved.
Disclosure of Invention
The embodiment of the application provides a method and a device for generating a human body key point detection model.
In a first aspect, an embodiment of the present application provides a method for generating a human body key point detection model, including: acquiring a sample set, wherein samples in the sample set comprise sample human body images and marking information of key points in the sample human body images; selecting samples from the sample set, and performing the following training steps: inputting a sample human body image of the selected sample into the initial first model to obtain a characteristic diagram of the pyramid structure; determining a first-layer loss value based on the feature map and the labeling information of the key points in the sample human body image; inputting the characteristic diagram into an initial second model to obtain the position coordinates of the detected key points; determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image; determining whether the training of the initial first model and the initial second model is finished based on the first layer loss value and the second layer loss value; in response to determining that the training of the initial first model and the initial second model is complete, the initial first model and the initial second model are determined to be human keypoint detection models.
In some embodiments, inputting a sample human body image of the selected sample into the initial first model to obtain a feature map of the pyramid structure, including: inputting the sample human body image of the selected sample into a residual error network to obtain a feature map output by the last convolution layer of each residual error block; and (4) respectively passing the feature maps output by the convolution layers through the full convolution layers, and then obtaining the feature map of the pyramid structure through horizontal connection after up-sampling.
In some embodiments, determining the first layer loss value based on the feature map and the annotation information of the key points in the sample human body image comprises: generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image; generating a predetermined number of first predictive thermodynamic diagrams according to the feature map, wherein each first predictive thermodynamic diagram corresponds to a key point; a first layer loss value is determined based on a positional deviation of each keypoint in the real thermodynamic diagram from the first predicted thermodynamic diagram.
In some embodiments, inputting the feature map into the initial second model, and obtaining the position coordinates of the detected keypoints comprises: generating an attention feature map according to the feature map; generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point; and for a second predictive thermodynamic diagram in the predetermined number of second predictive thermodynamic diagrams, detecting the position coordinates of the corresponding key points according to the positions of the maximum probability pixels in each second predictive thermodynamic diagram.
In some embodiments, generating an attention feature map from the feature map comprises: adding the feature map into bottleneck blocks of different times to obtain feature maps of different scales; the feature maps with different scales are subjected to upsampling and then fused together to obtain a first feature map; inputting feature maps of different scales into an attention model to obtain first attention maps of different resolutions; the first attention diagrams with different resolutions are fused together after being subjected to upsampling to obtain a fused first attention diagram, and the fused first attention diagram is combined with the first feature diagram to obtain a second feature diagram; inputting the second feature map into the attention model to obtain a second attention map; and combining the second attention diagram and the second characteristic diagram to obtain the attention characteristic diagram.
In some embodiments, determining the second layer loss value based on the position coordinates of the detected keypoints and the annotation information of the keypoints in the sample human body image comprises: generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image; generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point; and determining a second layer loss value based on the position deviation of each key point in the real thermodynamic diagram and the second predicted thermodynamic diagram.
In some embodiments, the method further comprises: and in response to determining that the initial first model and the initial second model are not trained, adjusting relevant parameters in the initial first model and the initial second model, reselecting the sample from the sample set, and continuing to perform the training step by using the adjusted initial first model and the adjusted initial second model.
In a second aspect, an embodiment of the present application provides a method for detecting a human body, including: acquiring a human body image of a detection object; inputting the human body image into the human body key point detection model generated by the method according to the first aspect, and generating the position coordinates of the human body key points of the detection object.
In a third aspect, an embodiment of the present application provides an apparatus for generating a human body key point detection model, including: the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a sample set, and samples in the sample set comprise a sample human body image and marking information of key points in the sample human body image; a training unit configured to select samples from a set of samples, and to perform the following training steps: inputting a sample human body image of the selected sample into the initial first model to obtain a characteristic diagram of the pyramid structure; determining a first-layer loss value based on the feature map and the labeling information of the key points in the sample human body image; inputting the characteristic diagram into an initial second model to obtain the position coordinates of the detected key points; determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image; determining whether the training of the initial first model and the initial second model is finished based on the first layer loss value and the second layer loss value; in response to determining that the training of the initial first model and the initial second model is complete, the initial first model and the initial second model are determined to be human keypoint detection models.
In some embodiments, the training unit is further configured to: inputting the sample human body image of the selected sample into a residual error network to obtain a feature map output by the last convolution layer of each residual error block; and (4) respectively passing the feature maps output by the convolution layers through the full convolution layers, and then obtaining the feature map of the pyramid structure through horizontal connection after up-sampling.
In some embodiments, the training unit is further configured to: generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image; generating a predetermined number of first predictive thermodynamic diagrams according to the feature map, wherein each first predictive thermodynamic diagram corresponds to a key point; a first layer loss value is determined based on a positional deviation of each keypoint in the real thermodynamic diagram from the first predicted thermodynamic diagram.
In some embodiments, the training unit is further configured to: generating an attention feature map according to the feature map; generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point; and for a second predictive thermodynamic diagram in the predetermined number of second predictive thermodynamic diagrams, detecting the position coordinates of the corresponding key points according to the positions of the maximum probability pixels in each second predictive thermodynamic diagram.
In some embodiments, the training unit is further configured to: adding the feature map into bottleneck blocks of different times to obtain feature maps of different scales; the feature maps with different scales are subjected to upsampling and then fused together to obtain a first feature map; inputting feature maps of different scales into an attention model to obtain first attention maps of different resolutions; the first attention diagrams with different resolutions are fused together after being subjected to upsampling to obtain a fused first attention diagram, and the fused first attention diagram is combined with the first feature diagram to obtain a second feature diagram; inputting the second feature map into the attention model to obtain a second attention map; and combining the second attention diagram and the second characteristic diagram to obtain the attention characteristic diagram.
In some embodiments, the training unit is further configured to: generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image; generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point; and determining a second layer loss value based on the position deviation of each key point in the real thermodynamic diagram and the second predicted thermodynamic diagram.
In some embodiments, the apparatus further comprises an adjustment unit configured to: and in response to determining that the initial first model and the initial second model are not trained, adjusting relevant parameters in the initial first model and the initial second model, reselecting the sample from the sample set, and continuing to perform the training step by using the adjusted initial first model and the adjusted initial second model.
In a fourth aspect, an embodiment of the present application provides an apparatus for detecting key points of a human body, including: a detection unit configured to acquire a human body image of a detection object; a generating unit configured to input the human body image into the human body key point detection model generated by the method according to any one of the first aspect, and generate position coordinates of the human body key points of the detection object.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.
In a sixth aspect, the present application provides a computer-readable medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method according to any one of the first aspect.
According to the method and the device for generating the human body key point detection model, the pyramid model and the attention model are fused to generate the human body key point detection model. For the convolutional neural network, different depths correspond to different levels of semantic features, the shallow network has high resolution, the detail features are concerned, the deep network has low resolution, and the semantic features are concerned. The cascade model is implemented by connecting two or more neural networks in series, so as to obtain more context information. Under the condition that the calculated amount of an original model is not increased basically, the multi-scale problem in object detection can be solved by changing network connection. The attention model (attention model) measures the importance of different features to the current task by calculating the weight of the features, thereby focusing on important features and weakening unimportant features. The technologies are beneficial for the network to detect the hidden or hidden human key points more accurately.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of generating a human keypoint detection model according to the application;
FIGS. 3a and 3b are schematic diagrams of an application scenario of a method for generating a human body keypoint detection model according to the application;
FIG. 4 is a flow diagram of yet another embodiment of a method of generating a human keypoint detection model according to the present application;
FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating a human keypoint detection model according to the application;
FIG. 6 is a schematic block diagram of one embodiment of an apparatus for detecting key points in a human body according to the present application;
FIG. 7 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which a method of generating a human keypoint detection model, an apparatus for generating a human keypoint detection model, a method of detecting human keypoints, or an apparatus for detecting human keypoints according to an embodiment of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminals 101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the terminals 101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user 110 may use the terminals 101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The terminals 101 and 102 may have various client applications installed thereon, such as a model training application, a human key point detection and recognition application, a shopping application, a payment application, a web browser, an instant messenger, and the like.
Here, the terminals 101 and 102 may be hardware or software. When the terminals 101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the terminals 101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminals 101, 102 are hardware, an image capturing device may be further mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user 110 may use the image capturing device on the terminal 101, 102 to capture a human body image of himself or another person.
Database server 104 may be a database server that provides various services. For example, a database server may have a sample set stored therein. The sample set contains a large number of samples. The sample can include a sample human body image and labeling information of key points in the sample human body image. In this way, the user 110 may also select samples from a set of samples stored by the database server 104 via the terminals 101, 102.
The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the terminals 101, 102. The background server may train the initial model by using samples in the sample set sent by the terminals 101 and 102, and may send a training result (e.g., the generated human body key point detection model) to the terminals 101 and 102. In this way, the user can apply the generated human key point detection model to perform human key point detection.
Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating a human body key point detection model or the method for detecting a human body provided by the embodiment of the present application is generally performed by the server 105. Accordingly, the means for generating a human body keypoint detection model or the means for detecting human body keypoints is generally also provided in the server 105.
It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.
It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of generating a human keypoint detection model according to the present application is shown. The method for generating the human body key point detection model can comprise the following steps:
step 201, a sample set is obtained.
In this embodiment, the execution subject of the method of generating a human body keypoint detection model (e.g., the server 105 shown in fig. 1) may obtain the sample set in a variety of ways. For example, the executing entity may obtain the existing sample set stored therein from a database server (e.g., database server 104 shown in fig. 1) via a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal (e.g., terminals 101, 102 shown in FIG. 1). In this way, the executing entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set.
Here, the sample set may include at least one sample. The sample can include a sample human body image and annotation information associated with key points in the sample human body image.
Optionally, data enhancement of the training samples may be performed, including rotation, size change, cropping, flipping, changing light intensity, etc., to obtain augmented training data and to make the model more generalized. Experiments show that the size of an input picture can influence the accuracy of key point detection, and the larger the size of the input picture is in a certain range, the more accurate the position of a detected key point is. Since the shape of a person in a picture is generally a long bar, the method sets the size of an input picture to 864 x 648, taking accuracy and calculation into consideration. In implementation, the picture is cropped while ensuring that the aspect ratio of the picture is unchanged, and the picture size is modified to 864 x 648 after zero padding of the picture edges. When data enhancement is carried out on the picture, corresponding operations such as rotation, scale change and overturning are carried out on the marked key point coordinates.
In the present embodiment, the sample human body image generally refers to an image containing a human body. It may be a planar human body image or a stereoscopic human body image (i.e., a human body image containing depth information). And the sample human body image may be a color image (e.g., RGB (Red, Green, Blue, Red-Green-Blue) photograph) and/or a grayscale image, etc. The Format of the Image is not limited in the present application, and may be a Format such as jpg (Joint Photo graphics Experts Group, a picture Format), BMP (Bitmap, Image file Format), or RAW (RAW Image Format), as long as the subject reading and recognition can be performed.
At step 202, a sample is selected from a sample set.
In this embodiment, the executing subject may select a sample from the sample set obtained in step 201, and perform the training steps from step 203 to step 208. The selection manner and the number of samples are not limited in the present application. For example, at least one sample may be randomly selected, or a sample with better definition (i.e., higher pixels) of the human body image may be selected from the samples.
Step 203, inputting the sample human body image of the selected sample into the initial first model to obtain the characteristic diagram of the pyramid structure.
In this embodiment, the executive may input the sample human image of the sample selected in step 202 into the initial first model. By detecting and analyzing the key point regions in the sample human body image, a feature map containing key points can be obtained.
In this embodiment, the initial first model may be an existing variety of neural network models created based on machine learning techniques. The neural network model may have various existing neural network structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). The storage location of the initial model is likewise not limited in this application.
As shown in fig. 3a, in the first stage of the cascade model, Resnet101 may be selected as a basic network structure, the feature maps conv2, conv3, conv4, conv5 output by the last convolutional layer of each residual block are taken out to pass through 1 × 1 full convolutional layer 1, and then the feature maps of the feature pyramid structure are obtained by up-sampling and transverse connecting the layers.
And step 204, determining a first-layer loss value based on the feature map and the labeling information of the key points in the sample human body image.
In this embodiment, the executive subject may analyze the labeling information of the key points of the sample human body image and the feature map obtained in step 203, so as to determine the first layer loss value. For example, the feature map and the label information of the key point may be used as parameters and input to a specified loss function (loss function), so that a loss value between the feature map and the key point can be calculated.
In this embodiment, the loss function is usually used to measure the degree of inconsistency between the predicted value (e.g. feature map) and the actual value (e.g. annotation information) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function may be set according to actual requirements.
In some optional implementations of this embodiment, determining the first-layer loss value based on the feature map and the annotation information of the key points in the sample human body image includes: and generating a real thermodynamic diagram (heatmap) for each key point according to the labeling information of the key points in the sample human body image. And generating a preset number of first predictive thermodynamic diagrams according to the feature maps, wherein each first predictive thermodynamic diagram corresponds to one key point. A first layer loss value is determined based on a positional deviation of each keypoint in the real thermodynamic diagram from the first predicted thermodynamic diagram. And (3) sequentially passing each hierarchical feature diagram output in the step 203 through a 1 × 1 convolutional layer and a 3 × 3 convolutional layer to obtain the detected key point thermodynamic diagrams under each resolution, and calculating the L2 losses of the detected key point thermodynamic diagrams and the real key point thermodynamic diagrams as a first layer loss function of the network.
Step 205, inputting the feature map into the initial second model to obtain the position coordinates of the detected key points.
In this embodiment, the executing agent may input the feature map generated in step 203 into the initial second model, and obtain the position coordinates of the detected key points. The initial second model may be an attention model based neural network. The primary purpose of the initial second model is to extract the concerned features from feature maps of different scales, and the detailed features and the semantic features can be retained. Thereby concentrating the important features and weakening the unimportant features.
In some optional implementations of this embodiment, inputting the feature map into the initial second model, and obtaining the position coordinates of the detected key points includes: generating an attention feature map according to the feature map; generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point; and for a second predictive thermodynamic diagram in the predetermined number of second predictive thermodynamic diagrams, detecting the position coordinates of the corresponding key points according to the positions of the maximum probability pixels in each second predictive thermodynamic diagram.
In some optional implementations of this embodiment, generating the attention feature map according to the feature map includes:
and step 2051, adding the feature maps into the bottleneck blocks of different times to obtain feature maps of different scales.
In this embodiment, in the second stage of the cascade model, bottleeck (bottleneck block) of different times is added to the feature maps of each level output in step 203, so as to obtain feature maps of different scales. Stacking more bottleneck blocks into deeper levels, with smaller space sizes, achieves a good balance between efficiencies.
And step 2052, the feature maps with different scales are subjected to upsampling and then fused together to obtain a first feature map.
In this embodiment, the feature maps of different scales obtained in step 2051 are upsampled and then subjected to pixel-wise addition (pixel-wise add) to obtain a feature map fcSee 301 in fig. 3 a.
And step 2053, inputting the feature maps with different scales into the attention model to obtain first attention maps with different resolutions.
In this embodiment, the different-scale feature maps output in step 2051 are subjected to attention maps (attention maps) with different resolutions by an attention model, see 302 in fig. 3 a.
And step 2054, the first attention diagrams with different resolutions are fused together after being subjected to upsampling to obtain a fused first attention diagram, and the fused first attention diagram is combined with the first feature diagram to obtain a second feature diagram.
In this embodiment, the fused first attention map is combined with f obtained in step 2052cCombining to obtain refined feature maps fAM1See 303 in fig. 3 a.
In step 2055, the second feature map is input into the attention model to obtain a second attention map.
In this embodiment, still further, fAM1Obtaining a refined attention map AM from an attention model2I.e. 304 in fig. 3 a.
And step 2056, combining the second attention map and the second feature map to obtain an attention feature map.
In this embodiment, f isAM1And AM2Combining to obtain a refined feature map fout. I.e. 305 in fig. 3 a. In this step, different resolutions focus on different image features, small resolutions focus on global information, and high resolutions focus more on local details.
And step 206, determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image.
In this embodiment, the execution subject may analyze the labeling information of the key points of the sample human body image and the position coordinates of the key points obtained in step 205, so as to determine the second layer loss value. For example, the position coordinates of the detected key points and the label information of the key points may be input to a predetermined loss function (loss function) as parameters, and a loss value between the two values may be calculated.
In this embodiment, the loss function is generally used to measure the degree of inconsistency between the predicted value (e.g. the position coordinates of the detected key points) and the actual value (e.g. the annotation information of the key points) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function may be set according to actual requirements.
In some optional implementations of this embodiment, the second layer is determined based on the position coordinates of the detected key points and the label information of the key points in the sample human body imageLoss values, including: and generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image. And generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to one key point. And determining a second layer loss value based on the position deviation of each key point in the real thermodynamic diagram and the second predicted thermodynamic diagram. Feature map f output in step 2056outAnd inputting the heat maps into the full convolutional layer 2 (namely, sequentially passing through 1 × 1 convolutional layer and 3 × 3 convolutional layers) to obtain detected key points, wherein the number of the heat maps is the same as that of the key points, each heat map corresponds to one key point, and the position of the maximum probability pixel is searched on each heat map, namely the position coordinate of the detected key point. And calculating the L2 loss of the detection thermodynamic diagram and the real thermodynamic diagram of the key points as a function of the loss of the second stage of the network.
Step 207, determining whether the initial first model and the initial second model are trained based on the first layer loss value and the second layer loss value.
In this embodiment, the first layer loss value and the second layer loss value are added to obtain the total loss value of the network. In each iterative training process, inputting pictures and corresponding key point marking data, calculating a first layer loss value and a second layer loss value by forward propagation, then calculating the gradient of the first layer loss value and the second layer loss value, and completing the backward propagation of the network and updating parameters. Experiments show that after a certain number of iterations, the first layer loss value and the second layer loss value are changed, only key points which are difficult to detect are concerned, namely, only a plurality of key point channels with larger second loss values are calculated and returned, and therefore a better detection effect on the key points which are difficult to detect is achieved.
From the change in the loss value, the execution subject may determine whether the initial model is trained. As an example, if multiple samples are selected in step 202, the performing agent may determine that the initial first model and the initial second model are trained to be complete if the total loss value of each sample reaches the target value. As another example, the performing agent may count the proportion of samples with total loss values reaching the target value to the selected samples. And when the ratio reaches a preset sample ratio (e.g., 95%), it can be determined that the initial model training is complete.
In this embodiment, if the executing entity determines that the training of the initial first model and the initial second model is completed, the executing entity may continue to execute step 208. If the executing agent determines that the initial first model and the initial second model are not trained, the relevant parameters in the initial first model and the initial second model may be adjusted. The weights in each convolutional layer in the initial first model and the weights in each attention model in the initial second model are modified, for example, using a back propagation technique. And may return to step 202 to re-select samples from the sample set. So that the training steps described above can be continued.
It should be noted that the selection manner is not limited in the present application. For example, in the case where there are a large number of samples in the sample set, the execution subject may select a non-selected sample from the sample set.
Step 208, in response to determining that the training of the initial first model and the initial second model is complete, determining the initial first model and the initial second model as the human body key point detection models.
In this embodiment, if the execution subject determines that the training of the initial first model and the initial second model is completed, the initial first model and the initial second model may be determined as the human body key point detection model.
Optionally, the executing entity may store the generated human body key point detection model locally, or may send it to a terminal or a database server.
According to the method provided by the embodiment of the application, the attention model is added into the cascaded characteristic pyramid model, so that the accuracy of detecting the human key points which are difficult to detect, blocked or rarely act is improved. The test result is as the example of fig. 3b, and the scheme can accurately detect 14 key points of the head, the neck, the left and right shoulders, the left and right elbows, the left and right wrists, the left and right hips, the left and right knees and the left and right ankles of the human body, and can accurately detect key points which are difficult to detect, are shielded and have rare actions.
Referring to fig. 4, a flowchart 400 of an embodiment of a method for detecting a human body provided by the present application is shown. The method for detecting a human body may include the steps of:
step 401, acquiring a human body image of a detection object.
In the present embodiment, an execution subject (e.g., the server 105 shown in fig. 1) of the method for detecting a human body may acquire a human body image of a detection target in various ways. For example, the execution subject may obtain the human body image stored therein from a database server (e.g., database server 104 shown in fig. 1) through a wired connection manner or a wireless connection manner. As another example, the execution subject may also receive a human body image captured by a terminal (e.g., terminals 101, 102 shown in fig. 1) or other device.
In the present embodiment, the detection object may be any user, such as a user using a terminal, or another user who appears in the image capturing range, or the like. The body image may equally be a color image and/or a grayscale image, etc. And the format of the human body image is not limited in this application.
Step 402, inputting the human body image into the human body key point detection model, and generating the position coordinates of the human body key points of the detection object.
In this embodiment, the executing subject may input the human body image acquired in step 401 into the human body key point detection model, thereby generating a human body key point detection result of the detection object. The human body key point detection result may be position information for describing key points of the human body in the image.
In this embodiment, the human key point detection model may be generated by the method described in the embodiment of fig. 2. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.
It should be noted that the method for detecting a human body in this embodiment may be used to test the human body key point detection model generated in the foregoing embodiments. And then the human body key point detection model can be continuously optimized according to the test result. The method may also be a practical application method of the human body key point detection model generated by the above embodiments. The human body key point detection model generated by the embodiments is adopted to detect the human body key points, and the performance of human body key point detection is improved. If more human key points are found, the found human key point information is more accurate, and the like. The accuracy of human key point detection of difficult detection, sheltered or rare actions is improved.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a human body key point detection model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for generating a human body key point detection model of the present embodiment includes: an acquisition unit 501, a training unit 502 and an adjustment unit 503. Wherein the obtaining unit 501 is configured to obtain a sample set, wherein samples in the sample set include a sample human body image and annotation information of key points in the sample human body image. The training unit 502 is configured to select samples from a sample set and to perform the following training steps: inputting a sample human body image of the selected sample into the initial first model to obtain a characteristic diagram of the pyramid structure; determining a first-layer loss value based on the feature map and the labeling information of the key points in the sample human body image; inputting the characteristic diagram into an initial second model to obtain the position coordinates of the detected key points; determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image; determining whether the training of the initial first model and the initial second model is finished based on the first layer loss value and the second layer loss value; in response to determining that the training of the initial first model and the initial second model is complete, the initial first model and the initial second model are determined to be human keypoint detection models.
In this embodiment, the specific processes of the obtaining unit 501, the training unit 502 and the adjusting unit 503 of the apparatus 500 for generating a human body key point detection model may refer to steps 201 and 208 in the corresponding embodiment of fig. 2.
In some optional implementations of this embodiment, the training unit 502 is further configured to: inputting the sample human body image of the selected sample into a residual error network to obtain a feature map output by the last convolution layer of each residual error block; and (4) respectively passing the feature maps output by the convolution layers through the full convolution layers, and then obtaining the feature map of the pyramid structure through horizontal connection after up-sampling.
In some optional implementations of this embodiment, the training unit 502 is further configured to: generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image; generating a predetermined number of first predictive thermodynamic diagrams according to the feature map, wherein each first predictive thermodynamic diagram corresponds to a key point; a first layer loss value is determined based on a positional deviation of each keypoint in the real thermodynamic diagram from the first predicted thermodynamic diagram.
In some optional implementations of this embodiment, the training unit 502 is further configured to: generating an attention feature map according to the feature map; generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point; and for a second predictive thermodynamic diagram in the predetermined number of second predictive thermodynamic diagrams, detecting the position coordinates of the corresponding key points according to the positions of the maximum probability pixels in each second predictive thermodynamic diagram.
In some optional implementations of this embodiment, the training unit 502 is further configured to: adding the feature map into bottleneck blocks of different times to obtain feature maps of different scales; the feature maps with different scales are subjected to upsampling and then fused together to obtain a first feature map; inputting feature maps of different scales into an attention model to obtain first attention maps of different resolutions; the first attention diagrams with different resolutions are fused together after being subjected to upsampling to obtain a fused first attention diagram, and the fused first attention diagram is combined with the first feature diagram to obtain a second feature diagram; inputting the second feature map into the attention model to obtain a second attention map; and combining the second attention diagram and the second characteristic diagram to obtain the attention characteristic diagram.
In some optional implementations of this embodiment, the training unit 502 is further configured to: generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image; generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point; and determining a second layer loss value based on the position deviation of each key point in the real thermodynamic diagram and the second predicted thermodynamic diagram.
In some optional implementations of this embodiment, the apparatus 500 further includes an adjusting unit 503 configured to: and in response to determining that the initial first model and the initial second model are not trained, adjusting relevant parameters in the initial first model and the initial second model, reselecting the sample from the sample set, and continuing to perform the training step by using the adjusted initial first model and the adjusted initial second model.
With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for detecting key points of a human body, which corresponds to the embodiment of the method shown in fig. 4, and which can be applied in various electronic devices.
As shown in fig. 6, the apparatus 600 for detecting key points of a human body according to the present embodiment includes: a detection unit 601 and a generation unit 602. Wherein the detection unit 601 is configured to acquire a human body image of the detection object. The generating unit 602 is configured to input the human body image into a human body key point detection model generated by the method described in the embodiment of fig. 2, and generate the position coordinates of the human body key points of the detection object.
It will be understood that the elements described in the apparatus 600 correspond to various steps in the method described with reference to fig. 4. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a touch panel, a keyboard, a mouse, a camera, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a training unit. As another example, it can also be described as: a processor includes a detection unit and a generation unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a sample set".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a sample set, wherein samples in the sample set comprise sample human body images and marking information of key points in the sample human body images; selecting samples from the sample set, and performing the following training steps: inputting a sample human body image of the selected sample into the initial first model to obtain a characteristic diagram of the pyramid structure; determining a first-layer loss value based on the feature map and the labeling information of the key points in the sample human body image; inputting the characteristic diagram into an initial second model to obtain the position coordinates of the detected key points; determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image; determining whether the training of the initial first model and the initial second model is finished based on the first layer loss value and the second layer loss value; in response to determining that the training of the initial first model and the initial second model is complete, the initial first model and the initial second model are determined to be human keypoint detection models.
Further, the one or more programs, when executed by the electronic device, may further cause the electronic device to: acquiring a human body image of a detection object; and inputting the human body image into the human body key point detection model to generate the position coordinates of the human body key points of the detection object. The human key point detection model may be generated by using the method for generating the human key point detection model described in the above embodiments.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (14)

1. A method of generating a human keypoint detection model, comprising:
acquiring a sample set, wherein samples in the sample set comprise sample human body images and marking information of key points in the sample human body images;
selecting samples from the sample set, and performing the following training steps: inputting a sample human body image of a selected sample into an initial first model to obtain a characteristic diagram of a pyramid structure, wherein the initial first model adopts Resnet101 as a basic network structure; determining a first-layer loss value based on the feature map and the labeling information of key points in the sample human body image; inputting the characteristic diagram into an initial second model to obtain the position coordinates of the detected key points, wherein the initial second model is a neural network based on an attention model; determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image; determining whether training of the initial first model and the initial second model is completed based on the first layer loss value and the second layer loss value; in response to determining that training of the initial first model and the initial second model is complete, determining the initial first model and the initial second model as human key point detection models;
wherein, the inputting the feature map into the initial second model to obtain the position coordinates of the detected key points comprises:
generating an attention feature map according to the feature map;
generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point;
for a second predictive thermodynamic diagram of the predetermined number of second predictive thermodynamic diagrams, detecting position coordinates of corresponding key points according to the position of the maximum probability pixel in each second predictive thermodynamic diagram;
the generating of the attention feature map according to the feature map comprises:
adding the characteristic diagram into bottleneck blocks of different times to obtain characteristic diagrams of different scales;
the feature maps with different scales are subjected to upsampling and then fused together to obtain a first feature map;
inputting the feature maps of different scales into an attention model to obtain first attention maps of different resolutions;
the first attention diagrams with different resolutions are fused together after being subjected to upsampling to obtain a fused first attention diagram, and the fused first attention diagram is combined with the first feature diagram to obtain a second feature diagram;
inputting the second feature map into the attention model to obtain a second attention map;
and combining the second attention map and the second feature map to obtain an attention feature map.
2. The method of claim 1, wherein the inputting the sample human body image of the selected sample into the initial first model to obtain the feature map of the pyramid structure comprises:
inputting the sample human body image of the selected sample into a residual error network to obtain a feature map output by the last convolution layer of each residual error block;
and (4) respectively passing the feature maps output by the convolution layers through the full convolution layers, and then obtaining the feature map of the pyramid structure through horizontal connection after up-sampling.
3. The method of claim 2, wherein the determining a first layer loss value based on the feature map and annotation information for keypoints in the sample human image comprises:
generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image;
generating a predetermined number of first predictive thermodynamic diagrams according to the feature map, wherein each first predictive thermodynamic diagram corresponds to a key point;
a first layer loss value is determined based on a positional deviation of each keypoint in the real thermodynamic diagram from the first predicted thermodynamic diagram.
4. The method of claim 1, wherein the determining a second layer loss value based on the position coordinates of the detected keypoints and annotation information of keypoints in the sample human body image comprises:
generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image;
generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point;
and determining a second layer loss value based on the position deviation of each key point in the real thermodynamic diagram and the second predicted thermodynamic diagram.
5. The method according to one of claims 1-4, wherein the method further comprises:
in response to determining that the initial first model and the initial second model are not trained, adjusting relevant parameters in the initial first model and the initial second model, and reselecting samples from the sample set, and continuing the training step using the adjusted initial first model and initial second model.
6. A method for detecting a human, comprising:
acquiring a human body image of a detection object;
inputting the human body image into a human body key point detection model generated by adopting the method of any one of claims 1 to 4, and generating position coordinates of the human body key points of the detection object.
7. An apparatus for generating a human keypoint detection model, comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a sample set, wherein samples in the sample set comprise a sample human body image and marking information of key points in the sample human body image;
a training unit configured to select samples from the set of samples and to perform the following training steps: inputting a sample human body image of a selected sample into an initial first model to obtain a characteristic diagram of a pyramid structure, wherein the initial first model adopts Resnet101 as a basic network structure; determining a first-layer loss value based on the feature map and the labeling information of key points in the sample human body image; inputting the characteristic diagram into an initial second model to obtain the position coordinates of the detected key points, wherein the initial second model is a neural network based on an attention model; determining a second-layer loss value based on the position coordinates of the detected key points and the labeling information of the key points in the sample human body image; determining whether training of the initial first model and the initial second model is completed based on the first layer loss value and the second layer loss value; in response to determining that training of the initial first model and the initial second model is complete, determining the initial first model and the initial second model as human key point detection models;
wherein the training unit is further configured to:
generating an attention feature map according to the feature map;
generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point;
for a second predictive thermodynamic diagram of the predetermined number of second predictive thermodynamic diagrams, detecting position coordinates of corresponding key points according to the position of the maximum probability pixel in each second predictive thermodynamic diagram;
the training unit is further configured to:
adding the characteristic diagram into bottleneck blocks of different times to obtain characteristic diagrams of different scales;
the feature maps with different scales are subjected to upsampling and then fused together to obtain a first feature map;
inputting the feature maps of different scales into an attention model to obtain first attention maps of different resolutions;
the first attention diagrams with different resolutions are fused together after being subjected to upsampling to obtain a fused first attention diagram, and the fused first attention diagram is combined with the first feature diagram to obtain a second feature diagram;
inputting the second feature map into the attention model to obtain a second attention map;
and combining the second attention map and the second feature map to obtain an attention feature map.
8. The apparatus of claim 7, wherein the training unit is further configured to:
inputting the sample human body image of the selected sample into a residual error network to obtain a feature map output by the last convolution layer of each residual error block;
and (4) respectively passing the feature maps output by the convolution layers through the full convolution layers, and then obtaining the feature map of the pyramid structure through horizontal connection after up-sampling.
9. The apparatus of claim 8, wherein the training unit is further configured to:
generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image;
generating a predetermined number of first predictive thermodynamic diagrams according to the feature map, wherein each first predictive thermodynamic diagram corresponds to a key point;
a first layer loss value is determined based on a positional deviation of each keypoint in the real thermodynamic diagram from the first predicted thermodynamic diagram.
10. The apparatus of claim 7, wherein the training unit is further configured to:
generating a real thermodynamic diagram for each key point according to the labeling information of the key points in the sample human body image;
generating a predetermined number of second predictive thermodynamic diagrams according to the attention feature map, wherein each second predictive thermodynamic diagram corresponds to a key point;
and determining a second layer loss value based on the position deviation of each key point in the real thermodynamic diagram and the second predicted thermodynamic diagram.
11. The apparatus according to one of claims 7-10, wherein the apparatus further comprises an adjustment unit configured to:
in response to determining that the initial first model and the initial second model are not trained, adjusting relevant parameters in the initial first model and the initial second model, and reselecting samples from the sample set, and continuing the training step using the adjusted initial first model and initial second model.
12. An apparatus for detecting human keypoints, comprising:
a detection unit configured to acquire a human body image of a detection object;
a generating unit configured to input the human body image into a human body key point detection model generated by the method according to any one of claims 1 to 5, and generate position coordinates of human body key points of the detection object.
13. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN201811380813.2A 2018-11-20 2018-11-20 Method and device for generating human body key point detection model Active CN109508681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811380813.2A CN109508681B (en) 2018-11-20 2018-11-20 Method and device for generating human body key point detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811380813.2A CN109508681B (en) 2018-11-20 2018-11-20 Method and device for generating human body key point detection model

Publications (2)

Publication Number Publication Date
CN109508681A CN109508681A (en) 2019-03-22
CN109508681B true CN109508681B (en) 2021-11-30

Family

ID=65749172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811380813.2A Active CN109508681B (en) 2018-11-20 2018-11-20 Method and device for generating human body key point detection model

Country Status (1)

Country Link
CN (1) CN109508681B (en)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020220126A1 (en) 2019-04-30 2020-11-05 Modiface Inc. Image processing using a convolutional neural network to track a plurality of objects
CN110188634B (en) * 2019-05-14 2022-11-01 广州虎牙信息科技有限公司 Human body posture model construction method and device, electronic equipment and storage medium
CN110175544B (en) * 2019-05-14 2021-06-29 广州虎牙信息科技有限公司 Target model construction method and device, electronic equipment and storage medium
CN110210526A (en) * 2019-05-14 2019-09-06 广州虎牙信息科技有限公司 Predict method, apparatus, equipment and the storage medium of the key point of measurand
CN110287954A (en) * 2019-06-05 2019-09-27 北京字节跳动网络技术有限公司 Target area determines training method, device and the computer readable storage medium of model
CN110287846B (en) * 2019-06-19 2023-08-04 南京云智控产业技术研究院有限公司 Attention mechanism-based face key point detection method
CN110378253B (en) * 2019-07-01 2021-03-26 浙江大学 Real-time key point detection method based on lightweight neural network
CN110532981B (en) * 2019-09-03 2022-03-15 北京字节跳动网络技术有限公司 Human body key point extraction method and device, readable storage medium and equipment
CN110647834B (en) * 2019-09-18 2021-06-25 北京市商汤科技开发有限公司 Human face and human hand correlation detection method and device, electronic equipment and storage medium
CN112633305A (en) * 2019-09-24 2021-04-09 深圳云天励飞技术有限公司 Key point marking method and related equipment
CN110738654B (en) * 2019-10-18 2022-07-15 中国科学技术大学 Key point extraction and bone age prediction method in hip joint image
CN110895809B (en) * 2019-10-18 2022-07-15 中国科学技术大学 Method for accurately extracting key points in hip joint image
CN110766701B (en) * 2019-10-31 2020-11-06 北京推想科技有限公司 Network model training method and device, and region division method and device
CN110796105A (en) * 2019-11-04 2020-02-14 中国矿业大学 Remote sensing image semantic segmentation method based on multi-modal data fusion
CN110930385A (en) * 2019-11-20 2020-03-27 北京推想科技有限公司 Breast lump detection and positioning method and device
CN111126379B (en) * 2019-11-22 2022-05-17 苏州浪潮智能科技有限公司 Target detection method and device
CN111160111B (en) * 2019-12-09 2021-04-30 电子科技大学 Human body key point detection method based on deep learning
CN111126416A (en) * 2019-12-12 2020-05-08 创新奇智(重庆)科技有限公司 Engine chain wheel identification system and identification method based on key point detection
CN111027504A (en) * 2019-12-18 2020-04-17 上海眼控科技股份有限公司 Face key point detection method, device, equipment and storage medium
CN111127632B (en) * 2019-12-20 2023-06-02 北京奇艺世纪科技有限公司 Human modeling model acquisition method and device, electronic equipment and storage medium
CN111160375B (en) * 2019-12-31 2024-01-23 北京奇艺世纪科技有限公司 Three-dimensional key point prediction and deep learning model training method, device and equipment
CN111259822A (en) * 2020-01-19 2020-06-09 杭州微洱网络科技有限公司 Method for detecting key point of special neck in E-commerce image
CN111260774B (en) * 2020-01-20 2023-06-23 北京百度网讯科技有限公司 Method and device for generating 3D joint point regression model
CN111291692B (en) * 2020-02-17 2023-10-20 咪咕文化科技有限公司 Video scene recognition method and device, electronic equipment and storage medium
CN111292378A (en) * 2020-03-12 2020-06-16 南京安科医疗科技有限公司 CT scanning auxiliary method, device and computer readable storage medium
CN111402228B (en) * 2020-03-13 2021-05-07 腾讯科技(深圳)有限公司 Image detection method, device and computer readable storage medium
CN112102947B (en) * 2020-04-13 2024-02-13 国家体育总局体育科学研究所 Apparatus and method for body posture assessment
CN111523422B (en) * 2020-04-15 2023-10-10 北京华捷艾米科技有限公司 Key point detection model training method, key point detection method and device
CN111522986B (en) * 2020-04-23 2023-10-10 北京百度网讯科技有限公司 Image retrieval method, device, equipment and medium
CN111539377A (en) * 2020-05-11 2020-08-14 浙江大学 Human body movement disorder detection method, device and equipment based on video
CN111783535B (en) * 2020-05-28 2024-06-18 北京沃东天骏信息技术有限公司 Method and device for enhancing key point data and method and device for detecting key point
CN111860573B (en) * 2020-06-04 2024-05-10 北京迈格威科技有限公司 Model training method, image category detection method and device and electronic equipment
CN111695519B (en) * 2020-06-12 2023-08-08 北京百度网讯科技有限公司 Method, device, equipment and storage medium for positioning key point
CN111783948A (en) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
CN111985556A (en) * 2020-08-19 2020-11-24 南京地平线机器人技术有限公司 Key point identification model generation method and key point identification method
CN112417947B (en) * 2020-09-17 2021-10-26 重庆紫光华山智安科技有限公司 Method and device for optimizing key point detection model and detecting face key points
CN112036516A (en) * 2020-11-04 2020-12-04 北京沃东天骏信息技术有限公司 Image processing method and device, electronic equipment and storage medium
CN112380981B (en) * 2020-11-12 2024-06-28 平安科技(深圳)有限公司 Face key point detection method and device, storage medium and electronic equipment
CN112686097A (en) * 2020-12-10 2021-04-20 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method
CN112785582B (en) * 2021-01-29 2024-03-22 北京百度网讯科技有限公司 Training method and device for thermodynamic diagram generation model, electronic equipment and storage medium
CN112861678B (en) * 2021-01-29 2024-04-19 上海依图网络科技有限公司 Image recognition method and device
CN113011304A (en) * 2021-03-12 2021-06-22 山东大学 Human body posture estimation method and system based on attention multi-resolution network
CN113408568B (en) * 2021-04-16 2024-04-16 科大讯飞股份有限公司 Related method, device and equipment for training detection model of object key points
CN113313063A (en) * 2021-06-21 2021-08-27 暨南大学 Ear detection method, electronic device and storage medium
CN113706463B (en) * 2021-07-22 2024-04-26 杭州键嘉医疗科技股份有限公司 Joint image key point automatic detection method and device based on deep learning
CN113870215B (en) * 2021-09-26 2023-04-07 推想医疗科技股份有限公司 Midline extraction method and device
CN114550207B (en) * 2022-01-17 2023-01-17 北京新氧科技有限公司 Method and device for detecting key points of neck and method and device for training detection model
CN114519401A (en) * 2022-02-22 2022-05-20 平安科技(深圳)有限公司 Image classification method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229445A (en) * 2018-02-09 2018-06-29 深圳市唯特视科技有限公司 A kind of more people's Attitude estimation methods based on cascade pyramid network
CN108229341A (en) * 2017-12-15 2018-06-29 北京市商汤科技开发有限公司 Sorting technique and device, electronic equipment, computer storage media, program
CN108710830A (en) * 2018-04-20 2018-10-26 浙江工商大学 A kind of intensive human body 3D posture estimation methods for connecting attention pyramid residual error network and equidistantly limiting of combination

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229341A (en) * 2017-12-15 2018-06-29 北京市商汤科技开发有限公司 Sorting technique and device, electronic equipment, computer storage media, program
CN108229445A (en) * 2018-02-09 2018-06-29 深圳市唯特视科技有限公司 A kind of more people's Attitude estimation methods based on cascade pyramid network
CN108710830A (en) * 2018-04-20 2018-10-26 浙江工商大学 A kind of intensive human body 3D posture estimation methods for connecting attention pyramid residual error network and equidistantly limiting of combination

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
towards accurate multi-person pose estimation in the wild;george papandreou et al;《arXiv:1701.01779v2》;20170414;第1-9页 *
基于视觉显著性的监控视频动态目标跟踪;李博 等;《信息技术》;20141231;第60-65页 *

Also Published As

Publication number Publication date
CN109508681A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
CN109508681B (en) Method and device for generating human body key point detection model
US10943145B2 (en) Image processing methods and apparatus, and electronic devices
CN111104962B (en) Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN110046600B (en) Method and apparatus for human detection
CN108197618B (en) Method and device for generating human face detection model
US9704247B2 (en) Information processing method and system
CN108734185B (en) Image verification method and device
CN112233125B (en) Image segmentation method, device, electronic equipment and computer readable storage medium
CN109858333B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN109255767B (en) Image processing method and device
CN108229418B (en) Human body key point detection method and apparatus, electronic device, storage medium, and program
CN112862877B (en) Method and apparatus for training an image processing network and image processing
WO2021238548A1 (en) Region recognition method, apparatus and device, and readable storage medium
CN115457531A (en) Method and device for recognizing text
CN109272543B (en) Method and apparatus for generating a model
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109118456B (en) Image processing method and device
CN106415605A (en) Techniques for distributed optical character recognition and distributed machine language translation
AU2018202767A1 (en) Data structure and algorithm for tag less search and svg retrieval
CN111881804A (en) Attitude estimation model training method, system, medium and terminal based on joint training
CN110969641A (en) Image processing method and device
CN114565916A (en) Target detection model training method, target detection method and electronic equipment
CN111292333A (en) Method and apparatus for segmenting an image
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
CN117422851A (en) Virtual clothes changing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant