Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present invention and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The method for identifying the key points of the human body provided by the embodiment of the invention can be applied to terminal devices such as robots, mobile phones, tablet computers, wearable devices, vehicle-mounted devices, Augmented Reality (AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs) and the like, and the embodiment of the invention does not limit the specific types of the terminal devices at all. The robot may be a service robot, an underwater robot, an entertainment robot, a military robot, an agricultural robot, or the like.
As shown in fig. 1, the method for identifying key points of a human body according to the embodiment of the present invention includes the following steps S101 to S104:
s101, detecting N key points of a human body in a human body image through a key point detection model, and obtaining N heat maps and a background map output by the key point detection model; wherein N is an integer greater than or equal to 2.
In application, the key point detection model can be constructed based on any algorithm for detecting key points of human bones, and a human body image containing a human body is input into the key point detection model, so that N heat maps and a background map output by the key point detection model can be obtained. The human body contains 18 keypoints, and the N keypoints contain at least 2 of the 18 keypoints of the human body. The human body image can be an RGB image, an infrared image or a depth image.
As shown in fig. 2, a schematic diagram illustrating the location of 18 key points of a human body is illustrated as an example; wherein, 18 key points are respectively marked as: nose (Nose)0, Neck (Neck)1, Right Shoulder (Right Shoulder)2, Right Elbow (Right Elbow)3, Right Wrist (Right Wrist)4, Left Shoulder (Left Shoulder)5, Left Elbow (Left Elbow)6, Left Wrist (Left Wrist)7, Right Hip (Right Hip)8, Right Knee (Right Knee)9, Right Ankle (Right Ankle)10, Left Hip (Left Hip)11, Left Knee (Left Knee)12, Left Ankle (Left Ankle)13, Right Eye (Right Eye)14, Left Eye (Left Eye)15, Right Ear (Right Eye) 16, and Left Ear (Left Eye) 17.
In one embodiment, the N key points include at least 14 key points of 18 key points of the human body, including a nose, a neck, a right shoulder, a right elbow, a right wrist, a left shoulder, a left elbow, a left wrist, a right hip, a right knee, a right ankle, a left hip, a left knee, and a left ankle, and optionally at least one of a right eye, a left eye, a right ear, and a left ear.
In application, under the condition that the head action of a human body does not need to be recognized, the N key points can only comprise other key points except the head key points, so that the data volume is reduced, and the training speed is improved.
In application, only N key points of a human body to be detected in a human body image can be detected through the key point detection model, a human body not to be detected does not need to be detected, each human body needing key point detection in the human body image is a human body to be detected, each human body not needing key point detection is a human body not to be detected, a user can set any human body in the human body image as the human body to be detected according to actual needs, and for example, all human bodies in the human body image can be set as the human body to be detected. Each heat map includes the same key point of all the human bodies to be detected in the human body image, that is, the same key point of all the human bodies to be detected in the human body image corresponds to one heat map, for example, when N key points include 18 key points of a human body, 18 heat maps and a background map can be obtained, the 1 st heat map includes the noses of all the human bodies to be detected, the 2 nd heat map includes the necks, … (and so on) of all the human bodies to be detected, and the N th heat map includes the left ears of all the human bodies to be detected. The background image is an image corresponding to the background in the human body image output by the key point detection model, and the background image output by the key point detection module is a blank image because no key point exists in the background of the human body image. Since the human body image does not necessarily display the complete human body, some key points in the human body image may be occluded, and therefore some key points may not be detected, and blank images not including key points may exist in all heat maps output by the key point detection model.
As shown in fig. 3, exemplary key points of all human bodies in one human body image are shown.
As shown in FIG. 4, an exemplary output of 18 heatmaps.
S102, segmenting areas where key points of different human bodies are located in each heat map, and determining outlines of the areas where the key points of the different human bodies are located in each heat map;
step S103, acquiring a heat map peak value in the outline of the key point area;
and step S104, obtaining the coordinates of the key points of different human bodies in each heat map according to the coordinates of the peak value of the heat map in the outline of the area where the key points of different human bodies are located in each heat map.
In application, after obtaining N heat maps, the coordinates of the key points in each heat map are identified, and then the coordinates of the key points belonging to the same human body are distributed to the same person, so that the coordinates of the N key points of each human body can be obtained.
In application, the closer the position in the heat map to the coordinates of the key point, the higher the confidence, therefore, the region where the key point is located in the heat map is to be segmented, then the contour of the region where the key point is located can be identified by any contour identification method, then the coordinates of the heat map peak are found in the contour, and further the coordinates of the heat map peak are found in the region where the key point is located, and the coordinates where the heat map peak is located are taken as the coordinates of the key point.
In one embodiment, step S104 includes:
and distributing the key points of different human bodies in each heat map to corresponding human bodies according to the coordinates of the key points of different human bodies in each heat map, and obtaining the coordinates of the key points of different human bodies in each heat map.
In application, after obtaining the coordinates of each keypoint, the keypoints belonging to each human body are assigned to each human body according to the coordinates of each keypoint, and the coordinates of all the keypoints belonging to each human body are obtained.
In one embodiment, step S104 specifically includes:
and mapping the key points of different human bodies in each heat map to corresponding human bodies according to the coordinates of the key points of different human bodies in each heat map and the position of each human body, and obtaining the coordinates of the key points of different human bodies in each heat map.
In application, the position of each human body in the human body image can be obtained by identifying the human body image through a target identification method, then the key points belonging to each human body can be obtained according to the coordinates of the key points and the position of the human body, and then each key point is mapped to the corresponding human body.
And S105, drawing a joint feature map of each human body according to the coordinates of the key points of each human body.
In application, after obtaining the coordinates of the key points of each human body, the obtained key points of each human body can be connected according to the connection rule of adjacent key points in the joints of the human body, so as to obtain the joint feature map of each human body.
As shown in fig. 5, an effect diagram obtained by assigning key points to corresponding human bodies and connecting the key points is exemplarily shown.
In one embodiment, step S105 includes:
and drawing a joint characteristic diagram of each human body according to the coordinates of the key points of each human body and a preset key point connection rule.
In application, the preset key point connection rule is a rule preset according to a connection rule of adjacent key points in joints of a human body, for example, a right elbow connected with a right elbow, a right wrist connected with a right elbow, a left elbow connected with a left elbow, a left wrist connected with a left elbow, a neck connected with a right hip and a left hip, a right hip connected with a right knee, a right knee connected with a right ankle, a left hip connected with a left knee, a left knee connected with a left ankle, a right ear connected with a right eye, a left ear connected with a left eye, a nose connected with a right eye and a left eye.
In application, the joint feature map may be an image in which the background is a solid color and includes only one key point of the human body, or may be an image in which the background is a natural color and includes all key points of the human body.
As shown in fig. 6, in an embodiment, after step S105, the method further includes:
step S601, classifying the joint characteristic diagram of each human body through a classification network to obtain the posture category label of each human body.
In application, after the joint feature maps of each human body are drawn, the joint feature maps of all the human bodies are input into the classification network as training data, the classification network is trained, and posture category labels of each human body, which are output after the classification network classifies the joint feature maps of all the human bodies, are obtained. The classification network may be a lightweight classification network (shufflenet-v2) with the last layer being a normalized exponential loss function (softmax loss).
As shown in fig. 7, in one embodiment, step S601 includes the following steps S701 to S703:
s701, respectively identifying a left part of joints and a right part of joints in a joint feature map of each human body by using different colors;
s702, adjusting the marked joint characteristic graph of each human body to a preset size;
step S703, training a classification network through the adjusted joint feature map of each human body, and obtaining the posture class label of each human body output by the classification network.
In application, the left part of joints and the right part of joints in the joint feature map of each human body are marked by different colors, so that the classification network can classify different human postures conveniently. Since the joint feature map has no background semantic meaning, each joint feature map does not need to be large, and therefore, the oversized joint feature map can be reduced or cropped to adjust to a preset size, for example, 64 × 64 pixels. When the joint feature map is drawn based on key points of a body part of a human body other than the head, the joint feature map may be referred to as a skeleton map.
As shown in fig. 8, 10 skeleton images with solid background are exemplarily shown.
In one embodiment, step S702 includes:
adjusting the identified joint characteristic diagram of each human body to a preset size according to the resolution of the classification network;
step S703 includes:
training the classification network through the adjusted joint feature maps of all the human bodies to obtain the posture class label of each human body output by the normalization index loss function of the last layer of the classification network.
In application, the size of each joint feature map can be adjusted according to the input resolution supported by the classification network, then the classification network is trained by using the joint feature maps, and finally the posture class labels are output through the normalized exponential loss function of the classification network, namely the training of the classification network is completed.
In application, the human body image can be an image needing to recognize human body posture, and can also be a human body image specially used for training a classification network. If other human body images are to be identified, the other human body images are input into the key point detection model, and the human body key point identification method provided by the embodiment of the invention is executed again.
The human body key point identification method provided by the embodiment of the invention can effectively improve the identification precision of the key points of the human body, thereby effectively improving the identification precision of the key points of the human body, effectively eliminating the interference of clothes, backgrounds and the like in human body images when the posture of a static human body is identified by using the joint characteristic diagram drawn by the coordinates of the key points of the human body, reducing the data amount required by training, further improving the training speed and having strong interpretability.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
The embodiment of the invention also provides a human body key point identification device which is used for executing the steps in the human body key point identification method. The human body key point identification device may be a virtual device (virtual application) in the terminal device, and is run by a processor of the terminal device, or may be the terminal device itself.
As shown in fig. 9, the human body key point identification apparatus provided in the embodiment of the present invention includes:
the key point detection module 101 is configured to detect key points at N joint positions of a human body in a human body image through a key point detection model, and obtain N heat maps and a background map output by the key point detection model; wherein N is an integer greater than or equal to 2;
a key point segmentation module 102, configured to segment regions where key points of different human bodies are located in each heat map, and determine outlines of the regions where key points of different human bodies are located in each heat map;
a heat map peak acquisition module 103, configured to acquire a heat map peak within the contour of the keypoint region;
a key point positioning module 104, configured to obtain coordinates of key points of different human bodies in each heat map according to coordinates of a heat map peak in an outline of an area where the key points of different human bodies in each heat map are located;
and the feature map drawing module 105 is configured to draw a joint feature map of each human body according to the coordinates of the key points of each human body.
In one embodiment, the human body key point identification apparatus further includes:
and the classification module is used for classifying the skeleton map of each human body through a classification network to obtain the posture classification label of each human body.
In application, each module in the human body key point identification device can be a software program module, can also be realized by different logic circuits integrated in a processor, and can also be realized by a plurality of distributed processors.
As shown in fig. 10, an embodiment of the present invention further provides a terminal device 200, including: at least one processor 201 (only one processor is shown in fig. 10), a memory 202, and a computer program 203 stored in the memory 202 and operable on the at least one processor 201, the steps in the various human keypoint identification method embodiments described above being implemented when the computer program 203 is executed by the processor 201.
In an application, the terminal device may include, but is not limited to, a memory, a processor. Those skilled in the art will appreciate that fig. 10 is merely an example of a terminal device, and does not constitute a limitation of the terminal device, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, etc.
In applications, the processor may include a central processing unit and a graphics processor, which may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In some embodiments, the storage may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory may also be an external storage device of the terminal device in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device. Further, the memory may also include both an internal storage unit of the terminal device and an external storage device. The memory is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of computer programs. The memory may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the contents of information interaction, execution process, and the like between the above-mentioned devices/modules are based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof can be referred to specifically in the method embodiment section, and are not described herein again.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely illustrated, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. Each functional module in the embodiments may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module, and the integrated module may be implemented in a form of hardware, or in a form of software functional module. In addition, the specific names of the functional modules are only for convenience of distinguishing from each other and are not used for limiting the protection scope of the present invention. The specific working process of the modules in the system may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
The embodiment of the invention provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the human body key point identification method of any one of the embodiments is realized.
The embodiment of the invention provides a computer program product, and when the computer program product runs on a terminal device, the terminal device is enabled to execute the human body key point identification method of any one of the embodiments.
The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which is stored in a computer readable storage medium and used for instructing related hardware to implement the steps of the embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying computer program code to a terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus and the terminal device are merely illustrative, and for example, the division of the modules is only one logical division, and there may be other divisions when the actual implementation is performed, for example, a plurality of modules or components may be combined or may be integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.