CA3136990A1

CA3136990A1 - A human body key point detection method, apparatus, computer device and storage medium

Info

Publication number: CA3136990A1
Application number: CA3136990A
Authority: CA
Inventors: Zhaokun XU; Yunxi LI; Huaiyuan JI
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2020-10-29
Filing date: 2021-10-29
Publication date: 2022-04-29
Also published as: CN114429474A

Abstract

The present invention discloses to a human body key point detection method, apparatus, computer device, and storage medium. The method comprises: detecting whether the current scenario exists human, if no, collecting first depth image; if yes, collecting second depth image, storing human detection frame; comparing frame difference between collected first depth image and second depth image, obtaining depth changing area map; obtaining human body mask code and obtaining human body area image; obtaining single body area image, inputting key point detection model, outputting plural body key points; obtaining confidence level of each body key point and obtaining final body key point. The present invention is based on multi-resolution key point detection method of human body mask code, improving key point detection accuracy, resolving background detection issue, reducing detection error scenario of wrong person positioning, wrong order and missing order in self-service shop, reducing economical losses and improving user experience.

Description

A HUMAN BODY KEY POINT DETECTION METHOD, APPARATUS, COMPUTER DEVICE AND
STORAGE MEDIUM
Field [0001] The present disclosure relates to the field of computer vision technology, particularly to a human body key point detection method, apparatus, computer device, and storage medium.
Background

[0002] With the continuous development of the computer vision field, the human body key point detection algorithm has attracted more and more attentions. The human body key point detection is very important for human action recognition, human body posture description, and even human body behavior prediction.
The human body key points detection is the cornerstone of many computer vision tasks, such as human-to-goods interaction in self-service shops, abnormal human action detection in various monitoring scenarios, action capturing in movie production and so on.

[0003] At present, the human body key point detection method based on the deep learning architecture has achieved remarkable results in the case of a single person with a simple background, it mainly includes two types of methods which are bottom-up and top-down. However, no matter which method is currently available, the desired effect cannot be achieved even in the case of complex background and close distances of many people. There are mainly following two categories of problems:

[0004] 1. In actual scenario with complex background, current algorithm is likely to mis-detect human body key point to the background.

[0005] 2. When many people are close, the target human body key point is likely to be mistakenly detected to the limbs of the nearby human body. At the same time, the other human bodies key points are likely to Date recue / Date received 2021-10-29 be mistakenly detected to the target body limbs.
Invention Content

[0006] In order to solve the above-mentioned problems , the present invention provides a method for detecting the human body key point and the method includes:

[0007] Detecting whether there is a human body in the current scenario, if no human body is detected, collecting first depth image in the current scenario and covering first depth image of undetected human body in precious scenario; if a human body is detected, collecting second depth image in the current scenario, storing a human body detection frame corresponding to second depth image;

[0008] Comparing frame difference between the collected first depth image and second depth image, obtaining a depth changing area map;

[0009] Obtaining a human body mask code, obtaining a human body area image according to the human body mask code;

[0010] Obtaining a single body area image by using the human body detection frame, inputting a key point detection model, outputting plural human body key points;

[0011] Obtaining confidence level of each key point of the human body and obtaining final human body key point according to the confidence level.

[0012] In an implementation, comparing frame difference between the collected first depth image and second depth image, comprising:

Date recue / Date received 2021-10-29

[0013] Pre-setting a difference threshold, the difference threshold is determined by scenario and depth camera;

[0014] Subtracting the pixels of the first depth image and the second depth image one by one, if the pixel difference is greater than the threshold, then recording the difference, if the pixel difference is not greater than the threshold, then recoding zero.

[0015] In an implementation, obtaining a human body mask code, comprising:

[0016] Pre-setting a limited threshold of human body connected domain, determining size relationship between each non-zero connected domain in the depth changing area map and the limited threshold;

[0017] If the non-zero connected domain is greater than the limited threshold, determining of the non-zero connected domain is human body connected domain, calculating centroid of the human body connected domain;

[0018] If the non-zero connected domain is not greater than the limited threshold, determining of the non-zero connected domain is not human body connected domain and discarding this domain;

[0019] Performing area growth by using each centroid as reference point to obtain first human body area mask code;

[0020] Performing human body implementation segmentation on the second depth image, combining each segmented implementation human body mask code to obtain second human body area mask code;

[0021] Merging the first human body area mask code and the second human body area mask code to obtain the human body mask code.

Date recue / Date received 2021-10-29

[0022] In an implementation, obtaining the human body area image through the human body mask code, comprising:

[0023] Intercepting the human body area in the second depth image by using the human body mask code, filtering out background of the image and obtaining the human body area image;

[0024] Pre-processing the human body area image.

[0025] In an implementation, obtaining single human body area image by using the human body detection frame, comprising:

[0026] Keeping center point of the human body detection frame unchanged, expanding the range of the human body detection frame in an equal proportion;

[0027] Intercepting the pre-processed human body area image according to the expanded human body detection frame, obtaining single human body area image, scaling the single human body area image to predetermined size.

[0028] In an implementation, inputting a key point detection model, outputting plural human body key points, comprising:

[0029] The single human body area image inputs key point detection model;

[0030] The key point detection model outputs plural human body key point heat maps, and each heat map of the human body key point corresponds to a key point of the human body.

Date recue / Date received 2021-10-29

[0031] In an implementation, obtaining the confidence level of each key point of the human body and obtaining the final key point of the human body according to the confidence level, comprising:

[0032] Searching for peak position of each human body key point heat map, determining of the peak position is the detection position of the human body key point according to the human body key point heat map, and determining of the peak value is the confidence level of the human body key point;

[0033] Pre-setting a threshold of confidence level, and determining the relationship between the confidence level of each human body key point and the threshold of confidence level;

[0034] If the confidence level of the human body key point is greater than the threshold of confidence level, outputting the coordinates and the confidence level of the human body key point;

[0035] If the confidence level of the human body key point is not greater than the threshold of confidence level, discarding the human body key point;

[0036] Obtaining the final human body key point.

[0037] In an implementation, inputting a key point detection model, outputting plural human body key points, also comprising:

[0038] Inputting the single human body area image into the key point detection model;

[0039] Down-sampling the inputted single human body area image, obtaining a first characteristic map;

[0040] Linear interpolating and down-sampling on the first characteristic map respectively, obtaining a second characteristic map and a third characteristic map with different resolution, turning on the Date recue / Date received 2021-10-29 correspondingly first resolution branch, second resolution branch, and third resolution branch, processing through the residual block respectively;

[0041] The first resolution branch, the second resolution branch, and the third resolution branch are respectively subjected to first multi-resolution cross merge, wherein each branch requires to add the correspondingly characteristic map and the characteristic maps corresponding to all other branches, and after completing the merge, each branch is processed by the residual block again;

[0042] After amplifying the characteristic maps corresponding to the first resolution branch and the second resolution branch by linear interpolation, performing the second multi-resolution cross merge with the characteristic map of the third resolution branch;

[0043] Outputting plural human body key point heat maps according to the obtained characteristic map after the second multi-resolution cross merge, each human body key point heat map corresponds to a human body key point.

[0044] A human body key point detection apparatus, comprising:

[0045] A detection module configured to detect whether there is a human body in the current scenario;

[0046] A first collection module configured to if no human body is detected, collecting first depth image in the cm-rent scenario and covering first depth image of undetected human body in precious scenario;

[0047] A second collection module configure to if a human body is detected, collecting second depth image in the cm-rent scenario, storing a human body detection frame corresponding to second depth image;

[0048] A comparison module configured to compare frame difference between the collected first depth Date recue / Date received 2021-10-29 image and second depth image, obtaining a depth changing area map;

[0049] An interception module configured to obtain a human body mask code and obtain a human body area image according to the human body mask code;

[0050] A key point acquisition module configured to obtain a single body area image by using the human body detection frame, inputting a key point detection model, outputting plural human body key points;

[0051] A judgement module configured to obtain confidence level of each key point of the human body and obtaining final human body key point according to the confidence level.

[0052] A computer device, including a memory, a processor and a computer program stored in the memory and ran on the processor configured to achieve the steps of any above-mentioned methods when the processor executes the computer program.

[0053] A computer readable storage medium stored with a computer program configured to achieve the steps of any above-mentioned methods when the processor executes the computer program.

[0054] The human body key point detection method, device and computer storage medium of the present invention provide a multi-resolution human body key point detection method based on a human body mask, which has the following effects:

[0055] 1. The present invention combines two body mask extraction methods based on deep dynamic frame difference area growth and Yolact implementation segmentation to obtain complete body mask information and use the mask to filter out the complex background interference information in the corresponding image , It solves the problem that the human body key point is incorrectly detected on the complex background, and greatly improves the accuracy of the algorithm.

Date recue / Date received 2021-10-29

[0056] 2. The present invention innovatively uses the structure of multiple resolutions and multiple network branches in parallel, so that the key point detection model can take care of both partial fine-grained and whole person's global information, and can detect quasi partial fine-grained key points while paying attention to each connection between the key points of the global whole person, thus solving the problem of the key points mis-detect the people around, and improving the accuracy of the key points.

[0057] 3. The multi-resolution network branches in the human body key point detection network structure in the present invention only cross and merge with each other twice, respectively, located in the middle and the last of the entire network, and are intended to allow each branch network to fully analyze the image before proceeding. Cross merging avoids the interference caused by other branches in frequent multi-scale merge. While reducing the computational consumption required by the network model and speeding up the operation, it also reduces the interference between the resolution branches and improves the accuracy of the algorithm.

[0058] 4. The present invention uses an RGBD depth camera to obtain RGB color images and depth images in the scenario, introducing the depth information based on RGB color texture information, adding the depth information to an additional image channel, and expanding the information range of what the human body key point detection algorithm can perceive, and the range of information increases the accuracy of the algorithm.
Drawing Description

[0059] Figure 1 is a steps diagram of a human body key point detecting method in an implementation;

[0060] Figure 2 is a process diagram of a human body key point detecting method in an implementation;

Date recue / Date received 2021-10-29

[0061] Figure 3 is a detection model diagram of a human body key point detecting method in an implementation;

[0062] Figure 4 is a structure diagram of a human body key point detecting apparatus in an implementation;

[0063] Figure 5 is an internal structure of a computer device in an implementation.
Specific implementation methods

[0064] In order to make clearer application purposes, technical solutions, and advantages, the present disclosure is further explained in detail with a particular embodiment thereof, and with reference to the drawings. It shall be appreciated that these descriptions are only intended to be illustrative, but not to limit the scope of the disclosure thereto.

[0065] The human body key point detection method provided by the present application, in one implementation, as shown in Figure 1, Figure 2, including the following steps:

[0066] S100, detecting whether there is a human body in the current scenario, if no human body is detected, collecting first depth image in the current scenario and covering first depth image of undetected human body in precious scenario; if a human body is detected, collecting second depth image in the current scenario, storing a human body detection frame corresponding to second depth image.

[0067] In the present implementation, the front human body detection algorithm is first used to detect the human body in the current scenario and determining whether there is a human body in the current scenario.

[0068] Then, the judgment process in the following two directions is carried out according to the detection Date recue / Date received 2021-10-29 result.

[0069] If no human body is detected in the current scenario, the current scenario is regarded as a non-human-body condition, collecting the first depth image in the current scenario, retaining the file, and overwriting the first depth image of the previous scenario where no human body is detected.

[0070] If a human body is detected in the current scenario, the current scenario is regarded as a normal human body condition, collecting the second depth image in the current scenario, retaining the file, and storing the related information of the human body detection frame corresponding to the second depth image. Preferably, the second depth image is a RGBD image.

[0071] S200: comparing frame difference between the collected first depth image and second depth image, obtaining a depth changing area map.

[0072] In the present implementation, comparing the first depth image without human body and the second depth image with human body collected in step S100 by frame difference to obtain a depth changing area map.

[0073] Specifically, for frame difference processing, firstly, pre-setting the difference threshold according to the characteristics of scenario and depth camera, and then subtracting the pixels of the first depth image and the second depth image one by one, if the pixel difference is greater than the threshold, recording the difference, if the pixel difference is not greater than the threshold, then recording as zero.

[0074] S300: obtaining a human body mask code, obtaining a human body area image according to the human body mask code.

[0075] In the present implementation, obtaining the human body mask code by merging method, and Date recue / Date received 2021-10-29 obtaining the human body area image through the human body mask code.

[0076] Specifically, obtaining the human body mask code includes:

[0077] Pre-setting the limited threshold of the human body connected domain, calculating the size of the non-zero connected domain in the depth change area obtained in step S200, and determining the size relationship between each non-zero value connected domain in the depth changing area map and the limited threshold.

[0078] If the non-zero connected domain is greater than the limited threshold, determining of the non-zero connected domain is human body connected domain, calculating centroid of the human body connected domain, the calculating method is as following:
Ef xi xc = ¨
' Yc = ,

[0079] Among them, xc, yc represent the centroid coordinates of the human body connected domain, /
is the number of pixels contained in the human body connected domain, and xi, yi is the coordinate of the ith pixel in the connected domain.

[0080] If the non-zero connected domain is not greater than the limited threshold, determining of the non-zero connected domain is not human body connected domain and discarding this domain.

[0081] Performing area growth by using each centroid as reference point to obtain first human body area mask code.

[0082] Preferably, the ad area growth algorithm is specifically: firstly, according to the accuracy of the Date recue / Date received 2021-10-29 actual used depth camera and characteristic of growth threshold value setting, determining the depth value of the four pixels which are up, down, left, and right of the nearest neighbor of the reference point, whether the difference between them and reference point is less than the growth threshold, if it is less than the growth threshold, the pixel is included in the growth area where the reference point is located; if it is not less than the growth threshold, the pixel is not in the growth area where the reference point is located, and then setting the new added pixel point of the growth area as the reference point, and the above steps are repeated until all the pixels in the growth area have no neighboring pixels that can grow.

[0083] Using Yolact deep learning of human body implementation segmentation network, performing body implementation segmentation on the second depth image, extracting the body mask code of each implementation, and then merging the obtained body mask code of each implementation together to obtain the second body area mask code. This second human body area mask code includes the human body mask codes of all human bodies in the second depth image.

[0084] Performing binarization processing on the first human body area mask code, setting non-zero valued pixels to 1, and zero-valued pixels are still 0. Then, the binarized first body area mask code and the second body area mask code are merged to obtain a complete human body mask code.

[0085] Furthermore, obtaining the human body area image through the human body mask code includes:
multiplying the human body mask code obtained by the above-mentioned method with the second depth image, intercepting the human body area in the second depth image, and filtering out the background part in the image to obtain the human body area image, and pre-processing the human body area image.

[0086] Specifically, the pre-processing includes numerically truncating the depth channel image in the human body area image and truncating it to an appropriate range according to the scenario characteristics, then uniformly normalizing the body area image. In an implementation , in the overhead shooting scenario of the self-service shop, because the maximum possible shooting depth of the overhead camera in the self-Date recue / Date received 2021-10-29 service shop is 3500, the depth channel value is preferably truncated to the range of [0,3500], the value that greater than 3500 is regarded as noise. In the implementations of different scenarios, the range of the depth channel truncation can be adjusted according to the actual situation of the scenario.

[0087] S400, obtaining a single body area image by using the human body detection frame, inputting a key point detection model, outputting plural human body key points.

[0088] In the present implementation, the human body detection frame corresponding to the second depth image stored in step S100 is used to process the human body area image obtained in step S300 to obtain a single human body area image. Preferably, the single human body area image is RGBD four-channel-image and inputting the obtained single person human body area image to the key point detection model, and outputting plural human body key points.

[0089] Specifically, using the stored human body detection frame corresponding to the current second depth image, firstly, keeping the center point of the human body detection frame unchanged, expanding the range of the human body detection frame to 1.25 times of the original detection range, and then according to the expanded range of the human body detection frame, cutting out a single human body area image from the pre-processed human body area image, and the single human body area image is scaled to a pre-determined size, preferably a size with a resolution of 256*256.

[0090] Inputting the scaled single human body area image into a pre-trained multi-resolution key point detection model, and the model outputs plural human body key point heat maps, and each human body key point heat map corresponds to a human body key point. Preferably, the model outputs 10 the human body key point heat maps to obtain 10 human body key points. Among them, 10 human body key point heat maps, in sequence, each picture corresponds to the human body's left ear key point, right ear key point, left shoulder key point, right shoulder key point, left elbow key point, right elbow key point, left wrist key point, right wrist key point, left crotch key point and right crotch key point.

Date recue / Date received 2021-10-29

[0091] In one of the implementations, the network structure of the multi-resolution key point detection model of the present invention uses the residual block and the BottelNeck residual block in the ResNet network as the basic building blocks. The network structure includes the following steps, as shown in Figure 3:

[0092] Receiving the incoming scaled image of a single human body area that is 256*256*4.

[0093] Using convolution layer with two 3*3 consecutive convolution kernels and a stride of 2 to down-sample the input single human body area map, and down sampling the image to 64* 64.

[0094] Inputting the 64* 64 characteristic map into the BottelNet residual block with the basic unit of the ResNet network.

[0095] Performing linear interpolation and down-sampling on the characteristic maps output by BottelNet to obtain two characteristic maps with different resolutions 128* 128 and 32*32 respectively, opening these two network branches with different resolutions.

[0096] Three different resolution branches (32* 32, 64* 64, 128* 128) are processed by 4 ResNet labeled residual blocks respectively, during which keeping the resolution of each different resolution branch unchanged.

[0097]
For the first multi-resolution cross merge, the three branches of different resolutions are completely merged, and each branch needs to add the characteristic map of its own branch to the characteristic maps of all other branches. Before adding, the characteristic maps of different branches need to be scaled to the same size.

Date recue / Date received 2021-10-29

[0098] Specifically, for example, on the 128*128 resolution branch, the input characteristic map is 128*128 resolution, and the input characteristic is enlarged from the 64*64 and 32*32 characteristic maps to 128*128 through 2x and 4x linear interpolation and added with the original 128*128 characteristic. On the contrary, when reducing the characteristic map, using 3*3 convolution for down-sampling, down-sampling to one-half with a 3*3 down-sampling convolution, and down-sampling to one-quarter with two 3*3 down-sampling convolution.

[0099] After completing interactive merge, each branch passes through 4 ResNet standard residual blocks again.

[0100] The 64* 64 and 32* 32 resolution branches are enlarged to 128*128 by 2x and 4x linear interpolation respectively and then merged with adding the 128*128 branch characteristic map to achieve the second multi-resolution cross merge.

[0101] After the merge, the 128*128 characteristic map is subjected to a 1*I*10 convolution operation to obtain the heat map output corresponding to 10 key points, and finally it is enlarged to the input size 256*256 through linear interpolation.

[0102] S500: obtaining confidence level of each human body key point and obtaining final human body key point according to the confidence level.

[0103] In the present implementation, obtaining the confidence level of each key point of the human body according to the human body key point the heat map, obtaining the final the human body key point according to the confidence level.

[0104] Specifically, searching for the highest response point coordinate of each human body key point heat map, which is the peak position, and determining the searched peak position as the detection position Date recue / Date received 2021-10-29 of the human body key point corresponding to the human body key point heat map, and the value of the peak value is the confidence level of the human body key point.

[0105] Pre-setting a threshold of confidence level to determine the relationship between the confidence level of each human body key point and the threshold of confidence level. If the confidence level of the human body key point is greater than the threshold of confidence level, outputting the coordinates and confidence level of the human body key point; if the confidence level of the human body key point is not greater than the threshold of confidence level, discarding the human body key point.

[0106] The final output human body key points are the obtained final human body key points.

[0107] Although the above-mentioned steps in the flowchart are shown in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly instruction in this article, there is no strict order in which these steps can be performed, and they can be performed in any other orders. In addition, at least parts of the appended drawings in the steps can include more sub steps or multiple stages, these sub steps or stages are not necessarily completed at the same time but can be executed in different time, the execution order of these sub steps or stages is also not necessarily in sequence order but can be performed alternately with the other steps or sub steps of other steps or at least one part of the other stages.

[0108] In an implementation, a human body key point detection apparatus is provided, comprising:
detection module 100, first collection module 200, second collection module 300, comparison module 400, interception module 500, key point acquisition module 600, judgement module 700. Wherein:

[0109] A detection module 100 configured to detect whether there is a human body in the current scenario.

[0110] A first collection module 200 configured to if no human body is detected, collecting first depth Date recue / Date received 2021-10-29 image in the cm-rent scenario and covering first depth image of undetected human body in precious scenario.

[0111] A second collection module 300 configure to if a human body is detected, collecting second depth image in the cm-rent scenario, storing a human body detection frame corresponding to second depth image.

[0112] A comparison module 400 configured to compare frame difference between the collected first depth image and second depth image, obtaining a depth changing area map.

[0113] An interception module 500 configured to obtain a human body mask code and obtain a human body area image according to the human body mask code.

[0114] A key point acquisition module 600 configured to obtain a single body area image by using the human body detection frame, inputting a key point detection model, outputting plural human body key points; and

[0115] A judgement module 700 configured to obtain confidence level of each key point of the human body and obtaining final human body key point according to the confidence level.

[0116] For the specific limitation of the human body key point detection apparatus can refer to the above-mentioned the message conversion method, which will not be repeated here. Each module of the above data cache apparatus can be achieved fully or partly by software, hardware, and their combinations. The above modules can be embedded in the processor or independent of the processor in computer device and can store in the memory of computer device in form of software, so that the processor can call and execute the operations corresponding to the above modules.

[0117] In an implementation, a computer device is provided to be a server and whose internal structure diagram is shown in Figure 5. The computer device includes a processor, a memory, a network interface, Date recue / Date received 2021-10-29 and a database connected through a system bus. The processor of the computer device is configured to provide calculation and control capabilities. The memory of computer device includes non-volatile storage medium and internal memory. The memory of non-volatile storage medium has operation system, computer programs and database. The internal memory provides an environment for the operation system and computer program running in a non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection.
The computer program is executed by the processor to implement a human body key point detection method.

[0118] The skilled in the art can understand that the structure shown in Figure 11 is only partial structural diagram related this application solution and not constitute limitation to the computer device applied on the current application solution, the specific computer device can include more or less components than what is shown in the figure, or combinations of some components or different components to what is shown in the figure.
[0118] In an implementation, a computer device is provided which includes a memory, a processor, and a computer program stored on the memory and running on the processor. The processor achieves the above-mentioned human body key point detection method when executing the computer program.

[0119] The skilled in the art can understand that all or partial of procedures from the above-mentioned methods can be performed by computer program instructions through related hardware, the mentioned computer program can be stored in a non-volatile material computer readable storage medium, this computer can include various implementation procedures from the abovementioned methods when execution. Any reference to the memory, the storage, the database, or the other media used in each implementation provided in current application can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programable ROM (PROM), electrically programmable ROM (EPRPMD), electrically erasable programmable ROM (EEPROM) or flash memory.
Volatile memory can include random access memory (RAM) or external cache memory. As an instruction Date recue / Date received 2021-10-29 but not limited to, RAM is available in many forms such as static RAM (SRAM), dynamic RAM
(DRAMD), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SRAM
(ESDRAM), synchronal link (Synchlink) DRAM (SLDRAM), memory bus (Rambus), direct RAM
(RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM
(RDRAM), etc.

[0120] The invention discloses a human body key point detection method, apparatus, computer device and storage medium. The multi-resolution key point detection method based on human body mask code is proposed in the patent of the present invention, based on the current key point algorithm using RGB images, depth information is introduced to expand information amount that the algorithm model can perceive, which can effectively reduce the interference of the RGB color complex noise information on the algorithm and improve the accuracy of the key point algorithm.

[0121] Aiming at the problem that background-independent information interferes greatly with the human body key point algorithm, the present invention proposes a high precision method for extracting the human body mask code, which uses the human body mask code to filter out the background interference in the RGBD image and solves the problem of key point errors in complex background.
The problem of positioning on the background improves the accuracy of the key point detection algorithm.

[0122] The human body key point detection model structure proposed in the patent of the present invention is different from the serial network structure commonly used in the current key point detection.
The present invention innovatively uses a parallel network structure, including plural parallel network branches with different resolutions, so that the network model proposed in the present invention can consider the global information of the human body and the local fine-grained information of the limbs, which has greatly reduced the situation that the key points are incorrectly located on the limbs of other people around the target human body. It also makes the key point detection of the local limbs more accurate, thereby improving the performance of the human body key point detection algorithm.

Date recue / Date received 2021-10-29 [O123] To improve the efficiency of the multi-resolution parallel network, the present invention does not frequently perform multi-scale merge like most current network models, but only performs information merge on each resolution network branch in the middle and the tail of the network, so that each different resolution branch can fully analyze and process their own information before blending with each other, reducing the mutual interference between branches of different resolutions, and improving the efficiency and running speed of the network model to meet the requirements of real-time performance in many scenarios with only limited computing ability. At the same time, it also improves the accuracy of the algorithm.
[0124] The technical characteristics of the above-mentioned implementations can be randomly combined, for concisely statement, not all possible combinations of technical characteristics in the abovementioned implementations are described. However, if there are no conflicts in the combinations of these technical characteristics, it shall be within the scope of this descriptions.
[0125] The above-mentioned implementations are only several implementations in this disclosure and the description is more specific and detailed but cannot be understood as the limitation of the scope of the invention patent. Evidently those ordinary skilled in the art can make various modifications and variations to the disclosure without departing from the spirit and scope of the disclosure. Therefore, the appended claims are intended to be construed as encompassing the described embodiment and all the modifications and variations coming into the scope of the disclosure.
Date recue / Date received 2021-10-29

Claims

Claims:

1. A detecting key points of human body method comprises:
detecting whether there is a human body in the current scenario, if no human body is detected, collecting first depth image in the current scenario and covering first depth image of undetected human body in precious scenario; if a human body is detected, collecting second depth image in the current scenario, storing a human body detection frame corresponding to second depth image;
comparing frame difference between the collected first depth image and second depth image, obtaining a depth changing area map;
obtaining a human body mask code, obtaining a human body area image according to the human body mask code;
obtaining a single body area image by using the human body detection frame, inputting a key point detection model, outputting plural human body key points; and obtaining confidence level of each key point of the human body and obtaining final human body key point according to the confidence level.

2. The method according to claim 1, wherein comparing frame difference between the collected first depth image and second depth image, comprising:
pre-setting a difference threshold, the difference threshold is determined by scenario and depth camera; and subtracting the pixels of the first depth image and the second depth image one by one, if the pixel difference is greater than the threshold, then recording the difference, if the pixel difference is not greater than the threshold, then recoding zero.

3. The method according to claim 1, wherein obtaining a human body mask code, comprising:
pre-setting a limited threshold of human body connected domain, determining size relationship between each non-zero connected domain in the depth changing area map and the limited threshold;
if the non-zero connected domain is greater than the limited threshold, determining of the non-zero connected domain is human body connected domain, calculating centroid of the human body connected domain;
if the non-zero connected domain is not greater than the limited threshold, determining of the non-zero connected domain is not human body connected domain and discarding this domain;
performing area growth by using each centroid as reference point to obtain first human body area mask code;
performing human body implementation segmentation on the second depth image, combining each segmented implementation human body mask code to obtain second human body area mask code;
and merging the first human body area mask code and the second human body area mask code to obtain the human body mask code.

4. The method according to claim 3, wherein obtaining the human body area image through the human body mask code, comprising:

intercepting the human body area in the second depth image by using the human body mask code, filtering out background of the image and obtaining the human body area image;
and pre-processing the human body area image.

5. The method according to claim 4, wherein obtaining single human body area image by using the human body detection frame, comprising:
keeping center point of the human body detection frame unchanged, expanding the range of the human body detection frame in an equal proportion; and intercepting the pre-processed human body area image according to the expanded human body detection frame, obtaining single human body area image, scaling the single human body area image to predetermined size.

6. The method according to claim 5, wherein inputting a key point detection model, outputting plural human body key points, comprising:
the single human body area image inputs key point detection model; and the key point detection model outputs plural human body key point heat maps, and each heat map of the human body key point corresponds to a key point of the human body.

7. The method according to claim 6, wherein obtaining the confidence level of each key point of the human body and obtaining the final key point of the human body according to the confidence level, comprising:

searching for peak position of each human body key point heat map, determining of the peak position is the detection position of the human body key point according to the human body key point heat map, and determining of the peak value is the confidence level of the human body key point;
pre-setting a threshold of confidence level, and determining the relationship between the confidence level of each human body key point and the threshold of confidence level;
if the confidence level of the human body key point is greater than the threshold of confidence level, outputting the coordinates and the confidence level of the human body key point; and if the confidence level of the human body key point is not greater than the threshold of confidence level, discarding the human body key point; and obtaining the final human body key point.

8. The method according to claim 6, wherein inputting a key point detection model, outputting plural human body key points, comprising:
inputting the single human body area image into the key point detection model;
down-sampling the inputted single human body area image, obtaining a first characteristic map;
linear interpolating and down-sampling on the first characteristic map respectively, obtaining a second characteristic map and a third characteristic map with different resolution, turning on the correspondingly first resolution branch, second resolution branch, and third resolution branch, processing through the residual block respectively;

the first resolution branch, the second resolution branch, and the third resolution branch are respectively subjected to first multi-resolution cross merge, wherein each branch requires to add the correspondingly characteristic map and the characteristic maps corresponding to all other branches, and after completing the merge, each branch is processed by the residual block again;
after amplifying the characteristic maps corresponding to the first resolution branch and the second resolution branch by linear interpolation, performing the second multi-resolution cross merge with the characteristic map of the third resolution branch; and outputting plural human body key point heat maps according to the obtained characteristic map after the second multi-resolution cross merge, each human body key point heat map corresponds to a human body key point.

9. A human body key point detection apparatus, comprising:
a detection module configured to detect whether there is a human body in the current scenario;
a first collection module configured to if no human body is detected, collecting first depth image in the current scenario and covering first depth image of undetected human body in precious scenario;
a second collection module configure to if a human body is detected, collecting second depth image in the current scenario, storing a human body detection frame corresponding to second depth image;
a comparison module configured to compare frame difference between the collected first depth image and second depth image, obtaining a depth changing area map;

an interception module configured to obtain a human body mask code and obtain a human body area image according to the human body mask code;
a key point acquisition module configured to obtain a single body area image by using the human body detection frame, inputting a key point detection model, outputting plural human body key points; and a judgement module configured to obtain confidence level of each key point of the human body and obtaining final human body key point according to the confidence level.

10. A computer readable storage medium stored with a computer program configured to achieve the steps of any methods in claim 1 to 7 when the processor executes the computer program.