WO2021007859A1 - Method and apparatus for estimating pose of human body - Google Patents

Method and apparatus for estimating pose of human body Download PDF

Info

Publication number
WO2021007859A1
WO2021007859A1 PCT/CN2019/096587 CN2019096587W WO2021007859A1 WO 2021007859 A1 WO2021007859 A1 WO 2021007859A1 CN 2019096587 W CN2019096587 W CN 2019096587W WO 2021007859 A1 WO2021007859 A1 WO 2021007859A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
type
heat map
image
processed
Prior art date
Application number
PCT/CN2019/096587
Other languages
French (fr)
Chinese (zh)
Inventor
谭文伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201980096776.9A priority Critical patent/CN113892113A/en
Priority to PCT/CN2019/096587 priority patent/WO2021007859A1/en
Publication of WO2021007859A1 publication Critical patent/WO2021007859A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular to a method and device for estimating a human body pose.
  • Human body posture estimation has received more and more attention due to its important application value and theoretical significance.
  • the research on single-person human body pose estimation has reached high accuracy, and the research direction of the industry focuses on multi-person human body pose estimation.
  • a multi-person human body pose estimation method is the top-down method, which mainly includes: the human body is detected by a human body detector, and the position of the human body is judged, and the human body is framed; after the human body is determined, the single human body is determined independently The key points are predicted, and finally the attitude prediction is achieved.
  • the detection effect of the top-down method is often more sensitive.
  • the top-down approach is also difficult to handle.
  • the number of people increases, the time expenditure will increase proportionally with the increase of the number of people.
  • Another multi-person human pose estimation method is the bottom-up method, which mainly includes key point detection and clustering. First, all the key points of all categories in the picture are detected, and then the key points are clustered, and different key points of different people are connected together, and the clustering produces different individuals. Compared with the top-down method, the bottom-up method has slightly lower accuracy, but it has great advantages in time efficiency.
  • the embodiments of the present application provide a method and device for estimating a human body posture to improve the efficiency and accuracy of multi-person human posture estimation.
  • a method for estimating a human body pose may include: inputting an image to be processed into a neural network, parallelly detecting K types of key points in the image to be processed, and obtaining a detection heat map and a tag pool for each type of key point; wherein , The detection heat map of a type of key point indicates the possibility of this type of key point in different positions in the image to be processed; the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, a key point
  • the tag value of is used to indicate the human body group to which the key point belongs; according to the detection heat map of each type of key point, the peak value in the detection heat map of each type of key point is obtained; according to the tag pool of each type of key point, each type of key is obtained Point detection The label value of the key point corresponding to each peak in the heat map; cluster the key points with similar label values as the same person connection key point.
  • the neural network is configured with a grouping loss function, and the grouping loss function is used to assign the label value of the key point.
  • the grouping loss function is used to assign the label value of the key point.
  • N is the number of human bodies in the image to be processed;
  • w(y) represents the label value of the key point at the y coordinate;
  • the neural network is configured with a detection loss function, and the detection loss function is used to calculate the mean square error between the predicted detection heat map and the true key point heat map To output the detection heat map.
  • the label values of key points are allocated according to the spatial constraint relationship, and the key points of the same person in the spatial constraint relationship are allocated similar label values.
  • a type of key point detection heat map indicates the possibility of such key points appearing in different positions in the image to be processed, including:
  • the detection heat map of key points represents the Gaussian distribution of this type of key point in different positions in the image to be processed; or, the detection heat map of a type of key point represents the probability of this type of key point in different positions in the image to be processed.
  • the detection heat map may be a confidence map.
  • a device for estimating a human body pose may include a detection unit, an acquisition unit, and a clustering unit.
  • the detection unit is used to input the image to be processed into the neural network, parallelly detect the K types of key points in the image to be processed, and obtain the detection heat map and label pool of each type of key point; among them, the detection heat map of one type of key point represents The possibility that this type of key point appears at different positions in the image to be processed;
  • the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, and the tag value of a key point is used to indicate The human body group to which the key point belongs;
  • the acquisition unit is used to obtain the peak value in the detection heat map of each type of key point according to the detection heat map of each type of key point obtained by the detection unit; the acquisition unit is also used to obtain according to the detection unit
  • the human body posture estimation device provided in the second aspect is used to implement the human body posture estimation method provided in the above-mentioned first aspect, and the specific implementation may refer to the specific implementation of the above-mentioned first aspect.
  • an embodiment of the present application provides a human body pose estimation device, the device includes a processor, and is configured to implement the human body pose estimation method described in the first aspect.
  • the device may further include a memory, which is coupled to the processor, and when the processor executes the instructions stored in the memory, the human body posture estimation method described in the first aspect can be implemented.
  • the device may further include a communication interface, which is used for the device to communicate with other devices.
  • the communication interface may be a transceiver, a circuit, a bus, a module, or other types of communication interfaces.
  • the instructions in the memory in this application can be pre-stored or downloaded from the Internet when the device is used and then stored. This application does not specifically limit the source of the instructions in the memory.
  • the coupling in the embodiments of the present application is an indirect coupling or connection between devices, units or modules, which can be electrical, mechanical or other forms, used for information exchange between devices, units or modules.
  • the embodiments of the present application also provide a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the human body posture described in any one of the above aspects or any one of the possible implementations. Estimate method.
  • the embodiments of the present application also provide a computer program product, which when running on a computer, causes the computer to execute the human body posture estimation method described in any one of the above-mentioned aspects or any possible implementation manner.
  • the embodiments of the present application provide a chip system, which includes a processor and may also include a memory, configured to implement the functions in the foregoing method.
  • the chip system can be composed of chips, or can include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a human body posture estimation device provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a method for estimating a human body pose provided by an embodiment of this application;
  • Figure 4 is a schematic diagram of another application scenario provided by an embodiment of the application.
  • Figure 4a is a schematic diagram of a detection heat map provided by an embodiment of the application.
  • FIG. 5 is a comparison diagram before and after image processing provided by an embodiment of the application.
  • Fig. 6 is a schematic diagram of label value clustering provided by an embodiment of the application.
  • FIG. 7 is a comparison diagram of the human body pose estimation performed by the openpose algorithm provided by an embodiment of the application and the algorithm of the application;
  • FIG. 8 is a comparison diagram of the human body pose estimation performed by the openpose algorithm provided by the embodiment of the application and the algorithm of the application;
  • FIG. 9 is a comparison diagram of the human body pose estimation performed by the openpose algorithm provided by an embodiment of the application and the algorithm of the application;
  • FIG. 10 is a schematic structural diagram of another apparatus for estimating a human body pose according to an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of another apparatus for estimating a human body pose according to an embodiment of the application.
  • words such as “first” and “second” are used to distinguish the same items or similar items with basically the same function and effect. Those skilled in the art can understand that words such as “first” and “second” do not limit the quantity and order of execution, and words such as “first” and “second” do not limit the difference.
  • the “first” and second descriptions of technical features have no order or size order.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner to facilitate understanding.
  • A/B can mean A or B; "and/or” in this application is only It is a kind of association relationship that describes the associated objects, which means that there can be three kinds of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B It can be singular or plural.
  • plural means two or more than two.
  • the following at least one item (a)” or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a).
  • at least one item (a) of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
  • At least one can also be described as one or more, and the multiple can be two, three, four or more, which is not limited in this application.
  • Fig. 1 shows a schematic diagram of the application scenario of the present application. As shown in Fig. 1, an input image is input to a neural network, and the neural network performs human posture estimation to obtain an output image.
  • the neural network is illustrated as a cascaded hourglass network in FIG. 1, but it is not specifically limited.
  • the neural network performs human body pose estimation, although the single-person human body pose estimation has reached a higher accuracy, the method of multi-person human body pose estimation still needs improvement.
  • this application provides a human body pose estimation method, the basic principle of which is: in bottom-up multi-person human body pose estimation, key points are detected in parallel, and the detection heat of each type of key point is obtained while detecting the key points.
  • FIG. 2 is a schematic diagram of the composition of a human body posture estimation device 20 provided by an embodiment of the application.
  • the human body posture estimation device 20 may include at least one processor 21, a memory 22, a communication interface 23, and a communication bus 24.
  • processor 21, a memory 22, a communication interface 23, and a communication bus 24 may be included in the human body posture estimation device 20.
  • the processor 21 may be one processor or a collective term for multiple processing elements.
  • the processor 21 is a central processing unit (CPU), or may be an application specific integrated circuit (ASIC), or may be one or more integrated circuits configured to implement the embodiments of the present application
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • DSP digital signal processors
  • FPGA field programmable gate arrays
  • the processor 21 can execute various functions of the function alias control server by running or executing a software program stored in the memory 22 and calling data stored in the memory 22.
  • the processor 21 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 2.
  • the human body posture estimation device 20 may include multiple processors, such as the processor 21 and the processor 25 shown in FIG. 2. Each of these processors can be a single-core processor (single-CPU) or a multi-core processor (multi-CPU).
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the memory 22 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory 22 may exist independently and is connected to the processor 21 through the communication bus 24.
  • the memory 22 may also be integrated with the processor 21. Among them, the memory 22 is used to store a software program for executing the solution of the present application, and the processor 21 controls the execution.
  • the communication interface 23 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. .
  • the communication interface 23 may include a receiving unit and a sending unit.
  • the communication bus 24 may be an industry standard architecture (ISA) bus, an external device interconnection (peripheral component, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus.
  • ISA industry standard architecture
  • PCI peripheral component
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in FIG. 2, but it does not mean that there is only one bus or one type of bus.
  • the components shown in FIG. 2 do not constitute a limitation on the communication device.
  • the communication device may include more or less components than those shown in the figure, or a combination of certain components. Some components, or different component arrangements.
  • the processor 21 executes the following functions by running or executing software programs and/or modules stored in the memory 22, and calling data stored in the memory 22:
  • the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, and the tag value of a key point is used to indicate the body group to which the key point belongs; according to each type Key point detection heat map, get the peak value in the detection heat map of each type of key point; according to the tag pool of each type of key point, get the label value of the key point corresponding to each peak in the detection heat map of each type of key point ;
  • the clustering of key points with similar label values is regarded as the key point of connecting the same person.
  • an embodiment of the present application provides a method for estimating a human body pose. As shown in FIG. 3, the method may include:
  • S301 Input the to-be-processed image into the neural network, and detect K-type key points in the to-be-processed image in parallel, and obtain a detection heat map and tag pool for each type of key point.
  • S301 may be performed pixel by pixel in the image to be processed, or the image to be processed may be cut into multiple regions, and S301 may be performed region by region, which is not specifically limited in the embodiment of the present application.
  • the term “parallel” in this article refers to non-serial.
  • the key points may be joint points on the human body or others, which are not specifically limited in the embodiments of the present application.
  • the category number K of the key points can be configured according to actual needs, and the embodiment of the application does not specifically limit it.
  • 17 key points are defined in the COCO data set, namely: 0-nose, 1-left eye, 2-right eye, 3-left ear, 4-right ear, 5-left shoulder, 6-right shoulder, 7- Left elbow, 8-right elbow, 9-left wrist, 10-right wrist, 11-left hip, 12-right hip, 13-left knee, 14-right knee, 15-left ankle, 16-right ankle.
  • the detection heat map of a type of key point indicates the possibility of this type of key point in different positions in the image to be processed;
  • the label pool of the type of key point includes the label value of each key point of the type in the image to be processed ,
  • the tag value of a key point is used to indicate the body group to which the key point belongs.
  • the detection heat map of a type of key points indicates the possibility of this type of key point in different positions in the image to be processed, which can be specifically implemented as follows:
  • the detection heat map of a type of key point indicates that the difference in the image to be processed The position where the Gaussian distribution of this type of key point appears.
  • multi-person human pose estimation since there are multiple human bodies in an image, there will be multiple peaks.
  • the detection heat map of the key point of the left button there are multiple people's left button peaks.
  • the detection heat map of a type of key points indicates the possibility of such key points appearing in different positions in the image to be processed, which can be specifically implemented as follows: the detection heat map of a type of key points represents the image to be processed The probability of such key points appearing in different positions.
  • the detection heat map can be a two-dimensional confidence map, which expresses the probability of key points by color depth.
  • a detection loss function can be configured for the neural network, and the detection loss function is used to calculate the mean square error between the predicted detection heat map and the true key point heat map to output the detection heat map.
  • the content of the detection loss function can be configured according to actual requirements, which is not specifically limited in the embodiment of the present application.
  • the detection loss function adopts the form in the paper "Stacked hourglass networks for human pose estimation", which will not be repeated here.
  • the label values of key points can be assigned according to the spatial constraint relationship, and the key points of the same person in the spatial constraint relationship are assigned similar label values.
  • the spatial constraint relationship can be used to determine which areas in the image are the same human body and which areas are different human bodies.
  • This application does not limit the specific content of the spatial constraint relationship, and can be configured according to actual needs.
  • the spatial constraint relationship can be: using the image pixel positions of the same human body in adjacent areas in the image are relatively close, and different human bodies appear to be far apart in the image space, in order to assign joint points to The individual belonging to the individual determines the area of the individual in the image through non-maximum suppression, and then obtains which areas in the image are the same human body and which areas are different human bodies.
  • different human bodies may be configured with different label value intervals in advance, and the label values may be obtained by determining the human body according to the spatial constraint relationship and the label value intervals configured for different human bodies.
  • Similar tag values mean that the absolute value of the difference value is less than a preset threshold value, and the specific value of the preset threshold value can be configured according to actual needs, which is not specifically limited in the embodiment of the present application.
  • similar label values can be randomly generated for different key points in the same person in the spatial constraint relationship, or, for different key points in the same person in the spatial constraint relationship, according to a preset algorithm Similar tag values are generated, which is not specifically limited in the embodiment of the present application. It should be noted that the content of the preset algorithm can be configured according to actual needs.
  • a real number is generated every time a key point is detected, and the real number is used as the label value to indicate the group to which the detected pixel belongs.
  • the type of the tag value may also be in other forms than real numbers, which are not specifically limited in the embodiment of the present application.
  • a grouping loss function can be configured for the neural network, and the grouping loss function is used to assign the label value of the key point.
  • the content of the grouping loss function can be configured according to actual needs, which is not specifically limited in the embodiment of the present application.
  • the embodiment of the present application provides a specific implementation of a grouping loss function, and the grouping loss function is as follows: (1).
  • N is the number of human bodies in the image to be processed; w ⁇ R W ⁇ H , represents the tag pool corresponding to the image to be processed; w(y) represents the tag value of the key point at the y coordinate; Represents the mean value of the tag values of all real keypoint positions of the nth person.
  • the grouping of key points is equivalent to a clustering problem.
  • the generated prediction tag pool needs to aggregate the joint points of the same person as much as possible, and make the joint points of different people as much as possible Separation, in order to reduce the error of using clustering algorithm to group key points during testing.
  • To establish the loss separately first use the k-means clustering method for the same person, that is, introduce the reference value Represents the mean value of the label values of all the real joint points of the nth person, and the mathematical expression is as follows (2).
  • the square loss function is established as the following equation (3).
  • equation (3) As mentioned above, it is necessary to make equation (3) smaller and make equation (4) larger. In order to unify (3) and (4) into a loss function, it is necessary to find the minimum value of equation (4), so a negative exponential function is introduced, and equation (4) is rewritten to obtain the following equation (5).
  • FIG. 4 is a schematic diagram of a scene of the human body pose estimation method described in an embodiment of the application.
  • the input image is input to the neural network (the cascaded hourglass network is shown in the figure), and the process of S301 is executed to obtain the detection heat map and tag pool shown in FIG. 4.
  • the detection heat map obtained from the input image in Figure 4 is shown in Figure 4a.
  • Each small image in Figure 4a is a detection heat map of a type of key point, and the bright spot in each small image represents the key point. Peak position.
  • this solution can adopt a 4-order hourglass structure, the input size of the network is 256 ⁇ 256, and the output size is 128 ⁇ 128. If there are K human key points to be predicted, the number of output channels of the network is 2K, K channels are used for detection, and K channels are used for grouping. For the COCO data set, since each image has 17 key point annotations, the final output of the network is 34 channels, of which the first 17 channels are used to output joint point detection heat maps, and the last 17 channels are used to output joint points The grouping label information.
  • the original image on the left uses the COCO data set.
  • the detection heat map and tag pool for each type of key point will be obtained.
  • the detection heat map can be the Gaussian distribution of each type of key point.
  • the tag values included in the 17 tag pools are as follows (arranged in the order of the 17 key points in the COCO data set):
  • the detection heat map of each type of key point is most likely that the positions of the key points are all peaks.
  • the label value of the key point corresponding to each peak is obtained according to the coordinate position in the image to be processed.
  • the three peaks in the detection heat map of the nose key points obtained in S302 have their own coordinates. According to the coordinates, in the label value of the nose key point, the label of the corresponding position is obtained The value is [-1.8509495,1.0292919,4.163754].
  • the Lloyd method in the K-means algorithm can be used to match similar key points.
  • other clustering methods can also be used, which is not specifically limited in the embodiment of the present application.
  • the k value in the algorithm corresponds to the number of people in the picture.
  • the number of peak points on the detected heat map with the most peak points can be calculated in S302 to determine the number of peak points in the picture. Number of people.
  • the "distance" in the Lloyd method can be Euclidean distance. In actual operation, other distance measurement methods can also be selected according to the actual situation to obtain the best key point grouping.
  • the grouped label values can be obtained as follows:
  • This application is a bottom-up method. Compared with the existing top-down method, the operating speed of the model is greatly improved when the accuracy is similar. Turning the serial key point detection into parallel can reduce the calculation amount of the model, greatly reduce the complexity of the model, and increase the processing speed of the model.
  • the openpose on the left will have key points missed and incorrectly connected.
  • the right knee is detected as the left knee and the left hip is connected to the right knee.
  • the application method (the right picture in Figure 7) relatively maintains a relatively stable performance in this scenario.
  • the time efficiency is compared.
  • the average time for openpose to process a picture is 8.492 seconds, while the application method processes one image.
  • the picture only takes 3.409 seconds, and the time efficiency has been improved to a certain extent. From the comparison of accuracy and time efficiency above, it can be seen that the comprehensive performance of the method of this application has been greatly improved.
  • FIGS 7, 8, and 9 illustrate the detection results of the method of this application and openpose.
  • the left image in the figure is the result of openpose, and the right image is the result of the method of this application.
  • the human body posture estimation apparatus may include a hardware structure and/or software module, and the above functions are implemented in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above-mentioned functions is executed in a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • the division of modules in the embodiments of the present application is illustrative, and is only a logical function division. In actual implementation, there may be other division methods.
  • the functional modules in the various embodiments of the present application may be integrated into one process. In the device, it can also exist alone physically, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • the human body posture estimation apparatus 100 In the case of dividing each functional module corresponding to each function, as shown in FIG. 10, the human body posture estimation apparatus 100 provided in this embodiment of the present application is used to implement the functions in the foregoing method.
  • the human body posture estimation apparatus 100 may include: a detection unit 1001, an acquisition unit 1002, and a clustering unit 1003.
  • the detection unit 1001 is used to perform S301 in FIG. 3; the acquisition unit 1002 is used to perform S302 and S303 in FIG. 3; the clustering unit 1003 is used to perform S304 in FIG. 3.
  • all relevant content of each step involved in the above method embodiment can be cited in the function description of the corresponding function module, and will not be repeated here.
  • the human body posture estimation device 110 provided in this embodiment of the present application is used to implement the functions in the foregoing method.
  • the human body posture estimation device 110 includes at least one processing module 1101 for implementing the functions in the method provided in the embodiment of the present application.
  • the processing module 1101 may be used to execute the processes S301 to S304 in FIG. 3.
  • the processing module 1101 may be used to execute the processes S301 to S304 in FIG. 3.
  • the human body posture estimation device 110 may further include at least one storage module 1102 for storing program instructions and/or data.
  • the storage module 1102 and the processing module 1101 are coupled.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processing module 1101 may cooperate with the storage module 1102 to operate.
  • the processing module 1101 may execute program instructions stored in the storage module 1102. At least one of the at least one storage module may be included in the processing module.
  • the human body posture estimation apparatus 110 may further include a communication module 1103 for communicating with other devices through a transmission medium, so as to determine that the human body posture estimation apparatus 110 can communicate with other devices.
  • the communication module 1103 is used for the device to communicate with other devices.
  • the human body posture estimation apparatus 110 involved in FIG. 11 in the embodiment of the present application may be the human body posture estimation apparatus 20 shown in FIG. 2.
  • the human body posture estimation device 100 or the human body posture estimation device 110 provided by the embodiments of the present application may be used to implement the functions in the methods implemented by the various embodiments of the present application.
  • the human body posture estimation device 100 or the human body posture estimation device 110 provided by the embodiments of the present application may be used to implement the functions in the methods implemented by the various embodiments of the present application.
  • only the same as those in the embodiments of the present application are shown.
  • a computer-readable storage medium is provided, and an instruction is stored thereon, and the method in the foregoing method embodiment is executed when the instruction is executed.
  • a computer program product containing instructions is provided, and when the instructions are executed, the method in the foregoing method embodiment is executed.
  • the embodiment of the present application further provides a chip system.
  • the chip system includes a processor for implementing the technical method in the embodiment of the present invention.
  • the chip system further includes a memory for storing necessary program instructions and/or data of the communication device in the embodiment of the present invention.
  • the chip system further includes a memory for the processor to call application program codes stored in the memory.
  • the chip system may be composed of one or more chips, and may also include chips and other discrete devices, which are not specifically limited in the embodiment of the present application.
  • the steps of the method or algorithm described in combination with the disclosure of the present application can be implemented in a hardware manner, or in a manner that a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in RAM, flash memory, ROM, erasable programmable read-only memory (erasable programmable ROM, EPROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EPROM, EEPROM), registers, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC may be located in the core network interface device.
  • the processor and the storage medium may also exist as discrete components in the core network interface device.
  • the memory may be coupled with the processor.
  • the memory may exist independently and be connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory may be used to store application program codes that execute the technical solutions provided in the embodiments of the present application, and the processor controls the execution.
  • the processor is used to execute the application program code stored in the memory, so as to implement the technical solution provided by the embodiment of the present application.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be It can be combined or integrated into another device, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate.
  • the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of software products, which are stored in a storage medium It includes several instructions to make a device (may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for estimating the pose of a human body, the method comprising: inputting an image to be processed into a neural network, detecting in parallel K classes of key points in said image to obtain a detection heat map and tag pool of each class of key points (S301); according to the detection heat map of each class of key points, acquiring peaks in the detection heat map of each class of key points (S302); according to the tag pool of each class of key points, acquiring tag values of key points corresponding to each peak in the detection heat map of each class of key points (S303); and using a key point cluster similar to the tag values as connected key points of the same human body (S304). The method and apparatus for estimating the pose of a human body may improve the efficiency and accuracy of estimating the pose of a human body for multiple individuals.

Description

一种人体姿态估计方法及装置Human body posture estimation method and device 技术领域Technical field
本申请实施例涉及图像处理领域,尤其涉及一种人体姿态估计方法及装置。The embodiments of the present application relate to the field of image processing, and in particular to a method and device for estimating a human body pose.
背景技术Background technique
人体姿态估计由于重要的应用价值和理论意义而受到越来越多的关注。目前,单人人体姿态估计研究已经达到了较高的精度,业界研究方向聚焦于多人人体姿态估计。Human body posture estimation has received more and more attention due to its important application value and theoretical significance. At present, the research on single-person human body pose estimation has reached high accuracy, and the research direction of the industry focuses on multi-person human body pose estimation.
一种多人人体姿态估计方法是自顶向下方法,主要包括:利用人体检测器检测到人体,并判断出人体的所在的位置,框出人体;在确定人体后,再独立对单人人体的关键点进行预测,最终达到姿态预测。在人体发生遮挡,背景复杂且容易混淆的情况下,自顶向下方法检测的效果往往比较敏感。对于复杂的姿态,自顶向下的方法也难以胜任。当人数增多的时候,时间开销也会随着人数增加呈比例增长。A multi-person human body pose estimation method is the top-down method, which mainly includes: the human body is detected by a human body detector, and the position of the human body is judged, and the human body is framed; after the human body is determined, the single human body is determined independently The key points are predicted, and finally the attitude prediction is achieved. When the human body is occluded and the background is complicated and easily confused, the detection effect of the top-down method is often more sensitive. For complex postures, the top-down approach is also difficult to handle. When the number of people increases, the time expenditure will increase proportionally with the increase of the number of people.
另一种多人人体姿态估计方法是自底向上方法,主要包括关键点的检测和聚类分组。先将图片中所有类别的所有关键点全部检测出来,再对关键点进行聚类,将不同人的不同关键点连接在一块,聚类产生不同的个体。自底向上方法相对于自顶向下方法,准确率稍差,但时间效率上有很大的优越性。Another multi-person human pose estimation method is the bottom-up method, which mainly includes key point detection and clustering. First, all the key points of all categories in the picture are detected, and then the key points are clustered, and different key points of different people are connected together, and the clustering produces different individuals. Compared with the top-down method, the bottom-up method has slightly lower accuracy, but it has great advantages in time efficiency.
当前众多的多人人体姿态估计方法各具优势及缺点,如何兼顾效率与准确率是多人人体姿态估计中亟待解决的问题。Many current multi-person human body pose estimation methods have their own advantages and disadvantages. How to balance efficiency and accuracy is an urgent problem in multi-person human body pose estimation.
发明内容Summary of the invention
本申请实施例提供一种人体姿态估计方法及装置,以提高多人人体姿态估计的效率与准确率。The embodiments of the present application provide a method and device for estimating a human body posture to improve the efficiency and accuracy of multi-person human posture estimation.
为达到上述目的,本申请实施例采用如下技术方案:In order to achieve the foregoing objectives, the following technical solutions are adopted in the embodiments of this application:
第一方面,提供一种人体姿态估计方法,该方法可以包括:将待处理图像输入神经网络,并行检测待处理图像中K类关键点,得到每类关键点的检测热图及标签池;其中,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的可能性;一类关键点的标签池包括待处理图像中每个该类关键点的标签值,一个关键点的标签值用于指示该关键点所属的人体分组;根据每类关键点的检测热图,获取每类关键点的检测热图中的峰值;根据每类关键点的标签池,获取每类关键点的检测热图中的每个峰值对应的关键点的标签值;将标签值相似的关键点聚类作为同一个人体连接关键点。In a first aspect, a method for estimating a human body pose is provided. The method may include: inputting an image to be processed into a neural network, parallelly detecting K types of key points in the image to be processed, and obtaining a detection heat map and a tag pool for each type of key point; wherein , The detection heat map of a type of key point indicates the possibility of this type of key point in different positions in the image to be processed; the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, a key point The tag value of is used to indicate the human body group to which the key point belongs; according to the detection heat map of each type of key point, the peak value in the detection heat map of each type of key point is obtained; according to the tag pool of each type of key point, each type of key is obtained Point detection The label value of the key point corresponding to each peak in the heat map; cluster the key points with similar label values as the same person connection key point.
通过本申请提供的人体姿态估计方法,在多人人体姿态估计中,并行检测减少了模型的计算量提高了模型的速率;为每个关键点分配标签值作为分组的先验知识,不仅提高了聚类分组的准确度,分组的过程可以并行进行,也提高了分组的效率;因此,提高了多人人体姿态估计的效率与准确率。Through the human body pose estimation method provided in this application, in multi-person human body pose estimation, parallel detection reduces the calculation amount of the model and increases the speed of the model; assigning a label value to each key point as the prior knowledge of the grouping not only improves The accuracy of clustering and grouping, the process of grouping can be carried out in parallel, and the efficiency of grouping is also improved; therefore, the efficiency and accuracy of multi-person human pose estimation are improved.
结合第一方面,在一种可能的实现中,神经网络配置分组损失函数,该分组损失函数用于分配关键点的标签值。以实现为每个关键点分配标签值作为分组的先验知识,提高了聚类分组的准确度和分组的效率。With reference to the first aspect, in a possible implementation, the neural network is configured with a grouping loss function, and the grouping loss function is used to assign the label value of the key point. In order to realize the assignment of tag values for each key point as the prior knowledge of grouping, the accuracy and efficiency of clustering grouping are improved.
结合第一方面或上述任一种可能的实现,在另一种可能的实现中,提供一种分组损失函数的具体实现,分组损失函数
Figure PCTCN2019096587-appb-000001
In combination with the first aspect or any of the foregoing possible implementations, in another possible implementation, a specific implementation of the packet loss function is provided. The packet loss function
Figure PCTCN2019096587-appb-000001
其中,N为待处理图像中人体的个数;w(y)表示y坐标处的关键点的标签值;
Figure PCTCN2019096587-appb-000002
表示第n个人所有真实关键点位置的标签值的均值。
Among them, N is the number of human bodies in the image to be processed; w(y) represents the label value of the key point at the y coordinate;
Figure PCTCN2019096587-appb-000002
Represents the mean value of the tag values of all real keypoint positions of the nth person.
结合第一方面或上述任一种可能的实现,在另一种可能的实现中,神经网络配置检测损失函数,检测损失函数用于计算预测的检测热图与真实关键点热图的均方误差以输出检测热图。Combining the first aspect or any of the above possible implementations, in another possible implementation, the neural network is configured with a detection loss function, and the detection loss function is used to calculate the mean square error between the predicted detection heat map and the true key point heat map To output the detection heat map.
结合第一方面或上述任一种可能的实现,在另一种可能的实现中,关键点的标签值根据空间约束关系分配,空间约束关系中同一个人体的关键点分配相似的标签值。In combination with the first aspect or any of the foregoing possible implementations, in another possible implementation, the label values of key points are allocated according to the spatial constraint relationship, and the key points of the same person in the spatial constraint relationship are allocated similar label values.
结合第一方面或上述任一种可能的实现,在另一种可能的实现中,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的可能性,包括:一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的高斯分布;或者,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的概率。例如,检测热图可以为置信图。In combination with the first aspect or any of the above possible implementations, in another possible implementation, a type of key point detection heat map indicates the possibility of such key points appearing in different positions in the image to be processed, including: The detection heat map of key points represents the Gaussian distribution of this type of key point in different positions in the image to be processed; or, the detection heat map of a type of key point represents the probability of this type of key point in different positions in the image to be processed. For example, the detection heat map may be a confidence map.
第二方面,提供一种人体姿态估计装置,该装置可以包括:检测单元、获取单元以及聚类单元。其中,检测单元,用于将待处理图像输入神经网络,并行检测待处理图像中K类关键点,得到每类关键点的检测热图及标签池;其中,一类关键点的检测热图表示所述待处理图像中不同位置出现该类关键点的可能性;一类关键点的标签池包括所述待处理图像中每个该类关键点的标签值,一个关键点的标签值用于指示该关键点所属的人体分组;获取单元,用于根据检测单元得到的每类关键点的检测热图,获取每类关键点的检测热图中的峰值;获取单元还用于,根据检测单元得到的每类关键点的标签池,获取每个峰值对应的关键点的标签值;聚类单元,用于将获取单元获取的标签值相似的关键点聚类作为同一个人体连接关键点。In a second aspect, a device for estimating a human body pose is provided. The device may include a detection unit, an acquisition unit, and a clustering unit. Among them, the detection unit is used to input the image to be processed into the neural network, parallelly detect the K types of key points in the image to be processed, and obtain the detection heat map and label pool of each type of key point; among them, the detection heat map of one type of key point represents The possibility that this type of key point appears at different positions in the image to be processed; the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, and the tag value of a key point is used to indicate The human body group to which the key point belongs; the acquisition unit is used to obtain the peak value in the detection heat map of each type of key point according to the detection heat map of each type of key point obtained by the detection unit; the acquisition unit is also used to obtain according to the detection unit The tag pool of each type of key point in the, to obtain the tag value of the key point corresponding to each peak; the clustering unit is used to cluster the key points with similar tag values obtained by the acquisition unit as the key points of the same person connection.
通过本申请提供的人体姿态估计装置,在多人人体姿态估计中,并行检测减少了模型的计算量提高了模型的速率;为每个关键点分配标签值作为分组的先验知识,不仅提高了聚类分组的准确度,分组的过程可以并行进行,也提高了分组的效率;因此,提高了多人人体姿态估计的效率与准确率。Through the human body posture estimation device provided in this application, in multi-person human body posture estimation, parallel detection reduces the calculation amount of the model and improves the speed of the model; assigning a label value to each key point as the prior knowledge of the grouping not only improves The accuracy of clustering and grouping, the process of grouping can be carried out in parallel, and the efficiency of grouping is also improved; therefore, the efficiency and accuracy of multi-person human pose estimation are improved.
需要说明的是,第二方面提供的人体姿态估计装置,用于执行上述第一方面提供的人体姿态估计方法,具体实现可以参考上述第一方面的具体实现。It should be noted that the human body posture estimation device provided in the second aspect is used to implement the human body posture estimation method provided in the above-mentioned first aspect, and the specific implementation may refer to the specific implementation of the above-mentioned first aspect.
第三方面,本申请实施例提供一种人体姿态估计装置,所述装置包括处理器,用于实现上述第一方面描述的人体姿态估计方法。所述装置还可以包括存储器,所述存储器与所述处理器耦合,所述处理器执行所述存储器中存储的指令时,可以实现上述第一方面描述的人体姿态估计方法。所述装置还可以包括通信接口,所述通信接口用于该装置与其它设备进行通信,示例性的,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。In a third aspect, an embodiment of the present application provides a human body pose estimation device, the device includes a processor, and is configured to implement the human body pose estimation method described in the first aspect. The device may further include a memory, which is coupled to the processor, and when the processor executes the instructions stored in the memory, the human body posture estimation method described in the first aspect can be implemented. The device may further include a communication interface, which is used for the device to communicate with other devices. Exemplarily, the communication interface may be a transceiver, a circuit, a bus, a module, or other types of communication interfaces.
需要说明的是,本申请中存储器中的指令可以预先存储也可以使用该装置时从互联网下载后存储,本申请对于存储器中指令的来源不进行具体限定。本申请实施例中 的耦合是装置、单元或模块之间的间接耦合或连接,其可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。It should be noted that the instructions in the memory in this application can be pre-stored or downloaded from the Internet when the device is used and then stored. This application does not specifically limit the source of the instructions in the memory. The coupling in the embodiments of the present application is an indirect coupling or connection between devices, units or modules, which can be electrical, mechanical or other forms, used for information exchange between devices, units or modules.
第四方面,本申请实施例中还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述任一方面或任意一种可能的实现方式所述的人体姿态估计方法。In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the human body posture described in any one of the above aspects or any one of the possible implementations. Estimate method.
第五方面,本申请实施例中还提供一种计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一方面或任意一种可能的实现方式所述的人体姿态估计方法。In a fifth aspect, the embodiments of the present application also provide a computer program product, which when running on a computer, causes the computer to execute the human body posture estimation method described in any one of the above-mentioned aspects or any possible implementation manner.
第六方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现上述方法中的功能。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。In a sixth aspect, the embodiments of the present application provide a chip system, which includes a processor and may also include a memory, configured to implement the functions in the foregoing method. The chip system can be composed of chips, or can include chips and other discrete devices.
上述第三方面至第六方面提供的方案,用于实现上述第一方面提供的人体姿态估计方法,因此可以与第一方面达到相同的有益效果,此处不再进行赘述。The solutions provided in the third aspect to the sixth aspect described above are used to implement the human body posture estimation method provided in the first aspect described above, and therefore can achieve the same beneficial effects as the first aspect, and will not be repeated here.
附图说明Description of the drawings
图1为本申请实施例提供的一种应用场景示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the application;
图2为本申请实施例提供的一种人体姿态估计装置的结构示意图;FIG. 2 is a schematic structural diagram of a human body posture estimation device provided by an embodiment of the application;
图3为本申请实施例提供的一种人体姿态估计方法的流程示意图;3 is a schematic flowchart of a method for estimating a human body pose provided by an embodiment of this application;
图4为本申请实施例提供的另一种应用场景示意图;Figure 4 is a schematic diagram of another application scenario provided by an embodiment of the application;
图4a为本申请实施例提供的一种检测热图示意图;Figure 4a is a schematic diagram of a detection heat map provided by an embodiment of the application;
图5为本申请实施例提供的一种图像处理前后对比图;FIG. 5 is a comparison diagram before and after image processing provided by an embodiment of the application;
图6为本申请实施例提供的一种标签值聚类示意图;Fig. 6 is a schematic diagram of label value clustering provided by an embodiment of the application;
图7为本申请实施例提供的openpose算法与本申请算法进行人体姿态估计的对比图;FIG. 7 is a comparison diagram of the human body pose estimation performed by the openpose algorithm provided by an embodiment of the application and the algorithm of the application;
图8为本申请实施例提供的openpose算法与本申请算法进行人体姿态估计的对比图;FIG. 8 is a comparison diagram of the human body pose estimation performed by the openpose algorithm provided by the embodiment of the application and the algorithm of the application;
图9为本申请实施例提供的openpose算法与本申请算法进行人体姿态估计的对比图;FIG. 9 is a comparison diagram of the human body pose estimation performed by the openpose algorithm provided by an embodiment of the application and the algorithm of the application;
图10为本申请实施例提供的另一种人体姿态估计装置的结构示意图;FIG. 10 is a schematic structural diagram of another apparatus for estimating a human body pose according to an embodiment of the application;
图11为本申请实施例提供的再一种人体姿态估计装置的结构示意图。FIG. 11 is a schematic structural diagram of another apparatus for estimating a human body pose according to an embodiment of the application.
具体实施方式Detailed ways
在本申请实施例中,为了便于清楚描述本申请实施例的技术方案,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。该“第一”、第二”描述的技术特征间无先后顺序或者大小顺序。In the embodiments of the present application, in order to clearly describe the technical solutions of the embodiments of the present application, words such as "first" and "second" are used to distinguish the same items or similar items with basically the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and order of execution, and words such as "first" and "second" do not limit the difference. The “first” and second descriptions of technical features have no order or size order.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner to facilitate understanding.
在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关 系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。并且,在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。In the description of this application, unless otherwise specified, "/" means that the related objects are in an "or" relationship. For example, A/B can mean A or B; "and/or" in this application is only It is a kind of association relationship that describes the associated objects, which means that there can be three kinds of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B It can be singular or plural. Also, in the description of this application, unless otherwise specified, "plurality" means two or more than two. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a). For example, at least one item (a) of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
在本申请实施例中,至少一个还可以描述为一个或多个,多个可以是两个、三个、四个或者更多个,本申请不做限制。In the embodiments of the present application, at least one can also be described as one or more, and the multiple can be two, three, four or more, which is not limited in this application.
此外,本申请实施例描述的网络架构以及场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。In addition, the network architectures and scenarios described in the embodiments of this application are intended to illustrate the technical solutions of the embodiments of this application more clearly, and do not constitute a limitation on the technical solutions provided in the embodiments of this application. Those of ordinary skill in the art will know that with With the evolution of the network architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
本申请实施例提供的方法可用于人体姿态估计的神经网络中,该神经网络可以为堆叠式沙漏网络或者其他结构,本申请实施例对此不进行具体限定。图1示意了本申请的应用场景示意图,如图1所示,输入图像输入到神经网络,神经网络进行人体姿态估计,得到输出图像。The method provided by the embodiment of the present application can be used in a neural network for human body posture estimation, and the neural network can be a stacked hourglass network or other structure, which is not specifically limited in the embodiment of the present application. Fig. 1 shows a schematic diagram of the application scenario of the present application. As shown in Fig. 1, an input image is input to a neural network, and the neural network performs human posture estimation to obtain an output image.
需要说明的是,图1中将神经网络示意为级联沙漏网络,但并不是具体限定。It should be noted that the neural network is illustrated as a cascaded hourglass network in FIG. 1, but it is not specifically limited.
目前,神经网络进行人体姿态估计时,虽然单人人体姿态估计已经达到了较高的精度,但多人人体姿态估计的方法还存在待改进的地方。At present, when the neural network performs human body pose estimation, although the single-person human body pose estimation has reached a higher accuracy, the method of multi-person human body pose estimation still needs improvement.
基于此,本申请提供一种人体姿态估计方法,其基本原理是:在自底向上的多人人体姿态估计中,并行检测关键点,并在检测关键点的同时得到每类关键点的检测热图及标签池,这样一来,并行检测减少了模型的计算量提高了模型的速率;为每个关键点分配标签值作为分组的先验知识,不仅提高了聚类分组的准确度,分组的过程可以并行进行,也提高了分组的效率;因此,提高了多人人体姿态估计的效率与准确率。Based on this, this application provides a human body pose estimation method, the basic principle of which is: in bottom-up multi-person human body pose estimation, key points are detected in parallel, and the detection heat of each type of key point is obtained while detecting the key points. Graphs and tag pools, in this way, parallel detection reduces the amount of model calculations and increases the speed of the model; assigning tag values to each key point as prior knowledge of the grouping not only improves the accuracy of clustering and grouping, but also The process can be performed in parallel, and the efficiency of grouping is also improved; therefore, the efficiency and accuracy of multi-person human posture estimation are improved.
下面将结合附图对本申请实施例的实施方式进行详细描述。The implementation of the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.
图2为本申请实施例提供的一种人体姿态估计装置20的组成示意图,如图2所示,该人体姿态估计装置20可以包括至少一个处理器21,存储器22、通信接口23、通信总线24。下面结合图2对人体姿态估计装置20的各个构成部件进行具体的介绍:2 is a schematic diagram of the composition of a human body posture estimation device 20 provided by an embodiment of the application. As shown in Fig. 2, the human body posture estimation device 20 may include at least one processor 21, a memory 22, a communication interface 23, and a communication bus 24. . Hereinafter, each component of the human body posture estimation device 20 will be specifically introduced in conjunction with FIG. 2.
处理器21,可以是一个处理器,也可以是多个处理元件的统称。例如,处理器21是一个中央处理器(central processing unit,CPU),也可以是特定集成电路(application specific integrated circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,例如:一个或多个微处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)。The processor 21 may be one processor or a collective term for multiple processing elements. For example, the processor 21 is a central processing unit (CPU), or may be an application specific integrated circuit (ASIC), or may be one or more integrated circuits configured to implement the embodiments of the present application For example, one or more microprocessors (digital signal processors, DSP), or one or more field programmable gate arrays (FPGA).
其中,处理器21可以通过运行或执行存储在存储器22内的软件程序,以及调用存储在存储器22内的数据,执行功能别名控制服务器的各种功能。在具体的实现中,作为一种实施例,处理器21可以包括一个或多个CPU,例如图2中所示的CPU0和CPU1。Among them, the processor 21 can execute various functions of the function alias control server by running or executing a software program stored in the memory 22 and calling data stored in the memory 22. In a specific implementation, as an embodiment, the processor 21 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 2.
在具体实现中,作为一种实施例,该人体姿态估计装置20可以包括多个处理器, 例如图2中所示的处理器21和处理器25。这些处理器中的每一个可以是一个单核处理器(single-CPU),也可以是一个多核处理器(multi-CPU)。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, as an embodiment, the human body posture estimation device 20 may include multiple processors, such as the processor 21 and the processor 25 shown in FIG. 2. Each of these processors can be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
存储器22可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器22可以是独立存在,通过通信总线24与处理器21相连接。存储器22也可以和处理器21集成在一起。其中,存储器22用于存储执行本申请方案的软件程序,并由处理器21来控制执行。The memory 22 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this. The memory 22 may exist independently and is connected to the processor 21 through the communication bus 24. The memory 22 may also be integrated with the processor 21. Among them, the memory 22 is used to store a software program for executing the solution of the present application, and the processor 21 controls the execution.
通信接口23,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。通信接口23可以包括接收单元以及发送单元。The communication interface 23 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. . The communication interface 23 may include a receiving unit and a sending unit.
通信总线24,可以是工业标准体系结构(industry standard architecture,ISA)总线、外部设备互连(peripheral component,PCI)总线或扩展工业标准体系结构(extended industry standard architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图2中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus 24 may be an industry standard architecture (ISA) bus, an external device interconnection (peripheral component, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in FIG. 2, but it does not mean that there is only one bus or one type of bus.
需要指出的是,图2中示出的部件并不构成对该通信装置的限定,除图2所示部件之外,该通信装置可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。It should be pointed out that the components shown in FIG. 2 do not constitute a limitation on the communication device. In addition to the components shown in FIG. 2, the communication device may include more or less components than those shown in the figure, or a combination of certain components. Some components, or different component arrangements.
具体的,处理器21通过运行或执行存储在存储器22内的软件程序和/或模块,以及调用存储在存储器22内的数据,执行如下功能:Specifically, the processor 21 executes the following functions by running or executing software programs and/or modules stored in the memory 22, and calling data stored in the memory 22:
将待处理图像输入神经网络,并行检测待处理图像中K类关键点,得到每类关键点的检测热图及标签池;其中,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的可能性;一类关键点的标签池包括待处理图像中每个该类关键点的标签值,一个关键点的标签值用于指示该关键点所属的人体分组;根据每类关键点的检测热图,获取每类关键点的检测热图中的峰值;根据每类关键点的标签池,获取每类关键点的检测热图中的每个峰值对应的关键点的标签值;将标签值相似的关键点聚类作为同一个人体连接关键点。Input the image to be processed into the neural network to detect K types of key points in the image to be processed in parallel, and obtain the detection heat map and tag pool of each type of key point; among them, the detection heat map of one type of key point indicates that different positions in the image to be processed appear The possibility of this type of key point; the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, and the tag value of a key point is used to indicate the body group to which the key point belongs; according to each type Key point detection heat map, get the peak value in the detection heat map of each type of key point; according to the tag pool of each type of key point, get the label value of the key point corresponding to each peak in the detection heat map of each type of key point ; The clustering of key points with similar label values is regarded as the key point of connecting the same person.
一方面,本申请实施例提供一种人体姿态估计方法,如图3所示,该方法可以包括:On the one hand, an embodiment of the present application provides a method for estimating a human body pose. As shown in FIG. 3, the method may include:
S301、将待处理图像输入神经网络,并行检测该待处理图像中K类关键点,得到每类关键点的检测热图及标签池。S301: Input the to-be-processed image into the neural network, and detect K-type key points in the to-be-processed image in parallel, and obtain a detection heat map and tag pool for each type of key point.
具体的,可以在待处理图像中逐像素执行S301,或者,将待处理图像切割成多个区域,逐区域执行S301,本申请实施例对此不进行具体限定。本文所称并行,是指非 串行。Specifically, S301 may be performed pixel by pixel in the image to be processed, or the image to be processed may be cut into multiple regions, and S301 may be performed region by region, which is not specifically limited in the embodiment of the present application. The term “parallel” in this article refers to non-serial.
其中,关键点可以为人体上的关节点或者其他,本申请实施例对此不进行具体限定。关键点的类别数量K,可以根据实际需求配置,本申请实施例不进行具体限定。Among them, the key points may be joint points on the human body or others, which are not specifically limited in the embodiments of the present application. The category number K of the key points can be configured according to actual needs, and the embodiment of the application does not specifically limit it.
例如,COCO数据集中定义了17个关键点,分别为:0-鼻子,1-左眼,2-右眼,3-左耳,4-右耳,5-左肩,6-右肩,7-左手肘,8-右手肘,9-左手腕,10-右手腕,11-左臀,12-右臀,13-左膝,14-右膝,15-左脚踝,16-右脚踝。For example, 17 key points are defined in the COCO data set, namely: 0-nose, 1-left eye, 2-right eye, 3-left ear, 4-right ear, 5-left shoulder, 6-right shoulder, 7- Left elbow, 8-right elbow, 9-left wrist, 10-right wrist, 11-left hip, 12-right hip, 13-left knee, 14-right knee, 15-left ankle, 16-right ankle.
其中,一类关键点的检测热图表示所述待处理图像中不同位置出现该类关键点的可能性;一类关键点的标签池包括该待处理图像中每个该类关键点的标签值,一个关键点的标签值用于指示该关键点所属的人体分组。Among them, the detection heat map of a type of key point indicates the possibility of this type of key point in different positions in the image to be processed; the label pool of the type of key point includes the label value of each key point of the type in the image to be processed , The tag value of a key point is used to indicate the body group to which the key point belongs.
一种可能的实现中,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的可能性,具体可以实现为:一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的高斯分布。在多人人体姿态估计中,由于一个图像中存在多个人体,就会有多个峰值。比如左键这个关键点的检测热图中,就有多个人的左键的峰值。In a possible implementation, the detection heat map of a type of key points indicates the possibility of this type of key point in different positions in the image to be processed, which can be specifically implemented as follows: The detection heat map of a type of key point indicates that the difference in the image to be processed The position where the Gaussian distribution of this type of key point appears. In multi-person human pose estimation, since there are multiple human bodies in an image, there will be multiple peaks. For example, in the detection heat map of the key point of the left button, there are multiple people's left button peaks.
另一种可能的实现中,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的可能性,具体可以实现为:一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的概率。例如,检测热图可以为二维置信图,通过颜色深浅表示关键点的概率。In another possible implementation, the detection heat map of a type of key points indicates the possibility of such key points appearing in different positions in the image to be processed, which can be specifically implemented as follows: the detection heat map of a type of key points represents the image to be processed The probability of such key points appearing in different positions. For example, the detection heat map can be a two-dimensional confidence map, which expresses the probability of key points by color depth.
在一种具体的实现中,可以对神经网络配置检测损失函数,检测损失函数用于计算预测的检测热图与真实关键点热图的均方误差以输出检测热图。对于检测损失函数的内容,可以根据实际需求配置,本申请实施例对此不进行具体限定。In a specific implementation, a detection loss function can be configured for the neural network, and the detection loss function is used to calculate the mean square error between the predicted detection heat map and the true key point heat map to output the detection heat map. The content of the detection loss function can be configured according to actual requirements, which is not specifically limited in the embodiment of the present application.
例如,检测损失函数采用论文《Stacked hourglass networks for human pose estimation》中的形式,此处不在进行赘述。For example, the detection loss function adopts the form in the paper "Stacked hourglass networks for human pose estimation", which will not be repeated here.
具体的,关键点的标签值可以根据空间约束关系分配,空间约束关系中同一个人体的关键点分配相似的标签值。Specifically, the label values of key points can be assigned according to the spatial constraint relationship, and the key points of the same person in the spatial constraint relationship are assigned similar label values.
其中,空间约束关系可以用于确定图像中哪些区域是同一个人体哪些区域是不同的人体。本申请对于空间约束关系的具体内容不进行限定,可以根据实际需求配置。Among them, the spatial constraint relationship can be used to determine which areas in the image are the same human body and which areas are different human bodies. This application does not limit the specific content of the spatial constraint relationship, and can be configured according to actual needs.
例如,一种可能的实现中,空间约束关系可以为:利用图像中同一个人体在相邻区域的图像像素位置比较接近,不同人体在图像空间里表现为距离较远,为了将关节点分配至所属个人,通过非最大抑制来确定图像中个体的区域,进而获取图像中哪些区域是同一个人体哪些区域是不同的人体。For example, in a possible implementation, the spatial constraint relationship can be: using the image pixel positions of the same human body in adjacent areas in the image are relatively close, and different human bodies appear to be far apart in the image space, in order to assign joint points to The individual belonging to the individual determines the area of the individual in the image through non-maximum suppression, and then obtains which areas in the image are the same human body and which areas are different human bodies.
具体的,可以预先给不同的人体配置不同的标签值区间,根据空间约束关系确定出的人体以及不同的人体配置的标签值区间,得到标签值。Specifically, different human bodies may be configured with different label value intervals in advance, and the label values may be obtained by determining the human body according to the spatial constraint relationship and the label value intervals configured for different human bodies.
相似的标签值,是指差值的绝对值小于预设阈值,该预设阈值的具体值可以根据实际需求配置,本申请实施例对此不进行具体限定。Similar tag values mean that the absolute value of the difference value is less than a preset threshold value, and the specific value of the preset threshold value can be configured according to actual needs, which is not specifically limited in the embodiment of the present application.
具体的,可以根据空间约束关系,对空间约束关系中位于同一个人体内的不同关键点,随机生成相似的标签值,或者,对空间约束关系中位于同一个人体内的不同关键点,根据预设算法生成相似的标签值,本申请实施例对此不进行具体限定。需要说明的是,预设算法的内容可以根据实际需求配置。Specifically, according to the spatial constraint relationship, similar label values can be randomly generated for different key points in the same person in the spatial constraint relationship, or, for different key points in the same person in the spatial constraint relationship, according to a preset algorithm Similar tag values are generated, which is not specifically limited in the embodiment of the present application. It should be noted that the content of the preset algorithm can be configured according to actual needs.
例如,在多人的人体姿态估计中,每一次检测到关键点都生成一个实数,以该实 数作为标签值来表示此检测像素点所属的分组。当然,标签值的类型也可以为实数之外的其他形式,本申请实施例对此不进行具体限定。For example, in multi-person human pose estimation, a real number is generated every time a key point is detected, and the real number is used as the label value to indicate the group to which the detected pixel belongs. Of course, the type of the tag value may also be in other forms than real numbers, which are not specifically limited in the embodiment of the present application.
在一种具体的实现中,可以对神经网络配置分组损失函数,该分组损失函数用于分配关键点的标签值。对于分组损失函数的内容,可以根据实际需求配置,本申请实施例对此不进行具体限定。In a specific implementation, a grouping loss function can be configured for the neural network, and the grouping loss function is used to assign the label value of the key point. The content of the grouping loss function can be configured according to actual needs, which is not specifically limited in the embodiment of the present application.
示例性的,本申请实施例提供一种分组损失函数的具体实现,分组损失函数如下式(1)。Exemplarily, the embodiment of the present application provides a specific implementation of a grouping loss function, and the grouping loss function is as follows: (1).
Figure PCTCN2019096587-appb-000003
Figure PCTCN2019096587-appb-000003
其中,N为待处理图像中人体的个数;w∈R W×H,表示待处理图像对应的标签池;w(y)表示y坐标处的关键点的标签值;
Figure PCTCN2019096587-appb-000004
表示第n个人所有真实关键点位置的标签值的均值。对于给定的一张图中,假定有N个人,这N个人的人体关节点的真实位置坐标为:T={(y nk)},n=1,...,N,k=1,...,K。
Among them, N is the number of human bodies in the image to be processed; w∈R W×H , represents the tag pool corresponding to the image to be processed; w(y) represents the tag value of the key point at the y coordinate;
Figure PCTCN2019096587-appb-000004
Represents the mean value of the tag values of all real keypoint positions of the nth person. For a given picture, assuming that there are N people, the real position coordinates of the joint points of the human body of these N people are: T={(y nk )}, n=1,...,N,k=1,... .,K.
下面简单描述分组损失函数的设计思路。The following briefly describes the design idea of the grouping loss function.
关键点的分组相当于一个聚类问题,为了能得到一个较好的关键点的分组结果,生成的预测标签池需要让同一个人的关节点尽可能聚合在一起,同时让不同人的关节点尽量分离,以减少测试时利用聚类算法对关键点分组的误差。分别进行损失建立,首先对同一个人情形,运用k-means聚类方式,即引入参考值
Figure PCTCN2019096587-appb-000005
表示第n个人所有真实关节点位置标签值的均值,数学表达式如下式(2)。
The grouping of key points is equivalent to a clustering problem. In order to get a better grouping result of key points, the generated prediction tag pool needs to aggregate the joint points of the same person as much as possible, and make the joint points of different people as much as possible Separation, in order to reduce the error of using clustering algorithm to group key points during testing. To establish the loss separately, first use the k-means clustering method for the same person, that is, introduce the reference value
Figure PCTCN2019096587-appb-000005
Represents the mean value of the label values of all the real joint points of the nth person, and the mathematical expression is as follows (2).
Figure PCTCN2019096587-appb-000006
Figure PCTCN2019096587-appb-000006
根据k-means聚类的定义,建立平方损失函数如下式(3)。According to the definition of k-means clustering, the square loss function is established as the following equation (3).
Figure PCTCN2019096587-appb-000007
Figure PCTCN2019096587-appb-000007
对于不同人的情形,令第n个人所有真实关节点位置标签值的均值为
Figure PCTCN2019096587-appb-000008
令第n′个人所有真实关节点位置标签值的均值为
Figure PCTCN2019096587-appb-000009
Figure PCTCN2019096587-appb-000010
表示前者与后者的误差。为了度量两者之间的距离,同样引入平方损失函数如下式(4)。
For the case of different people, let the mean of the label values of all the real joint points of the nth person be
Figure PCTCN2019096587-appb-000008
Let the average value of the label values of all the real joint points of the n′th person be
Figure PCTCN2019096587-appb-000009
Assume
Figure PCTCN2019096587-appb-000010
Indicates the error between the former and the latter. In order to measure the distance between the two, the square loss function is also introduced as the following equation (4).
Figure PCTCN2019096587-appb-000011
Figure PCTCN2019096587-appb-000011
如上所述,需要让(3)式越小,让(4)式越大。为了将(3),(4)统一成一个损失函数,需要让(4)式求得最小值,因此引入负指数函数,重写(4)式得到如下式(5)。As mentioned above, it is necessary to make equation (3) smaller and make equation (4) larger. In order to unify (3) and (4) into a loss function, it is necessary to find the minimum value of equation (4), so a negative exponential function is introduced, and equation (4) is rewritten to obtain the following equation (5).
Figure PCTCN2019096587-appb-000012
Figure PCTCN2019096587-appb-000012
合并(3)(5)求得损失函数如下式(6)。Combining (3) and (5) to obtain the loss function is the following equation (6).
Figure PCTCN2019096587-appb-000013
Figure PCTCN2019096587-appb-000013
总的损失为检测损失与分组损失的加权和:Loss=mL g+nL d,m,n为超参数。 The total loss is the weighted sum of the detection loss and the packet loss: Loss = mL g + nL d , where m and n are hyperparameters.
需要说明的是,上述分组损失函数的设计思路只是举例说明,凡是采用该思路涉及的分组损失函数均属于本申请所描述的分组损失函数。It should be noted that the design idea of the above-mentioned grouping loss function is only an example for illustration, and all grouping loss functions involved in this idea belong to the grouping loss function described in this application.
图4为本申请实施例描述的人体姿态估计方法的场景示意图。如图4所示,输入图像输入神经网络(图中示意为级联沙漏网络),执行S301的过程,得到图4中所示的检测热图及标签池。示例性的,图4中的输入图像得到的检测热图如图4a所示,图4a中的每个小图是一类关键点的检测热图,每个小图中的高亮点表示关键点的峰值位置。FIG. 4 is a schematic diagram of a scene of the human body pose estimation method described in an embodiment of the application. As shown in FIG. 4, the input image is input to the neural network (the cascaded hourglass network is shown in the figure), and the process of S301 is executed to obtain the detection heat map and tag pool shown in FIG. 4. Exemplarily, the detection heat map obtained from the input image in Figure 4 is shown in Figure 4a. Each small image in Figure 4a is a detection heat map of a type of key point, and the bright spot in each small image represents the key point. Peak position.
例如,本方案可以采用4阶hourglass结构,网络的输入的尺寸为256×256,输出的尺寸为128×128。若有K个人体关键点需要预测,则网络的输出通道数为2K个,K个通道用于检测,另外K个用于分组。对于COCO数据集,由于每一张图有17个关键点标注,最终网络的输出为34个通道,其中前17个通道用于输出关节点的检测热图,后17个通道用于输出关节点的分组标签信息。For example, this solution can adopt a 4-order hourglass structure, the input size of the network is 256×256, and the output size is 128×128. If there are K human key points to be predicted, the number of output channels of the network is 2K, K channels are used for detection, and K channels are used for grouping. For the COCO data set, since each image has 17 key point annotations, the final output of the network is 34 channels, of which the first 17 channels are used to output joint point detection heat maps, and the last 17 channels are used to output joint points The grouping label information.
下面结合具体示例,描述本申请的执行过程。The following describes the execution process of this application with specific examples.
如图5所示,左图的原图,采用COCO数据集,使用本申请的模型,会得到每一类关键点的检测热图与标签池。其中,检测热图可以为每类关键点的高斯分布。17个标签池所包括的标签值如下(按照COCO数据集中17个关键点的顺序排列):As shown in Figure 5, the original image on the left uses the COCO data set. Using the model of this application, the detection heat map and tag pool for each type of key point will be obtained. Among them, the detection heat map can be the Gaussian distribution of each type of key point. The tag values included in the 17 tag pools are as follows (arranged in the order of the 17 key points in the COCO data set):
[-1.8509495,1.0292919,4.163754],[-1.8537576,1.0384212,4.1641073],[-1.8466513,1.0294132,4.1671414],[-1.8384541,1.0231285,0],[-1.8542455,0,4.150399],[-1.8703872,1.0921177,4.120116],[-1.8843985,1.0381733,4.1086555],[-1.8894546,1.1279857,4.1338553],[-1.8111589,1.057544,4.1566596],[-1.9049348,0,4.1225743],[-1.8559527,1.0369155,4.124661],[-1.9162827,1.0587022,4.111921],[-1.9004283,1.1545397,4.1475873],[-1.9555404,0,4.185656],[-1.8526844,0,4.170965],[0,0,0],[0,0,0]。其中,0表示关键点信息不可用。[-1.8509495,1.0292919,4.163754], [-1.8537576,1.0384212,4.1641073], [-1.8466513,1.0294132,4.1671414], [-1.8384541,1.0231285,0],[-1.8542455,0,4.150399],[-1.8703872,1.0921177 ,4.120116], [-1.8843985,1.0381733,4.1086555], [-1.8894546,1.1279857,4.1338553], [-1.8111589,1.057544,4.1566596], [-1.9049348,0,4.1225743], [-1.8559527,1.0369155,4.124661], [ -1.9162827,1.0587022,4.111921], [-1.9004283,1.1545397,4.1475873], [-1.9555404,0,4.185656], [-1.8526844,0,4.170965], [0,0,0], [0,0,0] . Among them, 0 means that the key point information is not available.
S302、根据每类关键点的检测热图,获取每类关键点的检测热图中的峰值。S302. Obtain a peak value in the detection heat map of each type of key point according to the detection heat map of each type of key point.
具体的,每类关键点的检测热图中最有可能是关键点的位置均为峰值。Specifically, the detection heat map of each type of key point is most likely that the positions of the key points are all peaks.
例如,基于S301中的示例,在鼻子关键点的检测热图中,可以获取到3个峰值。For example, based on the example in S301, in the detection heat map of the nose key points, 3 peaks can be obtained.
S303、根据每类关键点的标签池,获取每个峰值对应的关键点的标签值。S303. Obtain the tag value of the key point corresponding to each peak according to the tag pool of each type of key point.
具体的,在S303中,按照待处理图像中的坐标位置,获取每个峰值对应的关键点的标签值。Specifically, in S303, the label value of the key point corresponding to each peak is obtained according to the coordinate position in the image to be processed.
例如,基于S301中的示例,在S302中获取的鼻子关键点的检测热图中的3个峰值,均有各自的坐标,按照该坐标,在鼻子关键点的标签值中,获取对应位置的标签值为[-1.8509495,1.0292919,4.163754]。For example, based on the example in S301, the three peaks in the detection heat map of the nose key points obtained in S302 have their own coordinates. According to the coordinates, in the label value of the nose key point, the label of the corresponding position is obtained The value is [-1.8509495,1.0292919,4.163754].
S304、将标签值相似的关键点聚类作为同一个人体连接关键点。S304. Cluster the key points with similar label values as the key points for connecting the same person.
例如,在S304中,可以采用K-means算法中的Lloyd方法匹配相似关键点,当然,也可以采用其他聚类方法,本申请实施例对此不进行具体限定。For example, in S304, the Lloyd method in the K-means algorithm can be used to match similar key points. Of course, other clustering methods can also be used, which is not specifically limited in the embodiment of the present application.
需要说明的是,当采用Lloyd方法匹配相似关键点时,算法中k值对应图片中的人数,在本申请中可以通过S302中计算峰值点最多的检测热图上峰值点的个数确定图片中的人数。Lloyd方法中的“距离”,可以采用欧几里得距离,在实际操作中,还可以根据实际情况来选用其他距离度量的方法,从而获得最佳效果的关键点分组。It should be noted that when the Lloyd method is used to match similar key points, the k value in the algorithm corresponds to the number of people in the picture. In this application, the number of peak points on the detected heat map with the most peak points can be calculated in S302 to determine the number of peak points in the picture. Number of people. The "distance" in the Lloyd method can be Euclidean distance. In actual operation, other distance measurement methods can also be selected according to the actual situation to obtain the best key point grouping.
例如,基于S301中的示例,在S304中使用Lloyd方法对标签值进行聚类,可以得到分组后的标签值如下:For example, based on the example in S301, using the Lloyd method to cluster the label values in S304, the grouped label values can be obtained as follows:
[-1.8509495,-1.8537576,-1.8466513,-1.8384541,-1.8542455,-1.8703872, -1.8843985,-1.8894546,-1.8111589,-1.9049348,-1.8559527,-1.9162827,-1.9004283,-1.9555404,-1.8526844,0,0];[-1.8509495, -1.8537576, -1.8466513, -1.8384541, -1.8542455, -1.8703872, -1.8843985, -1.8894546, -1.8111589, -1.9049348, -1.8559527, -1.9162827, -1.9004283, -1.9555404, -1.8526844, 0, 0] ;
[1.0292919,1.0384212,1.0294132,1.0231285,0,1.0921177,1.0381733,1.1279857,1.057544,0,1.0369155,1.0587022,1.1545397,0,0,0,0];[1.0292919, 1.0384212, 1.0294132, 1.0231285, 0, 1.0921177, 1.0381733, 1.1279857, 1.057544, 0, 1.0369155, 1.0587022, 1.1545397, 0, 0, 0, 0];
[4.163754,4.1641073,4.1671414,0,4.150399,4.120116,4.1086555,4.1338553,4.1566596,4.1225743,4.124661,4.111921,4.1475873,4.185656,4.170965,0,0]。[4.163754, 4.1641073, 4.1671414, 0, 4.150399, 4.120116, 4.1086555, 4.1338553, 4.1566596, 4.1225743, 4.124661, 4.111921, 4.1475873, 4.185656, 4.170965, 0, 0].
如图6所示,示意了聚类后的标签值,可以看出,标签值很好地被分离开来。As shown in Figure 6, the label values after clustering are shown. It can be seen that the label values are well separated.
将分为一组的标签值对应的关键点连线,即可得到图5中右图的输出图像,很好的识别出原图中的人体姿态。Connect the key points corresponding to the label values in a group to obtain the output image on the right in Figure 5, which can well recognize the human posture in the original image.
通过本申请提供的人体姿态估计方法,在多人人体姿态估计中,并行检测减少了模型的计算量提高了模型的速率;为每个关键点分配标签值作为分组的先验知识,不仅提高了聚类分组的准确度,分组的过程可以并行进行,也提高了分组的效率;因此,提高了多人人体姿态估计的效率与准确率。Through the human body pose estimation method provided in this application, in multi-person human body pose estimation, parallel detection reduces the calculation amount of the model and increases the speed of the model; assigning a label value to each key point as the prior knowledge of the grouping not only improves The accuracy of clustering and grouping, the process of grouping can be carried out in parallel, and the efficiency of grouping is also improved; therefore, the efficiency and accuracy of multi-person human pose estimation are improved.
本申请是一个自底向上的方法,相对于现有的自顶向下的方法,在准确率相差不多的情形下,大大提升了模型的运行速率。将串行的关键点检测变成并行,可以减少模型的计算量,使得模型的复杂度大大降低,提高模型的处理速度。This application is a bottom-up method. Compared with the existing top-down method, the operating speed of the model is greatly improved when the accuracy is similar. Turning the serial key point detection into parallel can reduce the calculation amount of the model, greatly reduce the complexity of the model, and increase the processing speed of the model.
如图7所示的场景下,左图的openpose会出现关键点漏检,以及关键点的错误连接,例如将右膝检测成了左膝,并将左臀与右膝进行了连接,而本申请方法(图7中右图)相对来说在此种场景下,也保持了较稳定的性能。In the scenario shown in Figure 7, the openpose on the left will have key points missed and incorrectly connected. For example, the right knee is detected as the left knee and the left hip is connected to the right knee. The application method (the right picture in Figure 7) relatively maintains a relatively stable performance in this scenario.
在MS COCO test-dev数据集上进行了测试,并与openpose这一自底向上方法进行了准确率的对比,对比结果如表1所示。The test was performed on the MS COCO test-dev data set, and the accuracy rate was compared with the bottom-up method of openpose. The comparison results are shown in Table 1.
表1Table 1
 To APAP AP 50 AP 50 AP 75 AP 75 AP M AP M AP L AP L ARAR AR 50 AR 50 AR 75 AR 75 AR M AR M AR L AR L
OpenposeOpenpose 0.6110.611 0.8440.844 0.6670.667 0.5580.558 0.6840.684 0.6650.665 0.8720.872 0.7180.718 0.6020.602 0.7490.749
本申请This application 0.6650.665 0.8490.849 0.7260.726 0.6120.612 0.7440.744 0.7010.701 0.8670.867 0.7550.755 0.6400.640 0.7890.789
此外,与openpose这一实时人体姿态估计算法进行了时间效率对比,通过对openpose和本申请方法采用同一批图片进行测试,得到openpose处理一张图片平均用时为8.492秒,而本申请方法处理一张图片仅需要3.409秒,时间效率有了一定的提升。从上面准确率及时间效率的对比可以看出,本申请方法的综合性能已经有了很大的提升。In addition, compared with openpose, a real-time human pose estimation algorithm, the time efficiency is compared. By testing openpose and the application method using the same batch of pictures, it is obtained that the average time for openpose to process a picture is 8.492 seconds, while the application method processes one image. The picture only takes 3.409 seconds, and the time efficiency has been improved to a certain extent. From the comparison of accuracy and time efficiency above, it can be seen that the comprehensive performance of the method of this application has been greatly improved.
图7、图8、图9示意了本申请方法与openpose的检测结果,图中的左图为openpose的结果,右图为本申请方法的结果。Figures 7, 8, and 9 illustrate the detection results of the method of this application and openpose. The left image in the figure is the result of openpose, and the right image is the result of the method of this application.
上述本申请提供的实施例中,分别从人体姿态估计装置的工作原理的角度对本申请实施例提供的方法进行了介绍。为了实现上述本申请实施例提供的方法中的各功能,人体姿态估计装置可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。In the foregoing embodiments provided in the present application, the methods provided in the embodiments of the present application are introduced from the perspective of the working principle of the human body posture estimation device. In order to implement the functions in the methods provided in the above embodiments of the present application, the human body posture estimation apparatus may include a hardware structure and/or software module, and the above functions are implemented in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above-mentioned functions is executed in a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现 时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。The division of modules in the embodiments of the present application is illustrative, and is only a logical function division. In actual implementation, there may be other division methods. In addition, the functional modules in the various embodiments of the present application may be integrated into one process. In the device, it can also exist alone physically, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
在采用对应各个功能划分各个功能模块的情况下,如图10所示为本申请实施例提供的人体姿态估计装置100,用于实现上述方法中的功能。如图10所示,人体姿态估计装置100可以包括:检测单元1001、获取单元1002及聚类单元1003。检测单元1001用于执行图3中的S301;获取单元1002用于执行图3中的S302、S303;聚类单元1003用于执行图3中的S304。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。In the case of dividing each functional module corresponding to each function, as shown in FIG. 10, the human body posture estimation apparatus 100 provided in this embodiment of the present application is used to implement the functions in the foregoing method. As shown in FIG. 10, the human body posture estimation apparatus 100 may include: a detection unit 1001, an acquisition unit 1002, and a clustering unit 1003. The detection unit 1001 is used to perform S301 in FIG. 3; the acquisition unit 1002 is used to perform S302 and S303 in FIG. 3; the clustering unit 1003 is used to perform S304 in FIG. 3. Among them, all relevant content of each step involved in the above method embodiment can be cited in the function description of the corresponding function module, and will not be repeated here.
在采用集成划分各个功能模块的情况下,如图11所示为本申请实施例提供的人体姿态估计装置110,用于实现上述方法中的功能。人体姿态估计装置110包括至少一个处理模块1101,用于实现本申请实施例提供的方法中的功能。示例性地,处理模块1101可以用于执行图3中的过程S301至S304。具体参见方法示例中的详细描述,此处不做赘述。In the case of adopting integrated division of various functional modules, as shown in FIG. 11, the human body posture estimation device 110 provided in this embodiment of the present application is used to implement the functions in the foregoing method. The human body posture estimation device 110 includes at least one processing module 1101 for implementing the functions in the method provided in the embodiment of the present application. Exemplarily, the processing module 1101 may be used to execute the processes S301 to S304 in FIG. 3. For details, please refer to the detailed description in the method example, which will not be repeated here.
人体姿态估计装置110还可以包括至少一个存储模块1102,用于存储程序指令和/或数据。存储模块1102和处理模块1101耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理模块1101可能和存储模块1102协同操作。处理模块1101可能执行存储模块1102中存储的程序指令。所述至少一个存储模块中的至少一个可以包括于处理模块中。The human body posture estimation device 110 may further include at least one storage module 1102 for storing program instructions and/or data. The storage module 1102 and the processing module 1101 are coupled. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules. The processing module 1101 may cooperate with the storage module 1102 to operate. The processing module 1101 may execute program instructions stored in the storage module 1102. At least one of the at least one storage module may be included in the processing module.
人体姿态估计装置110还可以包括通信模块1103,用于通过传输介质和其它设备进行通信,从而用于确定人体姿态估计装置110可以和其它设备进行通信。所述通信模块1103用于该装置与其它设备进行通信。The human body posture estimation apparatus 110 may further include a communication module 1103 for communicating with other devices through a transmission medium, so as to determine that the human body posture estimation apparatus 110 can communicate with other devices. The communication module 1103 is used for the device to communicate with other devices.
当处理模块1101为处理器,存储模块1102为存储器,通信模块1103为通信接口时,本申请实施例图11所涉及的人体姿态估计装置110可以为图2所示的人体姿态估计装置20。When the processing module 1101 is a processor, the storage module 1102 is a memory, and the communication module 1103 is a communication interface, the human body posture estimation apparatus 110 involved in FIG. 11 in the embodiment of the present application may be the human body posture estimation apparatus 20 shown in FIG. 2.
如前述,本申请实施例提供的人体姿态估计装置100或人体姿态估计装置110可以用于实施上述本申请各实施例实现的方法中的功能,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请各实施例。As mentioned above, the human body posture estimation device 100 or the human body posture estimation device 110 provided by the embodiments of the present application may be used to implement the functions in the methods implemented by the various embodiments of the present application. For ease of description, only the same as those in the embodiments of the present application are shown. For related parts and specific technical details that are not disclosed, please refer to the various embodiments of this application.
作为本实施例的另一种形式,提供一种计算机可读存储介质,其上存储有指令,该指令被执行时执行上述方法实施例中的方法。As another form of this embodiment, a computer-readable storage medium is provided, and an instruction is stored thereon, and the method in the foregoing method embodiment is executed when the instruction is executed.
作为本实施例的另一种形式,提供一种包含指令的计算机程序产品,该指令被执行时执行上述方法实施例中的方法。As another form of this embodiment, a computer program product containing instructions is provided, and when the instructions are executed, the method in the foregoing method embodiment is executed.
本申请实施例再提供一种芯片系统,该芯片系统包括处理器,用于实现本发明实施例的技术方法。在一种可能的设计中,该芯片系统还包括存储器,用于保存本发明实施例通信设备必要的程序指令和/或数据。在一种可能的设计中,该芯片系统还包括存储器,用于处理器调用存储器中存储的应用程序代码。该芯片系统,可以由一个或多个芯片构成,也可以包含芯片和其他分立器件,本申请实施例对此不作具体限定。The embodiment of the present application further provides a chip system. The chip system includes a processor for implementing the technical method in the embodiment of the present invention. In a possible design, the chip system further includes a memory for storing necessary program instructions and/or data of the communication device in the embodiment of the present invention. In a possible design, the chip system further includes a memory for the processor to call application program codes stored in the memory. The chip system may be composed of one or more chips, and may also include chips and other discrete devices, which are not specifically limited in the embodiment of the present application.
结合本申请公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可 以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM、闪存、ROM、可擦除可编程只读存储器(erasable programmable ROM,EPROM)、电可擦可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于核心网接口设备中。当然,处理器和存储介质也可以作为分立组件存在于核心网接口设备中。或者,存储器可以与处理器耦合,例如存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。存储器可以用于存储执行本申请实施例提供的技术方案的应用程序代码,并由处理器来控制执行。处理器用于执行存储器中存储的应用程序代码,从而实现本申请实施例提供的技术方案。The steps of the method or algorithm described in combination with the disclosure of the present application can be implemented in a hardware manner, or in a manner that a processor executes software instructions. Software instructions can be composed of corresponding software modules, which can be stored in RAM, flash memory, ROM, erasable programmable read-only memory (erasable programmable ROM, EPROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EPROM, EEPROM), registers, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium. Of course, the storage medium may also be an integral part of the processor. The processor and the storage medium may be located in the ASIC. In addition, the ASIC may be located in the core network interface device. Of course, the processor and the storage medium may also exist as discrete components in the core network interface device. Alternatively, the memory may be coupled with the processor. For example, the memory may exist independently and be connected to the processor through a bus. The memory can also be integrated with the processor. The memory may be used to store application program codes that execute the technical solutions provided in the embodiments of the present application, and the processor controls the execution. The processor is used to execute the application program code stored in the memory, so as to implement the technical solution provided by the embodiment of the present application.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated as needed. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be It can be combined or integrated into another device, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate parts may or may not be physically separate. The parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of software products, which are stored in a storage medium It includes several instructions to make a device (may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any change or replacement within the technical scope disclosed in this application shall be covered by the protection scope of this application . Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (15)

  1. 一种人体姿态估计方法,其特征在于,包括:A method for estimating a human body pose, characterized in that it comprises:
    将待处理图像输入神经网络,并行检测所述待处理图像中K类关键点,得到每类关键点的检测热图及标签池;其中,一类关键点的检测热图表示所述待处理图像中不同位置出现该类关键点的可能性;一类关键点的标签池包括所述待处理图像中每个该类关键点的标签值,一个关键点的标签值用于指示该关键点所属的人体分组;Input the image to be processed into the neural network and detect the K types of key points in the image to be processed in parallel to obtain the detection heat map and tag pool of each type of key point; among them, the detection heat map of one type of key point represents the image to be processed The possibility of this type of key point appearing in different positions in the middle; the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, and the tag value of a key point is used to indicate to which the key point belongs Human body group
    根据所述每类关键点的检测热图,获取每类关键点的检测热图中的峰值;Obtain the peak value in the detection heat map of each type of key point according to the detection heat map of each type of key point;
    根据所述每类关键点的标签池,获取每个所述峰值对应的关键点的标签值;Obtaining the tag value of the key point corresponding to each peak according to the tag pool of each type of key point;
    将标签值相似的关键点聚类作为同一个人体连接关键点。The clusters of key points with similar label values are regarded as the key points for connecting the same person.
  2. 根据权利要求1所述的方法,其特征在于,所述神经网络配置分组损失函数,所述分组损失函数用于分配关键点的标签值。The method according to claim 1, wherein the neural network is configured with a grouping loss function, and the grouping loss function is used for allocating label values of key points.
  3. 根据权利要求2所述的方法,其特征在于,The method according to claim 2, wherein:
    所述分组损失函数
    Figure PCTCN2019096587-appb-100001
    其中,N为所述待处理图像中人体的个数;w(y)表示y坐标处的关键点的标签值;
    Figure PCTCN2019096587-appb-100002
    表示第n个人所有真实关键点位置的标签值的均值。
    The grouping loss function
    Figure PCTCN2019096587-appb-100001
    Wherein, N is the number of human bodies in the image to be processed; w(y) represents the label value of the key point at the y coordinate;
    Figure PCTCN2019096587-appb-100002
    Represents the mean value of the tag values of all real keypoint positions of the nth person.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述神经网络配置检测损失函数,所述检测损失函数用于计算预测的检测热图与真实关键点热图的均方误差以输出检测热图。The method according to any one of claims 1 to 3, wherein the neural network is configured with a detection loss function, and the detection loss function is used to calculate the mean square error between the predicted detection heat map and the true key point heat map To output the detection heat map.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述标签值根据空间约束关系分配,空间约束关系中同一个人体的关键点分配相似的标签值。The method according to any one of claims 1 to 4, wherein the label value is allocated according to a spatial constraint relationship, and the key points of the same person in the spatial constraint relationship are allocated similar label values.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的可能性,包括:The method according to any one of claims 1 to 5, wherein the detection heat map of a type of key points indicates the possibility of the type of key points in different positions in the image to be processed, and includes:
    一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的高斯分布;The detection heat map of a type of key points represents the Gaussian distribution of this type of key points in different positions in the image to be processed;
    或者,or,
    一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的概率。The detection heat map of a type of key point represents the probability of this type of key point in different positions in the image to be processed.
  7. 一种人体姿态估计装置,其特征在于,包括:A human body posture estimation device is characterized in that it comprises:
    检测单元,用于将待处理图像输入神经网络,并行检测所述待处理图像中K类关键点,得到每类关键点的检测热图及标签池;其中,一类关键点的检测热图表示所述待处理图像中不同位置出现该类关键点的可能性;一类关键点的标签池包括所述待处理图像中每个该类关键点的标签值,一个关键点的标签值用于指示该关键点所属的人体分组;The detection unit is used to input the image to be processed into the neural network, to parallelly detect the K types of key points in the image to be processed, to obtain the detection heat map and label pool of each type of key point; among them, the detection heat map of one type of key point represents The possibility that this type of key point appears at different positions in the image to be processed; the tag pool of a type of key point includes the tag value of each key point of this type in the image to be processed, and the tag value of a key point is used to indicate The human body group to which the key point belongs;
    获取单元,用于根据所述检测单元得到的所述每类关键点的检测热图,获取每类关键点的检测热图中的峰值;An obtaining unit, configured to obtain a peak value in the detection heat map of each type of key point according to the detection heat map of each type of key point obtained by the detection unit;
    所述获取单元还用于,根据所述检测单元得到的所述每类关键点的标签池,获取每个所述峰值对应的关键点的标签值;The obtaining unit is further configured to obtain the tag value of the key point corresponding to each peak according to the tag pool of each type of key point obtained by the detection unit;
    聚类单元,用于将所述获取单元获取的标签值相似的关键点聚类作为同一个人体连接关键点。The clustering unit is used for clustering the key points with similar label values obtained by the obtaining unit as the connecting key points of the same person.
  8. 根据权利要求7所述的装置,其特征在于,所述神经网络配置分组损失函数,所述分组损失函数用于分配关键点的标签值。8. The device according to claim 7, wherein the neural network is configured with a grouping loss function, and the grouping loss function is used to allocate the label value of the key point.
  9. 根据权利要求8所述的装置,其特征在于,The device according to claim 8, wherein:
    所述分组损失函数
    Figure PCTCN2019096587-appb-100003
    其中,N为所述待处理图像中人体的个数;w(y)表示y坐标处的关键点的标签值;
    Figure PCTCN2019096587-appb-100004
    表示第n个人所有真实关键点位置的标签值的均值。
    The grouping loss function
    Figure PCTCN2019096587-appb-100003
    Wherein, N is the number of human bodies in the image to be processed; w(y) represents the label value of the key point at the y coordinate;
    Figure PCTCN2019096587-appb-100004
    Represents the mean value of the tag values of all real keypoint positions of the nth person.
  10. 根据权利要求7-9任一项所述的装置,其特征在于,所述神经网络配置检测损失函数,所述检测损失函数用于计算预测的检测热图与真实关键点热图的均方误差以输出检测热图。The device according to any one of claims 7-9, wherein the neural network is configured with a detection loss function, and the detection loss function is used to calculate the mean square error between the predicted detection heat map and the real key point heat map To output the detection heat map.
  11. 根据权利要求7-10任一项所述的装置,其特征在于,所述标签值根据空间约束关系分配,空间约束关系中同一个人体的关键点分配相似的标签值。The device according to any one of claims 7-10, wherein the tag value is assigned according to a spatial constraint relationship, and the key points of the same person in the spatial constraint relationship are assigned similar label values.
  12. 根据权利要求7-11任一项所述的装置,其特征在于,一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的可能性,包括:The device according to any one of claims 7-11, wherein the detection heat map of a type of key points indicates the possibility of the type of key points in different positions in the image to be processed, and includes:
    一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的高斯分布;The detection heat map of a type of key points represents the Gaussian distribution of this type of key points in different positions in the image to be processed;
    或者,or,
    一类关键点的检测热图表示待处理图像中不同位置出现该类关键点的概率。The detection heat map of a type of key point represents the probability of this type of key point in different positions in the image to be processed.
  13. 一种人体姿态估计装置,包括处理器和存储器,所述存储器和所述处理器耦合,所述处理器用于执行权利要求1至6任一项所述的人体姿态估计方法。A human body posture estimation device, comprising a processor and a memory, the memory is coupled to the processor, and the processor is configured to execute the human body posture estimation method according to any one of claims 1 to 6.
  14. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行权利要求1至6任一项所述的人体姿态估计方法。A computer-readable storage medium, comprising instructions, when running on a computer, causes the computer to execute the human body posture estimation method according to any one of claims 1 to 6.
  15. 一种计算机程序产品,当其在计算机上运行时,使得计算机执行权利要求1至6任一项所述的人体姿态估计方法。A computer program product, when it runs on a computer, causes the computer to execute the human body posture estimation method according to any one of claims 1 to 6.
PCT/CN2019/096587 2019-07-18 2019-07-18 Method and apparatus for estimating pose of human body WO2021007859A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980096776.9A CN113892113A (en) 2019-07-18 2019-07-18 Human body posture estimation method and device
PCT/CN2019/096587 WO2021007859A1 (en) 2019-07-18 2019-07-18 Method and apparatus for estimating pose of human body

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/096587 WO2021007859A1 (en) 2019-07-18 2019-07-18 Method and apparatus for estimating pose of human body

Publications (1)

Publication Number Publication Date
WO2021007859A1 true WO2021007859A1 (en) 2021-01-21

Family

ID=74209647

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/096587 WO2021007859A1 (en) 2019-07-18 2019-07-18 Method and apparatus for estimating pose of human body

Country Status (2)

Country Link
CN (1) CN113892113A (en)
WO (1) WO2021007859A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861776A (en) * 2021-03-05 2021-05-28 罗普特科技集团股份有限公司 Human body posture analysis method and system based on dense key points
CN112884780A (en) * 2021-02-06 2021-06-01 罗普特科技集团股份有限公司 Estimation method and system for human body posture
CN113033300A (en) * 2021-02-07 2021-06-25 广东省科学院智能制造研究所 Escalator safety automatic monitoring method and system based on computer vision
CN113095129A (en) * 2021-03-01 2021-07-09 北京迈格威科技有限公司 Attitude estimation model training method, attitude estimation device and electronic equipment
CN113486772A (en) * 2021-07-01 2021-10-08 中国科学院空天信息创新研究院 Human body posture estimation model training method, estimation method, device and equipment
CN113569627A (en) * 2021-06-11 2021-10-29 北京旷视科技有限公司 Human body posture prediction model training method, human body posture prediction method and device
CN116469175A (en) * 2023-06-20 2023-07-21 青岛黄海学院 Visual interaction method and system for infant education
CN118212650A (en) * 2024-03-07 2024-06-18 中国科学院空间应用工程与技术中心 Biological attitude detection method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549844A (en) * 2018-03-22 2018-09-18 华侨大学 A kind of more people's Attitude estimation methods based on multi-layer fractal network and joint relatives' pattern
CN108960212A (en) * 2018-08-13 2018-12-07 电子科技大学 Based on the detection of human joint points end to end and classification method
CN109614882A (en) * 2018-11-19 2019-04-12 浙江大学 A kind of act of violence detection system and method based on human body attitude estimation
JP2019096113A (en) * 2017-11-24 2019-06-20 Kddi株式会社 Processing device, method and program relating to keypoint data
CN109919245A (en) * 2019-03-18 2019-06-21 北京市商汤科技开发有限公司 Deep learning model training method and device, training equipment and storage medium
CN109948453A (en) * 2019-02-25 2019-06-28 华中科技大学 A kind of more people's Attitude estimation methods based on convolutional neural networks
CN110020611A (en) * 2019-03-17 2019-07-16 浙江大学 A kind of more human action method for catching based on three-dimensional hypothesis space clustering

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019096113A (en) * 2017-11-24 2019-06-20 Kddi株式会社 Processing device, method and program relating to keypoint data
CN108549844A (en) * 2018-03-22 2018-09-18 华侨大学 A kind of more people's Attitude estimation methods based on multi-layer fractal network and joint relatives' pattern
CN108960212A (en) * 2018-08-13 2018-12-07 电子科技大学 Based on the detection of human joint points end to end and classification method
CN109614882A (en) * 2018-11-19 2019-04-12 浙江大学 A kind of act of violence detection system and method based on human body attitude estimation
CN109948453A (en) * 2019-02-25 2019-06-28 华中科技大学 A kind of more people's Attitude estimation methods based on convolutional neural networks
CN110020611A (en) * 2019-03-17 2019-07-16 浙江大学 A kind of more human action method for catching based on three-dimensional hypothesis space clustering
CN109919245A (en) * 2019-03-18 2019-06-21 北京市商汤科技开发有限公司 Deep learning model training method and device, training equipment and storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884780A (en) * 2021-02-06 2021-06-01 罗普特科技集团股份有限公司 Estimation method and system for human body posture
CN113033300A (en) * 2021-02-07 2021-06-25 广东省科学院智能制造研究所 Escalator safety automatic monitoring method and system based on computer vision
CN113095129A (en) * 2021-03-01 2021-07-09 北京迈格威科技有限公司 Attitude estimation model training method, attitude estimation device and electronic equipment
CN113095129B (en) * 2021-03-01 2024-04-26 北京迈格威科技有限公司 Gesture estimation model training method, gesture estimation device and electronic equipment
CN112861776A (en) * 2021-03-05 2021-05-28 罗普特科技集团股份有限公司 Human body posture analysis method and system based on dense key points
CN113569627A (en) * 2021-06-11 2021-10-29 北京旷视科技有限公司 Human body posture prediction model training method, human body posture prediction method and device
CN113486772A (en) * 2021-07-01 2021-10-08 中国科学院空天信息创新研究院 Human body posture estimation model training method, estimation method, device and equipment
CN113486772B (en) * 2021-07-01 2024-05-21 中国科学院空天信息创新研究院 Human body posture estimation model training method, estimation method, device and equipment
CN116469175A (en) * 2023-06-20 2023-07-21 青岛黄海学院 Visual interaction method and system for infant education
CN116469175B (en) * 2023-06-20 2023-08-29 青岛黄海学院 Visual interaction method and system for infant education
CN118212650A (en) * 2024-03-07 2024-06-18 中国科学院空间应用工程与技术中心 Biological attitude detection method and system

Also Published As

Publication number Publication date
CN113892113A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
WO2021007859A1 (en) Method and apparatus for estimating pose of human body
WO2021190127A1 (en) Data processing method and data processing device
US11023206B2 (en) Dot product calculators and methods of operating the same
WO2022116423A1 (en) Object posture estimation method and apparatus, and electronic device and computer storage medium
JP7280393B2 (en) Visual positioning method, related model training method and related device and equipment
US11113571B2 (en) Target object position prediction and motion tracking
US20200184245A1 (en) Improper neural network input detection and handling
CN109348731A (en) A kind of method and device of images match
CN111967467A (en) Image target detection method and device, electronic equipment and computer readable medium
US20200218567A1 (en) Master device for managing distributed processing of task, task processing device for processing task, and method therefor
CN112799598B (en) Data processing method, processor and electronic equipment
US11941528B2 (en) Neural network training in a distributed system
WO2020233709A1 (en) Model compression method, and device
WO2019128504A1 (en) Method and apparatus for image processing in billiards game, and terminal device
Zicari et al. Low-cost FPGA stereo vision system for real time disparity maps calculation
WO2021167586A1 (en) Systems and methods for object detection including pose and size estimation
CN110910375A (en) Detection model training method, device, equipment and medium based on semi-supervised learning
JP2023511242A (en) METHOD, APPARATUS, DEVICE AND RECORDING MEDIUM FOR RELATED OBJECT DETECTION IN IMAGE
CN110009625B (en) Image processing system, method, terminal and medium based on deep learning
CN112734827A (en) Target detection method and device, electronic equipment and storage medium
CN110207702B (en) Target positioning method and device
WO2021051562A1 (en) Facial feature point positioning method and apparatus, computing device, and storage medium
EP3410389A1 (en) Image processing method and device
CN113822975B (en) Techniques for efficient sampling of images
CN110347511B (en) Geographic distributed process mapping method and device containing privacy constraint conditions and terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19937535

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19937535

Country of ref document: EP

Kind code of ref document: A1