WO2022001123A1 - 关键点检测方法、装置、电子设备及存储介质 - Google Patents

关键点检测方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022001123A1
WO2022001123A1 PCT/CN2021/076467 CN2021076467W WO2022001123A1 WO 2022001123 A1 WO2022001123 A1 WO 2022001123A1 CN 2021076467 W CN2021076467 W CN 2021076467W WO 2022001123 A1 WO2022001123 A1 WO 2022001123A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
key
graph model
key points
information
Prior art date
Application number
PCT/CN2021/076467
Other languages
English (en)
French (fr)
Inventor
金晟
刘文韬
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021565761A priority Critical patent/JP7182021B2/ja
Publication of WO2022001123A1 publication Critical patent/WO2022001123A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to a key point detection method, device, electronic device and storage medium.
  • a graph segmentation algorithm or a heuristic clustering algorithm can be used to cluster each key point.
  • the clustering process is only used as a post-processing operation, and does not directly supervise the clustering results, so that the key points are clustered. The accuracy of the process is low.
  • the present disclosure provides at least a key point detection method, apparatus, electronic device, and storage medium.
  • the present disclosure provides a key point detection method, comprising: acquiring an image to be detected; generating an image feature map and a plurality of key point heatmaps based on the to-be-detected image; the image feature map is used to characterize all key points.
  • an initial key point graph model is generated; the initial key point graph model contains the information of different types of key points in the image to be detected and the connecting edges information, each connecting edge is an edge between two key points of different categories; the initial key point graph model is subjected to pruning processing of the connecting edges for multiple times, until the processed key point graph model in the Multiple keypoints are clustered into multiple clusters, and keypoint information belonging to each target object is obtained.
  • an initial keypoint graph model corresponding to the image to be detected can be generated based on the generated image feature map and multiple keypoint heatmaps. Since the initial keypoint graph model includes the information in the image feature map and the keypoint heatmap , and the image feature map can represent the relative positional relationship between different target objects in the image to be detected, so that the initial key point graph model can be pruned at the connected edges, and the key point information of each target object can be obtained. Accurately distinguish the key points of different target objects to improve the accuracy of key point clustering.
  • the information of the key points includes position information, category information, and pixel feature information; the information of each key point in the initial key point graph model is determined according to the following steps: Figure, determine the position information of each key point; based on the position information of each key point, extract the pixel feature information of the key point from the image feature map, and based on the key point heat map to which the key point belongs Category label, to determine the category information corresponding to the key point.
  • the method further includes: for each of the key points in the initial key point graph model, based on the information of the key points and the relationship between the key points in the key point graph model and the key points There is information about other key points connecting the edges between the points, and the fusion feature of the key points is determined; the pruning of the connecting edges is performed on the initial key point graph model multiple times, including: based on the initial key point graph model. For the fusion feature of each key point included in the key point graph model, the initial key point graph model is subjected to multiple times of pruning of the connecting edges.
  • the initial keypoint graph model is subjected to multiple times of pruning processing of the connecting edges, until multiple keypoints in the processed keypoint graph model are clustered into multiple cluster, including: performing a first processing process for the current key point graph model: based on the fusion features of two key points corresponding to each connecting edge in the current key point graph model, the current key point graph model includes Perform key point clustering of the same target object on adjacent key points among the multiple key points to obtain at least one macro node; wherein, the macro node includes a plurality of adjacent key points after clustering; and based on each of the The fusion features of the key points included in the macro nodes are determined, and the fusion features of the macro nodes are determined; based on the obtained at least one of the macro nodes and the current key point graph model, the current key point graph model is subjected to the current time.
  • the pruning process of the connected edges is performed, and the key point graph model after the current pruning process is obtained; after the current first processing process is performed, the key point graph model after the current pruning process is used as the current A key point graph model, taking the currently determined macro node and the fusion feature of the macro node as the key point in the current key graph model and the fusion feature of the key point, and executing the first processing process again, Until multiple keypoints in the processed keypoint graph model are clustered into multiple clusters.
  • the relative points among the multiple key points included in the current key point graph model are compared.
  • Performing key point clustering of the same target object on adjacent key points to obtain at least one macro node including: determining the weight of the connecting edge based on the fusion feature of the two key points corresponding to the connecting edge, the weight representing the The probability that the two key points corresponding to the connecting edge belong to the same target object; based on the weight of each connecting edge included in the current key point graph model, the adjacent key points among the multiple key points included in the current key point graph model
  • the key point is to perform key point clustering of the same target object to obtain at least one macro node.
  • the weight represents the probability that the two key points corresponding to the connecting edge belong to the same target object, and then based on the weight of each connecting edge, the current key point graph model includes multiple key points.
  • the adjacent key points in the point cluster the key points of the same target object to obtain at least one macro node. For example, two key points with larger corresponding weights are clustered together to obtain a macro node, which makes the determination of the macro node more accurate. .
  • the current key point graph model is subjected to the current pruning process of the connecting edge, and the result is obtained.
  • the key point graph model after the current pruning process includes: based on the obtained at least one of the macro nodes and the current key point graph model, determining the connection edges to be deleted, and extracting the connection edges from the current key point graph model The connection edges to be deleted are deleted; at least one of the macro nodes and other key points in the current key point graph model except the key points included in the macro nodes are used as the key points after the pruning process , the remaining connection edges after deletion are used as the connection edges after pruning, and the key point graph model after the current pruning process is obtained.
  • determining the connection edge to be deleted based on the obtained at least one macro node and the current key point graph model includes: based on the category information of each key point included in the at least one macro node. , and the category information of other key points in the current key point graph model except the key point included in at least one macro node, to determine the connection edge to be deleted.
  • Category information determining the connection edge to be deleted, including: for any connection edge in the current key point graph model, the two key points corresponding to the any connection edge are key points in different macro nodes, And when there are key points with the same category information in the two macro nodes corresponding to any connecting edge, determine that any connecting edge is the connecting edge to be deleted; in the two key points corresponding to any connecting edge In the case that the point is a key point in the same macro node, determine that any connecting edge is the connecting edge to be deleted; one of the two key points corresponding to any connecting edge is the key point in the macro node. point, another key point is not a key point in the macro node, and the macro node corresponding to any connecting edge has a key point with the same category information as the other key point, determine the any connecting edge is the connecting edge to be
  • the key point information of each target object is generated by a pre-trained target neural network; wherein, the target neural network is obtained by training a neural network to be trained including a macro node discriminator. , the macro node discriminator is used to discriminate whether multiple key points included in each macro node belong to the same target object.
  • the neural network to be trained is trained by the following steps to obtain a pre-trained target neural network: obtaining a sample image; The neural network to be trained is trained to obtain a pre-trained target neural network.
  • training the neural network to be trained including the macro node discriminator to obtain a pre-trained target neural network including: The neural network to be trained is trained, and the prediction result is obtained, and the prediction result includes the detection result of the macro node discriminator, the prediction category of each key point, and the prediction position information of each key point; based on the macro node discrimination determine the first loss value based on the detection result of the device; and determine the second loss value based on the predicted category of each key point, the predicted position information of each key point, and the labeling result carried in the sample image ; wherein, the labeling result includes the labeling category that each key point belongs to the corresponding target object, and the labeling position information of each key point; based on the first loss value and the second loss value, the to-be-trained The neural network is trained to obtain a pre-trained target neural network.
  • the method further includes: determining the behavior type of the target object based on the key point information corresponding to each target object. .
  • the method further includes: determining at least one target object based on the key point information corresponding to each target object. position information of the target part, and generate special effect information for the at least one target part according to the position information of the at least one target part.
  • the present disclosure provides a key point detection device, comprising: an acquisition module configured to acquire an image to be detected; a first generation module configured to generate an image feature map and a plurality of key points based on the image to be detected heat map; the image feature map is used to represent the relative positional relationship between the key points of each target object in the to-be-detected image; each of the key-point heat maps contains a category of the to-be-detected image key points, key points of different categories correspond to different parts of the target object; the second generation module is configured to generate an initial key point graph model based on the image feature map and a plurality of the key point heat maps; the initial key point graph model;
  • the key point graph model includes information of key points of different categories in the image to be detected and information of connecting edges, and each connecting edge is an edge between two key points of different categories;
  • the processing module is configured to The initial key point graph model performs multiple times of pruning the connecting edges until the key points in the processed key point graph model are clustered into multiple clusters,
  • the present disclosure provides an electronic device, including: a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor It communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the steps of the method for detecting key points according to the first aspect or any one of the implementation manners are executed.
  • the present disclosure provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to execute the keys described in the first aspect or any one of the embodiments above. Steps of the point detection method.
  • the present disclosure provides a computer program product, comprising computer-readable code, when the computer-readable code is executed in an electronic device, a processor in the electronic device executes the code for implementing one or more of the above
  • the server executes the above method.
  • FIG. 1 is a schematic flowchart of a key point detection method provided by an embodiment of the present disclosure
  • 2A is a schematic flowchart of pruning processing in a key point detection method provided by an embodiment of the present disclosure
  • FIG. 2B is a schematic diagram of a network structure for implementing a key point detection method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a key point detection apparatus provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device 400 according to an embodiment of the present disclosure.
  • the bottom-up method first predicts the position of each key point, and then clusters each key point to obtain a complete human pose.
  • a graph segmentation algorithm or a heuristic clustering algorithm can be used to cluster each key point.
  • the clustering process is only used as a post-processing operation, and does not directly supervise the clustering results, so that the key points are clustered. The accuracy of the process is low.
  • an embodiment of the present disclosure provides a key point detection method.
  • the method includes S101-S104, wherein:
  • S102 based on the image to be detected, generate an image feature map and a plurality of key point heat maps; the image feature map is used to represent the relative positional relationship between each target object in the to-be-detected image; each key-point heat map contains the to-be-detected image A category of key points, and different categories of key points correspond to different parts of the target object.
  • the initial key point graph model includes information of key points of different categories in the image to be detected and information of connecting edges, and each connecting edge is Edges between two keypoints of different classes.
  • S104 perform multiple pruning processing of connecting edges on the initial key point graph model, until multiple key points in the processed key point graph model are clustered into multiple clusters, and obtain key point information belonging to each target object respectively .
  • an initial key point graph model corresponding to the image to be detected can be generated based on the generated image feature map and a plurality of key point heat maps, since the initial key point graph model includes the information in the image feature map and the key point heat map.
  • the image feature map can represent the relative positional relationship between different target objects in the image to be detected, so that the initial key point graph model can be pruned at the connected edges, and the key point information of each target object can be obtained. Accurately distinguish the key points of different target objects to improve the accuracy of key point clustering.
  • the image to be detected may be any image including multiple target objects.
  • the target object can be a person, that is, the key points of a plurality of human bodies included in the object to be detected are detected.
  • the acquired image to be detected can be input into the trained keypoint detection neural network to generate an image feature map and multiple keypoint heatmaps; and based on the image feature map, multiple keypoint heatmaps, and The trained keypoint detection neural network determines each keypoint of each target object.
  • each keypoint heatmap contains keypoints of one category of the image to be detected, and keypoints of different categories correspond to different parts of the target object.
  • the categories of key points can be head, neck, hand, etc., and then the key point heat map can be an image containing head key points, or the key point heat map can be an image containing neck key points, etc.;
  • the categories of the key points may be the set first category, the second category, etc., wherein the first category of key points may be the key points on the thumb, the second category of key points may be the key points on the index finger, etc., and then the key points
  • the point heatmap may be an image containing keypoints of the first category, or the keypoint heatmap may be an image containing keypoints of the second category, or the like.
  • the categories of key points and the number of categories can be set according to actual needs.
  • the number of key points corresponding to each target object may be set according to actual needs, for example, the number of key points corresponding to each target object may be 17, 105, and so on.
  • the number of keypoint heatmaps is consistent with the set number of keypoint categories. For example, when the set number of keypoint categories is 17, the number of keypoint heatmaps generated based on the image to be detected is also 17 Piece. Among them, the number of keypoints for each category can be one.
  • the number of image feature maps can be one or more.
  • the image feature map may represent the relative positional relationship between parts of each target object in the image to be detected and corresponding to the key points of various categories.
  • the number of image feature maps and the number of key point heat maps can be the same, that is, each image feature map can represent a type of key points of each target object in the image to be detected The relative positional relationship between the corresponding parts.
  • the size of the image feature map is consistent with the size of the keypoint heatmap.
  • image feature maps and multiple keypoint heatmaps can be obtained by setting different loss functions in the keypoint detection neural network.
  • the information of each key point can be extracted from multiple key point heatmaps and image feature maps, and each key point containing information is used as a node, and the edges between key points of different categories are used as connections. edges, which constitute the initial keypoint graph model.
  • the information of the connection edge may be information corresponding to the connection relationship between two key points.
  • the information of the connection edge 1 may be: the key point P1 and the key point P2 corresponding to the connection edge 1 have a connection relationship.
  • the information of key points includes location information, category information, and pixel feature information.
  • the information of each key point in the initial key point graph model can be determined according to the following steps: determining the position information of each key point based on the key point heat map; extracting the key point from the image feature map based on the position information of each key point The pixel feature information of the key point, and the category information corresponding to the key point is determined based on the category label of the key point heat map to which the key point belongs.
  • the position information of each key point may be determined based on the pixel value of each pixel point in the key point heat map.
  • a pixel point with a maximum pixel value may be selected as a key point, and the position information of the selected pixel point may be determined as the position information of the key point.
  • the pixel value of a certain pixel in the key point heat map is greater than the pixel value of the surrounding pixels, the pixel value of the pixel is considered to be a maximum value, and the pixel is a key point.
  • the pixel value of the pixel corresponding to the position information can be extracted from the image feature map, and the extracted pixel value is determined as the pixel feature information of the key point.
  • the category information corresponding to the key point can also be determined according to the category label of the key point heat map to which each key point belongs. For example, in the case where the category label of the keypoint heatmap G1 is head, the category information of each keypoint included in the keypoint heatmap G1 is the head keypoint; in the keypoint heatmap G2, the category label is neck In the case of , the category information of each key point included in the key point heatmap G2 is the neck key point.
  • the method before performing multiple pruning processing on the initial key points, may further include: for each key point in the initial key point graph model, the information based on the key point and the key point There is information about other key points in the graph model that have connected edges between the key points, and the fusion features of the key points are determined. Further, performing pruning processing of multiple connection edges on the initial key point graph model may include: based on the fusion feature of each key point included in the initial key point graph model, performing multiple connection edges on the initial key point graph model. Pruning treatment.
  • a corresponding fusion feature can be generated for each key point in the initial key point graph model, and then based on the fusion feature of each key point, the initial key point graph model is pruned multiple times to connect edges.
  • a Graph Neural Network can be used to determine the fusion feature of each key point in the initial key point graph model, and based on the fusion feature of each key point included in the initial key point graph model , the initial keypoint graph model is pruned multiple times to connect the edges.
  • GNN Graph Neural Network
  • the fusion feature corresponding to the key point is generated.
  • the fusion feature can not only characterize the characteristics of the key point, but also the relationship between the key point and other key points, so that based on the fusion features corresponding to each key point, the initial key point graph model can be more accurate.
  • the pruning process of the secondary connection edge can more accurately determine the key point information corresponding to each target object.
  • the initial keypoint graph model is subjected to multiple pruning processing of connecting edges until multiple keypoints in the processed keypoint graph model are clustered into multiple clusters, including:
  • Step 1 Based on the fusion feature of two key points corresponding to each connecting edge in the current key point graph model, analyze the multiple key points included in the current key point graph model. Perform key point clustering of the same target object on adjacent key points to obtain at least one macro node; wherein, the macro node includes a plurality of adjacent key points after clustering; and based on the fusion of the key points included in each macro node feature, to determine the fusion feature of the macro node; step 2, based on the obtained at least one macro node and the current key point graph model, perform the pruning process of the current connection edge on the current key point graph model, and obtain the current time pruning process.
  • the key point graph model of The fusion feature of is used as the key point in the current key point graph model and the fusion feature of the key point, and the first processing process is performed again until the multiple key points in the processed key point graph model are clustered into multiple clusters.
  • the initial key point graph model can be used as the current key point graph model, and the first processing process is performed for the first time to obtain the key point graph model after pruning; and the key point graph model after the first pruning process is As the current key point graph model, take each macro node and the fusion feature corresponding to each macro node obtained after the first pruning process as the key point in the current key point graph model and the fusion feature of the key point, and execute the first step.
  • the first processing process is performed twice, until the key points in the processed key point graph model are clustered into multiple clusters, and the number of clusters obtained by clustering is the same as the number of target objects included in the image to be detected.
  • Each cluster includes all key points corresponding to a target object, that is, each key point of each target object in the image to be detected is obtained.
  • each first processing process adjacent key points are clustered once to obtain at least one macro node, and the multiple key points included in each macro node are the key points of the same target object.
  • step 1 in some embodiments of the present disclosure, based on the fusion feature of two key points corresponding to each connecting edge in the current key point graph model, the current key point graph model
  • the adjacent key points of the multiple key points included in the point graph model perform key point clustering of the same target object to obtain at least one macro node, including: A1, based on the fusion feature of the two key points corresponding to the connecting edge, determine the connecting edge
  • the weight represents the probability that the two key points corresponding to the connecting edge belong to the same target object.
  • A2 based on the weight of each connecting edge included in the current keypoint graph model, perform keypoint clustering of the same target object on adjacent keypoints among multiple keypoints included in the current keypoint graph model to obtain at least one macro node .
  • the weight corresponding to each connecting edge can be determined, and the weight represents the probability that the two key points on both sides of the connecting edge belong to the same target object.
  • the weight of each connecting edge may be determined according to the fusion feature of the two key points corresponding to each connecting edge through the trained edge discriminator.
  • keypoint clustering may be performed on adjacent keypoints among multiple keypoints included in the current keypoint graph model based on the weight of each connecting edge included in the current keypoint graph model, At least one macro node is obtained, wherein the multiple key points included in each macro node are key points belonging to the same target object. For example, two key points corresponding to a connecting edge with a larger weight can be clustered together to obtain a macro node.
  • Each macro node includes two key points in the current key point graph model, so that the weight sum of the connection edges included in at least one macro node obtained after clustering is larger. For example, in the case where two macro nodes are obtained after this key point clustering is performed on the current key point graph model, the sum of the connection edge weights included in the two macro nodes can be obtained after clustering.
  • the fusion feature of each macro node can be determined. That is, the fusion feature of each key point included in the macro node can be fused to obtain the fusion feature corresponding to the macro node. In the implementation process, the fusion feature of each key point included in the macro node may be pooled to obtain the fusion feature of the macro node.
  • the weight represents the probability that the two key points corresponding to the connecting edge belong to the same target object, and then based on the weight of each connecting edge, the current key point graph model includes multiple key points.
  • the adjacent key points in the point cluster the key points of the same target object to obtain at least one macro node. For example, two key points with larger corresponding weights are clustered together to obtain a macro node, which makes the determination of the macro node more accurate. .
  • step 2 based on the obtained at least one macro node and the current key point graph model, the current key point graph model is subjected to the pruning process of the current connection edge, and the key point graph model after the current pruning process is obtained, including : B1, based on the obtained at least one macro node and the current key point graph model, determine the connection edge to be deleted, and delete the connection edge to be deleted from the current key point graph model. B2, take at least one macro node and other key points in the current key point graph model except the key points included in the macro node as the key points after pruning processing, and use the remaining connection edges after deletion as the pruning processing The connected edges of , get the key point graph model after the current pruning process.
  • step B1 first, according to the obtained at least one macro node and the current key point graph model, determine the connection edge to be deleted in the current pruning process, and change the connection edge to be deleted from the current key point graph model deleted in.
  • step B1 based on the obtained at least one macro node and the current key point graph model, determining the connection edge to be deleted includes: based on the at least one macro node included in each key point The category information and the category information of other key points in the current key point graph model except the key point included in at least one macro node determine the connection edge to be deleted.
  • each target object can include only one key point of the same category, for example, each target object can only include one key point of the head category, one key point of the neck category, and one key point of the left foot category.
  • key point therefore, it can be determined based on the category information of each key point included in the at least one macro node and the category information of other key points in the current key point graph model except the key point included in the at least one macro node. Connecting edges to be deleted.
  • based on the category information of each key point included in the at least one macro node and the categories of other key points other than the key points included in the at least one macro node in the current key point graph model information to determine the connecting edges to be deleted including:
  • Condition 1 The two key points corresponding to any connecting edge are key points in different macro nodes, and the two macro nodes corresponding to any connecting edge When there are key points with the same category information, it is determined that any connecting edge is the connecting edge to be deleted.
  • Condition 2 In the case that the two key points corresponding to any connecting edge are key points in the same macro node, determine that any connecting edge is a connecting edge to be deleted.
  • one key point is a key point in the macro node
  • the other key point is not a key point in the macro node
  • the macro node corresponding to any connecting edge is If there is a key point with the same category information as another key point, it is determined that any connecting edge is a connecting edge to be deleted.
  • any connecting edge in the current key point graph model if the connecting edge satisfies any one of the first, second, and third conditions, it is considered that the connecting edge is the connecting edge to be deleted, Otherwise, the connecting edge does not belong to the connecting edge to be deleted.
  • condition 1 when the two key points corresponding to the connecting edge are key points in different macro nodes, it can be judged whether there are key points with the same category information in the two macro nodes corresponding to the connecting edge. In the case of key points with the same information, the connecting edge is the connecting edge to be deleted; if there is no key point with the same category information, the connecting edge does not belong to the connecting edge to be deleted. In the second condition, when the two key points corresponding to the connecting edge are key points in the same macro node, the connecting edge is the connecting edge to be deleted.
  • one of the key points corresponding to the connecting edge is the key point included in the macro node, and the other key point is not the key point in the macro node, that is, the other key point is the key point in the current key point graph model.
  • the key point information of each target object includes only one key point of the same category, that is, the key point information of each target object includes a head key point, a neck key point, a left key point, and a left key point. Therefore, based on the category of the two key points corresponding to the connecting edge, the connecting edge to be deleted can be determined, and the pruned key point graph model can be generated, and then the next first processing process can be performed until Multiple keypoints in the processed keypoint graph model are clustered into multiple clusters.
  • At least one macro node and other key points in the current key point graph model except the key points included in the macro node may be used as the pruned key points , the remaining connection edges after deletion are used as the connection edges after part-time processing, and the key point graph model after the current pruning process is obtained. That is, in the key point graph model after the current pruning process, when the key point is a macro node, the fusion feature of the key point is the fusion feature corresponding to the macro node.
  • step 3 after the current first processing process is performed, the key point graph model after the current pruning process can be used as the current key point graph model corresponding to the next pruning process, and the currently determined key point graph model
  • the macro node and the fusion feature of the macro node are used as the key points in the current key graph model and the fusion feature of the key points, and the first processing process is performed again until the multiple key points in the processed key point graph model are clustered as Multiple clusters, that is, until there is no connecting edge in the processed keypoint graph model, each keypoint of each target object included in the image to be detected is obtained.
  • FIG. 2A the figure shows a schematic diagram of the pruning process in a key point detection method; an image feature map 22 (ie, Feature maps) and multiple key point heat maps can be generated based on the image to be detected. 21 (ie Heatmaps), and then based on the image feature map 22 and multiple key point heat maps 21, an initial key point map model 23 is generated, wherein the circles in the initial key point map model 23 are identified as key points, and dotted lines are different categories The connecting edges between the keypoints.
  • an image feature map 22 ie, Feature maps
  • 21 ie Heatmaps
  • a corresponding fusion feature can be generated for each key point, and the weight of each connecting edge can be determined by the trained edge discriminator 24, and based on the weight of each connecting edge included in the initial key point graph model 23, the initial The adjacent key points among the multiple key points included in the key point graph model perform key point clustering of the same target object to obtain at least one macro node 25 .
  • the connection edge to be deleted determines the connection edge to be deleted, and delete the connection edge to be deleted from the initial key point graph model, and then at least one macro node, and other key points in the initial key point graph model except the key points included in the macro node are used as the key points after pruning processing, and the remaining connection edges after deletion are used as the connection edges after pruning processing, and the current pruning process is obtained.
  • the key point graph model 26 after the branch processing, the key point graph model 26 after the current pruning process is used as the current key point graph model and the first processing process is performed again, until multiple key points in the processed key point graph model are clustered into multiple clusters, that is, the processing result obtained in the last step in Figure 2A.
  • the macro node discriminator 27 can also be used to discriminate each macro node generated, that is to judge whether the key points included in each macro node belong to the same target object, and The neural network to be trained is trained based on the detection result of the macro node discriminator 27 to obtain a trained target neural network.
  • the key point information of each target object is generated by a pre-trained target neural network; wherein, the target neural network is obtained by training a neural network to be trained including a macro node discriminator, and the macro node
  • the discriminator is used to discriminate whether multiple key points included in each macro node belong to the same target object.
  • the image to be detected can be input into the pre-trained target neural network, and the key point information of each target object included in the image to be detected can be obtained.
  • the category of each key point corresponding to each target object and the number of key points can be set according to actual needs.
  • the pre-trained target neural network may not include a macro node discriminator. That is, the macro node discriminator can judge whether the obtained multiple key points in each macro node belong to the same target object during the training process of the neural network to be trained.
  • the target neural network is obtained by training the neural network to be trained including the macro node discriminator, wherein the macro node discriminator is used to determine whether the multiple key points included in each macro node belong to the same target object , which can make the accuracy of the target neural network obtained by training higher.
  • the neural network to be trained is trained through the following steps to obtain a pre-trained target neural network:
  • a sample image is obtained; and based on the sample image, the neural network to be trained including the macro node discriminator is trained to obtain a pre-trained target neural network.
  • the neural network to be trained including the macro node discriminator is trained to obtain a pre-trained target neural network, which may include: 1. Based on the sample image, the neural network to be trained is trained to obtain the prediction result and the prediction result It includes the detection result of the macro node discriminator, the predicted category of each keypoint, and the predicted location information of each keypoint. 2. Determine the first loss value based on the detection result of the macro node discriminator; and determine the second loss value based on the predicted category of each key point, the predicted position information of each key point, and the annotation results carried in the sample image ; wherein, the labeling result includes the labeling category that each key point belongs to the corresponding target object, and the labeling position information of each key point. 3.
  • the to-be-trained neural network is trained to obtain a pre-trained target neural network.
  • the sample image carries the labeling result
  • the labeling result includes the labeling category to which each key point belongs to the corresponding target object, and the labeling position information of each key point.
  • the neural network to be trained can be trained based on the prediction result and the labeling result, and the trained target neural network can be obtained.
  • the first loss value can be determined based on the detection result of the macro node discriminator, and the second loss value can be determined based on the predicted category of each key point, the predicted position information of each key point, and the labeling result carried in the sample image; Through the sum of the first loss value and the second loss value, the neural network to be trained is trained to obtain the target neural network.
  • the method further includes: determining the behavior type of the target object based on the key point information corresponding to each target object.
  • the information of each key point of each target object can be input into the behavior detection neural network to determine the behavior type of the target object, for example, the behavior type can be For running, walking, raising arms, etc.
  • the method further includes: based on the key point information corresponding to each target object, determining a position information, and generate special effect information for at least one target part according to the position information of at least one target part.
  • the position information of at least one target part of the target object can be determined, and based on the preset special effect information corresponding to the target part, the corresponding special effect information is generated at the position of the target part .
  • the target part may be an arm, a head, a hand, or the like.
  • the arm position of the target object can be determined according to the information of each key point of the target object, and based on the preset special effect information of the arm, the special effect information corresponding to the arm is generated at the arm position of the target object.
  • the bottom-up method is generally divided into two steps.
  • the first step is to predict the Gaussian response map of the key points and obtain the position of each key point.
  • each key point is clustered to obtain the complete human pose.
  • the clustering step adopts a graph segmentation algorithm or a heuristic clustering algorithm. Clustering is only used as a post-processing operation and does not directly supervise the clustering results.
  • the clustering step generally adopts a graph segmentation algorithm or a heuristic clustering algorithm, which is only used as a post-processing operation, and does not directly supervise the clustering results; 2) Common The graph clustering algorithm cannot make full use of the prior information of the hierarchical structure of the human body.
  • the hierarchical information is: a person can be decomposed into the upper body and the lower body; the upper body can be decomposed into the head, shoulders, arms; 3) Ordinary graph clustering algorithm only uses local information and ignores global human body information.
  • an embodiment of the present disclosure provides a key point detection method.
  • the detection and clustering of key points are combined to perform end-to-end Ground training.
  • the clustering results are supervised, and the clustering loss can be directly transmitted back to the underlying feature extraction network for overall network optimization. In this way, the network pays more attention to the key points of wrong clustering results, and can perform feature learning more effectively.
  • the hierarchical graph clustering algorithm iteratively clusters the key points of the target object step by step, forming a hierarchical structure from the key points-limbs-the entire target object, which can supervise the clustering structure at each level , which can better preserve the prior information of the target object's hierarchy.
  • the macro-node discriminator Micro-node Discriminator
  • the internal features of the entire macro-node can be discriminated, and the global feature information can be better preserved.
  • 2A is a schematic flowchart of pruning processing in a key point detection method provided by an embodiment of the present disclosure, by judging whether two key points belong to the same target object, the key points of the same target object are gathered together.
  • the initial key point graph model G is divided into two parts: the key point V and the edge E, where the key point V is the information of each key point, that is, it includes "the type of the key point T, the coordinate X of the key point, and the feature F of the key point".
  • the edge E represents the relationship between key points, that is, whether they belong to the same target object.
  • Clustering Use the similarity matrix between key points to perform a clustering algorithm to gather adjacent key points together to become a new macro node (the key points obtained after clustering become macro nodes. ). Train a macro-node discriminator (Macro-Node Discriminator) to determine whether the key points inside a macro node belong to the same target object.
  • a macro-node discriminator Micro-Node Discriminator
  • Feature Aggregation Update features for each macro node. The whole clustering process is iteratively performed until all edges in the keypoint graph model are deleted, or all keypoints are successfully clustered into several clusters.
  • An embodiment of the present disclosure provides a method for detecting key points.
  • the input is: an RGB image of a multi-target object (the number of target objects is assumed to be P);
  • Step S1 extract key point information in the image to be detected; Step S2, construct a heat map of key points of multi-target objects; Step S3, perform feature learning based on the correlation of GNN; Step S4, iterate several times until the key point is reached There are no edges to be pruned in the graph model.
  • the step S4 is iterated several times until there are no edges to be pruned in the key point graph model, including: step S41 , using a pooling layer (avg-pooling) to perform key point feature fusion; step S42, update the similarity matrix between key points; step S43, cluster the key points; here, the clustering can be used to realize the merging of key points;
  • the point graph model is pruned. According to the structural constraints of the target object, the unreasonable edges in the current keypoint graph model are deleted. For example, a target object has only one head vertex.
  • FIG. 2B is a schematic diagram of a network structure for implementing a key point detection method provided by an embodiment of the present disclosure.
  • the network structure includes: a GNN module 21, an edge discriminator 22 (Edge Discriminator), and a macro node discriminator. 23 (Macro-node Discriminator), wherein: GNN module 21, which is composed of edge convolution EdgeConv layer and multi-layer neural network (Multi-Layer Perceptron, MLP) stacked.
  • the EdgeConv layer is a differentiable neural network module that can be embedded in the existing network architecture, and has the advantages of including local domain information; by stacking EdgeConv modules or recycling, the global shape information can be extracted.
  • the edge discriminator 22 is configured to input features of a pair of key points to determine whether the two key points belong to the same target object.
  • the macro node discriminator 23 is configured to judge whether the key points inside a macro node completely belong to the same target object.
  • the above method can not only be used to accurately predict the position of key points of the target object in the Internet video, but also can be used to analyze the behavior types of the target object; and can also be used to add real-time special effects to different parts of the target object.
  • an online hierarchical graph clustering algorithm is realized, and the structural prior information of the target object and the global information of the target object are preserved.
  • an embodiment of the present disclosure also provides a key point detection apparatus.
  • a schematic diagram of the architecture of the key point detection apparatus provided by the embodiment of the present disclosure includes an acquisition module 301 and a first generation module 302 , the second generation module 303, the processing module 304, the determination module 305, the training module 306, the behavior type determination module 307, and the special effect generation module 308, wherein:
  • an acquisition module 301 configured to acquire an image to be detected
  • the first generation module 302 is configured to generate an image feature map and a plurality of key point heat maps based on the image to be detected; the image feature map is used to represent the difference between the key points of each target object in the image to be detected. Relative positional relationship; each of the keypoint heatmaps includes a type of keypoints of the to-be-detected image, and keypoints of different types correspond to different parts of the target object;
  • the second generation module 303 is configured to generate an initial key point graph model based on the image feature map and a plurality of the key point heat maps;
  • the initial key point graph model includes the key points of different categories in the to-be-detected image Information of points and information of connecting edges, each connecting edge is an edge between two key points of different categories;
  • the processing module 304 is configured to perform multiple times of pruning processing of the connecting edges on the initial key point graph model, until multiple key points in the processed key point graph model are clustered into multiple clusters, to obtain respectively Keypoint information belonging to each target object.
  • the information of the key points includes position information, category information, and pixel feature information;
  • the second generation module 303 is configured to determine each key in the initial key point graph model according to the following steps: Point information: based on the key point heat map, determine the location information of each key point; based on the location information of each key point, extract the pixel feature information of the key point from the image feature map, and The category label of the key point heat map to which the key point belongs, and the category information corresponding to the key point is determined.
  • the apparatus further includes: a determination module 305 configured to, for each of the key points in the initial key point graph model, based on the information of the key points and the key point graph There is information about other key points of the connection edge between the model and the key point, and the fusion feature of the key point is determined; the processing module 304 performs the connection edge multiple times on the initial key point graph model
  • the configuration is: based on the fusion feature of each of the key points included in the initial key point graph model, the initial key point graph model is pruned multiple times for the connection edges deal with.
  • the processing module 304 performs multiple times of pruning the connecting edges on the initial keypoint graph model, until multiple keypoints in the processed keypoint graph model are
  • the configuration is as follows: perform the first processing process for the current key point graph model: based on the fusion feature of the two key points corresponding to each connecting edge in the current key point graph model, The adjacent key points among the multiple key points included in the current key point graph model perform key point clustering of the same target object to obtain at least one macro node; wherein, the macro node includes a plurality of clustered adjacent key points.
  • the current key point graph model performs the current pruning process of the connected edges, and obtains the key point graph model after the current pruning process; after the current first processing process is completed, the current time pruning
  • the processed key point graph model is used as the current key point graph model, and the currently determined macro node and the fusion feature of the macro node are used as the key point and the fusion feature of the key point in the current key graph model, and The first processing procedure is performed again until a plurality of keypoints in the processed keypoint graph model are clustered into a plurality of clusters.
  • the processing module 304 based on the fusion feature of two key points corresponding to each connecting edge in the current key point graph model, performs a multiplication process on the current key point graph model.
  • the configuration is as follows: based on the fusion feature of the two key points corresponding to the connecting edge, determine the connecting edge The weight represents the probability that the two key points corresponding to the connecting edge belong to the same target object; based on the weight of each connecting edge included in the current key point graph model, the current key point graph model The adjacent key points in the included multiple key points are subjected to key point clustering of the same target object to obtain at least one macro node.
  • the processing module 304 based on the obtained at least one of the macro nodes and the current key point graph model, performs the current time clipping of the connection edges on the current key point graph model.
  • the configuration is as follows: based on the obtained at least one of the macro nodes and the current key point graph model, determine the connection edge to be deleted, and extract the connection edge from the In the current key point graph model, the connection edges to be deleted are deleted; at least one of the macro nodes and other key points other than the key points included in the macro nodes in the current key point graph model are deleted. The point is used as the key point after the pruning process, and the remaining connection edge after deletion is used as the connection edge after the pruning process, and the key point graph model after the current pruning process is obtained.
  • the processing module 304 is configured to: based on the obtained at least one macro node and the current key point graph model, the connection edge to be deleted is determined to be: based on the at least one macro node.
  • the processing module 304 is based on the category information of each key point included in at least one macro node and the key points included in the current key point graph model except at least one macro node.
  • the category information of other key points other than the key point in the case of determining the connection edge to be deleted, the configuration is: for any connection edge in the current key point graph model, in the two corresponding to any connection edge
  • the key point is a key point in different macro nodes, and when there are key points with the same category information in the two macro nodes corresponding to any connection edge, determine that any connection edge is the connection edge to be deleted;
  • the two key points corresponding to any connecting edge are key points in the same macro node, determine that any connecting edge is the connecting edge to be deleted;
  • One key point in the point is a key point in the macro node, the other key point is not a key point in the macro node, and the macro node corresponding to any connecting edge has the same key as the category information of the other key point.
  • a point in the point in the
  • the key point information of each target object is generated by a pre-trained target neural network; wherein, the target neural network is obtained by training a neural network to be trained including a macro node discriminator, The macro node discriminator is used to discriminate whether multiple key points included in each macro node belong to the same target object.
  • the apparatus further includes: a training module 306, configured to train the neural network to be trained through the following steps to obtain a pre-trained target neural network: acquiring sample images; A sample image is used to train the neural network to be trained including the macro node discriminator to obtain a pre-trained target neural network.
  • a training module 306 configured to train the neural network to be trained through the following steps to obtain a pre-trained target neural network: acquiring sample images; A sample image is used to train the neural network to be trained including the macro node discriminator to obtain a pre-trained target neural network.
  • the training module 306 trains the neural network to be trained including the macro node discriminator to obtain a pre-trained target neural network, configure is: based on the sample image, train the neural network to be trained to obtain a prediction result, where the prediction result includes the detection result of the macro node discriminator, the prediction category of each key point, and each key point based on the detection result of the macro node discriminator, determine the first loss value; and based on the predicted category of each key point, the predicted position information of each key point, and the sample image
  • the second loss value is determined based on the labeling result carried in The second loss value is used to train the neural network to be trained to obtain a pre-trained target neural network.
  • the method further includes: a behavior type determination module 307, configured to be based on the key point information corresponding to each target object. , which determines the behavior type of the target object.
  • the method further includes: a special effect generation module 308, configured to be based on the key point information corresponding to each target object, Determine the position information of at least one target part of the target object, and generate special effect information for the at least one target part according to the position information of the at least one target part.
  • a special effect generation module 308 configured to be based on the key point information corresponding to each target object, Determine the position information of at least one target part of the target object, and generate special effect information for the at least one target part according to the position information of the at least one target part.
  • the functions or templates included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments, and the implementation process may refer to the descriptions in the above method embodiments. For brevity, I won't go into details here.
  • an embodiment of the present disclosure also provides an electronic device.
  • a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure includes a processor 401 , a memory 402 , and a bus 403 .
  • the memory 402 is configured to store execution instructions, including the memory 4021 and the external memory 4022; the memory 4021 here is also called the internal memory, and is configured to temporarily store the operation data in the processor 401 and the data exchanged with the external memory 4022 such as the hard disk,
  • the processor 401 exchanges data with the external memory 4022 through the memory 4021.
  • the processor 401 communicates with the memory 402 through the bus 403, so that the processor 401 is executing the following instructions: acquiring the image to be detected; Based on the to-be-detected image, an image feature map and a plurality of key point heat maps are generated; the image feature map is used to represent the relative positional relationship between each target object in the to-be-detected image; each of the key point heat maps The figure contains the key points of one category of the image to be detected, and the key points of different categories correspond to different parts of the target object; based on the image feature map and multiple heat maps of the key points, an initial key point is generated Graph model; the initial key point graph model includes information of key points of different categories in the to-be-detected image and information of connecting edges, and each connecting edge is an edge between two key points of different categories; The initial key point graph model performs multiple pruning processing of the connecting edges until the key points in the processed key point graph model are clustered into multiple clusters, and the key point information belonging to each
  • an embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the key point detection method described in the above method embodiment is executed. step.
  • the computer program product of the key point detection method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program code can be used to execute the key point detection methods described in the above method embodiments. For the steps, reference may be made to the above method embodiments, which will not be repeated here.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
  • the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions.
  • the computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
  • U disk mobile hard disk
  • read-only memory Read-Only Memory
  • RAM random access memory
  • magnetic disk or optical disk and other media that can store program codes .
  • the present disclosure generates an initial key point graph model corresponding to the image to be detected based on the generated image feature map and multiple key point heat maps. Since the initial key point graph model includes the information in the image feature map and the key point heat map, the image The feature map can represent the relative positional relationship between different target objects in the image to be detected, so that the initial key point graph model can be pruned at the connected edges, and the key point information of each target object can be obtained. The key points of different target objects are distinguished to improve the accuracy of key point clustering.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种关键点检测方法、装置、电子设备及存储介质,该方法包括:获取待检测图像;基于待检测图像,生成图像特征图和多个关键点热图;图像特征图用于表征待检测图像中各个目标对象之间的相对位置关系;每个关键点热图中包含待检测图像的一种类别的关键点,不同类别的关键点对应目标对象的不同部位;基于图像特征图和多个关键点热图,生成初始关键点图模型;初始关键点图模型中包含待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边;对初始关键点图模型进行多次连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。

Description

关键点检测方法、装置、电子设备及存储介质
相关申请的交叉引用
本公开基于申请号为202010622135.7、申请日为2020年06月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种关键点检测方法、装置、电子设备及存储介质。
背景技术
人体关键点检测和跟踪是视频分析的基础,在安防领域、动作分析领域具有重要的应用前景。自底向上的多人姿态检测技术,由于具有较高的计算效率,而被广泛应用。一般的,自底向上方法首先预测得到各个关键点的位置,再对各个关键点进行聚类,得到完整的人体姿态。
当前的方法中,可以采用图分割算法或者启发式的聚类算法,对各个关键点进行聚类,聚类过程只是作为后处理操作,并没有直接对聚类结果进行监督,使得关键点聚类过程的准确度较低。
发明内容
有鉴于此,本公开至少提供一种关键点检测方法、装置、电子设备及存储介质。
第一方面,本公开提供了一种关键点检测方法,包括:获取待检测图像;基于所述待检测图像,生成图像特征图和多个关键点热图;所述图像特征图用于表征所述待检测图像中各个目标对象之间的相对位置关系;每个所述关键点热图中包含所述待检测图像的一种类别的关键点,不同类别的关键点对应所述目标对象的不同部位;基于所述图像特征图和多个所述关键点热图,生成初始关键点图模型;所述初始关键点图模型中包含所述待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边;对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。
采用上述方法,可以基于生成的图像特征图和多个关键点热图,生成待检测图像对应的初始关键点图模型,由于初始关键点图模型中包括图像特征图和关键点热图中的信息,而图像特征图可以表征出待检测图像中不同目标对象之间的相对位置关系,从而可以对初始关键点图模型进行所处连接边的剪枝处理,得到各个目标对象的关键点信息, 较准确地对不同目标对象的关键点进行区分,以提高关键点聚类的精准度。
一种可能的实施方式中,所述关键点的信息包括位置信息、类别信息、以及像素特征信息;根据以下步骤确定所述初始关键点图模型中各个关键点的信息:基于所述关键点热图,确定各个关键点的位置信息;基于每个所述关键点的位置信息,从所述图像特征图中提取所述关键点的像素特征信息,并基于所述关键点所属关键点热图的类别标签,确定所述关键点对应的类别信息。
一种可能的实施方式中,所述方法还包括:针对所述初始关键点图模型中的每个所述关键点,基于所述关键点的信息和所述关键点图模型中与所述关键点之间存在连接边的其他关键点的信息,确定所述关键点的融合特征;所述对所述初始关键点图模型进行多次所述连接边的剪枝处理,包括:基于所述初始关键点图模型中包含的每个所述关键点的融合特征,对所述初始关键点图模型进行多次所述连接边的剪枝处理。
一种可能的实施方式中,所述对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,包括:针对当前关键点图模型执行第一处理过程:基于所述当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点;其中,所述宏节点中包括聚类后的多个相邻关键点;并基于每个所述宏节点中包括的关键点的融合特征,确定所述宏节点的融合特征;基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型;在执行完当前次的所述第一处理过程之后,将当前次剪枝处理后的关键点图模型作为当前关键点图模型,将当前次确定的所述宏节点以及所述宏节点的融合特征作为所述当前关键图模型中的关键点以及关键点的融合特征,并再次执行所述第一处理过程,直到处理后的关键点图模型中的多个关键点被聚类为多个簇。
一种可能的实施方式中,所述基于所述当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点,包括:基于所述连接边对应的两个关键点的融合特征,确定所述连接边的权重,所述权重表征所述连接边对应的两个关键点属于同一目标对象的概率;基于所述当前关键点图模型中包括的每条连接边的权重,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点。这里,通过确定每条连接边的权重,该权重表征连接边对应的两个关键点属于同一目标对象的概率,再可以基于每条连接边的权重,对当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点,比如将对应权重较大的两个关键点聚类在一起,得到一个宏节点,使得宏节点的确定较为准确。
一种可能的实施方式中,所述基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次剪枝 处理后的关键点图模型,包括:基于得到的至少一个所述宏节点和所述当前关键点图模型,确定待删减连接边,并从所述当前关键点图模型中将所述待删减连接边删除;将至少一个所述宏节点、和所述当前关键点图模型中除所述宏节点中包括的关键点之外的其他关键点作为剪枝处理后的关键点,将删除后剩余的连接边作为剪枝处理后的连接边,得到当前次剪枝处理后的关键点图模型。
一种可能的实施方式中,所述基于得到的至少一个宏节点和所述当前关键点图模型,确定待删减连接边,包括:基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边。
一种可能的实施方式中,基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边,包括:针对所述当前关键点图模型中的任一连接边,在该任一连接边对应的两个关键点为不同宏节点中的关键点,且该任一连接边对应的两个宏节点中存在类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边;在该任一连接边对应的两个关键点为同一宏节点中的关键点的情况下,确定该任一连接边为所述待删减连接边;在该任一连接边对应的两个关键点中一个关键点为宏节点中的关键点、另一个关键点不是宏节点中的关键点,且该任一连接边对应的所述宏节点中存在与另一个关键点的类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边。
一种可能的实施方式中,所述每个目标对象的关键点信息通过预先训练好的目标神经网络生成;其中,所述目标神经网络是由包括宏节点判别器的待训练神经网络训练得到的,所述宏节点判别器用于判别每个宏节点中包括的多个关键点是否属于同一目标对象。
一种可能的实施方式中,通过下述步骤对所述待训练神经网络进行训练,得到预先训练好的目标神经网络:获取样本图像;基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络。
一种可能的实施方式中,基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络,包括:基于所述样本图像,对所述待训练神经网络进行训练,得到预测结果,所述预测结果包括所述宏节点判别器的检测结果、每个关键点的预测类别、以及每个关键点的预测位置信息;基于所述宏节点判别器的检测结果,确定第一损失值;以及基于所述每个关键点的预测类别、所述每个关键点的预测位置信息,和所述样本图像中携带的标注结果,确定第二损失值;其中,所述标注结果包括每个关键点属于对应目标对象的标注类别,以及每个关键点的标注位置信息;基于所述第一损失值和所述第二损失值,对所述待训练神经网络进行训练,得到预先训练好的目标神经网络。
一种可能的实施方式中,在得到所述待检测图像中的每个目标对象的关键点信息之 后,还包括:基于每个目标对象对应的所述关键点信息,确定该目标对象的行为类型。
一种可能的实施方式中,在得到所述待检测图像中的每个目标对象的关键点信息之后,还包括:基于每个目标对象对应的所述关键点信息,确定该目标对象的至少一个目标部位的位置信息,并根据所述至少一个目标部位的位置信息,生成针对所述至少一个目标部位的特效信息。
以下装置、电子设备等的效果描述参见上述方法的说明,这里不再赘述。
第二方面,本公开提供了一种关键点检测装置,包括:获取模块,配置为获取待检测图像;第一生成模块,配置为基于所述待检测图像,生成图像特征图和多个关键点热图;所述图像特征图用于表征所述待检测图像中各个目标对象的关键点之间的相对位置关系;每个所述关键点热图中包含所述待检测图像的一种类别的关键点,不同类别的关键点对应所述目标对象的不同部位;第二生成模块,配置为基于所述图像特征图和多个所述关键点热图,生成初始关键点图模型;所述初始关键点图模型中包含所述待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边;处理模块,配置为对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。
第三方面,本公开提供一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,在电子设备运行的情况下,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如上述第一方面或任一实施方式所述的关键点检测方法的步骤。
第四方面,本公开提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如上述第一方面或任一实施方式所述的关键点检测方法的步骤。
第五方面,本公开提供了一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现上述一个或多个实施例中服务器执行上述方法。为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
图1为本公开实施例所提供的一种关键点检测方法的流程示意图;
图2A为本公开实施例所提供的一种关键点检测方法中,剪枝处理的流程示意图;
图2B为本公开实施例所提供的一种实现关键点检测方法的网络结构示意图;
图3为本公开实施例所提供的一种关键点检测装置的架构示意图;
图4为本公开实施例所提供的一种电子设备400的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
一般的,自底向上方法首先预测得到各个关键点的位置,再对各个关键点进行聚类,得到完整的人体姿态。当前的方法中,可以采用图分割算法或者启发式的聚类算法,对各个关键点进行聚类,聚类过程只是作为后处理操作,并没有直接对聚类结果进行监督,使得关键点聚类过程的准确度较低。
为了提高关键点聚类过程的准确度,本公开实施例提供了一种关键点检测方法。
为便于对本公开实施例进行理解,首先对本公开实施例所公开的一种关键点检测方法进行详细介绍。
参见图1所示,为本公开实施例所提供的一种关键点检测方法的流程示意图,该方法包括S101-S104,其中:
S101,获取待检测图像。
S102,基于待检测图像,生成图像特征图和多个关键点热图;图像特征图用于表征待检测图像中各个目标对象之间的相对位置关系;每个关键点热图中包含待检测图像的一种类别的关键点,不同类别的关键点对应目标对象的不同部位。
S103,基于图像特征图和多个关键点热图,生成初始关键点图模型;初始关键点图模型中包含待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边。
S104,对初始关键点图模型进行多次连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。
上述方法中,可以基于生成的图像特征图和多个关键点热图,生成待检测图像对应的初始关键点图模型,由于初始关键点图模型中包括图像特征图和关键点热图中的信息,而图像特征图可以表征出待检测图像中不同目标对象之间的相对位置关系,从而可以对初始关键点图模型进行所处连接边的剪枝处理,得到各个目标对象的关键点信息,较准确地对不同目标对象的关键点进行区分,以提高关键点聚类的精准度。
针对S101以及S102:待检测图像可以为任一包括多个目标对象的图像。目标对象可以为人,即对待检测对象中包括的多个人体的关键点进行检测。
在实施过程中,可以将获取的待检测图像输入至训练后的关键点检测神经网络中,生成图像特征图和多个关键点热图;并基于图像特征图、多个关键点热图、以及训练后 的关键点检测神经网络,确定每个目标对象的各个关键点。
这里,每个关键点热图中包含待检测图像的一种类别的关键点,不同类别的关键点对应目标对象的不同部位。比如,关键点的类别可以为头部、颈部、手部等,进而关键点热图可以为包含头部关键点的图像,或者,关键点热图可以为包含颈部关键点的图像等;或者,关键点的类别可以为设置的第一类别、第二类别等,其中,第一类别关键点可以为拇指上的关键点,第二类别关键点可以为食指上的关键点等,进而关键点热图可以为包含第一类别关键点的图像,或者,关键点热图可以为包含第二类别关键点的图像等。其中,关键点的类别和类别的数量可以根据实际需要进行设置。以及,每个目标对象对应的关键点的数量可以根据实际需要进行设置,比如,每个目标对象对应的关键点的数量可以为17个、105个等。
这里,关键点热图的数量与设置的关键点类别的数量一致,比如,在设置的关键点的类别数量为17个的情况下,基于待检测图像生成的关键点热图的数量也为17个。其中,每种类别的关键点的数量可以为一个。
图像特征图的数量可以为一个,也可以为多个。其中,在图像特征图的数量为一个的情况下,该图像特征图可以表征待检测图像中各个目标对象的、各种类别的关键点对应的部位之间的相对位置关系。在图像特征图的数量为多个的情况下,图像特征图的数量与关键点热图的数量可以相同,即每张图像特征图可以表征待检测图像中各个目标对象的一种类别的关键点对应的部位之间的相对位置关系。其中,图像特征图的尺寸与关键点热图的尺寸一致。
在实施过程中,可以通过在关键点检测神经网络中设置不同的损失函数,得到图像特征图和多个关键点热图。
针对S103:这里,可以从多个关键点热图和图像特征图中提取得到每个关键点的信息,将包含信息的每个关键点作为节点、以不同类别的关键点之间的边作为连接边,构成了初始关键点图模型。其中,连接边的信息可以为对应两个关键点之间存在连接关系的信息,比如,连接边一的信息可以为:连接边一对应的关键点P1和关键点P2存在连接关系。
在本公开的一些实施例中,关键点的信息包括位置信息、类别信息、以及像素特征信息。其中,可以根据以下步骤确定初始关键点图模型中各个关键点的信息:基于关键点热图,确定各个关键点的位置信息;基于每个关键点的位置信息,从图像特征图中提取关键点的像素特征信息,并基于关键点所属关键点热图的类别标签,确定关键点对应的类别信息。
在实施过程中,可以基于关键点热图中每个像素点的像素值,确定各个关键点的位置信息。示例性的,针对每个关键点热图,可以选择像素值为极大值的像素点,确定为一关键点,并将选择的该像素点的位置信息确定为关键点的位置信息。其中,在关键点热图中某一像素点的像素值大于周围像素点的像素值的情况下,认为该像素点的像素值为极大值,该像素点为关键点。在得到了每个像素点的位置信息之后,可以从图像特征 图中提取与该位置信息对应的像素点的像素值,将提取的像素值确定为关键点的像素特征信息。同时,还可以根据每个关键点所属关键点热图的类别标签,确定关键点对应的类别信息。比如,在关键点热图G1的类别标签为头部的情况下,关键点热图G1中包括的各个关键点的类别信息为头部关键点;在关键点热图G2的类别标签为颈部的情况下,关键点热图G2中包括的各个关键点的类别信息为颈部关键点。
针对S104:在本公开的一些实施例中,在对初始关键点进行多次剪枝处理之前,还可以包括:针对初始关键点图模型中的每个关键点,基于关键点的信息和关键点图模型中与关键点之间存在连接边的其他关键点的信息,确定关键点的融合特征。进而,对初始关键点图模型进行多次连接边的剪枝处理,可以包括:基于初始关键点图模型中包含的每个关键点的融合特征,对初始关键点图模型进行多次连接边的剪枝处理。
这里,可以先为初始关键点图模型中的每个关键点生成对应的融合特征,再基于每个关键点的融合特征,对初始关键点图模型进行多次连接边的剪枝处理。
在实施过程中,可以针对每个关键点,确定初始关键点图模型中与该关键点之间存在连接边的其他关键点,基于该关键点的信息和其他关键点的信息,生成该关键点的融合特征。示例性的,可以利用图神经网络(Graph Neural Network,GNN),确定初始关键点图模型中,每个关键点的融合特征,并基于初始关键点图模型中包含的每个关键点的融合特征,对初始关键点图模型进行多次连接边的剪枝处理。
上述实施方式中,通过针对每个关键点,基于该关键点的信息和与该关键点之间存在连接边的其他关键点的信息,生成该关键点对应的融合特征,这样,该关键点的融合特征不仅可以表征该关键点的特征,还可以表征该关键点与其他关键点之间的关联关系,使得基于各个关键点分别对应的融合特征,可以较准确的对初始关键点图模型进行多次连接边的剪枝处理,进而可以较准确的确定每个目标对象对应的关键点信息。
在本公开的一些实施例中,对初始关键点图模型进行多次连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,包括:
针对当前关键点图模型执行第一处理过程:步骤一、基于当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点;其中,宏节点中包括聚类后的多个相邻关键点;并基于每个宏节点中包括的关键点的融合特征,确定宏节点的融合特征;步骤二、基于得到的至少一个宏节点和当前关键点图模型,对当前关键点图模型进行当前次连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型;步骤三、在执行完当前次的第一处理过程之后,将当前次剪枝处理后的关键点图模型作为当前关键点图模型,将当前次确定的宏节点以及宏节点的融合特征作为当前关键图模型中的关键点以及关键点的融合特征,并再次执行第一处理过程,直到处理后的关键点图模型中的多个关键点被聚类为多个簇。这里,可以将初始关键点图模型作为当前关键点图模型,执行第一次第一处理过程,得到剪枝处理后的关键点图模型;并将第一次剪枝处理后的关键点图模型作为当前关键点图模型,将第一次剪枝处理后得到的每个宏节点和每个宏 节点对应的融合特征,作为当前关键点图模型中的关键点以及关键点的融合特征,执行第二次第一处理过程,直至处理后的关键点图模型中的多个关键点被聚类为多个簇,聚类得到的簇的数量与待检测图像中包括的目标对象的数量相同,每个簇中包括一个目标对象对应的全部关键点,即得到了待检测图像中每个目标对象的各个关键点。
上述实施方式下,在每一次第一处理过程中,对相邻关键点进行一次聚类,得到至少一个宏节点,每个宏节点中包括的多个关键点为同一目标对象的关键点,通过对初始关键点图模型进行多次第一处理过程,直至处理后的关键点图模型中的多个关键点被聚类为多个簇,得到了每个目标对象的关键点信息,使得得到的每个目标对象对应的关键点信息较为准确。
下述对第一处理过程进行详细说明:在步骤一中,在本公开的一些实施例中,基于当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点,包括:A1,基于连接边对应的两个关键点的融合特征,确定连接边的权重,权重表征连接边对应的两个关键点属于同一目标对象的概率。A2,基于当前关键点图模型中包括的每条连接边的权重,对当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点。这里,可以确定每条连接边对应的权重,该权重表征连接边两侧的两个关键点属于同一目标对象的概率。示例性的,可以通过训练的边判别器,针对每条连接边对应的两个关键点的融合特征,确定每条连接边的权重。
在本公开的一些实施例中,可以基于当前关键点图模型中包括的每条连接边的权重,对当前关键点图模型中包括的多个关键点中相邻关键点进行关键点聚类,得到至少一个宏节点,其中,每个宏节点中包括的多个关键点为属于同一目标对象的关键点。比如可以将权重较大的连接边对应的两个关键点聚类在一起,得到一个宏节点。每个宏节点中包括当前关键点图模型中的两个关键点,并使得聚类后得到的至少一个宏节点中包括的连接边的权重和较大。比如,在对当前关键点图模型进行本次关键点聚类后,得到两个宏节点的情况下,可以使得聚类后得到该两个宏节点中包括的连接边权重和较大。
在得到了宏节点之后,进行下一次第一处理过程之前,可以确定每个宏节点的融合特征。即可以将宏节点中包括的每个关键点的融合特征进行融合处理,得到宏节点对应的融合特征。在实施过程中,可以将宏节点中包括的每个关键点的融合特征进行池化处理,得到该宏节点的融合特征。
这里,通过确定每条连接边的权重,该权重表征连接边对应的两个关键点属于同一目标对象的概率,再可以基于每条连接边的权重,对当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点,比如将对应权重较大的两个关键点聚类在一起,得到一个宏节点,使得宏节点的确定较为准确。
在步骤二中,基于得到的至少一个宏节点和当前关键点图模型,对当前关键点图模型进行当前次连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型,包括:B1,基于得到的至少一个宏节点和当前关键点图模型,确定待删减连接边,并从当前关键点 图模型中将待删减连接边删除。B2,将至少一个宏节点、和当前关键点图模型中除宏节点中包括的关键点之外的其他关键点作为剪枝处理后的关键点,将删除后剩余的连接边作为剪枝处理后的连接边,得到当前次剪枝处理后的关键点图模型。
这里,在步骤B1中,可以先根据得到的至少一个宏节点和当前关键点图模型,确定当前次剪枝处理中待删减连接边,并将该待删减连接边从当前关键点图模型中删除。
在本公开的一些实施例中,在步骤B1中,基于得到的至少一个宏节点和当前关键点图模型,确定待删减连接边,包括:基于至少一个宏节点中包括的每个关键点的类别信息、以及当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定待删减连接边。这里,考虑到每个目标对象中仅可以包括一个相同类别的关键点,比如,每个目标对象中仅可以包括一个头部类别的关键点、一个颈部类别的关键点、一个左脚类别的关键点,因此,可以基于至少一个宏节点中包括的每个关键点的类别信息、以及当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定待删减连接边。
在本公开的一些实施例中,基于至少一个宏节点中包括的每个关键点的类别信息、以及当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定待删减连接边,包括:
针对当前关键点图模型中的任一连接边:条件一、在该任一连接边对应的两个关键点为不同宏节点中的关键点,且该任一连接边对应的两个宏节点中存在类别信息相同的关键点的情况下,确定该任一连接边为待删减连接边。条件二、在该任一连接边对应的两个关键点为同一宏节点中的关键点的情况下,确定该任一连接边为待删减连接边。条件三、在该任一连接边对应的两个关键点中一个关键点为宏节点中的关键点、另一个关键点不是宏节点中的关键点,且该任一连接边对应的宏节点中存在与另一个关键点的类别信息相同的关键点的情况下,确定该任一连接边为待删减连接边。
这里,针对当前关键点图模型中的任一连接边,在该连接边满足条件一、条件二、和条件三中的任一种条件的情况下,认为该连接边为待删减连接边,否则,该连接边不属于待删减连接边。
在条件一中,在连接边对应的两个关键点为不同宏节点中的关键点的情况下,可以判断该连接边对应的两个宏节点中是否存在类别信息相同的关键点,在存在类别信息相同的关键点的情况下,该连接边为待删减连接边;在不存在类别信息相同的关键点的情况下,该连接边不属于待删减连接边。在条件二中,在连接边对应的两个关键点为同一宏节点中的关键点的情况下,该连接边为待删减连接边。在条件三中,在该连接边对应的关键点中一个关键点是宏节点中包括的关键点,另一个关键点不是宏节点中的关键点,即另一个关键点是当前关键点图模型中除宏节点包括的关键点之外的其他关键点的情况下,可以判断该连接边对应的宏节点中是否存在与另一关键点的类别信息相同的关键点,在存在所述关键点的情况下,该连接边为待删减连接边;在不存在所述关键点的情况下,该连接边不属于待删减连接边。
上述实施方式下,考虑到每个目标对象的关键点信息中同类别的关键点仅包括一个,即每个目标对象的关键点信息中包括一个头部关键点、一个颈部关键点、一个左脚关键点等,因此,这里可以基于连接边对应的两个关键点的类别,确定待删减连接边,并生成剪枝后的关键点图模型,进而可以进行下一次第一处理过程,直到处理后的关键点图模型中的多个关键点被聚类为多个簇。
在B2中,在将待删减连接边删除之后,可以将至少一个宏节点、和当前关键点图模型中除宏节点中包括的关键点之外的其他关键点作为剪枝处理后的关键点,将删除后剩余的连接边作为兼职处理后的连接边,得到当前次剪枝处理后的关键点图模型。即当前次剪枝处理后的关键点图模型中,在关键点为宏节点的情况下,该关键点的融合特征为该宏节点对应的融合特征。
在步骤三中,在执行完当前次的第一处理过程之后,可以将当前次剪枝处理后的关键点图模型作为下一次剪枝处理时对应的当前关键点图模型,将当前次确定的宏节点以及宏节点的融合特征作为当前关键图模型中的关键点以及关键点的融合特征,并再次执行第一处理过程,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,也即直至处理后的关键点图模型中不存在任一条连接边时为止,进而得到了待检测图像中包括的每个目标对象的各个关键点。
参见图2A所示,图中示出的是一种关键点检测方法中,剪枝处理的流程示意图;可以基于待检测图像,生成图像特征图22(即Feature maps)和多个关键点热图21(即Heatmaps),再基于图像特征图22和多个关键点热图21,生成初始关键点图模型23,其中,初始关键点图模型23中的圆形标识为关键点,虚线为不同类别的关键点之间的连接边。接着,可以为每个关键点生成对应的融合特征,并通过训练的边判别器24确定每一条连接边的权重,并基于初始关键点图模型23中包括的每条连接边的权重,对初始关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点25。再接着,可以基于得到的至少一个宏节点25和初始关键点图模型23,确定待删减连接边,并从初始关键点图模型中将待删减连接边删除,再将至少一个宏节点、和初始关键点图模型中除宏节点中包括的关键点之外的其他关键点作为剪枝处理后的关键点,将删除后剩余的连接边作为剪枝处理后的连接边,得到当前次剪枝处理后的关键点图模型26,将当前次剪枝处理后的关键点图模型26作为当前关键点图模型再次进行第一处理过程,直至处理后的关键点图模型中的多个关键点被聚类为多个簇,即如图2A中最后一步得到的处理结果。其中,在待训练神经网络进行训练的情况下,还可以通过宏节点判别器27对生成的每个宏节点进行判别,即判断每个宏节点中包括的关键点是否属于同一个目标对象,并基于宏节点判别器27的检测结果对待训练神经网络进行训练,得到训练好的目标神经网络。
在本公开的一些实施例中,每个目标对象的关键点信息通过预先训练好的目标神经网络生成;其中,目标神经网络是由包括宏节点判别器的待训练神经网络训练得到的,宏节点判别器用于判别每个宏节点中包括的多个关键点是否属于同一目标对象。可以将 待检测图像输入至预先训练好的目标神经网络中,得到待检测图像中包括的每个目标对象的关键点信息。其中,每个目标对象对应的各个关键点的类别和关键点的数量,可以根据实际需要进行设置。
在实施过程中,该预先训练好的目标神经网络中可以不包括宏节点判别器。即该宏节点判别器可以在待训练神经网络的训练过程中,判断得到的每个宏节点中的多个关键点是否属于同一目标对象。在上述实施方式下,通过对包含宏节点判别器的待训练神经网络进行训练,得到目标神经网络,其中,宏节点判别器用于判别每个宏节点中包括的多个关键点是否属于同一目标对象,可以使得训练得到的目标神经网络的准确度较高。
在本公开的一些实施例中,通过下述步骤对待训练神经网络进行训练,得到预先训练好的目标神经网络:
获取样本图像;并基于样本图像,对包括宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络。
在基于样本图像,对包括宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络,可以包括:一、基于样本图像,对待训练神经网络进行训练,得到预测结果,预测结果包括宏节点判别器的检测结果、每个关键点的预测类别、以及每个关键点的预测位置信息。二、基于宏节点判别器的检测结果,确定第一损失值;以及基于每个关键点的预测类别、每个关键点的预测位置信息,和样本图像中携带的标注结果,确定第二损失值;其中,标注结果包括每个关键点属于对应目标对象的标注类别,以及每个关键点的标注位置信息。三、基于第一损失值和第二损失值,对待训练神经网络进行训练,得到预先训练好的目标神经网络。这里,样本图像中携带有标注结果,该标注结果中包括每个关键点属于对应目标对象的标注类别,以及每个关键点的标注位置信息。将样本图像输入至待训练神经网络中,得到预测结果,其中,预测结果中包括宏节点判别器的检测结果、每个关键点的预测类别、以及每个关键点的预测位置信息。进而可以基于预测结果和标注结果,对待训练神经网络进行训练,得到训练好的目标神经网络。可以基于宏节点判别器的检测结果,确定第一损失值,并基于每个关键点的预测类别、每个关键点的预测位置信息,和样本图像中携带的标注结果,确定第二损失值;通过第一损失值与第二损失值之和,对待训练神经网络进行训练,得到目标神经网络。
在本公开的一些实施例中,在得到待检测图像中的每个目标对象的关键点信息之后,还包括:基于每个目标对象对应的关键点信息,确定该目标对象的行为类型。这里,在得到每个目标对象的各个关键点的信息之后,可以将每个目标对象的各个关键点的信息输入至行为检测神经网络中,确定该目标对象的行为类型,比如,该行为类型可以为跑步、走步、托举双臂等。
在本公开的一些实施例中,在得到待检测图像中的每个目标对象的关键点信息之后,还包括:基于每个目标对象对应的关键点信息,确定该目标对象的至少一个目标部位的位置信息,并根据至少一个目标部位的位置信息,生成针对至少一个目标部位的特效信息。这里,可以针对每个目标对象的各个关键点的信息,确定该目标对象的至少一 个目标部位的位置信息,基于预设的目标部位对应的特效信息,在目标部位的位置处生成对应的特效信息。其中,目标部位可以为手臂、头部、手部等。比如,可以针对目标对象的各个关键点的信息,确定目标对象的手臂位置,并基于预设的手臂的特效信息,在目标对象的手臂位置处,生成手臂对应的特效信息。
相关技术中,人体关键点检测和跟踪是视频分析的基础,在安防领域、动作分析领域具有重要的应用前景。自底向上的多人姿态检测技术,由于较高的计算效率,而被广泛应用。一般地,自底向上方法一般分为两步,第一步预测关键点的高斯响应图,并得到各个关键点的位置。第二步,对各个关键点进行聚类,得到完整的人体姿态。然而,相关技术中,聚类步骤采用图分割算法或者启发式的聚类算法。聚类只是作为后处理操作,并没有直接对聚类结果进行监督。由此可知,相关技术中,存在以下缺点:1)聚类步骤一般采用图分割算法或者启发式的聚类算法,只是作为后处理操作,并没有直接对聚类结果进行监督;2)普通的图聚类算法,无法充分利用人体的分层结构先验信息,例如,分层信息为:一个人可以分解为上半身和下半身;上半身又可以分解为头、肩膀、胳膊;而头部又由脸部的5个关键点组成;3)普通的图聚类算法,只利用了局部信息,而忽视了全局人体信息。
为解决上述问题,本公开实施例提供了一种关键点检测方法,该方法中,首先基于可微分的“分层图聚类模块”,将关键点的检测和聚类联合起来,进行端到端地训练。然后,对聚类结果进行监督,聚类的损失可以直接反传回底层特征提取网络,进行整体网络优化。这样,网络更加注重聚类结果错误的关键点,可以更加有效的进行特征学习。一方面,分层的图聚类算法一步步迭代地对目标对象关键点进行聚类,构成了从关键点--肢体--整个目标对象的层次结构,可以对各个层次的聚类结构进行监督,能够更好地保留目标对象的层次结构先验信息。另一方面,通过引入宏关键点判别器(Macro-node Discriminator)可以对整个宏结点内部的特征进行判别,更好地保留了全局特征信息。
图2A为本公开实施例所提供的一种关键点检测方法中,剪枝处理的流程示意图,通过判断两两关键点是否属于同一个目标对象,把同一个目标对象的关键点聚集在一起。
1)提取待检测图像中的关键点信息,并根据所述关键点信息构造初始关键点图模型;这里,首先,需要提取关键点的信息,来构造初始关键点图模型G={V,E}。初始关键点图模型G分为关键点V和边E两部分,其中关键点V为各个关键点的信息,即包含「关键点的类别T,关键点的坐标X,关键点的特征F」。而边E代表关键点之间的关系,即是否属于同一个目标对象。构造初始关键点图模型之后,进行相关性特征的提取。
2)利用GNN来进行相关性特征的学习。使用边卷积(EdgeConv)来搭建图卷积神经网络模型,对所构造好的关键点图模型(Graph)进行卷积,更新关键点的特征。
3)更新关键点之间的相似度矩阵:接着训练一个边判别器(Edge Discriminator),对每一对关键点进行判别,判断这一对关键点是否属于同一个目标对象。利用判别信息来更新关键点之间的相似度矩阵。
4)聚类(Grouping):利用关键点之间的相似度矩阵,执行聚类算法,将相邻的关键点聚集在一起,成为一个新的宏节点(聚类后得到的关键点成为宏节点)。训练一个宏节点的判别器(Macro-Node Discriminator),判断一个宏节点内部的关键点是否属于同一个目标对象。
5)剪枝处理(Graph Pruning):根据一些目标对象结构先验约束,来对关键点图模型(Graph)进行剪枝处理,删除一些无关的边。
6)特征聚集(Feature Aggregation):对每个宏节点更新特征。整个聚类过程迭代地进行,直到关键点图模型中所有的边被删去,或所有关键点被成功聚类为若干个簇。
本公开实施例提供了一种关键点检测方法,以在线分层图聚类(Online Hierarchical Graph Clustering,OHGC)为例,输入:一张多目标对象的RGB图像(目标对象数假设为P);输出:P个目标对象关键点簇(一个目标对象的所有关键点组成一个簇);所述方法包括:
步骤S1,提取待检测图像中的关键点信息;步骤S2,构造多目标对象的关键点热图;步骤S3,基于GNN的相关性进行特征学习;步骤S4,迭代若干次,直到所述关键点图模型中无待剪枝边。
在本公开的一些实施例中,所述步骤S4,迭代若干次,直到所述关键点图模型中无待剪枝边,包括:步骤S41,利用池化层(avg-pooling),进行关键点特征融合;步骤S42,更新关键点之间的相似度矩阵;步骤S43,对所述关键点进行聚类;这里,所述聚类可以为用于实现关键点的合并;步骤S44,对当前关键点图模型进行剪枝处理。根据目标对象结构约束,删去对当前关键点图模型中不合理的边。例如,一个目标对象只有一个头顶点。
图2B为本公开实施例所提供的一种实现关键点检测方法的网络结构示意图,如图2B所示,所述网络结构包括:GNN模块21、边判别器22(Edge Discriminator)和宏节点判别器23(Macro-node Discriminator),其中:GNN模块21,由边缘卷积EdgeConv层和多层神经网络(Multi-Layer Perceptron,MLP)堆叠而成。其中,EdgeConv层为一种可微的,能嵌入已有的网络架构中的神经网络模块,具有含了局部领域信息;通过堆叠EdgeConv模块或循环使用,可以提取到全局形状信息等优点。边判别器22,配置为输入一对关键点的特征,来判断这两个关键点是否属于同一个目标对象。宏节点判别器23,配置为判断一个宏节点内部的关键点是否完全属于同一个目标对象。
上述方法既可以用于在互联网视频中,对目标对象关键点的位置进行准确预测;又可以用于分析目标对象的行为种类;还可以用于,在目标对象的不同部位增加实时特效。上述方法中,基于分层的图聚类模块,实现了在线分层的图聚类算法,保留了目标对象的结构先验信息和目标对象全局信息。一方面,通过端到端的训练,更加关注聚类结果出现的错误,更加有效的进行特征学习,可以直接优化聚类结果,提升了聚类精度;另一方面,能够利用目标对象结构先验信息和全局特征信息,提升了关键点的聚类精度。
本领域技术人员可以理解,在实施方式的上述方法中,各步骤的撰写顺序并不意味 着严格的执行顺序而对实施过程构成任何限定,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。
基于相同的构思,本公开实施例还提供了一种关键点检测装置,参见图3所示,为本公开实施例提供的关键点检测装置的架构示意图,包括获取模块301、第一生成模块302、第二生成模块303、处理模块304、确定模块305、训练模块306、行为类型确定模块307、以及特效生成模块308,其中:
获取模块301,配置为获取待检测图像;
第一生成模块302,配置为基于所述待检测图像,生成图像特征图和多个关键点热图;所述图像特征图用于表征所述待检测图像中各个目标对象的关键点之间的相对位置关系;每个所述关键点热图中包含所述待检测图像的一种类别的关键点,不同类别的关键点对应所述目标对象的不同部位;
第二生成模块303,配置为基于所述图像特征图和多个所述关键点热图,生成初始关键点图模型;所述初始关键点图模型中包含所述待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边;
处理模块304,配置为对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。
一种可能的实施方式中,所述关键点的信息包括位置信息、类别信息、以及像素特征信息;所述第二生成模块303,配置为根据以下步骤确定所述初始关键点图模型中各个关键点的信息:基于所述关键点热图,确定各个关键点的位置信息;基于每个所述关键点的位置信息,从所述图像特征图中提取所述关键点的像素特征信息,并基于所述关键点所属关键点热图的类别标签,确定所述关键点对应的类别信息。
一种可能的实施方式中,所述装置还包括:确定模块305,配置为针对所述初始关键点图模型中的每个所述关键点,基于所述关键点的信息和所述关键点图模型中与所述关键点之间存在连接边的其他关键点的信息,确定所述关键点的融合特征;所述处理模块304,在对所述初始关键点图模型进行多次所述连接边的剪枝处理的情况下,配置为:基于所述初始关键点图模型中包含的每个所述关键点的融合特征,对所述初始关键点图模型进行多次所述连接边的剪枝处理。
一种可能的实施方式中,所述处理模块304,在对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇的情况下,配置为:针对当前关键点图模型执行第一处理过程:基于所述当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点;其中,所述宏节点中包括聚类后的多个相邻关键点;并基于每个所述宏节点中包括的关键点的融合特征,确定所述宏节点的融合特征;基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次 剪枝处理后的关键点图模型;在执行完当前次的所述第一处理过程之后,将当前次剪枝处理后的关键点图模型作为当前关键点图模型,将当前次确定的所述宏节点以及所述宏节点的融合特征作为所述当前关键图模型中的关键点以及关键点的融合特征,并再次执行所述第一处理过程,直到处理后的关键点图模型中的多个关键点被聚类为多个簇。
一种可能的实施方式中,所述处理模块304,在基于所述当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点的情况下,配置为:基于所述连接边对应的两个关键点的融合特征,确定所述连接边的权重,所述权重表征所述连接边对应的两个关键点属于同一目标对象的概率;基于所述当前关键点图模型中包括的每条连接边的权重,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点。
一种可能的实施方式中,所述处理模块304,在基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型的情况下,配置为:基于得到的至少一个所述宏节点和所述当前关键点图模型,确定待删减连接边,并从所述当前关键点图模型中将所述待删减连接边删除;将至少一个所述宏节点、和所述当前关键点图模型中除所述宏节点中包括的关键点之外的其他关键点作为剪枝处理后的关键点,将删除后剩余的连接边作为剪枝处理后的连接边,得到当前次剪枝处理后的关键点图模型。
一种可能的实施方式中,所述处理模块304,在基于得到的至少一个宏节点和所述当前关键点图模型,确定待删减连接边的情况下,配置为:基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边。
一种可能的实施方式中,所述处理模块304,在基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边的情况下,配置为:针对所述当前关键点图模型中的任一连接边,在该任一连接边对应的两个关键点为不同宏节点中的关键点,且该任一连接边对应的两个宏节点中存在类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边;在该任一连接边对应的两个关键点为同一宏节点中的关键点的情况下,确定该任一连接边为所述待删减连接边;在该任一连接边对应的两个关键点中一个关键点为宏节点中的关键点、另一个关键点不是宏节点中的关键点,且该任一连接边对应的所述宏节点中存在与另一个关键点的类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边。
一种可能的实施方式中,所述每个目标对象的关键点信息通过预先训练好的目标神经网络生成;其中,所述目标神经网络是由包括宏节点判别器的待训练神经网络训练得到,宏节点判别器用于判别每个宏节点中包括的多个关键点是否属于同一目标对象。
一种可能的实施方式中,所述装置还包括:训练模块306,配置为通过下述步骤对 所述待训练神经网络进行训练,得到预先训练好的目标神经网络:获取样本图像;基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络。
一种可能的实施方式中,所述训练模块306,在基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络的情况下,配置为:基于所述样本图像,对所述待训练神经网络进行训练,得到预测结果,所述预测结果包括所述宏节点判别器的检测结果、每个关键点的预测类别、以及每个关键点的预测位置信息;基于所述宏节点判别器的检测结果,确定第一损失值;以及基于所述每个关键点的预测类别、所述每个关键点的预测位置信息,和所述样本图像中携带的标注结果,确定第二损失值;其中,所述标注结果包括每个关键点属于对应目标对象的标注类别,以及每个关键点的标注位置信息;基于所述第一损失值和所述第二损失值,对所述待训练神经网络进行训练,得到预先训练好的目标神经网络。
一种可能的实施方式中,在得到所述待检测图像中的每个目标对象的关键点信息之后,还包括:行为类型确定模块307,配置为基于每个目标对象对应的所述关键点信息,确定该目标对象的行为类型。
一种可能的实施方式中,在得到所述待检测图像中的每个目标对象的关键点信息之后,还包括:特效生成模块308,配置为基于每个目标对象对应的所述关键点信息,确定该目标对象的至少一个目标部位的位置信息,并根据所述至少一个目标部位的位置信息,生成针对所述至少一个目标部位的特效信息。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模板可以用于执行上文方法实施例描述的方法,其实现的过程可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
基于同一技术构思,本公开实施例还提供了一种电子设备。参照图4所示,为本公开实施例提供的电子设备的结构示意图,包括处理器401、存储器402、和总线403。其中,存储器402配置为存储执行指令,包括内存4021和外部存储器4022;这里的内存4021也称内存储器,配置为暂时存放处理器401中的运算数据,以及与硬盘等外部存储器4022交换的数据,处理器401通过内存4021与外部存储器4022进行数据交换,在电子设备400运行的情况下,处理器401与存储器402之间通过总线403通信,使得处理器401在执行以下指令:获取待检测图像;基于所述待检测图像,生成图像特征图和多个关键点热图;所述图像特征图用于表征所述待检测图像中各个目标对象之间的相对位置关系;每个所述关键点热图中包含所述待检测图像的一种类别的关键点,不同类别的关键点对应所述目标对象的不同部位;基于所述图像特征图和多个所述关键点热图,生成初始关键点图模型;所述初始关键点图模型中包含所述待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边;对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。此外,本公开实 施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的关键点检测方法的步骤。
本公开实施例所提供的关键点检测方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的关键点检测方法的步骤,可参见上述方法实施例,在此不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。以上仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。
工业实用性
本公开基于生成的图像特征图和多个关键点热图,生成待检测图像对应的初始关键点图模型,由于初始关键点图模型中包括图像特征图和关键点热图中的信息,而图像特征图可以表征出待检测图像中不同目标对象之间的相对位置关系,从而可以对初始关键点图模型进行所处连接边的剪枝处理,得到各个目标对象的关键点信息,较准确地对不同目标对象的关键点进行区分,以提高关键点聚类的精准度。

Claims (29)

  1. 一种关键点检测方法,包括:
    获取待检测图像;
    基于所述待检测图像,生成图像特征图和多个关键点热图;所述图像特征图用于表征所述待检测图像中各个目标对象之间的相对位置关系;每个所述关键点热图中包含所述待检测图像的一种类别的关键点,不同类别的关键点对应所述目标对象的不同部位;
    基于所述图像特征图和多个所述关键点热图,生成初始关键点图模型;所述初始关键点图模型中包含所述待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边;
    对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。
  2. 根据权利要求1所述的方法,其中,所述关键点的信息包括位置信息、类别信息、以及像素特征信息;
    根据以下步骤确定所述初始关键点图模型中各个关键点的信息:
    基于所述关键点热图,确定各个关键点的位置信息;
    基于每个所述关键点的位置信息,从所述图像特征图中提取所述关键点的像素特征信息,并基于所述关键点所属关键点热图的类别标签,确定所述关键点对应的类别信息。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    针对所述初始关键点图模型中的每个所述关键点,基于所述关键点的信息和所述关键点图模型中与所述关键点之间存在连接边的其他关键点的信息,确定所述关键点的融合特征;
    所述对所述初始关键点图模型进行多次所述连接边的剪枝处理,包括:
    基于所述初始关键点图模型中包含的每个所述关键点的融合特征,对所述初始关键点图模型进行多次所述连接边的剪枝处理。
  4. 根据权利要求1至3任一所述的方法,其中,所述对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,包括:
    针对当前关键点图模型执行第一处理过程:
    基于所述当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点;其中,所述宏节点中包括聚类后的多个相邻关键点;并 基于每个所述宏节点中包括的关键点的融合特征,确定所述宏节点的融合特征;
    基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型;
    在执行完当前次的所述第一处理过程之后,将当前次剪枝处理后的关键点图模型作为当前关键点图模型,将当前次确定的所述宏节点以及所述宏节点的融合特征作为所述当前关键图模型中的关键点以及关键点的融合特征,并再次执行所述第一处理过程,直到处理后的关键点图模型中的多个关键点被聚类为多个簇。
  5. 根据权利要求4所述的方法,其中,所述基于所述当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点,包括:
    基于所述连接边对应的两个关键点的融合特征,确定所述连接边的权重,所述权重表征所述连接边对应的两个关键点属于同一目标对象的概率;
    基于所述当前关键点图模型中包括的每条连接边的权重,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点。
  6. 根据权利要求4所述的方法,其中,所述基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型,包括:
    基于得到的至少一个所述宏节点和所述当前关键点图模型,确定待删减连接边,并从所述当前关键点图模型中将所述待删减连接边删除;
    将至少一个所述宏节点、和所述当前关键点图模型中除所述宏节点中包括的关键点之外的其他关键点作为剪枝处理后的关键点,将删除后剩余的连接边作为剪枝处理后的连接边,得到当前次剪枝处理后的关键点图模型。
  7. 根据权利要求6所述的方法,其中,所述基于得到的至少一个宏节点和所述当前关键点图模型,确定待删减连接边,包括:
    基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边。
  8. 根据权利要求7所述的方法,其中,基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边,包括:
    针对所述当前关键点图模型中的任一连接边,在该任一连接边对应的两个关键点为不同宏节点中的关键点,且该任一连接边对应的两个宏节点中存在类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边;
    在该任一连接边对应的两个关键点为同一宏节点中的关键点的情况下,确定该任一连接边为所述待删减连接边;
    在该任一连接边对应的两个关键点中一个关键点为宏节点中的关键点、另一个关键点不是宏节点中的关键点,且该任一连接边对应的所述宏节点中存在与另一个关键点的类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边。
  9. 根据权利要求1至8任一所述的方法,其中,所述每个目标对象的关键点信息通过预先训练好的目标神经网络生成;其中,所述目标神经网络是由包括宏节点判别器的待训练神经网络训练得到的,所述宏节点判别器用于判别每个宏节点中包括的多个关键点是否属于同一目标对象。
  10. 根据权利要求9所述的方法,其中,通过下述步骤对所述待训练神经网络进行训练,得到预先训练好的目标神经网络:
    获取样本图像;
    基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络。
  11. 根据权利要求10所述的方法,其中,基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络,包括:
    基于所述样本图像,对所述待训练神经网络进行训练,得到预测结果,所述预测结果包括所述宏节点判别器的检测结果、每个关键点的预测类别、以及每个关键点的预测位置信息;
    基于所述宏节点判别器的检测结果,确定第一损失值;以及基于所述每个关键点的预测类别、所述每个关键点的预测位置信息,和所述样本图像中携带的标注结果,确定第二损失值;其中,所述标注结果包括每个关键点属于对应目标对象的标注类别,以及每个关键点的标注位置信息;
    基于所述第一损失值和所述第二损失值,对所述待训练神经网络进行训练,得到预先训练好的目标神经网络。
  12. 根据权利要求1至11任一所述的方法,其中,在得到所述待检测图像中的每个目标对象的关键点信息之后,还包括:
    基于每个目标对象对应的所述关键点信息,确定该目标对象的行为类型。
  13. 根据权利要求1至11任一所述的方法,其中,在得到所述待检测图像中的每个目标对象的关键点信息之后,还包括:
    基于每个目标对象对应的所述关键点信息,确定该目标对象的至少一个目标部位的位置信息,并根据所述至少一个目标部位的位置信息,生成针对所述至少一个目标部位的特效信息。
  14. 一种关键点检测装置,包括:
    获取模块,配置为获取待检测图像;
    第一生成模块,配置为基于所述待检测图像,生成图像特征图和多个关键点热图;所述图像特征图用于表征所述待检测图像中各个目标对象的关键点之间的相对位置关系;每个所述关键点热图中包含所述待检测图像的一种类别的关键点,不同类别 的关键点对应所述目标对象的不同部位;
    第二生成模块,配置为基于所述图像特征图和多个所述关键点热图,生成初始关键点图模型;所述初始关键点图模型中包含所述待检测图像中不同类别的关键点的信息以及连接边的信息,每个连接边为两个不同类别的关键点之间的边;
    处理模块,配置为对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇,得到分别属于各个目标对象的关键点信息。
  15. 根据权利要求14所述的装置,其中,所述关键点的信息包括位置信息、类别信息、以及像素特征信息,所述第二生成模块,配置为根据以下步骤确定所述初始关键点图模型中各个关键点的信息:
    基于所述关键点热图,确定各个关键点的位置信息;
    基于每个所述关键点的位置信息,从所述图像特征图中提取所述关键点的像素特征信息,并基于所述关键点所属关键点热图的类别标签,确定所述关键点对应的类别信息。
  16. 根据权利要求14所述的装置,其中,所述装置还包括:
    确定模块,配置为针对所述初始关键点图模型中的每个所述关键点,基于所述关键点的信息和所述关键点图模型中与所述关键点之间存在连接边的其他关键点的信息,确定所述关键点的融合特征;
    所述处理模块,在对所述初始关键点图模型进行多次所述连接边的剪枝处理的情况下,配置为:
    基于所述初始关键点图模型中包含的每个所述关键点的融合特征,对所述初始关键点图模型进行多次所述连接边的剪枝处理。
  17. 根据权利要求14至16任一所述的装置,其中,所述处理模块,在对所述初始关键点图模型进行多次所述连接边的剪枝处理,直到处理后的关键点图模型中的多个关键点被聚类为多个簇的情况下,配置为:
    针对当前关键点图模型执行第一处理过程:
    基于所述当前关键点图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点;其中,所述宏节点中包括聚类后的多个相邻关键点;并基于每个所述宏节点中包括的关键点的融合特征,确定所述宏节点的融合特征;
    基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型;
    在执行完当前次的所述第一处理过程之后,将当前次剪枝处理后的关键点图模型作为当前关键点图模型,将当前次确定的所述宏节点以及所述宏节点的融合特征作为所述当前关键图模型中的关键点以及关键点的融合特征,并再次执行所述第一处理过程,直到处理后的关键点图模型中的多个关键点被聚类为多个簇。
  18. 根据权利要求17所述的装置,其中,所述处理模块,在基于所述当前关键点 图模型中的每条连接边对应的两个关键点的融合特征,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点的情况下,配置为:
    基于所述连接边对应的两个关键点的融合特征,确定所述连接边的权重,所述权重表征所述连接边对应的两个关键点属于同一目标对象的概率;
    基于所述当前关键点图模型中包括的每条连接边的权重,对所述当前关键点图模型包括的多个关键点中相邻关键点进行同一目标对象的关键点聚类,得到至少一个宏节点。
  19. 根据权利要求17所述的装置,其中,所述处理模块,在基于得到的至少一个所述宏节点和所述当前关键点图模型,对所述当前关键点图模型进行当前次所述连接边的剪枝处理,并得到当前次剪枝处理后的关键点图模型的情况下,配置为:
    基于得到的至少一个所述宏节点和所述当前关键点图模型,确定待删减连接边,并从所述当前关键点图模型中将所述待删减连接边删除;
    将至少一个所述宏节点、和所述当前关键点图模型中除所述宏节点中包括的关键点之外的其他关键点作为剪枝处理后的关键点,将删除后剩余的连接边作为剪枝处理后的连接边,得到当前次剪枝处理后的关键点图模型。
  20. 根据权利要求19所述的装置,其中,所述处理模块,在基于得到的至少一个宏节点和所述当前关键点图模型,确定待删减连接边的情况下,配置为:
    基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边。
  21. 根据权利要求20所述的装置,其中,所述处理模块,在基于至少一个宏节点中包括的每个关键点的类别信息、以及所述当前关键点图模型中除至少一个宏节点中包括的关键点之外的其他关键点的类别信息,确定所述待删减连接边的情况下,配置为:
    针对所述当前关键点图模型中的任一连接边,在该任一连接边对应的两个关键点为不同宏节点中的关键点,且该任一连接边对应的两个宏节点中存在类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边;
    在该任一连接边对应的两个关键点为同一宏节点中的关键点的情况下,确定该任一连接边为所述待删减连接边;
    在该任一连接边对应的两个关键点中一个关键点为宏节点中的关键点、另一个关键点不是宏节点中的关键点,且该任一连接边对应的所述宏节点中存在与另一个关键点的类别信息相同的关键点的情况下,确定该任一连接边为所述待删减连接边。
  22. 根据权利要求14至21任一所述的装置,其中,所述每个目标对象的关键点信息通过预先训练好的目标神经网络生成;其中,所述目标神经网络是由包括宏节点判别器的待训练神经网络训练得到的,所述宏节点判别器用于判别每个宏节点中包括的多个关键点是否属于同一目标对象。
  23. 根据权利要求22所述的装置,其中,所述装置还包括:训练模块,配置为通过下述步骤对所述待训练神经网络进行训练,得到预先训练好的目标神经网络:
    获取样本图像;
    基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络。
  24. 根据权利要求23所述的装置,其中,所述训练模块,在基于所述样本图像,对包括所述宏节点判别器的待训练神经网络进行训练,得到预先训练好的目标神经网络的情况下,配置为:
    基于所述样本图像,对所述待训练神经网络进行训练,得到预测结果,所述预测结果包括所述宏节点判别器的检测结果、每个关键点的预测类别、以及每个关键点的预测位置信息;
    基于所述宏节点判别器的检测结果,确定第一损失值;以及基于所述每个关键点的预测类别、所述每个关键点的预测位置信息,和所述样本图像中携带的标注结果,确定第二损失值;其中,所述标注结果包括每个关键点属于对应目标对象的标注类别,以及每个关键点的标注位置信息;
    基于所述第一损失值和所述第二损失值,对所述待训练神经网络进行训练,得到预先训练好的目标神经网络。
  25. 根据权利要求14至24任一所述的装置,其中,在得到所述待检测图像中的每个目标对象的关键点信息之后,还包括:
    行为类型确定模块,配置为基于每个目标对象对应的所述关键点信息,确定该目标对象的行为类型。
  26. 根据权利要求14至24任一所述的装置,其中,在得到所述待检测图像中的每个目标对象的关键点信息之后,还包括:
    特效生成模块,配置为基于每个目标对象对应的所述关键点信息,确定该目标对象的至少一个目标部位的位置信息,并根据所述至少一个目标部位的位置信息,生成针对所述至少一个目标部位的特效信息。
  27. 一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至13任一所述的关键点检测方法的步骤。
  28. 一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至13任一所述的关键点检测方法的步骤。
  29. 一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至13任一项所述的关键点检测方法的步骤。
PCT/CN2021/076467 2020-06-30 2021-02-10 关键点检测方法、装置、电子设备及存储介质 WO2022001123A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021565761A JP7182021B2 (ja) 2020-06-30 2021-02-10 キーポイント検出方法、キーポイント検出装置、電子機器及び記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010622135.7 2020-06-30
CN202010622135.7A CN111898642B (zh) 2020-06-30 2020-06-30 关键点检测方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022001123A1 true WO2022001123A1 (zh) 2022-01-06

Family

ID=73191965

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076467 WO2022001123A1 (zh) 2020-06-30 2021-02-10 关键点检测方法、装置、电子设备及存储介质

Country Status (4)

Country Link
JP (1) JP7182021B2 (zh)
CN (1) CN111898642B (zh)
TW (1) TWI766618B (zh)
WO (1) WO2022001123A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019136A (zh) * 2022-08-05 2022-09-06 山东圣点世纪科技有限公司 抗边界点漂移的目标关键点检测模型训练方法及检测方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898642B (zh) * 2020-06-30 2021-08-13 北京市商汤科技开发有限公司 关键点检测方法、装置、电子设备及存储介质
CN111783882B (zh) * 2020-06-30 2022-09-09 北京市商汤科技开发有限公司 关键点检测方法、装置、电子设备及存储介质
CN112465006B (zh) * 2020-11-24 2022-08-05 中国人民解放军海军航空大学 一种图神经网络目标跟踪方法及装置
CN112561054B (zh) * 2020-12-03 2023-03-31 中国科学院光电技术研究所 一种基于批量特征热图的神经网络滤波器剪枝方法
CN112580652B (zh) * 2020-12-24 2024-04-09 咪咕文化科技有限公司 虚拟装饰方法、装置、电子设备及存储介质
CN112598070B (zh) * 2020-12-25 2023-07-28 创新奇智(广州)科技有限公司 目标检测方法、装置、电子设备及存储介质
CN113408568B (zh) * 2021-04-16 2024-04-16 科大讯飞股份有限公司 对象关键点的检测模型训练的相关方法、装置、设备
CN113850245A (zh) * 2021-11-30 2021-12-28 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN114372477B (zh) 2022-03-21 2022-06-10 北京百度网讯科技有限公司 文本识别模型的训练方法、文本识别方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532873A (zh) * 2019-07-24 2019-12-03 西安交通大学 一种联合人体检测与姿态估计的深度网络学习方法
CN111783882A (zh) * 2020-06-30 2020-10-16 北京市商汤科技开发有限公司 关键点检测方法、装置、电子设备及存储介质
CN111898642A (zh) * 2020-06-30 2020-11-06 北京市商汤科技开发有限公司 关键点检测方法、装置、电子设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200919210A (en) * 2007-07-18 2009-05-01 Steven Kays Adaptive electronic design
CN105893920B (zh) * 2015-01-26 2019-12-27 阿里巴巴集团控股有限公司 一种人脸活体检测方法和装置
CN108985259B (zh) * 2018-08-03 2022-03-18 百度在线网络技术(北京)有限公司 人体动作识别方法和装置
WO2020046831A1 (en) * 2018-08-27 2020-03-05 TalkMeUp Interactive artificial intelligence analytical system
US11238612B2 (en) * 2018-08-28 2022-02-01 Beijing Jingdong Shangke Information Technology Co., Ltd. Device and method of tracking poses of multiple objects based on single-object pose estimator
US10643085B1 (en) * 2019-01-30 2020-05-05 StradVision, Inc. Method and device for estimating height and weight of passengers using body part length and face information based on human's status recognition
CN110020633B (zh) * 2019-04-12 2022-11-04 腾讯科技(深圳)有限公司 姿态识别模型的训练方法、图像识别方法及装置
CN111160085A (zh) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 一种人体图像关键点姿态估计方法
CN111339903B (zh) * 2020-02-21 2022-02-08 河北工业大学 一种多人人体姿态估计方法
CN111341438B (zh) * 2020-02-25 2023-04-28 中国科学技术大学 图像处理方法、装置、电子设备及介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532873A (zh) * 2019-07-24 2019-12-03 西安交通大学 一种联合人体检测与姿态估计的深度网络学习方法
CN111783882A (zh) * 2020-06-30 2020-10-16 北京市商汤科技开发有限公司 关键点检测方法、装置、电子设备及存储介质
CN111898642A (zh) * 2020-06-30 2020-11-06 北京市商汤科技开发有限公司 关键点检测方法、装置、电子设备及存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEJANDRO NEWELL, HUANG ZHIAO, DENG JIA: "Associative Embedding: End-to-End Learning for Joint Detection and Grouping", 9 June 2017 (2017-06-09), pages 1 - 11, XP055611461, Retrieved from the Internet <URL:https://arxiv.org/pdf/1611.05424.pdf> *
ANDREA VEDALDI, HORST BISCHOF, THOMAS BROX, JAN-MICHAEL FRAHM (EDS.): "Computer vision - ECCV 2020 : 16th European conference, Glasgow, UK, August 23-28, 2020 : proceedings; Part of the Lecture Notes in Computer Science ; ISSN 0302-9743", vol. 42, 1 January 1900, SPRINGER INTERNATIONAL PUBLISHING, Cham, ISBN: 978-3-030-58594-5, article JIN SHENG; LIU WENTAO; XIE ENZE; WANG WENHAI; QIAN CHEN; OUYANG WANLI; LUO PING: "Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation", pages: 718 - 734, XP047568932, DOI: 10.1007/978-3-030-58571-6_42 *
WANG YUE YUEWANG@CSAIL.MIT.EDU; SUN YONGBIN YB_SUN@MIT.EDU; LIU ZIWEI ZWLIU.HUST@GMAIL.COM; SARMA SANJAY E. SESARMA@MIT.EDU; BRONS: "Dynamic Graph CNN for Learning on Point Clouds", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 38, no. 5, 10 October 2019 (2019-10-10), US , pages 1 - 12, XP058475830, ISSN: 0730-0301, DOI: 10.1145/3326362 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019136A (zh) * 2022-08-05 2022-09-06 山东圣点世纪科技有限公司 抗边界点漂移的目标关键点检测模型训练方法及检测方法

Also Published As

Publication number Publication date
CN111898642A (zh) 2020-11-06
CN111898642B (zh) 2021-08-13
JP7182021B2 (ja) 2022-12-01
JP2022543954A (ja) 2022-10-17
TWI766618B (zh) 2022-06-01
TW202203212A (zh) 2022-01-16

Similar Documents

Publication Publication Date Title
WO2022001123A1 (zh) 关键点检测方法、装置、电子设备及存储介质
Goodfellow et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks
CN109002834B (zh) 基于多模态表征的细粒度图像分类方法
CN110033018B (zh) 图形相似度判断方法、装置及计算机可读存储介质
TWI774271B (zh) 關鍵點檢測方法、電子設備及電腦可讀儲存介質
US10949653B2 (en) Intelligent persona generation
CN108229347A (zh) 用于人识别的拟吉布斯结构采样的深层置换的方法和装置
Ajagbe et al. Investigating the efficiency of deep learning models in bioinspired object detection
CN113673244B (zh) 医疗文本处理方法、装置、计算机设备和存储介质
CN108875456A (zh) 目标检测方法、目标检测装置和计算机可读存储介质
Wang et al. MOL: Towards accurate weakly supervised remote sensing object detection via Multi-view nOisy Learning
Defriani et al. Recognition of Regional Traditional House in Indonesia Using Convolutional Neural Network (CNN) Method
CN111898528B (zh) 数据处理方法、装置、计算机可读介质及电子设备
CN117457192A (zh) 智能远程诊断方法及系统
CN116884045A (zh) 身份识别方法、装置、计算机设备和存储介质
CN111914772A (zh) 识别年龄的方法、年龄识别模型的训练方法和装置
CN104200222B (zh) 一种基于因子图模型的图片中对象识别方法
CN116630062A (zh) 一种医保欺诈行为检测方法、系统、存储介质
CN116958724A (zh) 一种产品分类模型的训练方法和相关装置
US11836223B2 (en) Systems and methods for automated detection of building footprints
Ke et al. Human attribute recognition method based on pose estimation and multiple-feature fusion
CN115050066A (zh) 人脸伪造检测方法、装置、终端及存储介质
CN113627522A (zh) 基于关系网络的图像分类方法、装置、设备及存储介质
Anggoro et al. Classification of Solo Batik patterns using deep learning convolutional neural networks algorithm
Ingale et al. Deep Learning for Crowd Image Classification for Images Captured Under Varying Climatic and Lighting Condition

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021565761

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21832886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21832886

Country of ref document: EP

Kind code of ref document: A1